Orthogonal Persistence

103 points by mpweiher 2 years ago

geophile 2 years ago

That phrase certainly brings back memories, from when I worked at an object-oriented database startup.

The object-orientation was actually pretty unimportant, (except for those products that brought in persistence via inheritance -- so not really orthogonal). No, the point was adding a new storage class to programming languages.

I worked at Object Design, and we had (IMHO) an incredibly elegant approach. In our approach, persistence really was orthogonal to type, for C/C++. If you want a FooBar, you would write "new FooBar(123)". That gives you a FooBar in the heap, disappears at process end (or on deletion), etc. Or you could write "new(db) FooBar(123)", and then on commit (we had transactions of course), the FooBar would be in the database, and accessible by other processes.

A page-faulting mechanism would bring in pages containing locations that your program referenced. That itself was very elegant.

But the really beautiful thing about this architecture was getting it to work in a 32-bit address space. We did some clever things about mapping portions of the address space during the faulting process to make things work transparently. (This problem pretty much disappears with a 64-bit address space.)

Separate from all that, we had a collection library, integrated with an OO query language. E.g., you could have a collections of widgets in your database, write "widgets[: weight < 0.01 and !strcmp(color, 'red') :], and get back a set containing the qualifying widgets. We also supported 1:1, 1:n, and m:n relationships, which would maintain pointers and sets of pointers in both directions.

It was a "database system" because our VCs wanted it to be. But it really wasn't. It really was a new storage class for C/C++, and later, for Smalltalk and Java.

Object Design also had a spectacularly talented group of engineers, many of whom came from MIT AI Lab/Symbolics.

aeontech 2 years ago

That sounds fascinating! Does anything like this exist now in the open source world?
- geophile 2 years ago
  
  Not that I know of.
- jinjin2 2 years ago
  
  The closest I know of is Realm, which is an amazing object database and seems to do a lot of what the op describes. But after the acquisition by MongoDB they seem to have drifted a bit more in a document database direction.
jdougan 2 years ago

How would you compare it to Gemstone/S? I spent a large chunk of the 90s working on a maintenance management system witha GS back end.
- mpweiher 2 years ago
  
  Can you describe your experience working with that GS backend?
  
  jdougan 2 years ago
  
  What is is you want to know?
- geophile 2 years ago
  
  I could have answered that question as much as 20 years ago. It’s been a while …

082349872349872 2 years ago

See the end for a discussion of "Unfriendly Persistence": https://github.com/mighty-gerbils/gerbil-persist/blob/master...

(the data you'd like to keep is more volatile than you'd wish, but the data others keep on you is much less volatile that you'd wish)

convolvatron 2 years ago

having this as a model would be lovely and I agree wholeheartedly with the exposition. its interesting though to think about what impact this model would have on programming. a lot of our processing and tooling are built around the notion that programs are _almost_ right, and that we can bring them in and out - and hopefully in the process our precious data hasn't been mangled.

when we express state directly in programs, we gain a lot, but our notion of trashy disposable execution goes away and now we have to think a lot more about how that system evolves.

usrusr 2 years ago

Reminds me of the discussions on hn when Intel Optane wasn't quite dead yet. Those always seemed to end with the conclusion that if separation between volatile and persistent memory had not been forced on us by technological reality, it would be a concept we'd better have invented at some point.
- catskul2 2 years ago
  
  Think you can find any of those discussions? I'd be curious to read/browse.
  
  usrusr 2 years ago
  
  This is the one that was featured in my memory:
  https://news.ycombinator.com/item?id=32314814
  But the most recent one is also interesting: https://news.ycombinator.com/item?id=38527437
- 082349872349872 2 years ago
  
  I was tangentially involved with an orthogonally persistent OS, and we indeed had had to reinvent a distinct journalling channel and special optionally-volatile storage, for DBMS-style applications.
  
  kragen 2 years ago
  
  other reasons to need optionally-volatile storage include secure encryption key generation (reusing randomness often fatally compromises it) and device drivers (if you restore the internal state of your device driver from a checkpoint, but not the state of the device, you will probably crash the system the next time the driver tries to frob the device)
  despite this, virtual machine checkpoints in qemu work well enough for many purposes
- cmrdporcupine 2 years ago
  
  I am not sure about this?
  Volatile memory is at this time merely an outgrowth of the uptime of the system. Back when people routinely turned their machines "off and on again", it became part of that convention. But now uptime can be measured in years, and even personal laptops can enter and exit suspended state for weeks on end without clearing volatile memory.
  What we have developed in software systems to accommodate this on long running processes is garbage collection.
  If the volatile/non-volatile distinction had never developed, all that would have happened is that R&D into garbage collection would have been more intense, and earlier.
  In fact Lisp had garbage collection from day 1.
  Systems like Smalltalk were also built from the ground up on an image-based model where all reachable state was persistent.
  In other words: transient data does not necessitate volatile memory. It necessitates garbage collection, though. (And likely also a distinction in programming between "performant" memory areas and non-performant, assuming our NV storage is the latter.)
  In a way, programmers having to deal with their garbage upfront and not relying on "have you tried turning it off and on again?" could have created better software engineering practices earlier? Maybe?
  
  48864w6ui 2 years ago
  
  Back in the days when minicomputers (which required a walk to the air-conditioned machine room to reboot) and microcomputers (which had a case or keyboard switch) coexisted, the former were way less flaky than the latter.

couchand 2 years ago

> Atomic sections must be short: long atomic sections will break liveness, i.e. may cause the system to become unresponsive.

> ...

> Based on disk latency, we may target say a millisecond as duration before which to commit the current transaction. When the timer is reached, the transaction is delayed until all current atomic sections are completed; and (possibly after a grace period) new atomic sections are blocked from even being started, until after the transaction is committed.

Maybe I'm reading this wrong, but the limitations on transaction duration seem to be disqualifying for real usage? If it's not possible to run an atomic transaction for longer than a few milliseconds without bringing the system down?

thom 2 years ago

This was very fashionable 15-20 years ago, in both application and OS research. One such Java framework:

https://prevayler.org/

Less ambitious than TFA overall, I grant you.

sillywalk 2 years ago

I'm not sure if counts as "orthogonal persistence", but there's the Aurora Operating System[0] from 2021. It's apparently based on FreeBSD, and can run most unmodified apps, as well as having an API to support its Store.
"We present the Aurora single level store (SLS), an OS that simplifies persistence by automatically per- sisting all traditionally ephemeral application state. With recent storage hardware like NVMe SSDs and NVDIMMs, Aurora is able to continuously checkpoint entire applications with millisecond granularity. Aurora is the first full POSIX single level store to han- dle complex applications ranging from databases to web browsers"
[0] https://rcs.uwaterloo.ca/pubs/hotos21-aurora.pdf
mdaniel 2 years ago

BSD 3-clause https://github.com/prevayler/prevayler/blob/master/LICENSE.t...

qazxcvbnm 2 years ago

> Persistence is Orthogonal to the Data Model, ...

I have some experience with a custom data runtime where the persistence is orthogonal to the data model, with silhouettes reminiscent of the described solutions in many of the features of my system, including multiple orthogonal/model-agnostic persistence backends, automatic data synchronisation, persistable executions, automatable schema changes, automatic reactivity.

This direction can indeed bring about great savings in various parts of development; however, it seems to me that more subtlety than indicated in the post is required.

The programmer must be provided with ergonomic means to give denotations for things like when and where to persist, in order to reduce data movement, and to keep the system performant (this does not violate orthogonality; we may specify e.g. to persist at the logical location, say, in the cloud, without having to specify the physical persistence). For instance, considering the case of schema changes, unless the system bundles its language inside the database, for performance sake, to perform such changes in an "Orthogonal Persistence" system external to the database would take an completely disproportionate amount of time relative to using SQL in the database. The data runtime I work with uses the idea of lenses (where valid lenses would necessarily be reversible) to allow for coherent, undoable schema changes, but I still resort to SQL for regular (eager) migrations (the lenses system for schema changes can still be useful for migrations applied lazily).

jerf 2 years ago

Or, to put it another like, like visual programming, like "programming languages should be able to wear syntaxes like themes", like "there ought to be some sort of nocode type solution with all the power of conventional programming but easy enough for anyone to pick up", there are reasons why this is not how all programming works already. Good ones and big ones. And none of those reasons are that nobody has had the idea before or put work into implementing it. If you want to succeed with an approach like this, you're going to need to understand them.
To be honest, such experience as I've had with automated persistence has generally actually strongly convinced me of the opposite, that it is a positive good that we do not get persistence everywhere. Consider the understanding that we get from functional programming that state is generally dangerous and to be carefully managed. Pervasive persistence fights hard against that careful management. Now state is not just in your program up until the OS process is terminated, but it's all permanently and automatically persisted. You get a huge new class of bugs involving path dependence on what bits of code were running across what bits of state when, and who ran which versions, and you hit them all the time, and they are nightmares to debug. At least when the program has the courtesy to completely cease existing and leave some particular concrete bit of state behind for the future, and then run through your code to load it back from that location, you have boundaries, and procedures for minimization and reconstruction. I actually shy away from too much automated persistence, and also have a very skeptical eye on the ever-present promise of memory that is as fast as RAM but persists like SSDs... I rather expect the computing world will discover that "rebooting" is not just a crutch, but actually a pretty fundamental and useful tool. However much in theory your software should never need it, in practice it's just too useful.
That said, best of luck to those jousting with this windmill. I'm not saying don't joust, people in general probably don't joust enough, I'm just saying, learn the history of why this hasn't worked before and learn the challenges. Success is at the very least more likely if one learns from the previous efforts.
- dTal 2 years ago
  
  Offtopic perhaps, but I am interested in reading an explanation of the good, big reasons why not "programming languages should be able to wear syntaxes like themes". Racket seems quite an interesting counterpoint, and I've never heard it argued as fundamentally flawed.
  
  jerf 2 years ago
  
  The people who think syntax should be worn like themes do not realize how deeply syntax interacts with a language. It isn't a skin.
  Your homework, should you choose to do it, because of course I can't make you and don't really care :), is to write a "Haskell theme" for Python. The theme must retain *args and **kwargs capabilities, as well as all other Python capabilities, though those two things will be one of the first major issues you hit. On the flip side, write a Python theme for Haskell. (This one based on my own stabs at it doesn't have such problems with the capabilities of the language, but it sure does take all of Haskell's elegance and wrap it in the grace of an elephant with a sprained leg.)
  Also, bear in mind my claim is not that it can't be done. My claim is more like, nobody would want to use the resulting language with the resulting "skin". Languages are not all the same. Even dynamic versus static imperative languages don't really "skin" very well; compare idiomatic Ruby ActiveRecord-based code with Rust code. The differences are not just skin-deep.

pilgrim0 2 years ago

It seems like the Tuple Space [1] model for distributed computing, put forth by Linda, lends itself perfectly for this use case. I very much appreciate the ratio of simplicity to power offered by tuple spaces. At it’s original form maybe it’s too simple, but there are many ways it can be improved upon and brought into modernity.

[1] https://en.m.wikipedia.org/wiki/Tuple_space

qazxcvbnm 2 years ago

> Transactions are not modular because every function needs to know whether it’s already in a transaction or not, to be conscious of what global entry point in a completely different module owns the transaction.

I fail to understand the section about why transactions are unmodular. I've never encountered transaction code where the initiator of the transaction would affect the computation; could anyone elucidate this?

marcosdumay 2 years ago
You can't write this in any random function that you don't know who will call:
do $$
begin transaction;
update page set change_count = change_count + 1 where page_id = 1;
if (select change_count = 100 from page where page_id = 1) then
```
    rollback;
```
else
```
    commit;
```
end if;
$$
layer8 2 years ago

This is probably about the nesting of transactions.
If you start a transaction when the calling code already started a transaction, then either you get an error because nested transactions are forbidden, or a reference counter is incremented for the transaction, so that when you close your inner transaction, no commit is done at that point, and instead the commit is only done when outermost transaction closes.
This latter case means that you don’t know when your inner transaction really commits, and also if you perform multiple inner transactions and the later one fails, the earlier one will implicitly also be rolled back, because they are all really just one shared outer transaction.
Of course, you could use separate database connections with independent transactions, but then you get into deadlocks or other problems when you really work on the same data.
So you can’t have modules that build on each other, while each being able to use transactions independently from each other. Transactions don’t compose in that way.
You would basically have to “color” every function based on wether it may perform a transaction or not, and within a transaction block you would only be allowed to call functions that don’t themselves perform a transaction. It becomes more complicated when you have transactions that are not lexically scoped, but for example live in an object.

Retr0id 2 years ago

>servers only see a unindexed random-looking key value store

(quoted from the main readme)

I bet there's some fun attacks waiting to happen, related to watching for specific access patterns. Avoidable I'm sure, but I imagine it'll require awareness from application developers.

Retr0id 2 years ago

idk if the authors are reading this, but here's some feedback on the row encryption scheme:
1. Please use an AEAD!
2. IIUC, the current design exposes the hashes of the data values. This seems undesirable and I think you can avoid it.

nahuel0x 2 years ago

Surprised of not seeing Smalltalk mentioned on the article.

layer8 2 years ago

Smalltalk doesn’t persist by default, you have to explicitly save a snapshot of your image.
- mpweiher 2 years ago
  
  Still pretty orthogonal...
jdougan 2 years ago

Not so much Smalltalk, but Gemstone/S should have gotten a mention.

bugbuddy 2 years ago

Yes, we can absolutely implement this but your computer now runs x times slower and or is y times more expensive. For the vast majority of people, this would be a quaint exercise and real market exists for it.