Very cool project! Being able to replay history is huge and makes it possible to look back in time without having to make full copies of the database. This is something that is very much lacking in many SQL systems where you need 'temporal tables' to achieve the same effect, but those are really limited as they have to be setup specifically and often duplicate data unnecessarily. If you are interested in this topic, I suggest you study Datomic and the EAVT data model. This is likely where database architecture in the future will be headed.
> The database is stored in memory. So it must be small enough to fit in RAM, and the full journal has to be replayed from scratch when opening a file.
For larger datasets, you really want disk support. Using something like SQLite or DuckDB as an append-only store is another way to achieve this effect.
Also lack of a proper query language will be a problem for long term serious use. A simple hand-rolled program API can only get you so far, until you need more advanced querying.
> Unlike XML or JSON, joedb is a binary file format that does not require any parsing. So, joedb files are much smaller, and processing data is much faster.
It doesn't do transactions or history versioning, but it is also very fast in memory. Something like jq or JSONPath on a disk-file version of this format could be interesting.
If you have reliable file locking you can implement a journal-only with multiple users without needing a server. You have to take care of write errors and deal with partial writes, which can be tricky with a binary format. A long time ago, I implemented one based on XML. Some non-Windows file-severs (citrix?) did not have reliable file locking, causing corrupted files.
Author of joedb here. I noticed a surge in traffic to my web site, and found this post. I would be glad to chat about joedb.
I have been developing this library for more than 10 years. I could not find a simple light-weight tool to serialize data to files with proper ACID transactions, and did not want to use a SQL database. Even SQLite is not that light, and using SQL strings as API from C is very unpleasant. I thought about the simplest possible way to implement ACID transactions, and came up with the design of joedb. It is orders of magnitude less complex than a SQL database, and provides the simple type-safe low-level access to data I want in C++ code.
An approach very close to one I've been thinking about lately.
My three cents: compact the journal when its size exceeds the actual data size.
With thresholds or other knobs; with the point being the initial load time should be directly proportional to the amount of actual data. Everything else/older is a backup.
The value of the journal having history (with comments and timestamps) is huge. I think what I'd prefer to see is having a start sequence of replay journal, build in-memory structure, optionally move old journal to backup name and write out minimal/compressed/comment-and-timestamp-stripped journal to new file. Optionally could be based on size delta; e.g. write if it's less than half the size of the old journal. This keeps journals as append only, while still giving access to full history. It does require some external management to avoid file usage growth even faster than a single journal; but it reduces startup time, and allows a management strategy like just deleting backup files older than a given date (once they're in cold backup, if needed).
It is very valuable but compaction enables a number of use cases where events are generated in significant quantity or you need to save space, like if you’re implementing event sourcing at thw GUI layer (the event store is basically a journal).
Only for some use cases. I don’t think the parent is arguing for forcing compaction. I’d personally use this with periodic compaction (cronjob), but I can see the utility either way.
Very cool project! Being able to replay history is huge and makes it possible to look back in time without having to make full copies of the database. This is something that is very much lacking in many SQL systems where you need 'temporal tables' to achieve the same effect, but those are really limited as they have to be setup specifically and often duplicate data unnecessarily. If you are interested in this topic, I suggest you study Datomic and the EAVT data model. This is likely where database architecture in the future will be headed.
> The database is stored in memory. So it must be small enough to fit in RAM, and the full journal has to be replayed from scratch when opening a file.
For larger datasets, you really want disk support. Using something like SQLite or DuckDB as an append-only store is another way to achieve this effect.
Also lack of a proper query language will be a problem for long term serious use. A simple hand-rolled program API can only get you so far, until you need more advanced querying.
> Unlike XML or JSON, joedb is a binary file format that does not require any parsing. So, joedb files are much smaller, and processing data is much faster.
Some time ago I created a JSON-compatible serialization format that is zero-copy (no parsing required): https://github.com/fastserial/lite3
It doesn't do transactions or history versioning, but it is also very fast in memory. Something like jq or JSONPath on a disk-file version of this format could be interesting.
By Rémi Coulom of Monte Carlo tree search fame. I think he originally used it in his Go engines.
If you have reliable file locking you can implement a journal-only with multiple users without needing a server. You have to take care of write errors and deal with partial writes, which can be tricky with a binary format. A long time ago, I implemented one based on XML. Some non-Windows file-severs (citrix?) did not have reliable file locking, causing corrupted files.
Author of joedb here. I noticed a surge in traffic to my web site, and found this post. I would be glad to chat about joedb.
I have been developing this library for more than 10 years. I could not find a simple light-weight tool to serialize data to files with proper ACID transactions, and did not want to use a SQL database. Even SQLite is not that light, and using SQL strings as API from C is very unpleasant. I thought about the simplest possible way to implement ACID transactions, and came up with the design of joedb. It is orders of magnitude less complex than a SQL database, and provides the simple type-safe low-level access to data I want in C++ code.
Was going to say that I hope Joe doesn't end up going to prison for an unspeakable crime, but then I saw it was an acronym.
Is that a Reiser reference or am I missing something?
An approach very close to one I've been thinking about lately.
My three cents: compact the journal when its size exceeds the actual data size. With thresholds or other knobs; with the point being the initial load time should be directly proportional to the amount of actual data. Everything else/older is a backup.
The value of the journal having history (with comments and timestamps) is huge. I think what I'd prefer to see is having a start sequence of replay journal, build in-memory structure, optionally move old journal to backup name and write out minimal/compressed/comment-and-timestamp-stripped journal to new file. Optionally could be based on size delta; e.g. write if it's less than half the size of the old journal. This keeps journals as append only, while still giving access to full history. It does require some external management to avoid file usage growth even faster than a single journal; but it reduces startup time, and allows a management strategy like just deleting backup files older than a given date (once they're in cold backup, if needed).
It is very valuable but compaction enables a number of use cases where events are generated in significant quantity or you need to save space, like if you’re implementing event sourcing at thw GUI layer (the event store is basically a journal).
But the event store is also your undo stack, then. Keeping it infinite (or deliberately trimming it at application launch) improves user experience.
Only for some use cases. I don’t think the parent is arguing for forcing compaction. I’d personally use this with periodic compaction (cronjob), but I can see the utility either way.
You can selectively compact the journal to only compact the numerous GUI events leaving domain events uncompacted (I do this for a CAD app I develop)
KV store in Rust, backed by a disaggregated, replicated journal https://github.com/s2-streamstore/s2-kv-demo