SQLite async connection pool for high-performance

101 points by slaily 4 days ago

slaily 4 days ago

If you’re building Python async apps (FastAPI, background jobs, etc.) with SQLite, you’ll eventually hit two issues

- Opening/closing connections is fast, but not free—overhead adds up under load

- SQLite writes are globally locked

aiosqlitepool is a tiny library that adds connection pooling for any asyncio SQLite driver (like aiosqlite):

- It avoids repeated database connection setup (syscalls, memory allocation) and teardown (syscalls, deallocation) by reusing long-lived connections

- Long-lived connections keep SQLite's in-memory page cache "hot." This serves frequently requested data directly from memory, speeding up repetitive queries and reducing I/O operations

- Allows your application to process significantly more database queries per second under heavy load

Enjoy!

slashdev 11 hours ago

How does this help with the second issue, the write locks?
- ncruces 11 hours ago
  
  No idea if it applies, but one way would be to direct all writes (including any transaction that may eventually write) to a single connection.
  Then writers queue up, while readers are unimpeded.
  
  dathinab 6 hours ago
  
  if you enable WAL mode with sqlite then readers are not blocked by writer so only writers queue up without needing any special case handling to archive it
  (in general you _really_ should use WAL mode if using sqlite concurrently, you also should read the documentation about WAL mode tho)
  
  ncruces 2 hours ago
  
  Writers won't queue up, rather they'll storm the place, taking turns at asking “can I go now” and sleeping for (tens, hundreds of) milliseconds at a time.
  This only gets “worse” as computers get faster: imagine how many write transactions a serial writer could complete (WAL mode and normal synchronous mode) while all your writers are sleeping after the previous one left, because they didn't line up?
  And, if you have a single limited pool, your readers will now be stuck waiting for an available connection too (because they're all taken by sleeping writers).
  It's much fairer and more efficient for writers to line up with blocking application locks.
  
  rich_sasha 2 hours ago
  
  I was running into some horrendous issues with WAL, where the WAL file would grow boundlessly, eventually leading to veery slow reads and writes.
  It's fixable by periodically forcing the WAL to be truncated, but it took me a lot of time and pain to figure it out.
  
  normie3000 30 minutes ago
  
  Interesting, were there any warning signs beyond general query slowdown?
mostlysimilar 11 hours ago

Around what amount of load would you say the overhead of opening/closing becomes a problem?
manmal 8 hours ago

Doesn’t SQLite have its own in-memory cache? Is this about having more control re cache size?
- dathinab 6 hours ago
  
  yes, per "open connection", hence why not closing+reopening connections all the time helps the cache ;)

d1l 11 hours ago

This is strange on so many levels.

SQLite does not even do network I/O.

How does sharing a connection (and transaction scope) in an asyncio environment even work? Won’t you still need a connection per asyncio context?

Does sqlite_open really take long compared to the inevitable contention for the write lock you’ll see when you have many concurrent contexts?

Does sqlite_open even register in comparison with the overhead of the python interpreter?

What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?

simonw 10 hours ago

If you're talking to a 100KB SQLite database file this kind of thing is likely unnecessary, just opening and closing a connection for each query is probably fine.
If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.
> What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.
https://github.com/omnilib/aiosqlite/blob/895fd9183b43cecce8...
- d1l 10 hours ago
  
  That's even crazier - so you're using asyncio because you have a ton of slow network-bound stuff - but for your database access you are running every sqlite connection in it's own thread and just managing those threads via the asyncio event loop?
  
  reactordev 6 hours ago
  
  Thread pooling for databases, whether network based, or disk based, is common. A lot of times it will be baked into your client, so the fact that you think it’s crazy means you’ve only dealt with clients that did this for you.
  For really large data sets, you can query and wait a few minutes before getting a result. Do you really want to await that?
  
  quietbritishjim 9 hours ago
  
  What is crazy about that?
  
  lttlrck 9 hours ago
  
  Of course I don't know what the parent is thinking, but my thought is: why can't it be entirely event loop driven? What are the threads adding here?
  (I don't know anything about that project and this isn't meant as a criticism of its design or a challenge - cos I'd probably lose :-) )
  
  eurleif 4 hours ago
  
  SQLite doesn't have a separate server process; it does all of the work for queries in your process. So it's intrinsically CPU-heavy, and it needs threads to avoid blocking the event loop.
  One way to look at is that with a client-server database and an async client library, you have a thread pool in the database server process to do the heavy lifting, and async clients talk to it via TCP. With SQLite, you have that "server" thread pool in the same process instead, and async "clients" talk to it via in-process communication.
  
  mayli 8 hours ago
  
  Cause the sqlite-lib that python ships isn't async, and sqlite itself usually doesn't give an async API.
  
  maxbond 9 hours ago
  
  Python's asyncio is single threaded. If you didn't send them into a different thread, the entire event loop would block, and it would degenerate to a fully synchronous single threaded program with additional overhead.
  
  paulddraper 6 hours ago
  
  This is a common paradigm for blocking APIs (e.g. the sqlite driver)
- crazygringo 10 hours ago
  
  > If you're querying a multi-GB SQLite database
  In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.
  SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.
  
  Kranar 6 hours ago
  
  SQLite is a great database for organizing data in desktop applications, including both productivity software and even video games. It's certainly not at all unreasonable for those use cases to have files that are in the low GB and I would much rather use SQLite to process that data instead of bundling MySQL or Postgres into my application.
  
  simonw 10 hours ago
  
  Multi-GB is tiny these days.
  I didn't say anything about concurrent access. SQLite with WAL mode is fine for that these days for dozens of concurrent readers/writers (OK only one writer gets to write at a time, but if your writes queue for 1-2ms who cares?) - if you're dealing with hundreds or thousands over a network then yeah, use a server-based database engine.
  
  da_chicken 4 hours ago
  
  Multi GB is tiny, but that doesn't make SQLite magically better at large queries of multi GB databases. That's why DuckDB has been getting more popular.
  
  benjiro 3 hours ago
  
  Sqlite != DuckDB... two totally different DB types. One is a row based, the other is a column based database. Both run different workloads and both can handle extreme heavy workloads.
  
  brulard 9 hours ago
  
  I always had troubles having multiple processes get write access to the sqlite file. For example if I have node.js backend work with that file, and I try to access the file with different tool (adminer for example) it fails (file in use or something like that). Should it work? I don't know if I'm doing something wrong, but this is my experience with multiple projects.
  
  dathinab 6 hours ago
  
  There are multiple aspects to it:
  - sqlite is a bit like a RWLocked database either any number or readers xor exactly one writer and no readers
  - but with WAL mode enabled readers and writers (mostly) don't block each other, i.e. you can have any number of readers and up to one writer (so normally you want WAL mode if there is any concurrent access)
  - if a transaction (including implicit by a single command without "begin", or e.g. upgrading from a read to a write transaction) is taking too long due to a different processes write transaction blocking it SQLITE_BUSY might be returned.
  - in addition file locks might be used by SQL bindings or similar to prevent multi application access, normally you wouldn't expect that but given that sqlite had a OPEN_EXCLUSIVE option in the past (which should be ignored by half way modern impl. of it) I wouldn't be surprised to find that.
  - your file system might also prevent concurrent access to sqlite db files, this is a super obscure niche case but I have seen it once (in a shared server, network filesystem(??) context, probably because sqlite really doesn't like network filesystems often having unreliable implementations for some of the primitives sqlite needs for proper synchronization)
  as other comments pointed out enabling WAL mode will (probably) fix your issues
  
  Groxx 9 hours ago
  
  They can't write concurrently, but generally speaking yes, they can: https://sqlite.org/faq.html#q5
  Your throughput will be much worse than a single process, but it's possible, and sometimes convenient. Maybe something in your stack is trying to hold open a writable connection in both processes?
  
  simonw 6 hours ago
  
  That is because the default SQLite mode is journal, but for concurrent reads and writes you need to switch it to WAL.
  
  cyanydeez 9 hours ago
  
  PRAGMA journal_mode = WAL;
  
  naasking 9 hours ago
  
  > In which case SQLite is probably the wrong tool for the job
  Why? If all it's missing is an async connection pool to make it a good tool for more jobs, what's the problem with just creating one?
  
  nomel 4 hours ago
  
  It's a bit re-inventing the wheel, since solving all the problems that come with network access is precisely why those databases exist, and what they've already done.
  asyncpg is a nice python library for postgres.
  I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.
  
  simonw 2 hours ago
  
  https://pglite.dev/ is a version of that, in 3MB of WASM.
charleslmunger 2 hours ago

A connection pool is absolutely a best practice. One of the biggest benefits is managing a cache of prepared statements, the page cache, etc. Maybe you have temp tables or temp triggers too.
Even better is to have separate pools for the writer connection and readers in WAL mode. Then you can cache write relevant statements only once. I am skeptical about a dedicated thread per call because that seems like it would add a bunch of latency.

bawolff 5 hours ago

> The primary challenge with SQLite in a concurrent environment (like an asyncio web application) is not connection time, but write contention. SQLite uses a database-level lock for writes. When multiple asynchronous tasks try to write to the database simultaneously through their own separate connections, they will collide. This contention leads to a cascade of SQLITE_BUSY or SQLITE_LOCKED errors.

I really don't get it. How would this help?

The benchmarks dont mention which journal mode sqlite is configured as, which is very suspicious as that makes a huge difference under concurrent load.

mayli 8 hours ago

FYI, I've once had few long-lived connection with wal, and wal file just goes exploded. Turns out sqlite won't truncate the wal if there are open connections.

infamia 6 hours ago

Using WAL2 should make that problem better. It has two WAL files it alternates between when making writes, so the system has an opportunity to check point the WAL file not in use.
https://sqlite.org/src/doc/wal2/doc/wal2.md

bob1029 8 hours ago

I've been thinking about trying pre-serialization of SQLite commands to enable single-writer against a singleton SQLiteConnection using something like Channel<T> or other high performance MPSC abstraction. Most SQLite providers have an internal mutex that handles serialization, but if we can avoid all contention on this mutex things might go faster. Opening and closing SQLite connections is expensive. If we can re-use the same instance things go a lot faster.