As a counterpoint, Derek Sivers says Simplify: move code into database functions <https://sive.rs/pg>
I think they are both right, Alex Kondov for keeping his domain logic mostly in the application layer, and Derek Sivers for mostly keeping his in the database layer.
Why? Because that's what they know well, and they can keep the domain logic mostly in one place. If Alex can keep most if his domain logic in the application layer that he knows better, that's the better choice for him. If Derek can keep most of his domain logic in the database layer that he knows how to utilize well, that's the better choice for him.
(I can't find it now but I thought someone very smartly reflected on the Why Perl? <https://news.ycombinator.com/item?id=35646612> post today that the main reason why anyone will choose any programming language is that they know how to use it well. I think the same applies here. This is why many programming languages are valid choices for different people and project, and this is why different architectures are valid choices too.)
When I was choosing jobs in the real world[1], I would avoid any company that depends heavily on stored procedures for business logic and where things were ruled by “database developers” and all the developers were doing were calling stored procedures.
Stored procedures are harder to unit test, do automated rollbacks, version control, etc.
I’ve never seen a system that 5 years in anyone said “I’m so glad we decided to use stored procedures everywhere”
[1] I work in consulting now and have no allusions that I face the same issues that day to day developers do. I get to lead development “solutions” and move on.
Your database will be the first major thing to cave under load.
The traditional tools most folks use (postgres/mysql/maria/etc) want to scale vertically, not horizontally (you can do it, but it's hard, and has lots of drawbacks).
Eventually - You will hit hardware limits. Providers only have instances that are so big. There is a ceiling on how much you can scale your DB without having to do major, major work.
I've seen this pattern play out now at 3 high growth companies I've worked at. The one that was using lots of stored procs and materialized views got hit the fastest and the hardest.
I’ve worked with the big boys and seen plenty of times where databases fell over no matter how much data you through at it because of locking issues and other database contention issues.
It’s much easier to horizontally scale a bunch of application servers than database servers.
“clusters” only help when you can deal with eventual consistency (which is often admittedly more than most businesses realize). Sure you can scale your read only instances.
BTW: I work in the consulting department of a little company I am sure you have heard of that knows something about dealing with “the big boys” at scale.
More generically, scaling stateful things is generally hard than scaling stateless things (because you have to figure out what to do with the state).
Even horizontally scaling systems you either have to redistribute the data on scale actions or keep the data in place and send new data to new infra (which may or may not be helpful)
I would agree, but I think it's worth checking out one of Siver's postgres repos where I thought he tested things pretty well[0]. Also things like Postgres are often run as a "server" on local dev or CI but could easily be run more ephemerally[1]. That should help with testing various states and behavior of the database.
I still would actually follow what you said thought because often times places which rely on stored procedures are, as you say, ruled by developers with the database access to manage those procedures and from what I've seen often do not have anything close to a reproducible setup. Besides the fact that often it's some proprietary database which is much more difficult to run locally than sqlite3 or PostgreSQL.
Still I keep that separate from considering if views/stored procedures are actually a good solution. I think sometimes it really is. Honestly Sivers' experiments really convinced me of that.
It is a matter of database tooling, Oracle and SQL Server provide quite good IDEs, graphical debugging, merge tooling, testing infrastructuring.
I can tell you that at least a pharmaceutical is quite happy to have plenty of stored procedures into their Oracle databases, including making use of Apex, and it is at least several decades in production.
And when you get ready to rollback, can you just revert all of your code and deploy? Can you just do a “git branch” and work on your own isolated code in your dev account?” How well does merging work when you do a pull request and have to merge your code and make sure your stored procedures are up to date?
If there are ten developers with 10 feature branches are they going to be running 10 instances of Oracle with their own “branches” of the stored procedures?
And then copy “enough” data on their computers to have a realistic test and make sure you have some type of script so sensitive data isn’t on their laptops and ensure schema changes are constantly propagated to each developers workstation. This isn’t even mentioning if you have a commercial database like Oracle where you have to deal with licenses.
And how do you merge changes and do rollbacks?
Not to mention the whole
GetCustomer_1
GetCustomer_2
GetCustomer_3
Where code is copy and pasted and modified slightly.
It’s much different. As someone who is a long time developer who also deals with the “DevOps” side of the world. I never design architectures of the former (development) without thinking about the latter from day one.
You can’t just wave your hands and throw it over to the “DevOps” folks. Doing so is kind of just the opposite of what “DevOps” is suppose to be.
I’ve worked on a codebase that had 300K lines of pl/sql and I didn’t find it all that fun to work with. The problem is that you are working in a very constrained environment, starved of language features and libraries, and with a deployment model that is deeply tied to the data and therefore more cumbersome. The IDE’s helped but didn’t matter that much. Also, oracle has per-cpu licenses, so the more code runs in the database the more expensive it becomes.
I think the best design is an API in a language designed for building API’s, on top of a dumb database, with all access gated through the API.
If you're going that approach, I think a lot of databases have facilities for extending outside plain SQL. Afaik Oracle can bind to Java and Postgres has an extension interface
I do tend to agree with API-outside-the-DB pattern. I guess Oracle supports some form of QoS but I don't think MySQL and Postgres do. That means having multiple apps hit the same schema can cause starvation issues
Also not sure how hard it is to monitor user resource usage. That's more important for billing (even if it's "fake" money from 1 department to another)
The line of reasoning seems valid, do what you do best; anyone who has had to deal with stored procedures in legacy systems will however disagree. Store procedures are to be avoided.
What is wrong with stored procedures ? They are not fashionable nor especially expressive, but they perform reliably and they do pretty much what anything that might be expected from relational logic.
They're turing-complete and modular so it's not really about what they can or cannot do.
Testability, tooling and the open-source ecosystem and either bad or non-existent. Writing PL/SQL is the worst environment I've worked in. That database sent emails, processed CSVs scheduled jobs, etc. yet there was still a web app to maintain next to it.
They're OK for certain things like essential triggers or performance-sensitive functions, but I would never deliberately put app logic in there. Major red flag.
> Testability, tooling and the open-source ecosystem and either bad or non-existent
If you're properly testing the code in your application that exercises persistence, that means your test harness runs a real database like the one you're running in production and thus you can also write the database logic tests using your own application's testing facilities.
Of the things you listed, "the database sends e-mail" is the only one where I'd think you'd have to change the code at all, and have the database go through a mockable middle-man so that it becomes testable; but everything else can be comfortably tested from a test suite that is able to talk to a real database.
Yep. Releasing, testing, debugging, etc are all more difficult in stored procs than in a “regular” language. Stored procs have other down sides:
- often unique to that DB, so locks you in
- Scaling that code is now tied to scaling your DB tier
- Tooling is often very inadequate
- Versioning and backwards compatibility of code can be a challenge
Some of those concerns apply to any database. Your query could slowdown if the database picks a bad plan, so you could say you will never trust the db to scale. That's separate from scaling the stored proc - just using the db can run into a scaling issue.
No, what I mean is your code scaling is now directly tied to how your DB scales. Your SP code can be impacting the rest of your DB, and vice-versa. I have seen large SP based systems to require Oracle boxes to be scaled up at enormous cost (hundreds of thousands or even millions of lines of SP).
Not because of slow queries, but just the cost of executing the stored procs themselves.
I've never worked on a system where we didn't make an effort to make sure we were NOT dependent on a specific DB vendor/implementation and avoiding SP's has always been a part of that.
We went through and got Oracle out of our enterprise after a particularly nasty licensing negotiation and audit with them some years ago. So it does happen.
I expect at some point we'll have a similar initiative around cloud providers.
Depends on the product in question. For integration products, its very common to support multiple database vendors with the same core engine. So any SQL written and db data-types is generally SQL-92/99.
This is pretty good since one can generally fast unit test DB code with an embedded database.
I’ve done it twice. Mass migration off Oracle and before that DB2 and Sybase. Current system is also occasionally sold on premise to customers who have their own db preference and we can support them all.
I used to deal with stored procedures in legacy systems and honestly it was not bad.
The first concern was getting the stored procedures into version control and creating a mechanism to update the systems based on the things in version control upon deployment. After that it was smooth sailing.
I think most people are reacting to the impedance mismatch of normal software lifecycles versus sticky databases. Naive or legacy users and documentation use the database as the system of record for the custom code, rather than as an execution environment provisioned from a real system of record like a source control system.
Where it gets even more tricky is not just stored procedures but application-specific functions embedded in views, or triggers running custom functions. It's no longer just a library of functions you can choose to call (or not) durng a query, but code that runs on its own based on clients queries that never directly mention the functions.
The same goes for schema management, and I think that is a big reason why so many developers fixate on "schemaless" approaches. They want to pretend that the database exists in a static way outside the software lifecycle, just like they ignore the filesystem and operating system and treat it as an unchanging abstraction.
Store procedures and functions are very good for many things. I am not claiming you should put all your business logic in the database, like old-time Oracle consultants wanted (for self-serving reasons), but some calculations are better handled in a single place, instead of being reimplemented in different parts of the stack, or stacks, given that it is common to write a project using one framework and language, and then migrate to another one.
The only reason to avoid SP is that you don't know any SQL and your ORM can't write the function calls, so you can't call them.
Which is a general complaint I have about modern software development: many things are done in convoluted ways because some developers don't know SQL and don't want to learn it.
+ 'outlive' the user interface - this data will be used forever but we are not sure where e.g. financial docs, records, etc etc
+ 'extend' past the user interface: this data started on an installed desktop app / green screen but we are not sure how we may want to extend it - eg to an API or to web
the DB is the right place to put business logic.
Coupling the data with business logic, in the DB, allows almost complete flexibility in how it is access and interacted with at the expense of complete lock in to the DB platform. Which may or may not be a bad thing. But if you have a skilled SQL team and some resources to pay the DB licensing and support piper, this is a good direction to ensure app support longevity.
For example, you can do the same thing on desktop or phone or web client or even hardware switches: say you have a DB stored_procedure to indicate that the process of manufacturing this part has reached stage X
(these is obviously really contrived but it is an example to make things clear)
You can run this, if it is in the DB by:
- hitting this button on the assembly line
- scanning a barcode can do the same thing
- having a user change a status in a UI
- have a batch job run off a script with a list of parts to apply the status to
And even better, if you want to update how the stored proc works, you update once in the DB and it is available to all 'interfaces' instead of having to update hardware switch code + user screen UI code + batch script code etc etc...
I know that an argument can be made that you have an app server layer separate from the DB so this is not a 'real' problem but this approach just puts the app server IN the DB
One closing thought: it is often easy to forget that, in most cases, the data is the product of software development and the tooling is just support. Only counterpoints I can think of here are games and interactive demo systems where the process ( of playing the game or using the software ) is the product.
Positioning the database and front-end as being similar is a mistake if business goals are thought about from a "the data is the product" perspective. If the rules about how to handle, manage and interpret the data are a core component of that data product, then the argument that the right place for business logic is in the DB is made even stronger.
Derek is right. Tooling is the main reason not to put code in DBs. It’s a pain, that’s all. Tools are like ‘70s calls to say never forget me’. But logically it makes eminent sense. At the end of the day, writing code in the dB is not that different than writing components for an app server.
There is also another reason. Designing effective schemas and bundled logic is not junior level work. Microservices, imo, all song and dance aside, really were trying to deconstruct the “monolith” of the schema: a schema that serves the requirements of today, is modular, and is possible to sanely extend in the future is not commodity work. So a practical reason not to do it is the required labor (expertise).
Both are implementations of the same idea: not splitting the logic into two services deployed separately. Either store data directly on disk or in a general-purpose piece of infrastructure that needs no configuration.
I think a really important aspect is keeping the logic in your head (and others learning it) is helped by mirroring this single source of truth with your implementation.
Okay, how about "Either put your domain logic in the database, or put it in the code, but don't randomly scatter your shit throughout the code and database"?
Some RDBMS features, such as stored procedures and views, can quite naturally be versioned, and become as ephemeral as your application code.
For example, have your application setup a schema to contain its version-specific database components. The schema will contain an immutable application version, such as its commit hash, in its name. This allows a given version of the application to only use its own set of sprocs and views. On deploy, run the SQL scripts to create the sprocs and views for that version.
You'll note that sprocs and views are not mutated under this strategy. This allows multiple versions to exist concurrently, which facilitates things like blue-green deploys.
What you're describing can be done to great success with the right type of requirements. Say you're building an information warehouse for OLAP style work. Totally makes sens to pyt all the heavy stuff in the DB because the data model is going to be pretty rigid and it's the sort of asset to get enough attention from the business to be supported correctly with a full staff.
On the other hand, defnining everything in code and using the db as a dumb store that's fractionally better than using files is a pretty good strategy when you may be handing over support to a team that won't be giving full time attention to the app. It also handles those key person risks etc by keeping thigns in one area of the app code.
I've only ever seen this work with "application managed databases" i.e. where the app truly owns and controls everything related to the database and can freely run ddl. The db code is then just part of the app and can be spun up against a completely empty db.
The death for this kind of thing is dbas administering the database separately because the coupling will kill you.
My two main reasons for (almost) never putting logic in a database are
1) it will tie you to a specific database. I know a handful of companies that have almost all of their IPR in the shape of stored procedures. Even if they wanted to escape from the database vendor they can't. Not without rewriting everything they have spent a decade or two building.
2) languages such as PL/SQL are much poorer than any modern programming language. If you have ever tried to write a business application in PL/SQL: it is no fun. At least not for me.
(What I mean by "almost never" is that I do use constraints quite a lot which you could argue isn't logic, but it is still asking the database to validate my data and model).
Even then, the implementation of consistency levels alone makes you have to rethink swaths of your code when you switch databases.
This is why I dont really care, changing data stores is almost never trivial even if you just use them for data storage.
What if the type isn't supported, what limitations on reading and writing exist in THIS engine? Which types of queries are weirdly slow and amazingly fast?
The list goes on for things that you don't need to add additional logic to.
This is why I dont really care, changing data stores is almost never trivial even if you just use them for data storage
Agreed. It’s strange to hear the lock-in argument when in practice even a set of pretty regular queries and DDL is generally non-portable, neither syntax nor performance-wise. Personally I can’t think of saying “let’s move <projectname> to <servername>” seriously at work, unless it’s a prototype which doesn’t require support yet or is a really insignificant auxiliary db.
>It’s strange to hear the lock-in argument when in practice even a set of pretty regular queries and DDL is generally non-portable, neither syntax nor performance-wise.
Many ORM's can solve this problem.
>I can’t think of saying “let’s move <projectname> to <servername>” seriously at work, unless it’s a prototype which doesn’t require support yet or is a really insignificant auxiliary db.
I've had to do it three times in my career. It usually comes from management. It ain't pretty. Two were from M&A and one was to get rid of an obsolete database.
I don't think the expectation that you are going to write DDL statements that will work in all SQL databases is reasonable. Which is why I never do this. My take is: it isn't going to be easy so don't even bother.
I have separate DDLs for different databases. And while I've heard a lot of people theorize that this is a maintenance nightmare, it has never been that in my experience.
You are going to spend a lot more time being frustrated if you try to make one DDL work for all databases. Especially if a new database is added to the mix (which might not even be an SQL database).
I have written perhaps half a dozen applications in the past 3-4 years that supports more than one database. Typically for schemas somewhere in the region of 5 to 15 tables. It rarely takes me more than 5-10 minutes to take the SQLite DDL and adapt it to, say PostgreSQL, including rewriting the SQL statements in the storage API layer. And you can design with this in mind.
(I often start with SQLite as it is the least feature rich database. Anything I can express for SQLite is easy to express for other SQL databases, and possibly NoSQL stores. And I can make do with "lowest common denominator" because any fancy stuff tends to just make life harder, so I tend to avoid it unless I have very, very specific needs)
Granted, it is a bit annoying to do schema updates across several DDLs, but so far, I've spent far less effort than a colleague of mine who always tries to maintain a single DDL and a single storage API layer implementation that is supposed to work across 2-3 SQL databases.
Also granted that it isn't elegant to maintain different DDL and store API implementations, but it turns out to not be as much work as people tend to claim.
There are tools that can handle this, Liquibase is one that I have used.
You write a change set that describes the schema you want to build and it generates DDL for whatever database you are using. It can also reverse engineer a change set from one database that you can use to generate DDL for a different database.
I supported an application that ran on half a dozen different DBs and it wasn't much of a problem. There were a few places where I had to specify change sets specific to one database implementation.
I was looking at a schema generator written in Go a while back, but I can't remember the name. It almost did everything I wanted except it didn't support SQLite (since no proper grammar exists for its peculiar variant of SQL).
What I'm looking for is something that can generate schemas for SQLite, PostgreSQL, MySQL and perhaps MS SQL Server that can be run from the command line (so I can use it from Make)
I have the opposite experience: that for almost all projects I have done in the past 20 years, switching data store is almost always nearly trivial. However, it does come at the cost of giving the design careful thought and sometimes forgoing useful functionality that it is tempting to make use of.
One reason for this is that I had a very bad experience with this. I was involved in two projects in the 1990s where we went all in and used a lot of features unique to two different databases. In one of the cases, the database vendor (a small niche player) got acquired by an automation company and their "real-time capable" database was no longer available for licensing as a separate product. This left us stranded. The other project involved loads of PL/SQL in an Oracle and what killed us was the combination of Oracle licensing fees, the scaling limits of the database and that it was really hard to find good PL/SQL developers.
Today we use several strategies to keep the data store choice flexible.
1) we use databases as pure data stores. Sure, we use constraints, but only as an (optional) extra safeguard that lets us know when we have screwed up. We aren't dependent on the constraints, but they are immensely useful for rooting out inconsistencies in the code and the model.
2) we try to model the data in ways that can be realized on all databases (or stores) of a certain class. Yes, this means that sometimes you have to use slightly different data types than you'd like, but that isn't the end of the world. In some cases what you store is a reasonably useful representation of a type, and then you create richer types to represent it in the application.
3) we tend to start development using SQL databases, but not all parts of the schema are expressed as a purely relational model. This way we can split the schema and handle parts of it using an entirely different class of database or store. This gives us the convenience of being able to start a project with a single database, exploit the availability of constraint enforcement to show us where we make mistakes, but as we need to scale up, parts of the schema can easily be migrated to, for instance, a timeseries database.
This both helps us off the ground quickly and it allows us to postpone actually building a scaled up implementation. (This has proven particularly useful on projects where we have been able to postpone handling a large scale case for years and then having more technology to choose from later).
4) to accomplish 3 we often spend a lot of time understanding what the key data access patterns for the application are going to be and keep this in mind when designing a schema and a scaling plan. It also helps if you can identify where you can decide that data is going to be immutable so that consistency becomes easier later.
5) the application logic never talks directly to the database. There is always a domain specific store service API between the application. We try to make this API as semantically narrow as possible, and try not to make any promises that would be hard to keep if we have to change storage technology. Again, it helps to partition this API into areas that can be realized with different classes of databases, so you can make fewer promises on the parts of the API that will handle the part of the schema that drives most of the operational cost.
6) all tests are written against the store service API, never directly against any database or storage technology. This way you already have a comprehensive test suite if you need to migrate to a new database entirely, or for when you start implementing different parts of the store using different database technologies.
In the beginning this approach will require a bit more work, but when you have established practices and it becomes a habit, it actually doesn't represent more work to implement than more direct ways of using databases. Yes, we sometimes forego using functionality offered by the database that would have saved us some work in the short term, but it pays off in the long run because you end up with simpler and more flexible designs.
A few years ago I did an experiment with two new hires. Their first task was to add support for a relatively new database to an existing application. They had zero prior experience with the application. I pointed them to the storage service API and the tests and I said "write an implementation of it that uses database X" (which was a NoSQL database). After about two days of work they had an implementation that passed all of the tests and could run all of the benchmarks without any problems. This included adding configuration options that allowed the application to initialize and run with the new store implementation. I think we discovered only 2-3 minor issues in their code over the next 12 months, which isn't bad for a first contribution to an unknown code base.
(Since we only had one set of tests which was leveraged across all implementations it also meant that every time we expanded and improved the tests, we'd be able to improve all implementations as the tests got better at finding subtle mistakes).
Because it was so cheap and easy to add support for new databases, I think we had support for something like 5 different databases. Including an embedded SQL database so that you could run the application locally for experimentation or integration testing (it is really nice to be able to fire up the application with an in-memory database for integration tests. It wasn't unusual that we ran integration tests that would create and destroy the storage a few hundred times in just a few seconds). After I left the project they added support for another database and migrated terabytes of production data with minimal downtime. The migration tools they wrote for this used the store service API as well, which means they developed one tool that would let them migrate between arbitrary databases. (So if, in the future, they chose to switch databases again, the migration tool already exists)
We also had adapters that allowed us to run different permutations of databases. So we could handle parts of the schema suited for relational databases using an SQL database, and handle the bulky bits with somewhat narrower requirements using a NoSQL database. This requires a lot of care when you design the API to have built in "fault lines" where you can split the schema, but with a bit of practice it becomes a habit. (In some cases we've made "conditional" schemas - so for instance if you run the entire thing in one database you may have a bunch of constraints to help you ensure consistency, but you run with mixed stores, you may not apply the constraints that would span across databases. Again, we try not to depend on the constraints, but they are a great help in designing the schema, the store service API and the application).
If you do #5 well then yeah, you can get away with it, because your database is just a repository pattern away.
I think people want to avoid this because its annoying or you have to think a decent bit up front or its just cheaper and easier to do this directly this one time, until its everywhere.
I think people over-estimate the cost of doing this and lack experience with the benefits. I've been doing this for ~20 years now, and when comparing our code to that of a few customers that use databases in a more "chaotic" manner, it is easy to see how quickly it pays off.
It performs well because the context switch between the application and the database is costly in terms of performance. If there is logic to be performed per row, and that logic also requires data that's in the DB, then the performance can be much faster with PL/SQL or similar DB code.
That context switch is routing queries and results between the client and server using IO requests.
Most databases allow you to write extensions in your language of choice, this can be a good middle ground where you write domain specific DB extension in the language of your application, this way, you can share logic between your database and your application layer.
I do that in Go for chemical stuff, I am pleased with this approach.
Which database are you using and what kind of extensions are we talking about?
(I often create Go types which are then serialized to some useful representation in the database, and de-serialized to Go types when I query the database. I usually use SQLX for most interaction with SQL databases since it does a lot of the tedious work for me (without introducing ORM'y madness). On the rare occasion where this eats too much into performance I can always use the DB library directly)
The argument is not that you should use application code where SQL is more efficient. The argument is that the sql shouldn’t be in the database itself as stored procedures separate from your application code.
Only DB portability would be a concern to not used stored procedures, that I can think of. If you want to write an application that can use multiple DBs then you might have a problem.
Even if you don’t use stored procedures, simple stuff like how you limit the return results is different between databases like SQL Server and MySQL as is how you do bulk inserts.
How you specify locks, the support data types, casting, intrinsic functions, conditional statements (case functions) are all different between databases.
I used to write a lot of PL/SQL and your assertion is simplistic at best, and misleading at worst.
First off it depends on what processing you are doing on what kinds of data sizes. If you are doing relatively simple things, sure. But remember that PL/SQL isn't particularly fast if you need to do anything more complicated. Don't do anything complicated (we'll get to that in point three).
Second, someone has to write that code. Writing PL/SQL is laborious because the tooling is bad. Pick any half-decent language today and the tooling is going to be fantastic. You have language servers, you have optimizing compilers, you have profilers, you have debuggers, you have lots of ways to instrument, inspect, and verify code. Where do you plan to recruit PL/SQL developers and how much are you going to pay them? Because in my experience: there really aren't that many usefully competent PL/SQL developers available on the market. There are lots of people willing to "give it a go", but competence is rare.
Third, it complicates your database. Instead of just doing relational math and executing queries it now also runs code that might not be as easy to model on the same cores, eating into your database IO and memory budget and causing lots of variability. Sure you can hound your PL/SQL developers to adopt various tricks in order to lower the variability of database performance, but that gets expensive. And the code gets ugly.
Fourth: It makes no sense to apply extra load to your most expensive component, which also happens to be the component class that most often also ends up being the bottleneck of your entire system. (Cloud provider pricing is a somewhat useful proxy for the cost of different systems. Have a look at the cost of RDS capacity compared to EC2)
Fifth: did you measure or did you assume? In the cases where you measured a clear gain, was it still worth the cost of all of the above? Did you measure the right thing? Could you argued your case with numbers?
My experience is that no matter how good you make the database logic, a problem will remain: the logic is in the database.
Want to do nice things to reason about your data changes using your code, or maybe show someone a preview of their changes? Sorry! All the interesting logic is in the database. You need to write your changeset to the database to figure out what’s really in it.
There are a lot of reasons not to use stored procedures heavily and I personally wouldn’t even work for a company that did, this is a relatively poor one.
This is like all of the people who think using Terraform gives them “cloud agnosticism” even how you do limits and bulk sql is different between the major databases.
Being database agnostic is so hard. I have an app that I test on Postgres, MariaDB and SQLite, I thought I was doing well. Nope, turns out it doesn't work with MySQL or MsSQL and would need a big schema change to support them (something about diamonds in the cascade relationships).
As a counterpoint, tying yourself to a specific database is the only way to actually use any database to its full potential. I'm a big fan of Postgres and use it basically all my projects. But if I was forced to write my code in a way that I could easily swap Postgres for MySQL or Oracle at a moments notice, I would lose much of the value that I feel Postgres offers.
CQRS simply means "replicate the data", while ES refers to event sourcing. When used together, CQRS/ES entails replicating data on various storage systems by replaying events from the Event Store. This replication can occur continuously on any RDBMS, and you can utilize all of its features, including vendor-specific ones, for querying.
In this setup, your application houses domain-specific code, also known as the "core domain" in DDD terms, while generic code exists in generic tools like RDBMS, called the "generic sub-domain."
For example, report generation is in generic subdomain, and RDBMSes are great at that.
In all my career (started coding in 1986) I keep hearing about 1), worked in several RDMS agnostic products, never saw a customer actually migrating into another RDMS.
PL/SQL is just an Ada dialect with a bit of SQL on top.
> In all my career (started coding in 1986) I keep hearing about 1), worked in several RDMS agnostic products, never saw a customer actually migrating into another RDMS.
no kidding, you hear about this way more than it actually comes up. especially for the folks who treat the RDBMS as a dumb store and don't use the features.
I've seen apps replaced/rewritten on top of a database multiple times. I've seen it be very convenient to have enough logic in the database that it was easy and low-risk to have multiple programs share a database, without sharing much or any code.
I've never actually seen a database system get swapped out, in real life. The only two ways I can imagine it happening are: 1) someone made a colossal fuck-up picking the DB software—and, it'd have to be really bad, I've seen a lot of applications stick with a mistake there and just deal with the pain; or 2) extreme hockey-stick growth (which, file under "good problem to have, and we can afford a proper migration")
In fact, #2 is usually addressable with supplementary databases, a little re-structuring of the existing database, and smarter caching, except in super-rare scenarios of truly wild growth or unusual access patterns. More often than not, even then, the original database sticks around in some form or another.
Shows how different the software ecosystem is. For a product that I worked on - during a decade, the DB system got swapped out 4 times - all driven by customer demand. The app software evolved continuously but the DB was hard changed - with all relevant schema and data migration applied during the upgrade process.
The phrase "to have multiple programs share a database" says a lot about what school of thought one belongs to in terms of architecture.
Using the database as the integration point went out of fashion a bit over 20 years ago. And for good reason. It is not a good integration pattern. It makes the implementation the integration interface rather than using a tightly controlled abstraction that can be kept narrow and preserve degrees of freedom for future evolution. And as you allude to: it gets worse because you have to move more application knowledge and logic into the database to make sure applications don't mess it up.
In fact, this approach is so bad that Amazon's CEO explicitly forbade it at Amazon (Bezos API Mandate, 2002). How often do you see CEOs, even of tech companies, find an integration pattern so counterproductive and dangerous they announce to the company that they will fire anyone using it?
I would say that about 60-70% of projects I've worked on in the last decade have switched databases at least once. One project had support for 5-6 databases and used three different databases in production over a span of 5 years. Adding support for them had fairly low cost (because the internal design made it easy) and it gave us the opportunity to try several things in production, at scale, and then choose what produced the best results. This is a lot better than just choosing something up front and sticking with it because you'll have the data to make an informed choice.
> 2) languages such as PL/SQL are much poorer than any modern programming language
true but it is for a reason.
For example you can replace/upgrade any piece of code on the running system transactionally. But in a modern programming language, let's say Java, it is not possible.
Should PL/SQL be similar to a "modern programming language" it would lose a lot of its perks.
Generally speaking PL/SQL is not much different from DDL like "create table ...". The latter is similar code describing "business logic"
I followed this approach once and ended up abandoning a week's worth of work because it was just far too slow (hours vs < 1 minute) when running on large quantities of data.
>This leaves more room for error and makes our logic harder to test since it can’t be easily mocked or executed with the rest of our application-level tests.
I ended up writing integration tests on the SQL queries. They were slower than the unit tests I had before but 1.5 seconds per test vs. 0.1 second per test was a price I was pretty happy to pay.
In general I would prefer to keep domain logic all in one place but practicality trumps ideology.
Yeah, I'm having trouble understanding this. This sentence is very confusing to me:
> While this gives you more flexibility, the tricky detail is that filtering happens only after your data is fetched. This is not different from fetching the results and filtering them inside your application. It just saves you some keystrokes.
> It just saves you some keystrokes.
No, it saves transferring tens of gigabytes to my application. I assume I'm missing something here, or maybe my data is different.
That's a valid viewpoint, but I've also seen people be very successful with SQL triggers, stored procedures and all of that.
Before it was fashionable to build out "service layers" inside a back end application or as an API I've seen people build the equivalent by making all database accesses in an application go through a set of stored procedures that, basically, provide an API.
It can work very well.
There is also the idea, less well developed, from the RDF world where you could have some system of inference that works like triggers but is based on logic and is potentially more disciplined. Let's say it hasn't caught on and I remember drinking in a hotel bar with the marketing head of one vendor, after a conference, who was frustrated that people didn't see the value.
Once upon a time, I administered a system that was built as a client application that just spole SQL, and a set of administrative tools running as the db superuser to create the db and it's triggers. Permissions were handled entirely internally through functions - user management was just database user management. Most simple operations happened on views. More complex operations were functions.
It worked pretty well - the worst part was Microsoft licensing, as we needed a SQL server client license for ever connected user. The second worst was the upgrade path - shutting the db server down such that clients couldn't connect but the super admin could to run the updates was tricky and failure prone, mostly because of a lack of tooling.
Today, with features like row-level permissions and notify in postgres, I think you could do great things. That said, I mostly recommend the articles advice to my customers - the database is a database, put logic in your application.
MSSQL licensing is just all kinds of nuts. As far as I know, the official stance is that anyone accessing data from the DB (not just the DB itself) needs a CAL. This means that with CAL licensing, every user of a web app needs one. I'd feel sorry for people that have to deal with this if it wasn't 90% self-inflicted pain.
So don't use the database for what it is intended because...it might be harder? I very much disagree with this article. It is not so simple to just declare no business logic in the DB when the DB schema itself IS a BIG part of the business logic. When working with data, the best place to work with the data is where it is stored, not pulled out and put back in.
And your app is more likely to be replaced long before the data and/or database is.
Stop treating the database like a black box and use it for what it is intended.
You missed the word maintain and is a very important point. If I can only maintain the data through an app and not directly in the database then the database is nothing more than a card file. Why not use the database features it was intended to provide? The app will be gone long before the data. Always. The next app will need to reproduce the previous apps logic. What if that was already done? It is better to have a mixz used judiciously, than to just consider the database only for storing and retrieving of data. What a wate of resources to not use those things!
Data integrity is maintained through FK, unique indexes, proper modeling, etc.
Also, an RMDBS is hard to scale horizontally, so the more CPU tasks you put on it, the sooner you will hit a bottleneck that is not easy to get out of. A application layer, if designed properly can scale a lot more.
>The app will be gone long before the data.
When this ever happens, it happens rarely, and the situation is the data enters a read-only state in my experience and happens during sunsetting. Most of the time, if a new system is required, a migration occurs to a new schema that the new app supports.
We use a mix of SQL and NoSQL databases for different and specific purposes. Your suggestion is to do the data manipulation outside of the database which typically means pulling data sets out, doing something with them, and then putting them back in. That will never be more efficient than just doing it in the database.
Our SQL databases are actually scaled horizontally using read replicas since writes are much less frequent than reads and we can easily add/remove instances as demand grows and shrinks. IMO, it wasn't any more difficult to scale out our databases than it would have been to implement any other K8 (application or DB).
> Your suggestion is to do the data manipulation outside of the database which typically means pulling data sets out, doing something with them, and then putting them back in.
It doesn't mean that at all. You can still leverage SQL to get amazing performance for queries and mutations across large datasets without burying that in a stored procedure. Yes it's generally a bad idea to pull thousands of records across the network and into memory to manipulate them before putting them back. But that's a false dichotomy.
I think the problem here is one of total disregard of any performance issues.
To me this is the kind of people who only write SQL through ORM, suffers constantly from the N+1 problem, and say that RDMS are slow compared to key/value stores.
I wonder how good can ChatGPT become for SQL and if it can replace these people xD
On the other hand, there are good reasons to put logic in the database, chiefly:
- Performance
- Consistency
- Security
Basically, you want to construct a public API in the form of stored procedures, functions and views, and then funnel all clients through that API, by granting execute/select permissions only on the artifacts which are part of the public API.
That can be good for performance not just because computation is close to data so you don’t pay the communication cost, but also because it opens optimization opportunities that might be impractical on the client, e.g. by returning several interdependent resultsets at once, or by cleverly using temporary tables or cursors.
It also ensures that a rogue client can’t circumvent the consistency rules implemented by your API. If you care about your data, this is a big one!
And finally, the API may be a natural place for performing authorization in case your security needs cannot be readily met by the database’s built-in security.
—
Perhaps the biggest argument against it is the fact that most SQL-based languages are firmly stuck in the 80s. For the most part, they’re just pain to work with. But enduring that pain might very well be worth the end result.
Performance is my primary reason for using stored procedures. The thing that almost every high-performance app eventually runs up against is sheer query quantity. No matter how good you are with doing set operations rather than operating on one value at a time, the time inevitably comes when you need to do a query from your app, do something to the result, and generate the next query, and repeat this for an arbitrary number of times. If you can move that process to the database, you can get performance wins so massive your peers won't believe such wins were even possible.
"Perhaps the biggest argument against it is the fact that most SQL-based languages are firmly stuck in the 80s."
Postgres is chipping away at this at least. It's still a mess, but every time I leave for a couple of years and come back, the game has been upped. I don't know exactly how long it's been in there, but as a for-instance, Postgres stored procedures have print debugging now, which I described in a Go context here but should apply to any other language that has a good Postgres-specific driver: https://www.reddit.com/r/golang/comments/12k6c6m/persistent_... That's just a night & day change. I can live with print-debugging even if I don't get a true debugger, but working without even print debugging is a little too 1960s for my tastes. There's also been a lot of improvement in being able to intermix arrays, tables, and other such things with each other, which used to be a major hassle, though it's still pretty chaotic in terms of naming schemes and such.
Testing code is exactly where I'm doing the print debugging from. I'm sure not doing it on production unless I have no choice!
I don't take a dogmatic view of testing. There is no requirement that test code in X must be run in a test harness in X. There is no problem testing my database procedures with code being driven from my primary app language. What matters is that cases are covered, and that the desired outcomes are verified. My database code is perfectly well covered with testing. If yours isn't, that's a process problem on your end and is your problem, not mine.
I wasn't accusing you of anything, I was genuinely asking. Do you mean that your testing logic is done in the code? I don't really know what that has to do with print debugging. Are you reading output from the database and running assertions on that?
For what it’s worth, we implement tests in the client language (C# in our case) and just call the stored procedures/views (in SQL Server) as any client would. Most tests are written so that they rollback after execution, and to tolerate any residual data that might have been left after the tests that must commit. A very small number of tests require a clean database, which we accomplish by traversing INFORMATION_SCHEMA and just emptying all the tables (we also temporarily disable constraints and triggers).
These tests are just normal “unit” tests from the perspective of Visual Studio, which brings one enormous benefit: you can put a breakpoint in T-SQL code and just right-click on the test and choose Debug, and the breakpoint will be hit! That’s possible because Visual Studio debugger can switch from C# to T-SQL seamlessly.
This is a fantastic way to debug in exactly the right test case which exposes the bug or behavior that you want to test.
I agree that SQL and flavors of PL/SQL have a 1980s vibe attached to them (well, they were born in those times!), but even small adjustments can make them feel better: for example, dropping the TYPING IN CAPS tradition.
Queries like `select this from that`, in my opinion, feel different than `SELECT this FROM that`, and we can often use color to distinguish keywords from names.
If you want Dont-Repeat-Yourself (DRY), you have to decide to put the domain logic either in the database or in a service*. In the first case, two applications that directly talk to the database is feasible since the common logic is implemented at the database level.
In the latter case, you need to ensure that every application uses the service and no application directly talks to the database. If multiple apps directly talk to the database where the domain logic is in the service layer, you will end up with code duplication and hard to trace bugs (unless you take extraordinary steps).
It is an architectural trade-off. There are use cases where you may choose one over the other.
>> If multiple apps directly talk to the database where the domain logic is in the service layer, you will end up with code duplication and hard to trace bugs
This is more common than putting it in a service in my experience. I haven't seen it done in the database in a new application since the 90's.
In the DotCom era it wasn't unusual when the entire business was implemented inisde an Oracle database runining on a Sun server (including the web server;) How fun was to debug such a system?
Persistence Ignorance [1], also known as PI, is a principle that states that the classes used to model a software application's business domain should not be affected by the way they are stored.
This principle applies to traditional application stacks, except for certain Big Data stacks such as kdb+/q, or situations where optimizing the data layer for performance and integrity is necessary.
For instance, when using kdb+/q, a separate CI pipeline is utilized alongside the main application's CI pipeline, including separate tests, linters, code or query formatters, etc.
When it is necessary to use such systems, the best approach is to isolate them as a separate bounded context, such as a microservice.
another one of the famous database holy wars, "don't use stored procedures for business logic". and at the moment HN rises to the occasion a ratio of 28 points / 60 comments.
I think at this point issues like these are pretty much baked in already. The vast majority of applications don't use stored procedures, because they are a. extremely vendor specific b. written imperatively, rather than declaratively, in vendor specific languages that let's face it are about as user friendly as REXX (Google it, kids) which make them very difficult to be expressive with c. are not very straightforward to keep versioned in source control (of course they can be versioned, but the pipeline from db environment -> source control typically has to be pretty custom, plus you have to get your DBA to use it) d. introduce all kinds of novel problems in integrating with application-level constructs such as database drivers, SQL builders, and dare I say ORMs, where while these are all potentially solvable areas (yes including ORMs), are not being solved, because the folks who swear by stored procedures pretty much despise all those other things.
So in some ways the way the stored procedure community is so opposed to application level constructs is kind of what keeps the community isolated, and in some cases, renders what might be useful technologies as completely unused (where I am referring to MySQL stored procedures, which...exist! But I wouldn't dare ever try to use them because who wants to be first, really).
MySQL stored procedures work fine when invoked from application code IME. The lack of native collection types is not ideal when you need to inject N values to a bit of data logic. As such, and for other reasons, I personally prefer raw parameterized SQL passed through a lightweight ORM that handles mapping for me as well as securely marshal a collection value into a parameterized query. But beyond that Id say that they are "usable".
Can you elaborate on the challenges you've faced with them?
for MySQL? only that they do not seem to be commonly used at all, within the already small set of modern applications that scale out on stored procedures successfully. MySQL's base of maturity is the PHP application that is using straight SQL.
I'm not an SP guy so while an SP app using a platform with lots and lots of widespread use and maturity for that style of programming, like Oracle or SQL SQL Server is already unpleasant for me but at least I'd know I was on well-trod ground, doing it for MySQL where issues I hit would have very little precedent / workarounds / community I'd not want to get involved with for anything important.
One of the egregious examples of this is Oracle's Application Express technology - write APIs and screens in PL/SQL. It's such a horrifyingly, nakedly cynical lock-in strategy that it took me a while to come to terms with it when I first came across it.
I think the problem here is NoSQL, for a RDBMS wouldn’t be a problem with data integrity, that’s why I’d prefer it over NoSQL, while it can be easier to start prototyping without a defined structure, I’m more confortable doing with it.
Last year I tried a lot of those new document oriented databases and for me, they are awful, you can’t be sure what’s in each record, so I need to be more careful with my application logic and not to mess up, while you can easily define your structure and constraints with a RDBMS and be confident that your data is always correct, also, when adding new fields it’s a nightmare without migrations.
This year I started thinking that we might need a database that lets us write proper stored procedures and that it feels like programming (I mean PL/SQL feels pretty hard and it seems that not so many devs are using it), for example what I wanted was to create a trigger when a user buys something, subtract the quantity from the inventory and create a new entry in notifications and orders, that way my app just needs to check if there’s unsent notifications and show the user their orders without that logic being in my app, and it does feels easy to just edit those procedures with migrations.
I think there's a distinction to be drawn about logic being stored within the DBMS and logic being implemented and executed directly in the DB.
Actually storing logic in the DBMS is often challenging unless you want to put a lot of effort in - source control tools and abilities to deal with multiple versions of logic are far less mature in the DBMS world, and while it's possible to make these work, you'll be doing a lot on your own to come up with something bespoke.
But there's no reason you can't leverage the DB more directly from your backend code. Given the authors example of like code, that's already inefficient with a single row and would be a nightmare for multiple rows. It's perfectly reasonable to take code like that and have it directly execute SQL statements where a lot of logic is stored in the form of SQL, but still colocated with your application code rather than in stored procedures, triggers, etc.
We do use an ORM for most of our DB interactions, but if something starts to stretch beyond the most basic of use cases, we're unafraid to drop to raw SQL and execute that instead. It's been a pretty happy medium for us.
Right now I'm more on the opposite side: put all the logic in the database (Postgres is craaazy powerful these days), then use a commoditised backend (we use Hasura, but there is a bunch of good options out there).
Counterpoint: think of your database server as any other microservice, with an API. Putting a limited amount of logic there can vastly increase (not decrease) the overall application performance. One can also use stored procedures and views to allow the model to change while maintaining backwards compatibility with other services.
Alternatively, don't use SQL at all, and use a NoSQL database where you can't put any logic at all. If you aren't going to use the power of the database, then don't "pay" for it.
Ive had a fair amount of low effort "big wins" in my career, simply by reversing the technical decision/implementation of people who took this idea too far. IMO the OP sits at a reasonable level on this particular "spectrum". The title is more controversial sounding than the post which itself is quite reasonable.
The implications of being tightly coupled to the (transactional) database are less onerous in a microservices environment where databases are "private" to a single application/service.
I have mixed feelings about using stored procedures and triggers. I was once working on a Postgres app and started storing some logic in the DB. I had some triggers that would automatically set values when a user state changed (e.g: when the user changed to 'inactive', the trigger would also update the tables related to subscriptions, API keys, etc).
As another "performance" trick, I was using multiple CTEs with the `RETURNING` clause to execute multiple operations in a single query.
Everything was OK when I was working on that app daily. But then I stopped working on it for a few months, and when I came back, I regretted using those tricks. For example, now I need to verify the triggers to make sure that changing a value won't change other tables that I forgot about. Also, I can't compose the SQL queries I wrote because each query does "everything at once". I would have rather paid the cost of doing 3 queries, and in exchange I could have reused some of those queries in different parts of the application [^1].
Of course, the app didn't even get close to the scale at which 1 query vs. 3 queries matter.
I still appreciate and like having some business logic in the DB, specially `CHECK` constraints. But the tooling for regular programming languages makes everything easier. Having the logic in the DB is a double-edged sword.
[^1]: This can become relevant when building an admin interface/CLI, since you may want to execute partial changes vs. the "everything at once" changes in the user-facing application.
"Completely" as in you'd put ALL your business logic in the db, or that you're open to pushing SOME to it? Most?
EDIT: It's genuine question. I use views and write a decent amount of business logic as queries (well, in my language's query DSL). It's always bugged me when people do a lot of post-processing for stuff that the db could do faster.
There's perhaps an interesting 3rd option aside from full stored procedures vs pure application code transformating data pulled down from database... where the processing and transformation is still handled with SQL, but this code and the interaction with the database resides in application code. So instead of triggers and stored procedures, you have database transactions defined in and called from application code.
SQL-style languages like plpgsql are crazy fast and super concise relative to the hurdles of doing it all in a host language.
I’m building a moderate sized application right now - and even though I consider myself a Go programmer, 100% of my code is either Svelte, Typescript or pl/pgsql + postgraphile. Not a line of Go to be seen.
I mean, the day will come when I need to write some Go code. But I’m so much more productive with plpgsql when it comes to anything within a single db.
Being able to implement a lot of business logic in SQL and pl/pgsql is another of the reasons I started developing Omnigres [1]. My initial motivation was to have something almost self-contained, just one thing to deploy infrastructure-wise.
But the more time I spend on it, the more I appreciate how having a single place for all the concerns significantly contributes to simplicity.
Surely there are some hurdles to overcome. Some are more technical in nature. Some are more on the side of tooling inadequacy/outdatedness. I am trying to contribute to both with the project.
I think all logic should be in the db, but I think we just don't have the right databases.
In my ideal world, my database would know about all the queries I want to run, and it would choose query plans based on this knowledge to effectively cache things, and handle streaming changes to queries.
You also want to be able to visualize the full data dependency graph of all your data, i.e. when this value changes, what else changes.
We need to get rid of all these poor-fitting abstractions between the database (REST, GraphQL, etc.). Your client code should work with a model exactly the same as how its stored in the database. In fact, your entire database should be able to run client-side in the browser. This is what is holding us back. This would allow optimisitic updates, and easy local/offline apps. I think people usually avoid this because they like to chose a different language for their backend and the thought of getting this to run in a browser is frightening.
Just think how much easier your life would be if you had direct database access in your browser from your client code, and you didn't have to worry about apis, orms, etc. and that it was secure, and synced automatically.
And now think about your current application you work on, and the difficulty in achieving such a thing.
You might enjoy this project, which ties to do basically exactly what you described: stick everything in a database and let it drive the app: https://riffle.systems/essays/prelude/
It’s still very much a research prototype but we should have some more writing out soon.
Good essay. Particularly with web development, there is complexity around bridging the gap between server and client, and data crosses this chasm through serialization which exacerbates the problem and limits expressiveness of server languages, requiring massive duplication of code simply to serialize and duplicate on client what is present on server if one wants all the power of client interactivity and API's.
This is a big value of recent server-centric frameworks like Phoenix LiveView that provide ability to have code and data co-located and not have to duplicate so much on client and server as with SPA's while attempting to maintain some base level of client interactivity. But seems always a tension between leveraging the full power of client and full power of server.
You might find this article [0][1] informative. It disputes the idea that UI's are "pure functions of the data/model" in a compelling way, and points to this incorrect assumption as having introduced some complexity/pain in how frameworks like React work.
> What is a (somewhat) pure mapping from the model is the data that is displayed in the UI, but not the entire UI.
This line was interesting to think about.
React added so much complexity and perf issues, and I really don’t know what was so bad about something like Backbone / Backbone.Marionette.
I find that in React I want all component props and data to come from an external store via subscription, instead of being passed down some tree. My UI is stable.
Certain operations use fewer resources when performed in the RDBMS than when performed in the application code.
As your data grows significantly, you may encounter new bottlenecks, and for some of these you may need to move some of the logic from your application into the RDBMS (ergo: write an SQL function or a stored procedure or a trigger, etc.).
I work of one of those “everything must be a stored procedure companies.” We are stuck with a single rdbms vendor and the company is perfectly ok with this.
Different developers have different skill levels and specialties but they know how to call stored procedures.
We have DB admins and a team of SQL specialists who optimize things and make sure everything meets standards.
Stored procedure code is source controlled just like other application code, just in different repos.
No there aren’t unit tests on this, but integration tests at the application level provide some coverage.
I realize there are downsides but they are generally acceptable. We aren’t really putting much if any domain logic in these stored procedures.
Not making an argument for, or against stored procedures, but just putting this link here as seeing quite a few comments regarding difficulty testing stored procs.
It was a fair number of years ago, working on an enterprise data warehouse, but the linked testing library was a bit of a revelation when testing SQLServer stored procs. Basically made unit testing stored procs straightforwards and natural.
Haven't used it since, but hopefully helpful for someone, and I do wonder if there are similar testing frameworks for other DBs?
I have worked on a system that was run by customers that were reluctant to make upgrades to their installations. We wanted to be able to evolve the database schema more frequently than we could roll out fleet-wide application upgrades.
We could have created a sort of proxy service to handle backwards compatibility between the database and applications, but since these applications needed only a very small number of endpoints, it was much, much easier on operations and maintenance to use the database itself as the compatibility layer through stored procedures.
IMO it's best to start as clean as possible and start migrating logic into the database over time as performance or security needs arise. Same with clean code, start with it and if there are performance issues start adding shortcuts and conditions that might improve the performance but have a negative impact on readability and require much more comments.
This is a major pain point moving customers from Oracle to cloud services. PL/SQL encapsulates a lot of logic in legacy enterprise applications that handcuffs customers to the platform similarly to mainframe systems. Migration tools can get part of the way there but still requires a strangler type pattern when migrating.
The mismatch of code fluidity and the rigidity of the RDBMS is why we created https://schemafreesql.com . We ofcourse looked at NOSQL but our fondness of SQL kept us from considering.
Taking this approach to using a database is always a judgement call, but putting logic in the database is probably useful a lot more often than the "nine times out of ten" that the article claims. And the tradeoff in this approach contradicts the last line in the article: "Utilize the full extent of your database’s capabilities, but don’t put domain logic in it." Putting logic in your database is often necessary to make full use of its capabilities.
In applications with complex storage needs and/or performance-sensitive IO needs, I prefer to think about databases as an additional tier that provides complex storage services (as is precisely the case with most modern databases). As an example, I once worked on message routing and transformation server. From the perspective of the application logic, messages simply needed to be durably persisted and retrieved. At the level of the storage layer, I wanted message versioning, provenance, copy-on-write semantics to minimize on-disk size, and indexes to support at least two different retrieval patterns. All of the items at the storage layer were implemented in stored procedures. In other words, the database supplied a custom "storage API" to the business logic tier for persisting and retrieving messages that implemented those routines in a database-specific way that did not concern developers at the application layer.
The main arguments against this approach are:
* It will tie you to a specific database. This is true, but almost irrelevant since anything other than the most trivial usage of a database will inevitably make use of a feature or syntax specific to the database that would require modification were one to migrate to another database. Further, the mere idea of database independence is kinda' silly. No one talks about how writing your application layer in one particular programming language will limit the ability to migrate to another programming language. We should make technology choices around programming languages, databases, etc. with the intention of matching their strengths to the problem at hand fully understanding that tradeoffs are being made and that rework will be necessary if those choices are ever revisited.
* SQL used for writing stored procedures is not as good (for some definition of "good") as other programming languages used in the application layer. SQL is certainly different than most imperative or functional programming languages, but it's expressive and well-suited for its purpose. If you really need to make use of the capabilities of a database, it would behoove you to develop some proficiency in SQL.
* Putting logic in the database breaks modern CI/CD processes. IMO, this is the most compelling argument against it. That said, there is tooling that exists for putting stored procedure and other database logic in version control, automatically deploying it to a database, and running tests on it. These tools are not as commonly used, but they do exist. I've also used tooling that introspected the database objects such as stored procedures, etc., and automatically generated type-safe application code to interact with those database objects. That provided compile-time guarantees that application and database code were in sync at least with respect to number and types of arguments, etc. Whether it makes sense to go to the effort of integrating this tooling into your development process is a judgement call, but it can be done and I've seen it work well for application with demanding database needs.
As a counterpoint, Derek Sivers says Simplify: move code into database functions <https://sive.rs/pg>
I think they are both right, Alex Kondov for keeping his domain logic mostly in the application layer, and Derek Sivers for mostly keeping his in the database layer.
Why? Because that's what they know well, and they can keep the domain logic mostly in one place. If Alex can keep most if his domain logic in the application layer that he knows better, that's the better choice for him. If Derek can keep most of his domain logic in the database layer that he knows how to utilize well, that's the better choice for him.
(I can't find it now but I thought someone very smartly reflected on the Why Perl? <https://news.ycombinator.com/item?id=35646612> post today that the main reason why anyone will choose any programming language is that they know how to use it well. I think the same applies here. This is why many programming languages are valid choices for different people and project, and this is why different architectures are valid choices too.)
When I was choosing jobs in the real world[1], I would avoid any company that depends heavily on stored procedures for business logic and where things were ruled by “database developers” and all the developers were doing were calling stored procedures.
Stored procedures are harder to unit test, do automated rollbacks, version control, etc.
I’ve never seen a system that 5 years in anyone said “I’m so glad we decided to use stored procedures everywhere”
[1] I work in consulting now and have no allusions that I face the same issues that day to day developers do. I get to lead development “solutions” and move on.
It's worse than that.
Your database will be the first major thing to cave under load.
The traditional tools most folks use (postgres/mysql/maria/etc) want to scale vertically, not horizontally (you can do it, but it's hard, and has lots of drawbacks).
Eventually - You will hit hardware limits. Providers only have instances that are so big. There is a ceiling on how much you can scale your DB without having to do major, major work.
I've seen this pattern play out now at 3 high growth companies I've worked at. The one that was using lots of stored procs and materialized views got hit the fastest and the hardest.
When it happens it is time to actually use the big boys databases.
I’ve worked with the big boys and seen plenty of times where databases fell over no matter how much data you through at it because of locking issues and other database contention issues.
It’s much easier to horizontally scale a bunch of application servers than database servers.
That is why big boys databases do clusters.
“clusters” only help when you can deal with eventual consistency (which is often admittedly more than most businesses realize). Sure you can scale your read only instances.
BTW: I work in the consulting department of a little company I am sure you have heard of that knows something about dealing with “the big boys” at scale.
Great, me too, several of them actually.
> When it happens it is time to actually use the big boys databases.
For years Oracle had a very small limit of how many CPU's stored procs could use baked into the DBMS engine. IIRC, it was something like 4.
Maybe that's changed now, I don't know. I do know Oracle qualifies for most as being one of "the big boys databases."
MySQL doesn't have materialized views, but it has better replication (at least using Aurora in AWS) than the big boys databases.
Adding more replicas and changing the instance size scales much better than fine-tuning the big complexity of Oracle and its materialized views.
So, maybe it is not the tool, but how to use it what matters. And fuck Oracle if by big boys databases you mean Oracle.
More generically, scaling stateful things is generally hard than scaling stateless things (because you have to figure out what to do with the state).
Even horizontally scaling systems you either have to redistribute the data on scale actions or keep the data in place and send new data to new infra (which may or may not be helpful)
I would agree, but I think it's worth checking out one of Siver's postgres repos where I thought he tested things pretty well[0]. Also things like Postgres are often run as a "server" on local dev or CI but could easily be run more ephemerally[1]. That should help with testing various states and behavior of the database.
I still would actually follow what you said thought because often times places which rely on stored procedures are, as you say, ruled by developers with the database access to manage those procedures and from what I've seen often do not have anything close to a reproducible setup. Besides the fact that often it's some proprietary database which is much more difficult to run locally than sqlite3 or PostgreSQL.
Still I keep that separate from considering if views/stored procedures are actually a good solution. I think sometimes it really is. Honestly Sivers' experiments really convinced me of that.
[0]: https://github.com/sivers/store
[1]: https://jamey.thesharps.us/2019/05/29/per-project-postgres/
It is a matter of database tooling, Oracle and SQL Server provide quite good IDEs, graphical debugging, merge tooling, testing infrastructuring.
I can tell you that at least a pharmaceutical is quite happy to have plenty of stored procedures into their Oracle databases, including making use of Apex, and it is at least several decades in production.
And when you get ready to rollback, can you just revert all of your code and deploy? Can you just do a “git branch” and work on your own isolated code in your dev account?” How well does merging work when you do a pull request and have to merge your code and make sure your stored procedures are up to date?
Yes, because there is such thing as CI/CD pipelines, and test environments, for database code as well.
If there are ten developers with 10 feature branches are they going to be running 10 instances of Oracle with their own “branches” of the stored procedures?
Yes,install the RDMS into their computer just like any other development tool.
Yes, learn to use multiple schemas and instances, just like using multiple deployments into Apache, Tomcat, IIS,...
Or even better, join the hype, and use containers with volumes configured for each feature branch.
And then copy “enough” data on their computers to have a realistic test and make sure you have some type of script so sensitive data isn’t on their laptops and ensure schema changes are constantly propagated to each developers workstation. This isn’t even mentioning if you have a commercial database like Oracle where you have to deal with licenses.
And how do you merge changes and do rollbacks?
Not to mention the whole
GetCustomer_1
GetCustomer_2
GetCustomer_3
Where code is copy and pasted and modified slightly.
No difference to any other kind of programming workflow, same kind of care is required and it is DevOps job to make sure it is taken care of.
Licenses are peanuts in enterprise projects.
It’s much different. As someone who is a long time developer who also deals with the “DevOps” side of the world. I never design architectures of the former (development) without thinking about the latter from day one.
You can’t just wave your hands and throw it over to the “DevOps” folks. Doing so is kind of just the opposite of what “DevOps” is suppose to be.
I’ve worked on a codebase that had 300K lines of pl/sql and I didn’t find it all that fun to work with. The problem is that you are working in a very constrained environment, starved of language features and libraries, and with a deployment model that is deeply tied to the data and therefore more cumbersome. The IDE’s helped but didn’t matter that much. Also, oracle has per-cpu licenses, so the more code runs in the database the more expensive it becomes.
I think the best design is an API in a language designed for building API’s, on top of a dumb database, with all access gated through the API.
If you're going that approach, I think a lot of databases have facilities for extending outside plain SQL. Afaik Oracle can bind to Java and Postgres has an extension interface
I do tend to agree with API-outside-the-DB pattern. I guess Oracle supports some form of QoS but I don't think MySQL and Postgres do. That means having multiple apps hit the same schema can cause starvation issues
Also not sure how hard it is to monitor user resource usage. That's more important for billing (even if it's "fake" money from 1 department to another)
The line of reasoning seems valid, do what you do best; anyone who has had to deal with stored procedures in legacy systems will however disagree. Store procedures are to be avoided.
What is wrong with stored procedures ? They are not fashionable nor especially expressive, but they perform reliably and they do pretty much what anything that might be expected from relational logic.
But maybe that's my enterprisey bias.
They're turing-complete and modular so it's not really about what they can or cannot do.
Testability, tooling and the open-source ecosystem and either bad or non-existent. Writing PL/SQL is the worst environment I've worked in. That database sent emails, processed CSVs scheduled jobs, etc. yet there was still a web app to maintain next to it.
They're OK for certain things like essential triggers or performance-sensitive functions, but I would never deliberately put app logic in there. Major red flag.
> Testability, tooling and the open-source ecosystem and either bad or non-existent
If you're properly testing the code in your application that exercises persistence, that means your test harness runs a real database like the one you're running in production and thus you can also write the database logic tests using your own application's testing facilities.
Of the things you listed, "the database sends e-mail" is the only one where I'd think you'd have to change the code at all, and have the database go through a mockable middle-man so that it becomes testable; but everything else can be comfortably tested from a test suite that is able to talk to a real database.
Yep. Releasing, testing, debugging, etc are all more difficult in stored procs than in a “regular” language. Stored procs have other down sides:
Some of those concerns apply to any database. Your query could slowdown if the database picks a bad plan, so you could say you will never trust the db to scale. That's separate from scaling the stored proc - just using the db can run into a scaling issue.
No, what I mean is your code scaling is now directly tied to how your DB scales. Your SP code can be impacting the rest of your DB, and vice-versa. I have seen large SP based systems to require Oracle boxes to be scaled up at enormous cost (hundreds of thousands or even millions of lines of SP).
Not because of slow queries, but just the cost of executing the stored procs themselves.
> That database sent emails, processed CSVs scheduled jobs, etc.
That's really a bad, very bad use of SP. They should only deal with and care about the data, not doing any interaction with any external systems.
I've never worked on a system where we didn't make an effort to make sure we were NOT dependent on a specific DB vendor/implementation and avoiding SP's has always been a part of that.
And in how many systems you ever ended switching from one database engine for another?
We went through and got Oracle out of our enterprise after a particularly nasty licensing negotiation and audit with them some years ago. So it does happen.
I expect at some point we'll have a similar initiative around cloud providers.
Yes, a very commendable goal, I've seen this effort undertaken many times. Curiously enough, I've never actually seen anyone switch databases.
Depends on the product in question. For integration products, its very common to support multiple database vendors with the same core engine. So any SQL written and db data-types is generally SQL-92/99.
This is pretty good since one can generally fast unit test DB code with an embedded database.
Once of the major lifts at my last job was going from Oracle to post Fred's. The speed procedures were the worst part of it all.
I’ve done it twice. Mass migration off Oracle and before that DB2 and Sybase. Current system is also occasionally sold on premise to customers who have their own db preference and we can support them all.
I moved the same system from MSSQL server to Oracle, then to MySQL.
The Oracle version had a lot of logic written as SP. I migrated them to MySQL SP.
So the conclusion is: SP don't make it impossible to migrate from one database to another, and second: yes database migrations do happen.
Also: all database migrations I have known involve Oracle in one way to another. Oracle salespeople are way too nosy.
I used to deal with stored procedures in legacy systems and honestly it was not bad.
The first concern was getting the stored procedures into version control and creating a mechanism to update the systems based on the things in version control upon deployment. After that it was smooth sailing.
Why are stored procedures to be avoided?
I think most people are reacting to the impedance mismatch of normal software lifecycles versus sticky databases. Naive or legacy users and documentation use the database as the system of record for the custom code, rather than as an execution environment provisioned from a real system of record like a source control system.
Where it gets even more tricky is not just stored procedures but application-specific functions embedded in views, or triggers running custom functions. It's no longer just a library of functions you can choose to call (or not) durng a query, but code that runs on its own based on clients queries that never directly mention the functions.
The same goes for schema management, and I think that is a big reason why so many developers fixate on "schemaless" approaches. They want to pretend that the database exists in a static way outside the software lifecycle, just like they ignore the filesystem and operating system and treat it as an unchanging abstraction.
Store procedures and functions are very good for many things. I am not claiming you should put all your business logic in the database, like old-time Oracle consultants wanted (for self-serving reasons), but some calculations are better handled in a single place, instead of being reimplemented in different parts of the stack, or stacks, given that it is common to write a project using one framework and language, and then migrate to another one.
The only reason to avoid SP is that you don't know any SQL and your ORM can't write the function calls, so you can't call them.
Which is a general complaint I have about modern software development: many things are done in convoluted ways because some developers don't know SQL and don't want to learn it.
Tom Kyte, who for a long time was the "ambassador to the world" for Oracle, makes essentially the same arguments
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUEST...
For data that will either :
+ 'outlive' the user interface - this data will be used forever but we are not sure where e.g. financial docs, records, etc etc
+ 'extend' past the user interface: this data started on an installed desktop app / green screen but we are not sure how we may want to extend it - eg to an API or to web
the DB is the right place to put business logic.
Coupling the data with business logic, in the DB, allows almost complete flexibility in how it is access and interacted with at the expense of complete lock in to the DB platform. Which may or may not be a bad thing. But if you have a skilled SQL team and some resources to pay the DB licensing and support piper, this is a good direction to ensure app support longevity.
For example, you can do the same thing on desktop or phone or web client or even hardware switches: say you have a DB stored_procedure to indicate that the process of manufacturing this part has reached stage X
update_item_status( Item 11111, 'Manufacturing Stage X' )
(these is obviously really contrived but it is an example to make things clear)
You can run this, if it is in the DB by:
- hitting this button on the assembly line
- scanning a barcode can do the same thing
- having a user change a status in a UI
- have a batch job run off a script with a list of parts to apply the status to
And even better, if you want to update how the stored proc works, you update once in the DB and it is available to all 'interfaces' instead of having to update hardware switch code + user screen UI code + batch script code etc etc...
I know that an argument can be made that you have an app server layer separate from the DB so this is not a 'real' problem but this approach just puts the app server IN the DB
One closing thought: it is often easy to forget that, in most cases, the data is the product of software development and the tooling is just support. Only counterpoints I can think of here are games and interactive demo systems where the process ( of playing the game or using the software ) is the product.
Positioning the database and front-end as being similar is a mistake if business goals are thought about from a "the data is the product" perspective. If the rules about how to handle, manage and interpret the data are a core component of that data product, then the argument that the right place for business logic is in the DB is made even stronger.
Derek is right. Tooling is the main reason not to put code in DBs. It’s a pain, that’s all. Tools are like ‘70s calls to say never forget me’. But logically it makes eminent sense. At the end of the day, writing code in the dB is not that different than writing components for an app server.
There is also another reason. Designing effective schemas and bundled logic is not junior level work. Microservices, imo, all song and dance aside, really were trying to deconstruct the “monolith” of the schema: a schema that serves the requirements of today, is modular, and is possible to sanely extend in the future is not commodity work. So a practical reason not to do it is the required labor (expertise).
Both are implementations of the same idea: not splitting the logic into two services deployed separately. Either store data directly on disk or in a general-purpose piece of infrastructure that needs no configuration.
I think a really important aspect is keeping the logic in your head (and others learning it) is helped by mirroring this single source of truth with your implementation.
Okay, how about "Either put your domain logic in the database, or put it in the code, but don't randomly scatter your shit throughout the code and database"?
Some RDBMS features, such as stored procedures and views, can quite naturally be versioned, and become as ephemeral as your application code.
For example, have your application setup a schema to contain its version-specific database components. The schema will contain an immutable application version, such as its commit hash, in its name. This allows a given version of the application to only use its own set of sprocs and views. On deploy, run the SQL scripts to create the sprocs and views for that version.
You'll note that sprocs and views are not mutated under this strategy. This allows multiple versions to exist concurrently, which facilitates things like blue-green deploys.
What you're describing can be done to great success with the right type of requirements. Say you're building an information warehouse for OLAP style work. Totally makes sens to pyt all the heavy stuff in the DB because the data model is going to be pretty rigid and it's the sort of asset to get enough attention from the business to be supported correctly with a full staff.
On the other hand, defnining everything in code and using the db as a dumb store that's fractionally better than using files is a pretty good strategy when you may be handing over support to a team that won't be giving full time attention to the app. It also handles those key person risks etc by keeping thigns in one area of the app code.
I've only ever seen this work with "application managed databases" i.e. where the app truly owns and controls everything related to the database and can freely run ddl. The db code is then just part of the app and can be spun up against a completely empty db.
The death for this kind of thing is dbas administering the database separately because the coupling will kill you.
My two main reasons for (almost) never putting logic in a database are
1) it will tie you to a specific database. I know a handful of companies that have almost all of their IPR in the shape of stored procedures. Even if they wanted to escape from the database vendor they can't. Not without rewriting everything they have spent a decade or two building.
2) languages such as PL/SQL are much poorer than any modern programming language. If you have ever tried to write a business application in PL/SQL: it is no fun. At least not for me.
(What I mean by "almost never" is that I do use constraints quite a lot which you could argue isn't logic, but it is still asking the database to validate my data and model).
Even then, the implementation of consistency levels alone makes you have to rethink swaths of your code when you switch databases. This is why I dont really care, changing data stores is almost never trivial even if you just use them for data storage.
What if the type isn't supported, what limitations on reading and writing exist in THIS engine? Which types of queries are weirdly slow and amazingly fast?
The list goes on for things that you don't need to add additional logic to.
This is why I dont really care, changing data stores is almost never trivial even if you just use them for data storage
Agreed. It’s strange to hear the lock-in argument when in practice even a set of pretty regular queries and DDL is generally non-portable, neither syntax nor performance-wise. Personally I can’t think of saying “let’s move <projectname> to <servername>” seriously at work, unless it’s a prototype which doesn’t require support yet or is a really insignificant auxiliary db.
>It’s strange to hear the lock-in argument when in practice even a set of pretty regular queries and DDL is generally non-portable, neither syntax nor performance-wise.
Many ORM's can solve this problem.
>I can’t think of saying “let’s move <projectname> to <servername>” seriously at work, unless it’s a prototype which doesn’t require support yet or is a really insignificant auxiliary db.
I've had to do it three times in my career. It usually comes from management. It ain't pretty. Two were from M&A and one was to get rid of an obsolete database.
I think "solve" is too strong a word.
Many ORMs can hide the problem.
I don't think the expectation that you are going to write DDL statements that will work in all SQL databases is reasonable. Which is why I never do this. My take is: it isn't going to be easy so don't even bother.
I have separate DDLs for different databases. And while I've heard a lot of people theorize that this is a maintenance nightmare, it has never been that in my experience.
You are going to spend a lot more time being frustrated if you try to make one DDL work for all databases. Especially if a new database is added to the mix (which might not even be an SQL database).
I have written perhaps half a dozen applications in the past 3-4 years that supports more than one database. Typically for schemas somewhere in the region of 5 to 15 tables. It rarely takes me more than 5-10 minutes to take the SQLite DDL and adapt it to, say PostgreSQL, including rewriting the SQL statements in the storage API layer. And you can design with this in mind.
(I often start with SQLite as it is the least feature rich database. Anything I can express for SQLite is easy to express for other SQL databases, and possibly NoSQL stores. And I can make do with "lowest common denominator" because any fancy stuff tends to just make life harder, so I tend to avoid it unless I have very, very specific needs)
Granted, it is a bit annoying to do schema updates across several DDLs, but so far, I've spent far less effort than a colleague of mine who always tries to maintain a single DDL and a single storage API layer implementation that is supposed to work across 2-3 SQL databases.
Also granted that it isn't elegant to maintain different DDL and store API implementations, but it turns out to not be as much work as people tend to claim.
There are tools that can handle this, Liquibase is one that I have used.
You write a change set that describes the schema you want to build and it generates DDL for whatever database you are using. It can also reverse engineer a change set from one database that you can use to generate DDL for a different database.
I supported an application that ran on half a dozen different DBs and it wasn't much of a problem. There were a few places where I had to specify change sets specific to one database implementation.
I was looking at a schema generator written in Go a while back, but I can't remember the name. It almost did everything I wanted except it didn't support SQLite (since no proper grammar exists for its peculiar variant of SQL).
What I'm looking for is something that can generate schemas for SQLite, PostgreSQL, MySQL and perhaps MS SQL Server that can be run from the command line (so I can use it from Make)
I have the opposite experience: that for almost all projects I have done in the past 20 years, switching data store is almost always nearly trivial. However, it does come at the cost of giving the design careful thought and sometimes forgoing useful functionality that it is tempting to make use of.
One reason for this is that I had a very bad experience with this. I was involved in two projects in the 1990s where we went all in and used a lot of features unique to two different databases. In one of the cases, the database vendor (a small niche player) got acquired by an automation company and their "real-time capable" database was no longer available for licensing as a separate product. This left us stranded. The other project involved loads of PL/SQL in an Oracle and what killed us was the combination of Oracle licensing fees, the scaling limits of the database and that it was really hard to find good PL/SQL developers.
Today we use several strategies to keep the data store choice flexible.
1) we use databases as pure data stores. Sure, we use constraints, but only as an (optional) extra safeguard that lets us know when we have screwed up. We aren't dependent on the constraints, but they are immensely useful for rooting out inconsistencies in the code and the model.
2) we try to model the data in ways that can be realized on all databases (or stores) of a certain class. Yes, this means that sometimes you have to use slightly different data types than you'd like, but that isn't the end of the world. In some cases what you store is a reasonably useful representation of a type, and then you create richer types to represent it in the application.
3) we tend to start development using SQL databases, but not all parts of the schema are expressed as a purely relational model. This way we can split the schema and handle parts of it using an entirely different class of database or store. This gives us the convenience of being able to start a project with a single database, exploit the availability of constraint enforcement to show us where we make mistakes, but as we need to scale up, parts of the schema can easily be migrated to, for instance, a timeseries database.
This both helps us off the ground quickly and it allows us to postpone actually building a scaled up implementation. (This has proven particularly useful on projects where we have been able to postpone handling a large scale case for years and then having more technology to choose from later).
4) to accomplish 3 we often spend a lot of time understanding what the key data access patterns for the application are going to be and keep this in mind when designing a schema and a scaling plan. It also helps if you can identify where you can decide that data is going to be immutable so that consistency becomes easier later.
5) the application logic never talks directly to the database. There is always a domain specific store service API between the application. We try to make this API as semantically narrow as possible, and try not to make any promises that would be hard to keep if we have to change storage technology. Again, it helps to partition this API into areas that can be realized with different classes of databases, so you can make fewer promises on the parts of the API that will handle the part of the schema that drives most of the operational cost.
6) all tests are written against the store service API, never directly against any database or storage technology. This way you already have a comprehensive test suite if you need to migrate to a new database entirely, or for when you start implementing different parts of the store using different database technologies.
In the beginning this approach will require a bit more work, but when you have established practices and it becomes a habit, it actually doesn't represent more work to implement than more direct ways of using databases. Yes, we sometimes forego using functionality offered by the database that would have saved us some work in the short term, but it pays off in the long run because you end up with simpler and more flexible designs.
A few years ago I did an experiment with two new hires. Their first task was to add support for a relatively new database to an existing application. They had zero prior experience with the application. I pointed them to the storage service API and the tests and I said "write an implementation of it that uses database X" (which was a NoSQL database). After about two days of work they had an implementation that passed all of the tests and could run all of the benchmarks without any problems. This included adding configuration options that allowed the application to initialize and run with the new store implementation. I think we discovered only 2-3 minor issues in their code over the next 12 months, which isn't bad for a first contribution to an unknown code base.
(Since we only had one set of tests which was leveraged across all implementations it also meant that every time we expanded and improved the tests, we'd be able to improve all implementations as the tests got better at finding subtle mistakes).
Because it was so cheap and easy to add support for new databases, I think we had support for something like 5 different databases. Including an embedded SQL database so that you could run the application locally for experimentation or integration testing (it is really nice to be able to fire up the application with an in-memory database for integration tests. It wasn't unusual that we ran integration tests that would create and destroy the storage a few hundred times in just a few seconds). After I left the project they added support for another database and migrated terabytes of production data with minimal downtime. The migration tools they wrote for this used the store service API as well, which means they developed one tool that would let them migrate between arbitrary databases. (So if, in the future, they chose to switch databases again, the migration tool already exists)
We also had adapters that allowed us to run different permutations of databases. So we could handle parts of the schema suited for relational databases using an SQL database, and handle the bulky bits with somewhat narrower requirements using a NoSQL database. This requires a lot of care when you design the API to have built in "fault lines" where you can split the schema, but with a bit of practice it becomes a habit. (In some cases we've made "conditional" schemas - so for instance if you run the entire thing in one database you may have a bunch of constraints to help you ensure consistency, but you run with mixed stores, you may not apply the constraints that would span across databases. Again, we try not to depend on the constraints, but they are a great help in designing the schema, the store service API and the application).
If you do #5 well then yeah, you can get away with it, because your database is just a repository pattern away.
I think people want to avoid this because its annoying or you have to think a decent bit up front or its just cheaper and easier to do this directly this one time, until its everywhere.
I think people over-estimate the cost of doing this and lack experience with the benefits. I've been doing this for ~20 years now, and when comparing our code to that of a few customers that use databases in a more "chaotic" manner, it is easy to see how quickly it pays off.
>If you have ever tried to write a business application in PL/SQL
I've seen it work well however it requires a proficiency with SQL that most developers don't require so don't invest in.
It is much harder to debug but there is an elegance to the solution. It's often very compact and performs well.
It performs well because the context switch between the application and the database is costly in terms of performance. If there is logic to be performed per row, and that logic also requires data that's in the DB, then the performance can be much faster with PL/SQL or similar DB code.
That context switch is routing queries and results between the client and server using IO requests.
Most databases allow you to write extensions in your language of choice, this can be a good middle ground where you write domain specific DB extension in the language of your application, this way, you can share logic between your database and your application layer.
I do that in Go for chemical stuff, I am pleased with this approach.
Which database are you using and what kind of extensions are we talking about?
(I often create Go types which are then serialized to some useful representation in the database, and de-serialized to Go types when I query the database. I usually use SQLX for most interaction with SQL databases since it does a lot of the tedious work for me (without introducing ORM'y madness). On the rare occasion where this eats too much into performance I can always use the DB library directly)
Interestingly, there's an unofficial Go procedural language for PostgreSQL:
https://gitlab.com/microo8/plgo
Haven't tried it out myself though.
SQLite, I register specific functions written in Go if I want to not load the stuff in memory.
The argument is not that you should use application code where SQL is more efficient. The argument is that the sql shouldn’t be in the database itself as stored procedures separate from your application code.
Only DB portability would be a concern to not used stored procedures, that I can think of. If you want to write an application that can use multiple DBs then you might have a problem.
Even if you don’t use stored procedures, simple stuff like how you limit the return results is different between databases like SQL Server and MySQL as is how you do bulk inserts.
How you specify locks, the support data types, casting, intrinsic functions, conditional statements (case functions) are all different between databases.
ORMs can abstract most things, but not stored procedures.
I used to write a lot of PL/SQL and your assertion is simplistic at best, and misleading at worst.
First off it depends on what processing you are doing on what kinds of data sizes. If you are doing relatively simple things, sure. But remember that PL/SQL isn't particularly fast if you need to do anything more complicated. Don't do anything complicated (we'll get to that in point three).
Second, someone has to write that code. Writing PL/SQL is laborious because the tooling is bad. Pick any half-decent language today and the tooling is going to be fantastic. You have language servers, you have optimizing compilers, you have profilers, you have debuggers, you have lots of ways to instrument, inspect, and verify code. Where do you plan to recruit PL/SQL developers and how much are you going to pay them? Because in my experience: there really aren't that many usefully competent PL/SQL developers available on the market. There are lots of people willing to "give it a go", but competence is rare.
Third, it complicates your database. Instead of just doing relational math and executing queries it now also runs code that might not be as easy to model on the same cores, eating into your database IO and memory budget and causing lots of variability. Sure you can hound your PL/SQL developers to adopt various tricks in order to lower the variability of database performance, but that gets expensive. And the code gets ugly.
Fourth: It makes no sense to apply extra load to your most expensive component, which also happens to be the component class that most often also ends up being the bottleneck of your entire system. (Cloud provider pricing is a somewhat useful proxy for the cost of different systems. Have a look at the cost of RDS capacity compared to EC2)
Fifth: did you measure or did you assume? In the cases where you measured a clear gain, was it still worth the cost of all of the above? Did you measure the right thing? Could you argued your case with numbers?
My experience is that no matter how good you make the database logic, a problem will remain: the logic is in the database.
Want to do nice things to reason about your data changes using your code, or maybe show someone a preview of their changes? Sorry! All the interesting logic is in the database. You need to write your changeset to the database to figure out what’s really in it.
> it will tie you to a specific database.
There are a lot of reasons not to use stored procedures heavily and I personally wouldn’t even work for a company that did, this is a relatively poor one.
This is like all of the people who think using Terraform gives them “cloud agnosticism” even how you do limits and bulk sql is different between the major databases.
Being database agnostic is so hard. I have an app that I test on Postgres, MariaDB and SQLite, I thought I was doing well. Nope, turns out it doesn't work with MySQL or MsSQL and would need a big schema change to support them (something about diamonds in the cascade relationships).
1) it will tie you to a specific database.
As a counterpoint, tying yourself to a specific database is the only way to actually use any database to its full potential. I'm a big fan of Postgres and use it basically all my projects. But if I was forced to write my code in a way that I could easily swap Postgres for MySQL or Oracle at a moments notice, I would lose much of the value that I feel Postgres offers.
Only if your architecture is CRUD.
Under CQRS/ES you are DBMS independent, and you can still use RDBMS for its full potential on the read/querying side.
How? Would you mind expanding a little bit?
CQRS simply means "replicate the data", while ES refers to event sourcing. When used together, CQRS/ES entails replicating data on various storage systems by replaying events from the Event Store. This replication can occur continuously on any RDBMS, and you can utilize all of its features, including vendor-specific ones, for querying.
In this setup, your application houses domain-specific code, also known as the "core domain" in DDD terms, while generic code exists in generic tools like RDBMS, called the "generic sub-domain."
For example, report generation is in generic subdomain, and RDBMSes are great at that.
In all my career (started coding in 1986) I keep hearing about 1), worked in several RDMS agnostic products, never saw a customer actually migrating into another RDMS.
PL/SQL is just an Ada dialect with a bit of SQL on top.
> In all my career (started coding in 1986) I keep hearing about 1), worked in several RDMS agnostic products, never saw a customer actually migrating into another RDMS.
no kidding, you hear about this way more than it actually comes up. especially for the folks who treat the RDBMS as a dumb store and don't use the features.
I've seen apps replaced/rewritten on top of a database multiple times. I've seen it be very convenient to have enough logic in the database that it was easy and low-risk to have multiple programs share a database, without sharing much or any code.
I've never actually seen a database system get swapped out, in real life. The only two ways I can imagine it happening are: 1) someone made a colossal fuck-up picking the DB software—and, it'd have to be really bad, I've seen a lot of applications stick with a mistake there and just deal with the pain; or 2) extreme hockey-stick growth (which, file under "good problem to have, and we can afford a proper migration")
In fact, #2 is usually addressable with supplementary databases, a little re-structuring of the existing database, and smarter caching, except in super-rare scenarios of truly wild growth or unusual access patterns. More often than not, even then, the original database sticks around in some form or another.
> I've never actually seen a database system get swapped out, in real life
https://aws.amazon.com/blogs/aws/migration-complete-amazons-...
Shows how different the software ecosystem is. For a product that I worked on - during a decade, the DB system got swapped out 4 times - all driven by customer demand. The app software evolved continuously but the DB was hard changed - with all relevant schema and data migration applied during the upgrade process.
The phrase "to have multiple programs share a database" says a lot about what school of thought one belongs to in terms of architecture.
Using the database as the integration point went out of fashion a bit over 20 years ago. And for good reason. It is not a good integration pattern. It makes the implementation the integration interface rather than using a tightly controlled abstraction that can be kept narrow and preserve degrees of freedom for future evolution. And as you allude to: it gets worse because you have to move more application knowledge and logic into the database to make sure applications don't mess it up.
In fact, this approach is so bad that Amazon's CEO explicitly forbade it at Amazon (Bezos API Mandate, 2002). How often do you see CEOs, even of tech companies, find an integration pattern so counterproductive and dangerous they announce to the company that they will fire anyone using it?
I would say that about 60-70% of projects I've worked on in the last decade have switched databases at least once. One project had support for 5-6 databases and used three different databases in production over a span of 5 years. Adding support for them had fairly low cost (because the internal design made it easy) and it gave us the opportunity to try several things in production, at scale, and then choose what produced the best results. This is a lot better than just choosing something up front and sticking with it because you'll have the data to make an informed choice.
OTOH one project I worked on switch between Oracle and MS SQL about once every six weeks (fickle client). We got pretty good at it eventually.
> 2) languages such as PL/SQL are much poorer than any modern programming language
true but it is for a reason. For example you can replace/upgrade any piece of code on the running system transactionally. But in a modern programming language, let's say Java, it is not possible.
Should PL/SQL be similar to a "modern programming language" it would lose a lot of its perks.
Generally speaking PL/SQL is not much different from DDL like "create table ...". The latter is similar code describing "business logic"
I followed this approach once and ended up abandoning a week's worth of work because it was just far too slow (hours vs < 1 minute) when running on large quantities of data.
>This leaves more room for error and makes our logic harder to test since it can’t be easily mocked or executed with the rest of our application-level tests.
I ended up writing integration tests on the SQL queries. They were slower than the unit tests I had before but 1.5 seconds per test vs. 0.1 second per test was a price I was pretty happy to pay.
In general I would prefer to keep domain logic all in one place but practicality trumps ideology.
Yeah, I'm having trouble understanding this. This sentence is very confusing to me:
> While this gives you more flexibility, the tricky detail is that filtering happens only after your data is fetched. This is not different from fetching the results and filtering them inside your application. It just saves you some keystrokes.
> It just saves you some keystrokes.
No, it saves transferring tens of gigabytes to my application. I assume I'm missing something here, or maybe my data is different.
Agree. This was written by someone writing code so inefficient, I would never want them to be near my code.
That's a valid viewpoint, but I've also seen people be very successful with SQL triggers, stored procedures and all of that.
Before it was fashionable to build out "service layers" inside a back end application or as an API I've seen people build the equivalent by making all database accesses in an application go through a set of stored procedures that, basically, provide an API.
It can work very well.
There is also the idea, less well developed, from the RDF world where you could have some system of inference that works like triggers but is based on logic and is potentially more disciplined. Let's say it hasn't caught on and I remember drinking in a hotel bar with the marketing head of one vendor, after a conference, who was frustrated that people didn't see the value.
Once upon a time, I administered a system that was built as a client application that just spole SQL, and a set of administrative tools running as the db superuser to create the db and it's triggers. Permissions were handled entirely internally through functions - user management was just database user management. Most simple operations happened on views. More complex operations were functions.
It worked pretty well - the worst part was Microsoft licensing, as we needed a SQL server client license for ever connected user. The second worst was the upgrade path - shutting the db server down such that clients couldn't connect but the super admin could to run the updates was tricky and failure prone, mostly because of a lack of tooling.
Today, with features like row-level permissions and notify in postgres, I think you could do great things. That said, I mostly recommend the articles advice to my customers - the database is a database, put logic in your application.
Needing a CAL just means the company was too cheap to license per core :D
MSSQL licensing is just all kinds of nuts. As far as I know, the official stance is that anyone accessing data from the DB (not just the DB itself) needs a CAL. This means that with CAL licensing, every user of a web app needs one. I'd feel sorry for people that have to deal with this if it wasn't 90% self-inflicted pain.
Microsoft introduced per-processor licensing for SQL Server more than two decades ago to handle public web apps.
So don't use the database for what it is intended because...it might be harder? I very much disagree with this article. It is not so simple to just declare no business logic in the DB when the DB schema itself IS a BIG part of the business logic. When working with data, the best place to work with the data is where it is stored, not pulled out and put back in. And your app is more likely to be replaced long before the data and/or database is. Stop treating the database like a black box and use it for what it is intended.
>So don't use the database for what it is intended because
The database is intended to store and retrieve data.
You missed the word maintain and is a very important point. If I can only maintain the data through an app and not directly in the database then the database is nothing more than a card file. Why not use the database features it was intended to provide? The app will be gone long before the data. Always. The next app will need to reproduce the previous apps logic. What if that was already done? It is better to have a mixz used judiciously, than to just consider the database only for storing and retrieving of data. What a wate of resources to not use those things!
Data integrity is maintained through FK, unique indexes, proper modeling, etc.
Also, an RMDBS is hard to scale horizontally, so the more CPU tasks you put on it, the sooner you will hit a bottleneck that is not easy to get out of. A application layer, if designed properly can scale a lot more.
>The app will be gone long before the data.
When this ever happens, it happens rarely, and the situation is the data enters a read-only state in my experience and happens during sunsetting. Most of the time, if a new system is required, a migration occurs to a new schema that the new app supports.
We use a mix of SQL and NoSQL databases for different and specific purposes. Your suggestion is to do the data manipulation outside of the database which typically means pulling data sets out, doing something with them, and then putting them back in. That will never be more efficient than just doing it in the database. Our SQL databases are actually scaled horizontally using read replicas since writes are much less frequent than reads and we can easily add/remove instances as demand grows and shrinks. IMO, it wasn't any more difficult to scale out our databases than it would have been to implement any other K8 (application or DB).
> Your suggestion is to do the data manipulation outside of the database which typically means pulling data sets out, doing something with them, and then putting them back in.
It doesn't mean that at all. You can still leverage SQL to get amazing performance for queries and mutations across large datasets without burying that in a stored procedure. Yes it's generally a bad idea to pull thousands of records across the network and into memory to manipulate them before putting them back. But that's a false dichotomy.
I think the problem here is one of total disregard of any performance issues.
To me this is the kind of people who only write SQL through ORM, suffers constantly from the N+1 problem, and say that RDMS are slow compared to key/value stores.
I wonder how good can ChatGPT become for SQL and if it can replace these people xD
On the other hand, there are good reasons to put logic in the database, chiefly:
- Performance
- Consistency
- Security
Basically, you want to construct a public API in the form of stored procedures, functions and views, and then funnel all clients through that API, by granting execute/select permissions only on the artifacts which are part of the public API.
That can be good for performance not just because computation is close to data so you don’t pay the communication cost, but also because it opens optimization opportunities that might be impractical on the client, e.g. by returning several interdependent resultsets at once, or by cleverly using temporary tables or cursors.
It also ensures that a rogue client can’t circumvent the consistency rules implemented by your API. If you care about your data, this is a big one!
And finally, the API may be a natural place for performing authorization in case your security needs cannot be readily met by the database’s built-in security.
—
Perhaps the biggest argument against it is the fact that most SQL-based languages are firmly stuck in the 80s. For the most part, they’re just pain to work with. But enduring that pain might very well be worth the end result.
"Performance"
Performance is my primary reason for using stored procedures. The thing that almost every high-performance app eventually runs up against is sheer query quantity. No matter how good you are with doing set operations rather than operating on one value at a time, the time inevitably comes when you need to do a query from your app, do something to the result, and generate the next query, and repeat this for an arbitrary number of times. If you can move that process to the database, you can get performance wins so massive your peers won't believe such wins were even possible.
"Perhaps the biggest argument against it is the fact that most SQL-based languages are firmly stuck in the 80s."
Postgres is chipping away at this at least. It's still a mess, but every time I leave for a couple of years and come back, the game has been upped. I don't know exactly how long it's been in there, but as a for-instance, Postgres stored procedures have print debugging now, which I described in a Go context here but should apply to any other language that has a good Postgres-specific driver: https://www.reddit.com/r/golang/comments/12k6c6m/persistent_... That's just a night & day change. I can live with print-debugging even if I don't get a true debugger, but working without even print debugging is a little too 1960s for my tastes. There's also been a lot of improvement in being able to intermix arrays, tables, and other such things with each other, which used to be a major hassle, though it's still pretty chaotic in terms of naming schemes and such.
Can you test any of this logic?
Testing code is exactly where I'm doing the print debugging from. I'm sure not doing it on production unless I have no choice!
I don't take a dogmatic view of testing. There is no requirement that test code in X must be run in a test harness in X. There is no problem testing my database procedures with code being driven from my primary app language. What matters is that cases are covered, and that the desired outcomes are verified. My database code is perfectly well covered with testing. If yours isn't, that's a process problem on your end and is your problem, not mine.
I wasn't accusing you of anything, I was genuinely asking. Do you mean that your testing logic is done in the code? I don't really know what that has to do with print debugging. Are you reading output from the database and running assertions on that?
For what it’s worth, we implement tests in the client language (C# in our case) and just call the stored procedures/views (in SQL Server) as any client would. Most tests are written so that they rollback after execution, and to tolerate any residual data that might have been left after the tests that must commit. A very small number of tests require a clean database, which we accomplish by traversing INFORMATION_SCHEMA and just emptying all the tables (we also temporarily disable constraints and triggers).
These tests are just normal “unit” tests from the perspective of Visual Studio, which brings one enormous benefit: you can put a breakpoint in T-SQL code and just right-click on the test and choose Debug, and the breakpoint will be hit! That’s possible because Visual Studio debugger can switch from C# to T-SQL seamlessly.
This is a fantastic way to debug in exactly the right test case which exposes the bug or behavior that you want to test.
Very cool, thanks for sharing
> I can live with print-debugging even if I don't get a true debugger
There’s been a plpgsql debugger built into pgadmin for about a decade, FYI.
I agree that SQL and flavors of PL/SQL have a 1980s vibe attached to them (well, they were born in those times!), but even small adjustments can make them feel better: for example, dropping the TYPING IN CAPS tradition.
Queries like `select this from that`, in my opinion, feel different than `SELECT this FROM that`, and we can often use color to distinguish keywords from names.
---
More importantly, performance seems much better if you don't have to have a middleman between your data and the user. I wrote a bit about this earlier this year https://yrashk.com/blog/2023/02/16/what-happens-if-you-put-h...
I am also very excited about what you can do with row and column-level security in databases like Postgres.
If you want Dont-Repeat-Yourself (DRY), you have to decide to put the domain logic either in the database or in a service*. In the first case, two applications that directly talk to the database is feasible since the common logic is implemented at the database level.
In the latter case, you need to ensure that every application uses the service and no application directly talks to the database. If multiple apps directly talk to the database where the domain logic is in the service layer, you will end up with code duplication and hard to trace bugs (unless you take extraordinary steps).
It is an architectural trade-off. There are use cases where you may choose one over the other.
*: service or library
>> If multiple apps directly talk to the database where the domain logic is in the service layer, you will end up with code duplication and hard to trace bugs
This is more common than putting it in a service in my experience. I haven't seen it done in the database in a new application since the 90's.
In the DotCom era it wasn't unusual when the entire business was implemented inisde an Oracle database runining on a Sun server (including the web server;) How fun was to debug such a system?
Persistence Ignorance [1], also known as PI, is a principle that states that the classes used to model a software application's business domain should not be affected by the way they are stored.
This principle applies to traditional application stacks, except for certain Big Data stacks such as kdb+/q, or situations where optimizing the data layer for performance and integrity is necessary.
For instance, when using kdb+/q, a separate CI pipeline is utilized alongside the main application's CI pipeline, including separate tests, linters, code or query formatters, etc.
When it is necessary to use such systems, the best approach is to isolate them as a separate bounded context, such as a microservice.
--
[1] https://deviq.com/principles/persistence-ignorance
another one of the famous database holy wars, "don't use stored procedures for business logic". and at the moment HN rises to the occasion a ratio of 28 points / 60 comments.
I think at this point issues like these are pretty much baked in already. The vast majority of applications don't use stored procedures, because they are a. extremely vendor specific b. written imperatively, rather than declaratively, in vendor specific languages that let's face it are about as user friendly as REXX (Google it, kids) which make them very difficult to be expressive with c. are not very straightforward to keep versioned in source control (of course they can be versioned, but the pipeline from db environment -> source control typically has to be pretty custom, plus you have to get your DBA to use it) d. introduce all kinds of novel problems in integrating with application-level constructs such as database drivers, SQL builders, and dare I say ORMs, where while these are all potentially solvable areas (yes including ORMs), are not being solved, because the folks who swear by stored procedures pretty much despise all those other things.
So in some ways the way the stored procedure community is so opposed to application level constructs is kind of what keeps the community isolated, and in some cases, renders what might be useful technologies as completely unused (where I am referring to MySQL stored procedures, which...exist! But I wouldn't dare ever try to use them because who wants to be first, really).
MySQL stored procedures work fine when invoked from application code IME. The lack of native collection types is not ideal when you need to inject N values to a bit of data logic. As such, and for other reasons, I personally prefer raw parameterized SQL passed through a lightweight ORM that handles mapping for me as well as securely marshal a collection value into a parameterized query. But beyond that Id say that they are "usable".
Can you elaborate on the challenges you've faced with them?
for MySQL? only that they do not seem to be commonly used at all, within the already small set of modern applications that scale out on stored procedures successfully. MySQL's base of maturity is the PHP application that is using straight SQL.
I'm not an SP guy so while an SP app using a platform with lots and lots of widespread use and maturity for that style of programming, like Oracle or SQL SQL Server is already unpleasant for me but at least I'd know I was on well-trod ground, doing it for MySQL where issues I hit would have very little precedent / workarounds / community I'd not want to get involved with for anything important.
Fair enough. To reiterate, Im personally not a fan but I would say they are generally usable.
One of the egregious examples of this is Oracle's Application Express technology - write APIs and screens in PL/SQL. It's such a horrifyingly, nakedly cynical lock-in strategy that it took me a while to come to terms with it when I first came across it.
I think the problem here is NoSQL, for a RDBMS wouldn’t be a problem with data integrity, that’s why I’d prefer it over NoSQL, while it can be easier to start prototyping without a defined structure, I’m more confortable doing with it.
Last year I tried a lot of those new document oriented databases and for me, they are awful, you can’t be sure what’s in each record, so I need to be more careful with my application logic and not to mess up, while you can easily define your structure and constraints with a RDBMS and be confident that your data is always correct, also, when adding new fields it’s a nightmare without migrations.
This year I started thinking that we might need a database that lets us write proper stored procedures and that it feels like programming (I mean PL/SQL feels pretty hard and it seems that not so many devs are using it), for example what I wanted was to create a trigger when a user buys something, subtract the quantity from the inventory and create a new entry in notifications and orders, that way my app just needs to check if there’s unsent notifications and show the user their orders without that logic being in my app, and it does feels easy to just edit those procedures with migrations.
I think there's a distinction to be drawn about logic being stored within the DBMS and logic being implemented and executed directly in the DB.
Actually storing logic in the DBMS is often challenging unless you want to put a lot of effort in - source control tools and abilities to deal with multiple versions of logic are far less mature in the DBMS world, and while it's possible to make these work, you'll be doing a lot on your own to come up with something bespoke.
But there's no reason you can't leverage the DB more directly from your backend code. Given the authors example of like code, that's already inefficient with a single row and would be a nightmare for multiple rows. It's perfectly reasonable to take code like that and have it directly execute SQL statements where a lot of logic is stored in the form of SQL, but still colocated with your application code rather than in stored procedures, triggers, etc.
We do use an ORM for most of our DB interactions, but if something starts to stretch beyond the most basic of use cases, we're unafraid to drop to raw SQL and execute that instead. It's been a pretty happy medium for us.
Right now I'm more on the opposite side: put all the logic in the database (Postgres is craaazy powerful these days), then use a commoditised backend (we use Hasura, but there is a bunch of good options out there).
I like this viewpoint when you are a small company searching for PMF and your entire backend can fit into a small DB.
At one point you'll have to reckon with IO costs and storage once that is over and then you have no choice but to exploit data locality more heavily.
Just saying, YMMV depending on how much data is in your database.
Counterpoint: think of your database server as any other microservice, with an API. Putting a limited amount of logic there can vastly increase (not decrease) the overall application performance. One can also use stored procedures and views to allow the model to change while maintaining backwards compatibility with other services.
Alternatively, don't use SQL at all, and use a NoSQL database where you can't put any logic at all. If you aren't going to use the power of the database, then don't "pay" for it.
Ive had a fair amount of low effort "big wins" in my career, simply by reversing the technical decision/implementation of people who took this idea too far. IMO the OP sits at a reasonable level on this particular "spectrum". The title is more controversial sounding than the post which itself is quite reasonable.
The implications of being tightly coupled to the (transactional) database are less onerous in a microservices environment where databases are "private" to a single application/service.
I have mixed feelings about using stored procedures and triggers. I was once working on a Postgres app and started storing some logic in the DB. I had some triggers that would automatically set values when a user state changed (e.g: when the user changed to 'inactive', the trigger would also update the tables related to subscriptions, API keys, etc).
As another "performance" trick, I was using multiple CTEs with the `RETURNING` clause to execute multiple operations in a single query.
Everything was OK when I was working on that app daily. But then I stopped working on it for a few months, and when I came back, I regretted using those tricks. For example, now I need to verify the triggers to make sure that changing a value won't change other tables that I forgot about. Also, I can't compose the SQL queries I wrote because each query does "everything at once". I would have rather paid the cost of doing 3 queries, and in exchange I could have reused some of those queries in different parts of the application [^1].
Of course, the app didn't even get close to the scale at which 1 query vs. 3 queries matter.
I still appreciate and like having some business logic in the DB, specially `CHECK` constraints. But the tooling for regular programming languages makes everything easier. Having the logic in the DB is a double-edged sword.
[^1]: This can become relevant when building an admin interface/CLI, since you may want to execute partial changes vs. the "everything at once" changes in the user-facing application.
This is hardly a new opinion. It's also an opinion I've come to completely disagree with.
"Completely" as in you'd put ALL your business logic in the db, or that you're open to pushing SOME to it? Most?
EDIT: It's genuine question. I use views and write a decent amount of business logic as queries (well, in my language's query DSL). It's always bugged me when people do a lot of post-processing for stuff that the db could do faster.
There's perhaps an interesting 3rd option aside from full stored procedures vs pure application code transformating data pulled down from database... where the processing and transformation is still handled with SQL, but this code and the interaction with the database resides in application code. So instead of triggers and stored procedures, you have database transactions defined in and called from application code.
I am with you 100%.
SQL-style languages like plpgsql are crazy fast and super concise relative to the hurdles of doing it all in a host language.
I’m building a moderate sized application right now - and even though I consider myself a Go programmer, 100% of my code is either Svelte, Typescript or pl/pgsql + postgraphile. Not a line of Go to be seen.
I mean, the day will come when I need to write some Go code. But I’m so much more productive with plpgsql when it comes to anything within a single db.
Being able to implement a lot of business logic in SQL and pl/pgsql is another of the reasons I started developing Omnigres [1]. My initial motivation was to have something almost self-contained, just one thing to deploy infrastructure-wise.
But the more time I spend on it, the more I appreciate how having a single place for all the concerns significantly contributes to simplicity.
Surely there are some hurdles to overcome. Some are more technical in nature. Some are more on the side of tooling inadequacy/outdatedness. I am trying to contribute to both with the project.
[1] https://omnigr.es/
Omnigres looks really intriguing! I will check it out when I have more time.
I think all logic should be in the db, but I think we just don't have the right databases.
In my ideal world, my database would know about all the queries I want to run, and it would choose query plans based on this knowledge to effectively cache things, and handle streaming changes to queries.
You also want to be able to visualize the full data dependency graph of all your data, i.e. when this value changes, what else changes.
We need to get rid of all these poor-fitting abstractions between the database (REST, GraphQL, etc.). Your client code should work with a model exactly the same as how its stored in the database. In fact, your entire database should be able to run client-side in the browser. This is what is holding us back. This would allow optimisitic updates, and easy local/offline apps. I think people usually avoid this because they like to chose a different language for their backend and the thought of getting this to run in a browser is frightening.
Just think how much easier your life would be if you had direct database access in your browser from your client code, and you didn't have to worry about apis, orms, etc. and that it was secure, and synced automatically.
And now think about your current application you work on, and the difficulty in achieving such a thing.
(Disclaimer: I’m plugging my own work here ;-))
You might enjoy this project, which ties to do basically exactly what you described: stick everything in a database and let it drive the app: https://riffle.systems/essays/prelude/
It’s still very much a research prototype but we should have some more writing out soon.
Good essay. Particularly with web development, there is complexity around bridging the gap between server and client, and data crosses this chasm through serialization which exacerbates the problem and limits expressiveness of server languages, requiring massive duplication of code simply to serialize and duplicate on client what is present on server if one wants all the power of client interactivity and API's.
This is a big value of recent server-centric frameworks like Phoenix LiveView that provide ability to have code and data co-located and not have to duplicate so much on client and server as with SPA's while attempting to maintain some base level of client interactivity. But seems always a tension between leveraging the full power of client and full power of server.
You might find this article [0][1] informative. It disputes the idea that UI's are "pure functions of the data/model" in a compelling way, and points to this incorrect assumption as having introduced some complexity/pain in how frameworks like React work.
[0] https://blog.metaobject.com/2018/12/uis-are-not-pure-functio...
[1] https://news.ycombinator.com/item?id=31979347
That article was a great read.
> What is a (somewhat) pure mapping from the model is the data that is displayed in the UI, but not the entire UI.
This line was interesting to think about.
React added so much complexity and perf issues, and I really don’t know what was so bad about something like Backbone / Backbone.Marionette.
I find that in React I want all component props and data to come from an external store via subscription, instead of being passed down some tree. My UI is stable.
Certain operations use fewer resources when performed in the RDBMS than when performed in the application code.
As your data grows significantly, you may encounter new bottlenecks, and for some of these you may need to move some of the logic from your application into the RDBMS (ergo: write an SQL function or a stored procedure or a trigger, etc.).
I work of one of those “everything must be a stored procedure companies.” We are stuck with a single rdbms vendor and the company is perfectly ok with this.
Different developers have different skill levels and specialties but they know how to call stored procedures.
We have DB admins and a team of SQL specialists who optimize things and make sure everything meets standards.
Stored procedure code is source controlled just like other application code, just in different repos.
No there aren’t unit tests on this, but integration tests at the application level provide some coverage.
I realize there are downsides but they are generally acceptable. We aren’t really putting much if any domain logic in these stored procedures.
https://tsqlt.org/
Not making an argument for, or against stored procedures, but just putting this link here as seeing quite a few comments regarding difficulty testing stored procs.
It was a fair number of years ago, working on an enterprise data warehouse, but the linked testing library was a bit of a revelation when testing SQLServer stored procs. Basically made unit testing stored procs straightforwards and natural.
Haven't used it since, but hopefully helpful for someone, and I do wonder if there are similar testing frameworks for other DBs?
I have worked on a system that was run by customers that were reluctant to make upgrades to their installations. We wanted to be able to evolve the database schema more frequently than we could roll out fleet-wide application upgrades.
We could have created a sort of proxy service to handle backwards compatibility between the database and applications, but since these applications needed only a very small number of endpoints, it was much, much easier on operations and maintenance to use the database itself as the compatibility layer through stored procedures.
IMO it's best to start as clean as possible and start migrating logic into the database over time as performance or security needs arise. Same with clean code, start with it and if there are performance issues start adding shortcuts and conditions that might improve the performance but have a negative impact on readability and require much more comments.
This is a major pain point moving customers from Oracle to cloud services. PL/SQL encapsulates a lot of logic in legacy enterprise applications that handcuffs customers to the platform similarly to mainframe systems. Migration tools can get part of the way there but still requires a strangler type pattern when migrating.
The mismatch of code fluidity and the rigidity of the RDBMS is why we created https://schemafreesql.com . We ofcourse looked at NOSQL but our fondness of SQL kept us from considering.
Until your domain logic runs 1000x slower than it needs to and it actually becomes a problem.
The artificial division between the "database" and "real programming" is the original sin behind everything wrong with modern programming.
Taking this approach to using a database is always a judgement call, but putting logic in the database is probably useful a lot more often than the "nine times out of ten" that the article claims. And the tradeoff in this approach contradicts the last line in the article: "Utilize the full extent of your database’s capabilities, but don’t put domain logic in it." Putting logic in your database is often necessary to make full use of its capabilities.
In applications with complex storage needs and/or performance-sensitive IO needs, I prefer to think about databases as an additional tier that provides complex storage services (as is precisely the case with most modern databases). As an example, I once worked on message routing and transformation server. From the perspective of the application logic, messages simply needed to be durably persisted and retrieved. At the level of the storage layer, I wanted message versioning, provenance, copy-on-write semantics to minimize on-disk size, and indexes to support at least two different retrieval patterns. All of the items at the storage layer were implemented in stored procedures. In other words, the database supplied a custom "storage API" to the business logic tier for persisting and retrieving messages that implemented those routines in a database-specific way that did not concern developers at the application layer.
The main arguments against this approach are:
* It will tie you to a specific database. This is true, but almost irrelevant since anything other than the most trivial usage of a database will inevitably make use of a feature or syntax specific to the database that would require modification were one to migrate to another database. Further, the mere idea of database independence is kinda' silly. No one talks about how writing your application layer in one particular programming language will limit the ability to migrate to another programming language. We should make technology choices around programming languages, databases, etc. with the intention of matching their strengths to the problem at hand fully understanding that tradeoffs are being made and that rework will be necessary if those choices are ever revisited.
* SQL used for writing stored procedures is not as good (for some definition of "good") as other programming languages used in the application layer. SQL is certainly different than most imperative or functional programming languages, but it's expressive and well-suited for its purpose. If you really need to make use of the capabilities of a database, it would behoove you to develop some proficiency in SQL.
* Putting logic in the database breaks modern CI/CD processes. IMO, this is the most compelling argument against it. That said, there is tooling that exists for putting stored procedure and other database logic in version control, automatically deploying it to a database, and running tests on it. These tools are not as commonly used, but they do exist. I've also used tooling that introspected the database objects such as stored procedures, etc., and automatically generated type-safe application code to interact with those database objects. That provided compile-time guarantees that application and database code were in sync at least with respect to number and types of arguments, etc. Whether it makes sense to go to the effort of integrating this tooling into your development process is a judgement call, but it can be done and I've seen it work well for application with demanding database needs.
Nope, don't waste network bandwith and client CPU on data that shouldn't have left the database in first place.
I read that the first version of cloudflare was almost entirely a postgres extension.
https://news.ycombinator.com/item?id=22883548