RAM Is the New Disk

162 points by nikhilgarg28 8 years ago

Well yes, I think RAM has been the new disk for awhile now, and not because (anecdote about database disk structures) or (any recent change to cost of RAM).

If you use Linux, the fastest way to test how much faster your application is off disk is to simply make a filesystem in RAM, and run the whole thing from there. Because library-chasing to build a chroot is a hassle, I would recommend simply putting a container on a RAM-backed block device, then installing your application on the container.

I have personally designed, built and managed large clusters of diskless machines and find that the mix of RAM-only and PXE[1] boot is an excellent one for maintaining state (and security) across well managed infrastructure. Disks be damned. For permanent storage, consider sharing a DRBD[2] cluster from dedicated nodes.

[1] https://en.wikipedia.org/wiki/Preboot_Execution_Environment

[2] https://en.wikipedia.org/wiki/DRBD

old-gregg 8 years ago

I don't recommend this anymore. With a typical developer machine containing 16GB of RAM, and especially on Linux, you will that all of your daily-touched files are in FS cache after a few minutes of work. Even with default kernel settings Linux is pretty good with eating up all of your unused RAM for speeding up disk access.
Here's my anecdote based on 16GB workstation with NVMe SSD (Samsung 960 Pro):
Watching my project compile I occasionally open iotop in another terminal and don't see anything above occasional write flushes. To confirm, I did create a tmpfs volume and did not observe any improvement. `free` reported my buffers to be at ~4.7GB, which is basically all of my /bin, /usr and all of Golang sources+libs.
- FeepingCreature 8 years ago
  
  One memory leak and your cache is gone.
  [edit] Not sure if ramdisks are pinned though.
  
  ars 8 years ago
  
  > Not sure if ramdisks are pinned though.
  Ramdisks will go to swap. A memory leak will force the entire ramdisk into swap, and reading it back into memory afterward is 10 to 100 times slower than reading normal files off of a disk.
  
  majewsky 8 years ago
  
  > Ramdisks will go to swap.
  Assuming that you have swap. I don't; I want my SSD to stay alive.
  
  mavhc 8 years ago
  
  Does anyone have actual data on swap on SSDs in 2017?
  
  ars 8 years ago
  
  If you have a memory leak, and a ramdisk, and no swap then the OOM killer will trigger.
  Hopefully it will target the program with the memory leak, but this is not guaranteed.
  Swap is useful because you can shift unused memory onto disk. There are many programs that allocate (and write) a lot of memory that they never afterward use.
  By having swap you make more room for cache in memory.
  SSD doesn't matter here - this is not swap thrashing, but rather occasional writes.
fundabulousrIII 8 years ago

Every network access to disk based storage that can't be cached|copied to the your ram filesystem for the lifetime of your container is a performance hit and you can't scale with RAM backed storage without more phys memory. As a matter of fact when you run out of RAM in your (maybe incomplete) scenario you no longer have a working node. Zero sum.
DRBD is fine (used it for years) but it's not something that is one size fits all.
- contingencies 8 years ago
  
  Every network access to disk based storage...
  Rare for most services. Stuff like logs can be shuffled off elsewhere for a write, requiring no commit validation. Only DB/fileservers really require permanent storage with commit validation, writes are typically rare, and 100Gbps+ LAN on a PXE-based diskless cluster is not going to be introducing massive latency, especially if you prioritize the VLAN or link multiple ports. Reads are typically cheap and cacheable.
  that can't be cached|copied to the your ram filesystem for the lifetime of your container
  IMHO most services and their dependencies will come in well under 512MB, so that's a non-issue.
  you can't scale with RAM backed storage without more phys memory
  By definition, one could say the same about anything... although to be fair you could still scale via compression, sharding, or another established strategy.
  when you run out of RAM...
  In a managed scenario a service container or VM would terminate or a significant degradation in response time would be detected, it would be taken out of the service pool and stop having traffic routed to it, be restarted, then be re-introduced to the pool. Ditto extra CPU load, broken network policies, anomalous block IO, etc. Leaving modern service-level architecture aside, basic heartbeat-style IP monitoring with reliable node-level failover has existed in open source since the 90s. There's really no excuse to wing this stuff on production systems today.
  it's not something that is one size fits all
  Nothing fits all!
  
  fundabulousrIII 8 years ago
  
  <sarcasm> I find your production model very attractive and your assumptions about other usage(s) persuasive. You must be an expert! </sarcasm>

arielweisberg 8 years ago

...

I was the third engineer at VoltDB and spent six years making that bet. It's not a good bet.

Maybe there are other factors, but if VoltDB could page out cold data to disk I think it would be at least 2x if not more successful. No one agreed with me so it never happened.

I saw so many use cases go out the door because hey you know what? RAM is expensive and it's cheaper to page out cold data. The scale where that cost starts to matter is not that big.

nikhilgarg28 8 years ago

Curious - why did people not agree with you? Was it an ideological belief or were they betting on some assumptions that turned out to be incorrect?
- arielweisberg 8 years ago
  
  Do I really know? I'm not sure. I have opinions, but for the most part I was kept in the dark. There was a roadmap handed to me and what was on it was already decided.
  We did work on equally important things also, but we also split focus with IMO unimportant things.
  A combination of me not having a seat at the table (literally was told this after a year or so) and IMO non-technical leadership driving focus by chasing what they thought were the important factors.
  The company survives and does OK though.
manigandham 8 years ago

+1, this ended up being a major con when we did our comparison considering how much data we needed available and the cost overhead.
lathiat 8 years ago

this is a feature that MySQL Cluster had to add over time, and then it had limitations that were slowly lifted.
AndyNemmity 8 years ago

Have spent 6 years I think working on SAP HANA. The one feature I've always asked for is seamless paging of even warmish data to disk.
In memory is fast and awesome, but it doesn't have to be as mind boggling expensive as it is. Why are we all making the same mistakes?
- Pamar 8 years ago
  
  I would really like to hear something about your experience with SAP HANA. Do you have a blog or anything you could share?
ericfrenkiel 8 years ago

Yes, it's why we at MemSQL added a column store on disk in 2014. Memory-only is too limiting and has evolved to a notion of "memory-first."
- AndyNemmity 8 years ago
  
  Really a sort of odd choice though. I'd have gone the other way around with column store in memory, and row store on disk.
  Not that I think that's ideal either though, having both in memory for hot used data, and the rest on disk is ideal. With an extremely easy to use setup that makes it essentially automatic, but with rules engines for finer grained tuning.
  
  manigandham 8 years ago
  
  MemSQL actually adds an in-memory rowstore to each columnstore for rapid ingest of new rows until they get compacted into a new segment. Columnstore data is pretty fast so it works well off disk compared to row stores which aren't as efficient.
  SQL Server similarly has the hekaton in-memory tables + columnstore indexes and the latest version allows combining both for in-memory columnstores.
  
  AndyNemmity 8 years ago
  
  I've used MemSQL, and it's rapid ingest by the default isn't ACID compliant, so it sort of depends on how you compare it.
  The results of the columnstore data was pretty fast, and it's even faster in memory. Depends on what you're doing, and what the requirements are.
  Was really impressed by MemSQL, and loved the wire compatibility with mysql, so don't take this as just a knock on MemSQL in anyway.
  
  manigandham 8 years ago
  
  > by the default isn't ACID compliant
  What do you mean?
  
  AndyNemmity 8 years ago
  
  I'm surprised you wouldn't understand this as it's an absolutely requirement given you are a user of MemSQL.
  But then I went to their docs to link you to the details, and it feels like they intentionally avoid stating clearly the problem.
  Essentially, they allow committed transactions to hit memory and not disk. They allow you to configure it so that's not the case, but it isn't the default, and looking over the current documentation they certainly aren't clear about it like an open source project would be.
  transaction-buffer needs to be set to 0 for durability, but the way the docs are explaining it is trying to confuse not being durable, as a different kind of durability.
  I'm not interested in getting into a long discussion about this though, but it's difficult to explain the literal issue when they do such a marketing job of trying to hide the specifics.
  Now I'm far less surprised a user wouldn't know this. Apologies for my forward initial statement.
  
  manigandham 8 years ago
  
  It's not a surprise, that's normal. Durability just means writes are safe, how it does so doesn't matter. All databases use a write-ahead log with small batch/async flushes and this is the same with MemSQL.
  The enterprise edition has HA so with data on 2 nodes for safety. Otherwise you lose the whole point of in-memory performance (for writes) if you're going to write every single bit to disk immediately.
  They explain it clearly on the durability page: http://docs.memsql.com/docs/using-durability-and-recovery
FractalNerve 8 years ago

Given a time-series history of DRAM/SSD prices, by "what value" do you need to be able to buy 1TB RAM (or anything approaching RAM speeds) in order to make VoltDB and in-Memory Databases competitively advantageous?
So, given this insider knowledge of yours, can we make a prediction by what date predicively DRAM/SSD-NVMe prices may make in-Memory Database Startups lucrative again?
--
Offtopic:
I feel empathy for you, being overrun in decisions as an engineer in a field you're the expert in market and technology by decision makers can be heart-breaking.
EDIT:
removed irrelevant personal experience
- arielweisberg 8 years ago
  
  So what happened that isn't shown in most analysis of RAM costs is that RAM didn't go down in cost that much for many people. For instance RAM in the cloud is still very expensive.
  What is also not shown is that cold data is everywhere. You need to have it, but paying to put it in RAM generates zero value for a profit seeking business. So if you don't page out cold data you effectively throw yourself out of the running for a huge swath of use cases.
  For a small deployment sure it's dwarfed by engineering costs. But infrastructure per engineering head count is trending towards more infrastructure per head and infrastructure cost matters to more businesses.
  The other thing is that data volumes are also increasing at a rate competitive with RAM is decreasing in price. This is because there are new opportunities to make money using more data and this is a trend you can't really beat. The more data you can have the more use cases and lines of business get invented.
  This is not a scientific analysis it's just conjecture based on anecdata from my time in the industry.

tw04 8 years ago

The price of disk has dropped at nearly the same pace as ram. As has the cost of compute. At the same time data growth has increased faster than either has dropped... so I'm not really sure the price argument holds water. If I can buy ram at 1/100th the cost but I need to store 500x more data... that isn't a net win on cost.

From ~$1000.00/gb to $0.03/gb

http://www.mkomo.com/cost-per-gigabyte-update

runeks 8 years ago

It would be interesting to see that chart updated to 2017 data. It appears the downward slope becomes significantly less steep around 2009 (looks like the price dropped as much from 2006-2008 as it did in the five years 2009-2014), and I’d be interested in seeing how recent SSD prices affect this. As far as I can see, rotational HDD technology is at the end of its S-curve, whereas SSD technology is still relatively new.
- notpeter 8 years ago
  
  Backblaze recently updated that data through Q2 2017 [1]. Hard drives have only dropped to $0.028/GB. For comparison SSDs are still ~$0.40/GB (~14x).
  [1]: https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
  [2]: https://pcpartpicker.com/trends/internal-hard-drive/
ksec 8 years ago

Data Growth is faster then RAM price decrease, and at the moment it is actually increasing.
While I dont believe in Infinite growth of data, I still think a RAM only DB isn't as good if we have SSD that is ridiculously fast. My thinking is that RAM / SSD should always be 1:5 or 1:10.

sumanthvepa 8 years ago

The problem with asking a programmer to keep track of the locality of their data, is that most modern programming languages make reasoning about locality hard to do. With the exception of C and C++. Even for those languages, unless all relevant data is in simple arrays, making assertions about locality is hard.

For interpreted languages like Python or Javascript, figuring out RAM storage and access patterns of data is very hard. So we probably need programming language mechanisms to help with understanding the locality patterns of our programs and probably tooling to help change it.

humanrebar 8 years ago

Even for C++, most designs are OOP, which treats data layout as an afterthought.
hinkley 8 years ago

Lots of developers are blind to issues of local reasoning. It becomes a self fulfilling prophecy, because the code they write is often inscrutable. The people who generally can follow things like this can't anymore.
Const-me 8 years ago

> Even for those languages, unless all relevant data is in simple arrays, making assertions about locality is hard.
It depends on the libraries you use. Take a look: https://github.com/Const-me/CollectionMicrobench
As you see, in practice, a good linked list is same or slightly faster than std::vector. And it’s consistently 2-3 times faster than equally linked std::list.
That’s not just synthetic tests. Recently, I’ve got 2.5x performance improvement in my app just by switching from std::unordered_map to CAtlMap with the same keys/values.
Theoretically, C++/11 fixes that with stateful allocators. Practically, I’ve not seen good open source ones with the performance comparable to CAtlPlex that powers these ATL node-based collections. I’m not even sure it’s possible. STL is too standardized and too old. It might be there’s no room in its allocators API for sufficient level of integration between a collection and it’s backing stateful allocator.

0xbear 8 years ago

And 99% of developers spend their entire careers without giving as much as a passing thought to cache locality. Quick, how long, in cycles, does it take to retrieve data from RAM? About 200 cycles. 200 cycles is a very long time if you miss cache often. Scattered RAM reads can be _slower_ than sustained linear disk reads (that is, once the disk actually gets around to reading, which takes a while).

lathiat 8 years ago

It's even worse when you get into interpreted languages like Python and Ruby. Bad efficiency at that level which translates to dreadful if not totally broken efficiency at the cache level.
- bitL 8 years ago
  
  Bbbut JIT compiler solves everything, right?
  BTW, this is a blind spot/elephant in the room of most *VM-based and functional languages nobody wants to talk about...
flukus 8 years ago

90% of developers are working in languages where you can't really do much about cache misses, or doing so will at least involve some very non-idiomatic code. If you can't do much about the problem it's not really helpful to be thinking about it much.
- 0xbear 8 years ago
  
  I don't disagree. And for 90% of them worrying about cache locality or branch mispredictions on a daily basis would be a waste of time. It's fine to deliberately ignore such concerns. It's somewhat less fine to know absolutely nothing about how programs are actually executed, and what makes them go fast.

adrianratnapala 8 years ago

If a terabyte is a lot of data to you -- and it is for many, many things, then this post is right; you should buy as much RAM as you have data, and access it accordingly.

The commenters who are saying disk has a different price/performance trade-off that is still valuable are also right, but that applies to large data sets.

AndyNemmity 8 years ago

I worked on a petabyte in memory hana cluster. It all depends on what you're doing, and how important it is to you.
I don't even know what a large data set is anymore. I think my general definition is one you won't put into memory, whatever your threshold is for that.

vbezhenar 8 years ago

May be with very high-end servers it is. But generally it's not. I can buy 4TB HDD For $200. I think, I'll have to add 2-3 zeros for 4TB RAM machine, and I'm not even saying only about RAM, I need some server motherboard, some server processor, while I can use 4TB RAM HDD with pretty much any computer. And SSD isn't going to be even with HDD in near future as well for $/Byte. So optimizing software for HDD won't go anywhere. But, of course, it's awesome to have some alternatives if you have money and need more performance.

zeusk 8 years ago

NVMe drives which are approaching RAM speeds are a good compromise if RAM and server components are outside of your financial reach.
- pvg 8 years ago
  
  For a very generous definition of 'approaching'.
- dis-sys 8 years ago
  
  many NVMe drives on the market are useless jokes. try some from the now biggest semiconductor company, test their fsync() performance and don't get a heart attack for seeing those ugly numbers. ;)
  
  olavgg 8 years ago
  
  This!
  https://forums.servethehome.com/index.php?threads/did-some-w...
  
  dis-sys 8 years ago
  
  hi olavgg, I was searching for fast fsync & low cost NVMe SSD a few months ago, so I looked into consumer NVMe SSDs. Samsung 960 Pro was the first I tested, the results were just shockingly bad. It was so bad to the extent that I started to question whether my kernel/installation caused the slowness issue. Searched online and found a few of your posts talking about the exact same problem you saw. That saved me quite a bit time. :)
  Yes, totally agree with the conclusion in your link above, consumer SSD (NVMe or not, high end or cheap) doesn't worth a dime. Cheers!
- FractalNerve 8 years ago
  
  @Zeusk is right, I wish costs of SSD-RAMs like Diablo Memory1 [1] were more affordable or transparent.
  Summary performance comparison by Storage Class: http://xitore.com/what-is-nvm-x
  NVM-X RAM ≈ DDR4-3200 ↧Availability|↥Cost| ↥4TB@25,6 GB/s Diablo Memory1 ↥Availability|↥Cost|↱256GB@10GB/s
  EDIT, added source: [1] http://www.diablo-technologies.com/memory1/

olegkikin 8 years ago

Are you sure latency stayed around 100ns?

http://pics.crucial.com/wcsstore/CrucialSAS/images/campaigns...

vvanders 8 years ago

Probably not but I think the point stands that latency hasn't made any major strides since the switch from SDR to DDR.
nikhilgarg28 8 years ago

That's very interesting. The estimate of 100ns came from here: https://people.eecs.berkeley.edu/~rcs/research/interactive_l.... and is probably not very precise (maybe because it is only capturing rough order of magnitude). I have now updated the post. Thanks for the feedback! Specific constant aside, the point about latency not improving much still holds.
- nayuki 8 years ago
  
  Your link got truncated with ellipses. Here we go: http://www.eecs.berkeley.edu/~rcs/research/interactive_laten...
everheardofc 8 years ago

There are these things called speed of light and virtual memory. The latency of the physical RAM is completely irrelevant unless its embedded directly on the cpu

sand500 8 years ago

Ive wondered in estimation how much of "my" data is stored in various types of memory. CPU cache if I am actively browsing a website, those last couple IMs stored in the RAM of some server? Any content i have ever uploaded to the internet is probably on a hard disk ready to be brought to cache at a moments notice. Then there are all the backups on tape.

WalterBright 8 years ago

Extensive use of "RAM" disks was commonplace in the 1980s.

slackingoff2017 8 years ago

And old is new again :). The ratio of ram to disk cost has historically varied wildly. The pendulum will come around again in a few years when huge SSD's are cheap.
I have a feeling with multi terabyte SSD's at cheaper prices we'll be shuffling all our data back to "disk" again :).

toast0 8 years ago

I think the insight here may be that ram should be optimized for sequential access, just like the disks of old.

ajross 8 years ago

It is, that's what "fast page mode" was in the late 90's and what prefetch queueing is in DDR.
pjc50 8 years ago

That has been the case since about the mid-90s.

godelmachine 8 years ago

This might be the single most important article I have read in the past one week, because Adrian Colyer is on vacation! May I add that even SAP HANA is designed for in memory computing? As far as disks are concerned, NVM should soon replace them.

jlebrech 8 years ago

with "serverless" applications you can read the whole app into memory and run it and then clear it for the next app, which i'm sure speeds things up.

icebraining 8 years ago

That's not how it works; programs are kept "warm" for some time after each requests, or indefinitely (e.g. in App Engine you can choose dynamic or resident instances).
- kthejoker2 8 years ago
  
  Just chiming in to say this is also true for Lambda and Azure Functions.
  What Id really like is for them to scale up to full fledged VMs once some usage or performance threshold was hit.

amelius 8 years ago

If only programming languages supported "offsetted pointers", we could use mmapped files and store arbitrary data structures in them without hassle.

icebraining 8 years ago

Many programming languages use references to other objects liberally. Wouldn't it be hard to keep it all contained so that you could restore it later?

sriram_iyengar 8 years ago

Excellent insights. If cost is not a factor, how does improvements in SSD space stacks against RAM ? Thanks

jsudhams 8 years ago

Not sure i agree, on Server spec the ram cost 10% more than CPU like if CPU cost 2000 then ram cost would be like 2200. Also it is not scalable for amount of data and not sure if I agree on laptop as well , 8gb ddr3 is about $80 while I can get 128gb ssd or 1tb magnetic disk so really can't use memory instead of disk. Except in few cases

drudru11 8 years ago

This post is 10 late

rodgerd 8 years ago

> If anything, I would suspect that the developers have become costlier over time, at least in the last 10 years or so.

Really? Have developer costs actually increased in real terms in the last 10 years? Have your developer costs (if you're outside the VC/SV bubble) increased in real terms? And how much?

This seems like a terrible assumption.

jerf 8 years ago

The real point is that developer time has not kept up with the rate of RAM price decrease, and unless you plan on seriously defending the claim that developers only cost 1/6000th of what they used to twenty years ago, the points in the blog post stand.