πfs – A data-free filesystem

98 points by tosh 4 years ago

Previous discussions

PiFS – The Data-Free Filesystem (February 20, 2021 — 1 points, 1 comments) - https://news.ycombinator.com/item?id=26208704

Πfs: Never worry about data again (October 25, 2019 — 3 points, 1 comments) - https://news.ycombinator.com/item?id=21359338

The π Filesystem for FUSE: Store Your Data in π (February 21, 2019 — 1 points, 1 comments) - https://news.ycombinator.com/item?id=19223032

pifs - Avoid disk space usage by saving your files in the digits of Pi (December 14, 2018 — 3 points, 1 comments) - https://news.ycombinator.com/item?id=18687275

πfs – A data-free filesystem (March 14, 2017 — 285 points, 105 comments) - https://news.ycombinator.com/item?id=13869691

Πfs: Stores your data in π (January 6, 2016 — 2 points, 1 comments) - https://news.ycombinator.com/item?id=10856108

Πfs: Never worry about data again (January 5, 2016 — 5 points, 1 comments) - https://news.ycombinator.com/item?id=10847693

netflixandkill 4 years ago

I love that they ran with this far enough to get it working. We need a graph of the average number of bits to store an offset into pi versus size of stored data.

floren 4 years ago

Well, based on this sentence:
> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.
I'd say best-case scenario, you're looking at 1:1 offset storage size vs. stored data size :)
- k__ 4 years ago
  
  How would the location search slow down with inceased block size?
  And is the algorithm to do so faster on a quantum computer?
- netflixandkill 4 years ago
  
  that's way worse than 1:1 unless all integers between 0 and 255 occur in the first 256 digits of pi, which I'm 99.pi% sure is not the case.

noxer 4 years ago

Wait until people find out all the CSAM is stored on there. They can't ban π soon enough. Its worse than bitcoin. /s

Stampo00 4 years ago

I had to look up the acronym. Are we not allowed to say "kiddie porn" on here?
- account-5 4 years ago
  
  Child sexual abuse material because that's what it is: sexual abuse. Calling it porn sends the message that it's legitimate, it isn't.
- seanw444 4 years ago
  
  It's the politically-correct version now, apparently. Hadn't heard it before the Apple phone-scanner debacle.
- noxer 4 years ago
  
  If you applies the same interpretation as with "gay porn" to "kiddie porn" then its would be something completely different. It would be porn performed (acted) by kids and possibly targeted at kids. That does make much sense. They are not acting, they are abused and people should name it like that.

pastrami_panda 4 years ago

Love it, the tone of the readme is amazing.

Protostome 4 years ago

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

I don't get it. we simply replace one byte (the data) in another byte, or even more than that (the index in pi) What am I missing?

Besides, why do you have to "search" pi? why not just make a table mapping all possible 2/3/4 bytes (256^(2/3/4) combinations) to it's corresponding positions in pi, and every subsequent compression will run much more efficiently.

BTW, it is very easy to show that a simple huffman code based compression yields a better compression ratio than this method.

_tom_ 4 years ago

Looks like "NFS" in the hacker news font.

redconfetti 4 years ago

I wonder if this concept can be used for Πcoin.

gfodor 4 years ago

I wonder how hard it would be to find offsets into pi that contain surprisingly legit bit sequences.

nickdothutton 4 years ago

This. Given a few millions digits I wonder what the hit rate is.
- MauranKilom 4 years ago
  
  Assuming your data is much shorter than the number of digits to search, and that repeated digits do not appear often enough to matter, the hit rate is just pow(10, numberOfDigitsInData) / numberOfDigitsToSearch. Same idea for any other base (if you then count digits of that base, not base 10 digits of course).
  That is, odds of finding a 6 digit datum in a million digits are fairly good. Finding longer data becomes exceedingly unlikely very fast.
  
  gfodor 4 years ago
  
  I think I'm asking the inverse question - instead of having a known-good datum, given the fact that Pi isn't random, I wonder what coincidentally you could stumble onto, given a broad enough heuristic to "discover" interesting sequences.
  
  MauranKilom 4 years ago
  
  For all we know, Pi is random (i.e. normal, although we haven't been able to prove it). That would mean any sequence appears eventually, with uniform odds. Hence any ("interesting") data you'd want to store does appear at some point, and I gave the odds of any (interesting or not) digit string appearing in the first x digits.

betwixthewires 4 years ago

This is seriously a very interesting concept. Sounds like tower of babel but somehow much more useful for it's obvious purpose.

generalizations 4 years ago

Do you mean the library of babel?
- wizzwizz4 4 years ago
  
  No, the Tower of Babel.[0] With this revolutionary technology, we can keep track of information using only metadata; in the information age, such a “digital Tower of Babel” could let us attain ever-increasing heights of Knowledge, if we are not scattered as a result.
  [0]: https://xkcd.com/496/
  
  MauranKilom 4 years ago
  
  I don't get how the internet secretary thing relates to the tower of babel... Did you mean https://xkcd.com/2421/?
  
  wizzwizz4 4 years ago
  
  “You mean the fifth?”
  “No, the third.”
- betwixthewires 4 years ago
  
  Yes.

einpoklum 4 years ago

The last commit was made 5 years ago.

tmountain 4 years ago

I doubt Pi has changed much…
- pindab0ter 4 years ago
  
  To be fair, this is Hacker News.
- jrootabega 4 years ago
  
  pi was abandoned in favor of pi 2
  
  seanw444 4 years ago
  
  All hail Tau