It's fun to see a plaintext accounting view as the example...
I just switched from QuickBooks to Beancount+Fava for my sole proprietorship, and couldn't be happier. I've added a text-based simple invoice system, a text-based vehicle mileage tracker, and have validators that ensure that every expense with a tax status has a document attached to it.
It's far easier and faster to use than QuickBooks, I don't have to put up with ads, and with git + RFC3161 attestation of commits, I can prove I made additions when I said I made them, and there's no accidental erasures from lazy text edits, and it's a simple command to see exactly when each entry was made.
All based on plain text at the core, but I've now added Fava extensions so that I can do it all in the browser when I want. If there was a TUI fava with graphs, great, but the web isn't so bad either. Now, let's see what my accountant thinks of this...
This is really good inspiration for some of my plain text accounting projects! Could you please go into more detail about your RFC3161 attestation of commits? I'm assuming you're signing your commits with a gpg key to assert that it was in fact you who made the commit. Do you use an external timstamping service and an external ca authority, or do you build your own chains of trust? If you were asked to attest your accounting commits, what would that look like to the auditor?
The time stamp stuff is mostly a lark, because the bank statements and receipts are probably all that really matters, but it was fun!
I use freetsa.org and OpenSSL on the git commit hash to tie that commit to a particular point in time. I also added the Bitcoin based opentimestamps-client time stamping, but even fewer auditors would believe that it's of any value... Edit: I only timestamp after account reconciliation right now, and will do it when I close the books for a year. The files for attestation get attached to the commit with a git note, and get added to a directory for easier browsing. An LLM can write scripts for this, probably from just copying and pasting this comment as direction. I installed them as git subcommands.
Other CAs offer for-fee time stamp attestation, and I hear it may hold value in the EU, but here in the US it's only for fun, and for very small values of fun!
Thank you for taking the time to reply! Agreed, it does seem like any auditor would be able to verify all this information from other sources, but I really like the idea of having multiple independent levels of attestation for my organizations important financial and legal documents.
I love this kind of thing so much. One thing I try to keep in mind when doing my own simple file formats is knowing what I would have to do to convert it to a more common format, if I need to for any reason. Knowing I have an "escape plan" if I need to get them to a different format for someone makes me feel good about it.
In your case, I bet you could easily convert it to a CSV that your accountant could tolerate.
That escape plan was my entire motivation. When I first started using QuickBooks I did a data export and it seemed like maybe it had everything. After a year with a disastrous UI change, spammy ads everywhere throughout the UI and by email for a product a pay a ton of money, I started to investigate alternatives and realized that QuickBooks export missed most of the work I had been putting into my books (receipts, invoices, and reconciliation). And once I knew that I couldn't use it as a primary data store anymore.
Now that I control my data, all forms of transformation are almost trivial, as you say!
Thanks for mentioning Beancount; I didn't know about it before. I'm American but have a job in a different country, so I deal with two currencies routinely, and haven't found a good way of handling multiple currencies in Gnucash. So my wife and I have been keeping our records in text files. I'll look into what it would take to switch over to Beancount; I bet I could write a conversion script (or get an LLM to write most of it for me, even) that would do 95% of the work since we've been pretty consistent in our format, and throw up warning messages about any entries that it couldn't parse (we haven't necessarily been 100% consistent).
Hey I'm working on a rust implementation of Beancount, if you get a chance check it out. https://rustledger.github.io .
I'm also hoping to put together a standards body that can formalize plain-text accounting standards too so others can more easily implement their own systems that conform to the standard.
One small warning about Beancount: v3 is a huge change from v2, and a lot of the documentation on the web is in v2 and it's not clearly demarcated that it doesn't apply anymore.
I'd recommend looking up information with an LLM, and not trusting web pages. This is the first time I've made such an unusual suggestion, but LLMs are actua really good at information retrieval and translation. The Beancount community has a bit of cleaning up to do before web searches can be trusted again.
That said, Fava is such a huge step up from the hledger web ui that it's totally worth it, especially since I want quick and easy attachment of documents and viewing of documents. It's super easy to convert between hledger/ledger and beancount formats, so it's fairly easy to switch to the other if you already have your data in such clear and clean formats.
> Fun to see a contemporary take on something that peaked between 1970s–1980s
Maybe that was the peak, but you had some very good TUIs in the early 1990's for DOS apps, where Windows hadn't quite completely taken over yet, but you very likely had a VGA-compatible graphics card and monitor, meaning you had a good, high-resolution, crisp and configurable-font text mode available, and also likely had a mouse. This is the stuff I grew up with: QBASIC and EDIT.COM for example. Bisqwit has a cool video about how some apps from that era could have a proper mouse cursor, even: https://www.youtube.com/watch?v=7nlNQcKsj74
The peak of TUIs is now. Take a look at Omarchy, an entire operating system built around terminals and config files, it's nirvana. I can only imagine how much farther down this road things may go as we enter a world where the primary interface is conversation with the machine in text. I'm sure I'll get downvoted for that last part because Reddit -- (cough) I mean Hacker News - hates AI, but I'm genuinely excited for the future.
I hope it’s inevitable. Most users’ computing workloads by far are text oriented. The terminal is capable of flexbox now. Current GUIs create massive complexity and power draw relative to their value. Over a long enough arc, economic inefficiency is doomed.
That’s the tip of a conceivable iceberg but exactly. Also look at kitty graphics protocol.
Look at the amount of engineering resources we pour into OS GUI toolkits and then browsers. Those layers of complexity aren’t there because we stood back and said, “given what we know in 2026 how should we design a GUI compositor?”. The majority of the stack is written how it is by archeological happenstance. One generation adds on top of the prior since the 60s.
I’d say start from the terminal, fix the rendering limitations that drove the split from terminal and then to the browser. If we pin down efficient GUI, we could have machines that cover non graphics workloads which is the vast majority with solar and the equivalent of a 6502.
The amount of energy wasted on modern stacks relative to the tasks being delivered is incalculable.
I 100% agree, and this isna big reason why I find the current state of education so suboptimal. Everyone just goes on to do webdev, completely ignoring the lower levels and taking it all for granted. The thing is, there's no real innovation to be done that high up the stack. When you're that high you mostly just write glue code to stick parts someone else wrote together. Real innovation comes from quite a few levels down the stack, starting at the native code level downwards.
Like you pointed out, the current stack is heavily unoptimized and has a terrible architecture; it's only the way it is because of happenstance and tides of the market (companies always reaching for faster over better). An actual "nirvana" in computing like the other guy said would require bulldozing a good chunk of our current stack, keeping only kernels and core utilities, if even.
I really wish we had a bigger focus on getting good foundation instead of making yet another JS framework and SaaS, but then again, who's paying developers to actually do something of quality nowadays?
Comments like this is proof that the old-school hacker spirit is alive and kicking. This kind of pride in efficient and artful use of computers is needed more than ever.
You easily have 4k pixels, why use a tiny subset of those in a very inefficient way? We have proper hardware to make a bunch of these computations actually fast, and yet we should stuck with drawing relatively expensive text everywhere?
If you only care about the UX of TUIs, that I can stand behind (though mostly as a guideline, it doesn't fit every workflow), but you can do that with a proper GUI just as well.
> If you only care about the UX of TUIs, that I can stand behind
This is a confusing concession. Of course we love TUIs because of the UX, what other reason is there?
Constraint breeds consistency and consistency breeds coherence.
Take 1,000 random TUI designers and 1,000 random GUI designers and plot the variations between them (use any method you like)—the TUI designers will be more tightly clustered together because the TUI interface constrains what's reasonable.
Yes of course you CAN recreate TUI-like UX in a GUI, that's not the issue. People don't. In a TUI they must. I like that UX and like that if I seek out a TUI for whatever thing I want to do, I'm highly likely to find a UX that I enjoy. Whereas with GUIs it's a crapshoot. That's it.
> the TUI designers will be more tightly clustered together because the TUI interface constrains what's reasonable.
It constrains what’s possible, not what’s reasonable. For example, one could typically fit more text on a screen by compressing it, but most of the time, that’s not the reasonable thing to do.
I’m saying most of the time because of the existence of English Braille (https://en.wikipedia.org/wiki/English_Braille#System) which uses a compression scheme to compress frequently used words and character sequences such as ‘and’ and ‘ing’ shows that, if there is enough pressure to keep texts short, humans are willing to learn fairly idiosyncratic text compression schemes.
One could also argue Unix, which uses a widely inconsistent ad-hoc compression scheme, writing “move” as “mv”, “copy” as “cp” or “cpy” (as in “strcpy”), etc. also shows that, but I think that would be a weaker argument.
> It constrains what’s possible, not what’s reasonable.
Why do you say "constrains what’s possible, not what’s reasonable", as though it's one and not the other? Does possibility conflict with reasonability? I would think it's not an either/or, it's a both/and.
The set of reasonable things is bounded by the set of possible things. So if the constraints of TUI design make certain things impossible, surely they make those same things unreasonable at the same time.
Try a 300 baud modem for a few months and good money says something terribly modern like Get-MrParameterCount would get compressed, a lot. Here's Bill Joy on the topic:
> No. It took a long time. It was really hard to do because you've got to remember that I was trying to make it usable over a 300 baud modem. That's also the reason you have all these funny commands. It just barely worked to use a screen editor over a modem. It was just barely fast enough. A 1200 baud modem was an upgrade. 1200 baud now is pretty slow. — "Bill Joy's greatest gift to man – the vi editor". The Register. 2003.
I'm sorry, excellent GUI with Blender? With the 2.5 interface things were ass backwards but you had a bunch of stuff you could do with only the mouse. With the 2.8 interface suddenly a bunch of stuff was hidden behind arcane key combinations, options disabled by default, and the loss of important visual data like the bounding box view and having both the UV and cursor coordinates in the same tab in the UV/image editor. No matter what the controls are different with every sub-window type, and interface panels flip from top to bottom and left to right for best readability without thought spared for consistency. There's a reason why someone can learn FL Studio in a few weeks, but take months or even over a year to become competent in Blender. I love it's jank and have been using it for eleven years, but I would never call the UI more than serviceable.
When you are "drawing text everywhere", you end up not having to draw all that much text. 3d models have more and more polygons as graphics cards improve, but the 80x24 standard persists for terminals (and UX is better for it). And I'm not even that convinced of "relatively expensive". Grokking UTF-8 and finding grapheme cluster boundaries has a lot of business logic, but it isn't really that hard. And unless you're dealing with Indic or Arabic scripts that defy a reasonable monospace presentation, you can just cache the composed glyphs.
I'm curious: Do you have a nice set of GUI applications that come with the UX you'd expect of TUIs?
(I'm not actually sure what the UX of TUIs is I love so much. Relative simplicity / focus on core features? Uff, notepad wins this one on vim. Fast startup times? I use gomuks, that takes a minute for the initial sync. No mouse? Moving around in TUI text editors with hjkl is slow. I either jump where I want to go with search or use the mouse. Lightness over SSH/network is the only thing I can't come up with a counterexample for.)
Blender? There you have to use a mouse because you have a much much bigger state space to control.
Also, Intellij is perhaps a better example. You can fully control it via only the keyboard, yet no amount of plugins would turn (neo)vim into something as capable as it is. And it makes good use of the extra pixels - human can take in much more information than a text grid.
You could double or quadruple the number of pixels, and it wouldn't make any difference in how much information humans comprehend easily. You would be using more computing power and more memory to deliver the same amount of useful information less efficiently.
A "proper GUI" is rarely better than a well-designed TUI for communicating textual information, IMO. And the TUI constraints keep the failure-states for badly-designed UI tightly bound, unlike GUI constraints.
What about a map, or an image? We can surely agree that humans can take in a lot more information than a readable letter-grid allows, depending on the type of information.
Sure, of course sometimes an image conveys things better than a thousand words. But a very large percentage of what most people do with computers is primarily text, with more images in ads than useful content. By and large GUIs don't use images to convey information better, they just make text worse.
Modern terminal software supports displaying images, for what it's worth.
> Modern terminal software supports displaying images, for what it's worth.
In a worse, and dramatically overcomplicated way. Like it's kind of funny that largely the same people that is all for this supposed ultra minimalism would be celebrating a Rube Goldberg way of doing graphical interfaces? (Because in the end it is a graphical interface).
One of TUI advantages over GUIs (including modern web sites) - all text can be selected/copied (you may need to use modifies in some TUI). It's a bit frustrating when GUI shows text but I cannot select and copy it.
It's not only about buttons. A web-app of trading platform I use doesn't allow to copy-paste a fund name (both in web and in the mobile app). I don't think they disallow this intentionally, likely an artefact of GUI framework they use.
I was confused about why your comment was being downvoted; it sounded like an honest opinion... Until I got to the last sentence. You wrote a self-fulfilling prophecy.
Plain text coupled with non-deterministic interfaces (AI) is not great. It’s like a hybrid: some of the best of old school tech coupled with the most sketchy high tech.
I will now get to have Kafkaesque conversations with computers in MarkDown.
We had "opinionated" TUIs with emacs, and Omarchy will never surpass emacs' ease, shallow learning curve, and configurability. Emacs is the operating system of the future, and you can already integrate AI with it. It provides everything you need or want or don't know you want except a decent text editor.
I used emacs for years, switched to neovim last month. It's just too old and crufty for modern use, and modernizing frameworks like Doom / Spacemacs only add to the complexity. Nice modal workflows just go against emacs' nature.
I'm imagining a TTY-like interface that you can simultaneously type into, speak and gesture at, and whatever else I'm not thinking of (maybe with the "shell" creating a list of suggestions/ anticipating future tasks in the background based on voice input?) Doubt it would be at all practical, if only because the keyboard as a primary input device might not be as much of a thing when you can generate most code/text, but kinda fun to think about.
Not just TUIs the whole stack is converging back to text. I run ~15 personal tools and every one that survived past the first month stores data as JSON/markdown in git repos.
Text in git gives you versioning, sync, grep, and you can hand the whole thing to an LLM with zero serialization. It's perfect for me.
Mainframe terminals were nasty, I remember them. Similar high keys to recent keyboards that are so liked by gamers… but I get it. My dentist had a carpal tunnel that needed surgical procedure.
If I don’t use mouse all day my hands are pretty much all right.
Good mouse and keyboard is a key to productivity.
Everytime I see someone using touchpad I don't get it. My fingers would be on fire in 10 minutes. People don't know what they're getting themselves into.
Couldn't help riffing off on a tangent from the title (since the article is about diagramming tools)...
Dylan Beattie has a thought-provoking presentation for anyone who believes that "plain text" is a simple / solid substrate for computing: "There's no such thing as plain text" https://www.slideshare.net/slideshow/theres-no-such-thing-as... (you'll find many videos from different conferences)
Haven't watched the videos yet, but from the slides, it looks like part of the issue he was talking about was encodings (there's a slide illustrating UTF-16LE ve UTF-16BE, for example). Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again. It has a much larger set of valid characters, but if you receive a text file without knowing its encoding, you can just assume it's UTF-8 and have a 99.7% chance of being right.
> Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again.
Whenever I hear this, I hear "all text files should be 50% larger for no reason".
UTF-8 is pretty similar to the old code page system.
Hm? UTF-8 encodes all of ASCII with one byte per character, and is pretty efficient for everything else. I think the only advantage UTF-16 has over UTF-8 is that some ranges (such as Han characters I believe?) are often 3 bytes of UTF-8 while they're 2 bytes of UTF-16. Is that your use case? Seems weird to describe that as "all text files" though?
UTF-8 encodes European glyphs in two bytes and oriental glyphs in three bytes. This is due to the assumption that you're not going to be using oriental glyphs. If you are going to use them, UTF-8 is a very poor choice.
UTF-8 does not encode "European glyphs" in two bytes, no. Most European languages use variations of the latin alphabet, meaning most glyphs in European languages use the 1-byte ASCII subset of UTF-8. The occasional non-ASCII glyph becomes two bytes, that's correct, but that's a much smaller bloat than what you imply.
Anyway, what are you comparing it to, what is your preferred alternative? Do you prefer using code pages so that the bytes in a file have no meaning unless you also supply code page information and you can't mix languages in a text file? Or do you prefer using UTF-16, where all of ASCII is 2 bytes per character but you get a marginal benefit for Han texts?
A file isn't meaningful unless you know how to interpret it; that will always be true. Assuming that all files must be in a preexisting format defeats the purpose of having file formats.
> Most European languages use variations of the latin alphabet
If you want to interpret "variations of Latin" really, really loosely, that's true.
Cyrillic and Greek characters get two bytes, even when they are by definition identical to ASCII characters. This bloat is actually worse than the bloat you get by using UTF-8 for Japanese; Cyrillic and Greek will easily fit into one byte.
UTF-8 does not require a byte order mark. The byte order mark is a technical necessity born from UTF-16 and a desire to store UTF-16 in a machine's native endianness.
The byte order mark has has no relation to code pages.
I don't think you know what you're talking about and I do not think further engagement with you is fruitful. Bye.
EDIT: okay since you edited your comment to add the part about Greek and cryllic after I responded, I'll respond to that too. Notice how I did not say "all European languages". Norwegian, Swedish, French, Danish, Spanish, German, English, Polish, Italian, and many other European languages have writing systems where typical texts are "mostly ASCII with a few special symbols and diacritics here and there". Yes, Greek and cryllic are exceptions. That does not invalidate my point.
As someone who has been using Cyrillic writing all my life, I've never noticed this bloat you're speaking of, honestly...
Maybe if you're one of those AI behemots who works with exabytes of training data, it would make some sense to compress it down by less than 50% (since we're using lots of Latin terms and acronyms and punctuation marks which all fit in one byte in UTF-8).
On the web and in other kinds of daily text processing, one poorly compressed image or one JavaScript-heavy webshite obliterates all "savings" you would have had in that week by encoding text in something more efficient.
It's the same with databases. I've never seen anyone pick anything other than UTF-8 in the last 10 years at least, even though 99% of what we store there is in Cyrillic. I sometimes run into old databases, which are usually Oracle, that were set up in the 90s and never really upgraded. The data is in some weird encoding that you haven't heard of for decades, and it's always a pain to integrate with them.
I remember the days of codepages. Seeing broken text was the norm. Technically advanced users would quickly learn to guess the correct text encoding by the shapes of glyphs we would see when opening a file. Do not want.
> A file isn't meaningful unless you know how to interpret it; that will always be true.
There are multiple levels of meaning, though; character encoding is just one part of it. For example, a text file might be plain text, or HTML, or JSON, or a C source code, etc; a binary file might be DER, or IFF, or ZIP, etc; and then there will be e.g. what kind of data a JSON or DER or IFF contains and how that level of the data is interpreted, etc.
> Cyrillic and Greek characters get two bytes, even when they are by definition identical to ASCII characters.
Whether or not they are identical to ASCII characters depends on the character set and on other things, such as what they are being used for; the definition of "identical" is not so simple as you make it seem. Unicode defines them as not identical, which is appropriate for some uses but is wrong for other uses. (Unicode also defines some characters as identical even though in some uses it would be more appropriate to treat them as not identical, too. So, Unicode is both ways bad.)
> This bloat is actually worse than the bloat you get by using UTF-8 for Japanese; Cyrillic and Greek will easily fit into one byte.
I agree with that (although I think UTF-8 should not be used for Japanese either), but it isn't because of which characters are considered "identical" or not. There are problems with Unicode in general regardless of which encoding you use.
> ... (although I think UTF-8 should not be used for Japanese either) ...
The people putting up websites in Japanese disagree with you, it would seem. According to Wikipedia (in the Shift JIS article), as of March 2026 99% of websites in the .jp domain were in UTF-8, with only 1% being in Shift JIS.
Japan used to have two different encodings in common use, Shift JIS (usually used on Windows) and EUC-JP (more common on Unix servers). This resulted in characters being misinterpreted often enough that they coined the word mojibake to describe the phenomenon of text coming out completely garbled. These days, it seems Japanese website makers are more than happy to accept a slight inefficiency in encoding size, because what they gain from that is never having to see mojibake again.
If they are misinterpreted, it is because the character encoding is not declared properly.
I still sometimes see mojibake in Japanese web pages, but sometimes it works; if it works, it is because the character encoding is declared properly.
In my opinion, EUC-JP is a generally better encoding of JIS (especially in e.g. C source code, which should not use Shift-JIS but EUC-JP is OK), but Shift-JIS does have some benefits in some circumstances (such as making a character grid with one byte per character cell; if using Shift-JIS for a Pascal source code then you should use (* *) instead of { } for comments please).
> If they are misinterpreted, it is because the character encoding is not declared properly.
OR because the software is buggy, or making assumptions about encoding and not checking them (which also counts as "buggy", of course). You can declare the encoding all you like, it won't protect you against the stupid decisions that other people make in writing their software. (See Excel, for example).
Yes, if you declare your encoding properly, things should work. Most of the time. And if you're using any encoding that is not the worldwide default (which these days is UTF-8), then you definitely should declare the encoding. But you'll still occasionally hit badly-written software that doesn't even think about other encodings and doesn't handle them properly. The only defense against that situation, where you declare your encoding properly and it still doesn't work, is to just use the encoding that the software was written to expect, which is almost certainly the worldwide default.
Yikes. That would lose the ability to know the meaning of the current bytes, or misinterpret them badly, if you happen to get one critical byte dropped or mangled in transmission. At least UTF-8 is self-syncing: if you end up starting to read in the middle of a non-rewindable stream whose beginning has already passed, you can identify the start of the next valid codepoint sequence unambiguously, and then end up being able to sync up with the stream, and you're guaranteed not to have to read more than 4 bytes (6 bytes when UTF-8 was originally designed) in order to find a sync point.
But if you have to rely on a byte that may have already gone past? No way to pick up in the middle of a stream and know what went before.
No, we haven't. You can start at any byte in a UTF-8 document and resume reading coherent text. If you start reading from the middle of a multi code point sequence, then the first couple of glyphs may be wrong, for example you may see a lone skin tone modifier rendered as a beige blob where the author intended a smiley face with that skin tone. But these multi code point sequences are short, and the garbled text is bounded to the rest of the multi code point sequence. The entire rest of the document will be perfectly readable.
Compare this to missing a code page indicator. It will garble the whole section until the next code page indicator, often the whole rest of the document. The fact that you're even comparing these two situations as if they're the same is frankly ridiculous.
Unicode had support for language tag codepoints. They still exist but have long been deprecated. They were intended to deal with glyph variants, especially with regards to Han unification.
UTF-8 may still be a good choice for Japanese text, though.
For one thing, pure text is often not the only thing in the file. Markup is often present, and most markup syntaxes (such as HTML or XML) use characters from the ASCII range for the markup, so those characters are one byte (but would be two bytes in UTF-16). Back when the UTF-8 Everywhere manifesto (https://utf8everywhere.org/) was being written, they took the Japanese-language Wikipedia article on Japan, and compared the size of its HTML source between UTF-8 and UTF-16. (Scroll down to section 6 to see the results I'm about to cite). UTF-8 was 767 KB, UTF-16 was 1186 KB, a bit more than 50% larger than UTF-8. The space savings from the HTML markup outweighed the extra bytes from having a less-efficient encoding of Japanese text. Then they did a copy-and-paste of just the Japanese text into a text file, to give UTF-16 the biggest win. There, the UTF-8 text was 222 KB while the UTF-16 encoding got it down to 176 KB, a 21% win for UTF-16 — but not the 50% win you would have expected from a naive comparison, because Japanese text still uses many characters from the ASCII set (space, punctuation...) and so there are still some single-byte UTF-8 characters in there. And once the files were compressed, both UTF-8 and UTF-16 were nearly the same size (83 KB vs 76 KB) which means there's little efficiency gain anyway if your content is being served over a gzip'ed connection.
So in theory, UTF-8 could be up to 50% larger than UTF-16 for Japanese, Chinese, or Korean text (or any of the other languages that fit into the higher part of the basic multilingual place). But in practice, even giving the UTF-16 text every possible advantage, they only saw a 20% improvement over UTF-8.
Which is not nearly enough to justify all the extra cost of suddenly not knowing what encoding your text file is in any more, not when we've finally reached the point of being able to open a text file and just know the encoding.
P.S. I didn't even mention the Shift JIS encoding, and there's a reason I didn't. I've never had to use it "for real", but I've read about it. No. No thank you. No. Shudder. I'm not knocking the cleverness of it, it was entirely necessary back when all you had was 8 bits to work with. But let me put it this way: it's not a coincidence that Japan invented a word (mojibake) to represent what happens when you see text interpreted in the wrong encoding. There were multiple variations of Shift JIS (and there was also EUC-JP just to throw extra confusion into the works), so Japanese people saw garbled text all the time as it moved from one computer running Windows, to an email server likely running Unix, to another computer running Windows... it was a big mess. It's also not a coincidence that (according to Wikipedia), 99.1% of Japanese websites (defined as "in the .jp domain") are encoded in UTF-8, while Shift JIS is used by only 1% (probably about 0.95% rounded up) of .jp websites.
So in practice, nearly everyone in Japan would rather have slightly less efficient encoding of text, but know for a fact that their text will be read correctly on the other end.
The point is, a lot of work went into making that happen. I.e., plain text as it is today is not some inherent property of computing. It is a binary protocol and displaying text through fonts is also not a trivial matter.
So my question is: what are we leaving on the table by over focusing on text? What about graphs and visual elements?
I was not very descriptive, but I was referring to the next layer up of building blocks. Instead of text, we could also express things in hybrid ways with text but also visual nodes that can carry more dense information. The usual response is that those things don't work with text-based tools, but that's my point. Text based tools needed invention and decades of refinement, and they're still not all that great.
I should have said "a text file with no byte-order mark". I would hope that Excel's CSV export, if it's writing UTF-16, is writing a byte-order mark first (though I don't have any Excel-exported CSVs lying around right now to check). The byte-order mark is necessary for UTF-16 since it has big-endian and little-endian variants, but unnecessary (and actually harmful in a few situations) for UTF-8. So naturally, if you assume something is UTF-8 but the first few bytes you encounter are FF FE or FE FF (both of which are illegal in UTF-8) then instead of throwing an error saying "Hey, that's illegal UTF-8, buddy!" you should just reparse in UTF-16 (and you now know the correct byte order to use). In fact, you should read four bytes just to make sure you're not seeing FF FE 00 00, because that would indicate a UTF-32LE document. (Which indicates an ambiguity in UTF-16, that UTF-8 doesn't have. A UTF-16 document that begins with a null byte is likely to be misinterpreted as UTF-32LE).
Before I go off on too much of a rabbit trail, I have two points I want to make:
1. Since UTF-8 should be the default assumption for any sensible software, a byte-order mark is not needed for UTF-8, but any non-UTF-8 encoding should use a byte-order mark. (And in fact needs a BOM, because both UTF-16 and UTF-32 have LE and BE variants).
2. Excel needs to fix its stupid CSV import/export defaults.
I can't tell what the argument is just from the slideshow. The main point appears to be that code pages, UTF-16, etc are all "plain text" but not really.
If that really was the argument, then it is, in 2026, obsolete; utf-8 is everywhere.
He has a YouTube channel, there's a talk on there.
He also discusses code pages etc.
I don't think the thesis is wrong. Eg when I think plain text I think ASCII, so we're already disagreeing about what 'plain text' is. His point isn't that we don't have a standard, it's that we've had multiple standards over what we think is the most basic of formats, with lots of hidden complications.
I read that article long time ago, and for me it's a hard disagree. A system as complex and quirky as Unicode can never be considered "plain", and even today it is common for many apps that something Unicode-related breaks. ASCII is still the only text system that will really work well everywhere, which I consider a must for calling something plain text.
And yes, ASCII means mostly limiting things to English but for many environments that's almost expected. I would even defend this not being a native English speaker myself.
Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!). Then you need to convert numbers to bytes, so aside from Unicode you also need an encoding. And there are multiple choices. So what would be "plain text" then? UTF-16? UTF-8? If so, with or without BOM? It can't be all of them. For something to really be "plain text" it has to be the same thing to everyone...
> Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!).
It is worse than that; you can also get different characters with the same code points, and also same code points and characters that should be different according to some uses, and also different code points and characters that should be same according to some uses, etc.
With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext.
Unencrypted information that may be input to an encryption operation. Note: Plain text is not a synonym for clear text. See clear text.
Intelligible data that has meaning and can be understood without the application of decryption.
I agree with you that Unicode is too complicated and messy, although it also shows that whether or not something is considered "plain" is itself too difficult.
Unicode has caused many problems (although it was common for m17n and i18n to be not working well before Unicode either). One problem is making some programs no longer 8-bit clean.
Unicode might be considered in two ways: (1) Unicode is an approximation of multiple other character sets, (2) All character sets are an encoding of a subset of Unicode. At best, if Unicode is used at all, it should be used as (1) (as a last resort), but it is too common for Unicode to be used as (2) (as a first resort), which is not good in my opinion.
(I mostly avoid Unicode in my software, although it is also often the case (and, in many (but not all) programs, should be the case) that it only cares about ASCII but does not prevent you from using any other character encodings that are compatible with ASCII.)
> ASCII is still the only text system that will really work well everywhere, which I consider a must for calling something plain text.
Yes, it does work well (almost) everywhere.
Supersets of ASCII are also common, including UTF-8, and PC character set, ISO 2022 (if ASCII is the initially selected G0 set, which it is in the ASN.1 Graphic string and General string types, as well in most terminal emulators), EUC-JP, etc. In these cases, ASCII will also usually work well.
However, as another comment mentions, and I agree with them, that if you mean "ASCII" then it is what you should say, rather than "plain text" which does not tell you what the character encoding is. That other comment says:
> Plain text is text intended to be interpreted as bytes that map simply to characters.
However, it is not always so clear and simple what "characters" is, depending on the character sets and what language you are writing. And then, there are also control characters, to be considered, so it is again not quite so "plain".
> And yes, ASCII means mostly limiting things to English but for many environments that's almost expected. I would even defend this not being a native English speaker myself.
In my opinion, it depends on the context and usage. One character set (regardless of which one it is) cannot be suitable for all purposes. However, for many purposes, ASCII is suitable (including C source code; you might put l10n in a separate file).
You should have proper m17n (in the contexts where it is appropriate, which is not necessarily all files), but Unicode is not a good way to do it.
There's also Plutus https://github.com/nickjj/plutus for income and expense tracking. Couldn't be happier. All I do now is export my bank's CSV files and import them into Plutus, a few minutes later my books are done after I align some of the categories. I've done 2 years of taxes with this now.
"ascii" for some means "anything textlike", so for example you may see roguelike game developers saying "nice ascii" in response to a screenshot full of CP437, Unicode, or text-like glyphs, all very much not ASCII. Some will get defensive when called out on this, claiming that CP437 is okay to call ascii because it's "extended ASCII" (nevermind the many different and conflicting extensions), or others point out that they do not have a better term for something textlike.
Tangent to article: text character based charts for statistics. Decades ago I had an education version of MINITAB that ran under DOS and did scatter diagrams and dotplots and box and whisker plots from text characters (you could use pure text, I think proper ASCII or you could set an option to use those DOS drawing characters). The idea was to encourage initial data exploration before launching on formal statistical tests.
Anyone know of a terminal program that can do proper dotplots?
Thanks to all posts above for engaging with my quest for minitab style text character dotplots!
Below is an example of what I'm on about (artisan construction in Mousepad) and apologies to anyone on a narrow screen where the text mode is going to get jumbled.
Plain text is great as far as it goes, but when it comes to structure you start from zero for every file. There’s always someone getting wistful about ad-hoc combinations of venerable Unix tools to process “plain text”, and that’s fine when you’re in an ad-hoc situation, but it’s no substitute for a well-specified format.
XML, JSON, YAML, RDF, EDN, LaTeX, OrgMode, Markdown... Plenty of plaintext, but structured information formats that are "yes, and". Yes, I can process them as lines of plain text, and I can do structured data transformations on them too, and there are clients (or readers) that know how to render them in WYSIWYG style.
If that’s our definition of “plain text”, sure. I would still rather our tools were more advanced, such that printable and non-printable formats were on a more equal footing, though. I always process structured formats through something that understands the structure, if I can, so I feel that the only benefit I regularly get out of formats being printable is that I have to use tools that only cope with printable formats. The argument starts getting a bit circular for me.
Yes, I thought of what you mentioned too, and in my opinion, DER is a better format, and it is a binary format rather than text.
(In my ideas of an operating system design, there is a structured binary format (similar to DER but different) used for most files and data, so that the tools (and the command shell) would be usable consistently with most of them; and if some need special handling, you can use other programs and functions to convert them and/or handle them in a way that can be interoperable.)
Hm, you made me think about non-printing characters as metadata, which is of course immediately lost on printing and therefore does not round trip between digital and printed versions.
Many nonprinting characters imply some directive; line break (hard-wrap the text here, but this is not a paragraph), page break (let the rest of the page be blank, start the next paragraph overleaf), EOL (file over, bye bye), nonbreaking space (keep these two words together, always, till death do them part).
This is out-of-band information spliced in-band (with the text corpus), which a computer program can "see", but a person can't.
XML arguably isn’t plain text, but a binary format: If you add/change the encoding declaration on the first line, the remaining bytes will be interpreted differently. Unless you process it as a function of its declared (or auto-detected, see below) encoding, you have to treat it as a binary file.
In the absence of an encoding declaration, the encoding is in some cases detected automatically based on the first four bytes: https://www.w3.org/TR/xml/#sec-guessing-no-ext-info
Again, that means that XML is a binary format.
Another way that the character encoding could be declared is ISO 2022. When using ISO 2022, the declaration of UTF-8 is <1B 25 47>, rather than the <EF BB BF> that XML and some other formats use.
However, whether you do it that way or another way, I think that the encoding declaration should not be omitted unless it is purely ASCII in which case the encoding declaration should be omitted.
Obviously, the site is html with hyperlinks etc ... but it's basically implementing a text interface that I can click on.
In a cryptographic sense, html itself is plaintext since it is encoded as ascii/utf-8. However, the MIME type of text/plain is distinct from text/html to describe the html encoding of document style/structure information.
A terminal is often considered plaintext but in reality there are escape sequences to encode meta-information which is otherwise unreadable to most humans.
On the other end of the spectrum, there are images with a few words transcribed on a lot of social media platforms. To blur this line, recent(ish) mobile platforms recognize text embedded in images and make it selectable. Given this context, is text in an image without anything else considered plaintext?
I guess my real question is, "where do we draw the line" for what we call plaintext, decades after the initial implementation.
The article mentioned that the use of 'ASCII' within the context of those tools should not be seen as the limited character set ASCII. Personally, I would avoid mentioning ASCII at all.
The title just talks of plain text though, and plain text usually means UTF-8 encoded text these days. Plain, as in conventional, standardised, portable, and editable with any text editor. I would be surprised if someone talked about plain text as being limited to just ASCII.
This and just use it instead of a .pdf.. stay libre make it work for all platforms of all ages and all over the world...
no rtf nothing needs to be fancy
a canonical .txt resource, is parseable and universal. Don't restrict your audience. For example:
If a repo has a .txt backup of some changelog it's therefore not tied to whatever platform.. The repo is a living project and memory.. don't assume anything: ( ie if for some reason your a/c is gone... that information is not lost its saved forever... )
usually if I must reference a pdf it means a browser.. And often I don't want a browser running..
Text and text files are simple. I think this is their number #1 advantage.
There are limitations though. Compare a database of .yml files to a database in a DBMS. I wrote a custom forum via ruby + yaml files. It also works. It also can not compete anywhere with e. g. rails/activerecord and so forth. Its sole advantage is simplicity. Everywhere else it loses without even a fight.
Manually, with spaces. Ironically each actual space took 3 spaces, which HN truncated to one space removing any concept of words. Technically this is incorrect because a real post would be 0x00 and not 0x20, but my point was even ‘basic’ text can be problematic if you’re not aware of encoding types, and it can really fukin bite if you’re not ready for it.
It's fun to see a plaintext accounting view as the example...
I just switched from QuickBooks to Beancount+Fava for my sole proprietorship, and couldn't be happier. I've added a text-based simple invoice system, a text-based vehicle mileage tracker, and have validators that ensure that every expense with a tax status has a document attached to it.
It's far easier and faster to use than QuickBooks, I don't have to put up with ads, and with git + RFC3161 attestation of commits, I can prove I made additions when I said I made them, and there's no accidental erasures from lazy text edits, and it's a simple command to see exactly when each entry was made.
All based on plain text at the core, but I've now added Fava extensions so that I can do it all in the browser when I want. If there was a TUI fava with graphs, great, but the web isn't so bad either. Now, let's see what my accountant thinks of this...
This is really good inspiration for some of my plain text accounting projects! Could you please go into more detail about your RFC3161 attestation of commits? I'm assuming you're signing your commits with a gpg key to assert that it was in fact you who made the commit. Do you use an external timstamping service and an external ca authority, or do you build your own chains of trust? If you were asked to attest your accounting commits, what would that look like to the auditor?
The time stamp stuff is mostly a lark, because the bank statements and receipts are probably all that really matters, but it was fun!
I use freetsa.org and OpenSSL on the git commit hash to tie that commit to a particular point in time. I also added the Bitcoin based opentimestamps-client time stamping, but even fewer auditors would believe that it's of any value... Edit: I only timestamp after account reconciliation right now, and will do it when I close the books for a year. The files for attestation get attached to the commit with a git note, and get added to a directory for easier browsing. An LLM can write scripts for this, probably from just copying and pasting this comment as direction. I installed them as git subcommands.
Other CAs offer for-fee time stamp attestation, and I hear it may hold value in the EU, but here in the US it's only for fun, and for very small values of fun!
Thank you for taking the time to reply! Agreed, it does seem like any auditor would be able to verify all this information from other sources, but I really like the idea of having multiple independent levels of attestation for my organizations important financial and legal documents.
I love this kind of thing so much. One thing I try to keep in mind when doing my own simple file formats is knowing what I would have to do to convert it to a more common format, if I need to for any reason. Knowing I have an "escape plan" if I need to get them to a different format for someone makes me feel good about it.
In your case, I bet you could easily convert it to a CSV that your accountant could tolerate.
That escape plan was my entire motivation. When I first started using QuickBooks I did a data export and it seemed like maybe it had everything. After a year with a disastrous UI change, spammy ads everywhere throughout the UI and by email for a product a pay a ton of money, I started to investigate alternatives and realized that QuickBooks export missed most of the work I had been putting into my books (receipts, invoices, and reconciliation). And once I knew that I couldn't use it as a primary data store anymore.
Now that I control my data, all forms of transformation are almost trivial, as you say!
Thanks for mentioning Beancount; I didn't know about it before. I'm American but have a job in a different country, so I deal with two currencies routinely, and haven't found a good way of handling multiple currencies in Gnucash. So my wife and I have been keeping our records in text files. I'll look into what it would take to switch over to Beancount; I bet I could write a conversion script (or get an LLM to write most of it for me, even) that would do 95% of the work since we've been pretty consistent in our format, and throw up warning messages about any entries that it couldn't parse (we haven't necessarily been 100% consistent).
I just might join you in Beancount + Fava land.
Hey I'm working on a rust implementation of Beancount, if you get a chance check it out. https://rustledger.github.io .
I'm also hoping to put together a standards body that can formalize plain-text accounting standards too so others can more easily implement their own systems that conform to the standard.
One small warning about Beancount: v3 is a huge change from v2, and a lot of the documentation on the web is in v2 and it's not clearly demarcated that it doesn't apply anymore.
I'd recommend looking up information with an LLM, and not trusting web pages. This is the first time I've made such an unusual suggestion, but LLMs are actua really good at information retrieval and translation. The Beancount community has a bit of cleaning up to do before web searches can be trusted again.
That said, Fava is such a huge step up from the hledger web ui that it's totally worth it, especially since I want quick and easy attachment of documents and viewing of documents. It's super easy to convert between hledger/ledger and beancount formats, so it's fairly easy to switch to the other if you already have your data in such clear and clean formats.
> Fun to see a contemporary take on something that peaked between 1970s–1980s
Maybe that was the peak, but you had some very good TUIs in the early 1990's for DOS apps, where Windows hadn't quite completely taken over yet, but you very likely had a VGA-compatible graphics card and monitor, meaning you had a good, high-resolution, crisp and configurable-font text mode available, and also likely had a mouse. This is the stuff I grew up with: QBASIC and EDIT.COM for example. Bisqwit has a cool video about how some apps from that era could have a proper mouse cursor, even: https://www.youtube.com/watch?v=7nlNQcKsj74
The peak of TUIs is now. Take a look at Omarchy, an entire operating system built around terminals and config files, it's nirvana. I can only imagine how much farther down this road things may go as we enter a world where the primary interface is conversation with the machine in text. I'm sure I'll get downvoted for that last part because Reddit -- (cough) I mean Hacker News - hates AI, but I'm genuinely excited for the future.
I hope it’s inevitable. Most users’ computing workloads by far are text oriented. The terminal is capable of flexbox now. Current GUIs create massive complexity and power draw relative to their value. Over a long enough arc, economic inefficiency is doomed.
> The terminal is capable of flexbox now.
You mean like https://silvery.dev/examples/layout.html ? This is definitely not a UI development paradigm I would have expected to see.
That’s the tip of a conceivable iceberg but exactly. Also look at kitty graphics protocol.
Look at the amount of engineering resources we pour into OS GUI toolkits and then browsers. Those layers of complexity aren’t there because we stood back and said, “given what we know in 2026 how should we design a GUI compositor?”. The majority of the stack is written how it is by archeological happenstance. One generation adds on top of the prior since the 60s.
I’d say start from the terminal, fix the rendering limitations that drove the split from terminal and then to the browser. If we pin down efficient GUI, we could have machines that cover non graphics workloads which is the vast majority with solar and the equivalent of a 6502.
The amount of energy wasted on modern stacks relative to the tasks being delivered is incalculable.
I 100% agree, and this isna big reason why I find the current state of education so suboptimal. Everyone just goes on to do webdev, completely ignoring the lower levels and taking it all for granted. The thing is, there's no real innovation to be done that high up the stack. When you're that high you mostly just write glue code to stick parts someone else wrote together. Real innovation comes from quite a few levels down the stack, starting at the native code level downwards.
Like you pointed out, the current stack is heavily unoptimized and has a terrible architecture; it's only the way it is because of happenstance and tides of the market (companies always reaching for faster over better). An actual "nirvana" in computing like the other guy said would require bulldozing a good chunk of our current stack, keeping only kernels and core utilities, if even.
I really wish we had a bigger focus on getting good foundation instead of making yet another JS framework and SaaS, but then again, who's paying developers to actually do something of quality nowadays?
Comments like this is proof that the old-school hacker spirit is alive and kicking. This kind of pride in efficient and artful use of computers is needed more than ever.
But why?
You easily have 4k pixels, why use a tiny subset of those in a very inefficient way? We have proper hardware to make a bunch of these computations actually fast, and yet we should stuck with drawing relatively expensive text everywhere?
If you only care about the UX of TUIs, that I can stand behind (though mostly as a guideline, it doesn't fit every workflow), but you can do that with a proper GUI just as well.
> If you only care about the UX of TUIs, that I can stand behind
This is a confusing concession. Of course we love TUIs because of the UX, what other reason is there?
Constraint breeds consistency and consistency breeds coherence.
Take 1,000 random TUI designers and 1,000 random GUI designers and plot the variations between them (use any method you like)—the TUI designers will be more tightly clustered together because the TUI interface constrains what's reasonable.
Yes of course you CAN recreate TUI-like UX in a GUI, that's not the issue. People don't. In a TUI they must. I like that UX and like that if I seek out a TUI for whatever thing I want to do, I'm highly likely to find a UX that I enjoy. Whereas with GUIs it's a crapshoot. That's it.
> the TUI designers will be more tightly clustered together because the TUI interface constrains what's reasonable.
It constrains what’s possible, not what’s reasonable. For example, one could typically fit more text on a screen by compressing it, but most of the time, that’s not the reasonable thing to do.
I’m saying most of the time because of the existence of English Braille (https://en.wikipedia.org/wiki/English_Braille#System) which uses a compression scheme to compress frequently used words and character sequences such as ‘and’ and ‘ing’ shows that, if there is enough pressure to keep texts short, humans are willing to learn fairly idiosyncratic text compression schemes.
colorforth (https://en.wikipedia.org/wiki/ColorForth) is another, way less popular example. It uses color to shorten program source code.
One could also argue Unix, which uses a widely inconsistent ad-hoc compression scheme, writing “move” as “mv”, “copy” as “cp” or “cpy” (as in “strcpy”), etc. also shows that, but I think that would be a weaker argument.
> It constrains what’s possible, not what’s reasonable.
Why do you say "constrains what’s possible, not what’s reasonable", as though it's one and not the other? Does possibility conflict with reasonability? I would think it's not an either/or, it's a both/and.
The set of reasonable things is bounded by the set of possible things. So if the constraints of TUI design make certain things impossible, surely they make those same things unreasonable at the same time.
Try a 300 baud modem for a few months and good money says something terribly modern like Get-MrParameterCount would get compressed, a lot. Here's Bill Joy on the topic:
> No. It took a long time. It was really hard to do because you've got to remember that I was trying to make it usable over a 300 baud modem. That's also the reason you have all these funny commands. It just barely worked to use a screen editor over a modem. It was just barely fast enough. A 1200 baud modem was an upgrade. 1200 baud now is pretty slow. — "Bill Joy's greatest gift to man – the vi editor". The Register. 2003.
Come on, my previous phone had more bandwidth via a goddamn satellite, sending emergency infos.
> Constraint breeds consistency and consistency breeds coherence.
In principle I would agree, but there are plenty of bad citizens among TUIs, it's absolutely not true that you can just start using one.
The same way there are excellent GUI applications like blender or intellij.
I'm sorry, excellent GUI with Blender? With the 2.5 interface things were ass backwards but you had a bunch of stuff you could do with only the mouse. With the 2.8 interface suddenly a bunch of stuff was hidden behind arcane key combinations, options disabled by default, and the loss of important visual data like the bounding box view and having both the UV and cursor coordinates in the same tab in the UV/image editor. No matter what the controls are different with every sub-window type, and interface panels flip from top to bottom and left to right for best readability without thought spared for consistency. There's a reason why someone can learn FL Studio in a few weeks, but take months or even over a year to become competent in Blender. I love it's jank and have been using it for eleven years, but I would never call the UI more than serviceable.
The gap between vi and emacs is larger than that of any GUI program I use as regularly as I use either of those.
The UX is the point.
When you are "drawing text everywhere", you end up not having to draw all that much text. 3d models have more and more polygons as graphics cards improve, but the 80x24 standard persists for terminals (and UX is better for it). And I'm not even that convinced of "relatively expensive". Grokking UTF-8 and finding grapheme cluster boundaries has a lot of business logic, but it isn't really that hard. And unless you're dealing with Indic or Arabic scripts that defy a reasonable monospace presentation, you can just cache the composed glyphs.
I'm curious: Do you have a nice set of GUI applications that come with the UX you'd expect of TUIs?
(I'm not actually sure what the UX of TUIs is I love so much. Relative simplicity / focus on core features? Uff, notepad wins this one on vim. Fast startup times? I use gomuks, that takes a minute for the initial sync. No mouse? Moving around in TUI text editors with hjkl is slow. I either jump where I want to go with search or use the mouse. Lightness over SSH/network is the only thing I can't come up with a counterexample for.)
Blender? There you have to use a mouse because you have a much much bigger state space to control.
Also, Intellij is perhaps a better example. You can fully control it via only the keyboard, yet no amount of plugins would turn (neo)vim into something as capable as it is. And it makes good use of the extra pixels - human can take in much more information than a text grid.
You could double or quadruple the number of pixels, and it wouldn't make any difference in how much information humans comprehend easily. You would be using more computing power and more memory to deliver the same amount of useful information less efficiently.
A "proper GUI" is rarely better than a well-designed TUI for communicating textual information, IMO. And the TUI constraints keep the failure-states for badly-designed UI tightly bound, unlike GUI constraints.
What about a map, or an image? We can surely agree that humans can take in a lot more information than a readable letter-grid allows, depending on the type of information.
Sure, of course sometimes an image conveys things better than a thousand words. But a very large percentage of what most people do with computers is primarily text, with more images in ads than useful content. By and large GUIs don't use images to convey information better, they just make text worse.
Modern terminal software supports displaying images, for what it's worth.
> Modern terminal software supports displaying images, for what it's worth.
In a worse, and dramatically overcomplicated way. Like it's kind of funny that largely the same people that is all for this supposed ultra minimalism would be celebrating a Rube Goldberg way of doing graphical interfaces? (Because in the end it is a graphical interface).
One of TUI advantages over GUIs (including modern web sites) - all text can be selected/copied (you may need to use modifies in some TUI). It's a bit frustrating when GUI shows text but I cannot select and copy it.
That's a very good point. I hadn't thought about that aspect before.
Is that always beneficial? Do you ever want to select the text of a confirm button?
What if it just popped on top in a dialog to the content you were about to select?
It's not only about buttons. A web-app of trading platform I use doesn't allow to copy-paste a fund name (both in web and in the mobile app). I don't think they disallow this intentionally, likely an artefact of GUI framework they use.
What's behind this new obsession with TUIs/CLIs anyway? You always had people obsessed with i3 and vim etc but this is something different.
It’s functionally focused and because most apps are web based now, and TUIs are generally local, it makes them seem relatively very fast.
I think part of it is Visual Studio Code doing most IDE things very well, creating a market niche for terminal tooling that handles the rest.
Certainly part of it is also people of my generation being nostalgic for the TUIs of DOS file managers and editors.
Get used to it, because with LLMs they're here to stay forever. (Bash will possibly be fossilized forever now, like the Latin alphabet.)
Nah. Folks will be building their own shells, I'm already half the way there.
I was confused about why your comment was being downvoted; it sounded like an honest opinion... Until I got to the last sentence. You wrote a self-fulfilling prophecy.
Apart from being wayland and a more modern look, why are you excited about omarchy and AI and you weren't with i3?
Plain text coupled with non-deterministic interfaces (AI) is not great. It’s like a hybrid: some of the best of old school tech coupled with the most sketchy high tech.
I will now get to have Kafkaesque conversations with computers in MarkDown.
We had "opinionated" TUIs with emacs, and Omarchy will never surpass emacs' ease, shallow learning curve, and configurability. Emacs is the operating system of the future, and you can already integrate AI with it. It provides everything you need or want or don't know you want except a decent text editor.
I used emacs for years, switched to neovim last month. It's just too old and crufty for modern use, and modernizing frameworks like Doom / Spacemacs only add to the complexity. Nice modal workflows just go against emacs' nature.
> It provides everything you need or want or don't know you want except a decent text editor.
That's what evil is for :P
I'm imagining a TTY-like interface that you can simultaneously type into, speak and gesture at, and whatever else I'm not thinking of (maybe with the "shell" creating a list of suggestions/ anticipating future tasks in the background based on voice input?) Doubt it would be at all practical, if only because the keyboard as a primary input device might not be as much of a thing when you can generate most code/text, but kinda fun to think about.
Not just TUIs the whole stack is converging back to text. I run ~15 personal tools and every one that survived past the first month stores data as JSON/markdown in git repos.
Text in git gives you versioning, sync, grep, and you can hand the whole thing to an LLM with zero serialization. It's perfect for me.
Oh, the irony, I go to the website and the only thing it shows is a video to watch.
I always liked Borland's code editor (would you call it an IDE?) from that era. The one that you used in Turbo-C, Turbo-Pascal, etc.
Text-mode versions of Wordperfect, Wordstar, and Lotus 1-2-3 were pretty good too.
Well, everything text-based is somehow calming, no need to touch mouse (and have a carpal tunnel).
Well, everything text-based is somehow calming, no need to touch mouse (and have a carpal tunnel).
Carpal tunnel syndrome isn't exclusive to using a mouse. I dated a woman who had a severe case of it from using a mainframe terminal with no mouse.
Mainframe terminals were nasty, I remember them. Similar high keys to recent keyboards that are so liked by gamers… but I get it. My dentist had a carpal tunnel that needed surgical procedure.
If I don’t use mouse all day my hands are pretty much all right.
Good mouse and keyboard is a key to productivity.
Everytime I see someone using touchpad I don't get it. My fingers would be on fire in 10 minutes. People don't know what they're getting themselves into.
Norton/Midnight Commander might be close to peak TUI for me.
Couldn't help riffing off on a tangent from the title (since the article is about diagramming tools)...
Dylan Beattie has a thought-provoking presentation for anyone who believes that "plain text" is a simple / solid substrate for computing: "There's no such thing as plain text" https://www.slideshare.net/slideshow/theres-no-such-thing-as... (you'll find many videos from different conferences)
Haven't watched the videos yet, but from the slides, it looks like part of the issue he was talking about was encodings (there's a slide illustrating UTF-16LE ve UTF-16BE, for example). Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again. It has a much larger set of valid characters, but if you receive a text file without knowing its encoding, you can just assume it's UTF-8 and have a 99.7% chance of being right.
FINALLY.
> Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again.
Whenever I hear this, I hear "all text files should be 50% larger for no reason".
UTF-8 is pretty similar to the old code page system.
Hm? UTF-8 encodes all of ASCII with one byte per character, and is pretty efficient for everything else. I think the only advantage UTF-16 has over UTF-8 is that some ranges (such as Han characters I believe?) are often 3 bytes of UTF-8 while they're 2 bytes of UTF-16. Is that your use case? Seems weird to describe that as "all text files" though?
UTF-8 encodes European glyphs in two bytes and oriental glyphs in three bytes. This is due to the assumption that you're not going to be using oriental glyphs. If you are going to use them, UTF-8 is a very poor choice.
UTF-8 does not encode "European glyphs" in two bytes, no. Most European languages use variations of the latin alphabet, meaning most glyphs in European languages use the 1-byte ASCII subset of UTF-8. The occasional non-ASCII glyph becomes two bytes, that's correct, but that's a much smaller bloat than what you imply.
Anyway, what are you comparing it to, what is your preferred alternative? Do you prefer using code pages so that the bytes in a file have no meaning unless you also supply code page information and you can't mix languages in a text file? Or do you prefer using UTF-16, where all of ASCII is 2 bytes per character but you get a marginal benefit for Han texts?
> Do you prefer using code pages so that the bytes in a file have no meaning unless you also supply code page information?
Yes. Note that this is already how Unicode is supposed to work. See e.g. https://en.wikipedia.org/wiki/Byte_order_mark .
A file isn't meaningful unless you know how to interpret it; that will always be true. Assuming that all files must be in a preexisting format defeats the purpose of having file formats.
> Most European languages use variations of the latin alphabet
If you want to interpret "variations of Latin" really, really loosely, that's true.
Cyrillic and Greek characters get two bytes, even when they are by definition identical to ASCII characters. This bloat is actually worse than the bloat you get by using UTF-8 for Japanese; Cyrillic and Greek will easily fit into one byte.
UTF-8 does not require a byte order mark. The byte order mark is a technical necessity born from UTF-16 and a desire to store UTF-16 in a machine's native endianness.
The byte order mark has has no relation to code pages.
I don't think you know what you're talking about and I do not think further engagement with you is fruitful. Bye.
EDIT: okay since you edited your comment to add the part about Greek and cryllic after I responded, I'll respond to that too. Notice how I did not say "all European languages". Norwegian, Swedish, French, Danish, Spanish, German, English, Polish, Italian, and many other European languages have writing systems where typical texts are "mostly ASCII with a few special symbols and diacritics here and there". Yes, Greek and cryllic are exceptions. That does not invalidate my point.
As someone who has been using Cyrillic writing all my life, I've never noticed this bloat you're speaking of, honestly...
Maybe if you're one of those AI behemots who works with exabytes of training data, it would make some sense to compress it down by less than 50% (since we're using lots of Latin terms and acronyms and punctuation marks which all fit in one byte in UTF-8).
On the web and in other kinds of daily text processing, one poorly compressed image or one JavaScript-heavy webshite obliterates all "savings" you would have had in that week by encoding text in something more efficient.
It's the same with databases. I've never seen anyone pick anything other than UTF-8 in the last 10 years at least, even though 99% of what we store there is in Cyrillic. I sometimes run into old databases, which are usually Oracle, that were set up in the 90s and never really upgraded. The data is in some weird encoding that you haven't heard of for decades, and it's always a pain to integrate with them.
I remember the days of codepages. Seeing broken text was the norm. Technically advanced users would quickly learn to guess the correct text encoding by the shapes of glyphs we would see when opening a file. Do not want.
> A file isn't meaningful unless you know how to interpret it; that will always be true.
There are multiple levels of meaning, though; character encoding is just one part of it. For example, a text file might be plain text, or HTML, or JSON, or a C source code, etc; a binary file might be DER, or IFF, or ZIP, etc; and then there will be e.g. what kind of data a JSON or DER or IFF contains and how that level of the data is interpreted, etc.
> Cyrillic and Greek characters get two bytes, even when they are by definition identical to ASCII characters.
Whether or not they are identical to ASCII characters depends on the character set and on other things, such as what they are being used for; the definition of "identical" is not so simple as you make it seem. Unicode defines them as not identical, which is appropriate for some uses but is wrong for other uses. (Unicode also defines some characters as identical even though in some uses it would be more appropriate to treat them as not identical, too. So, Unicode is both ways bad.)
> This bloat is actually worse than the bloat you get by using UTF-8 for Japanese; Cyrillic and Greek will easily fit into one byte.
I agree with that (although I think UTF-8 should not be used for Japanese either), but it isn't because of which characters are considered "identical" or not. There are problems with Unicode in general regardless of which encoding you use.
> ... (although I think UTF-8 should not be used for Japanese either) ...
The people putting up websites in Japanese disagree with you, it would seem. According to Wikipedia (in the Shift JIS article), as of March 2026 99% of websites in the .jp domain were in UTF-8, with only 1% being in Shift JIS.
Japan used to have two different encodings in common use, Shift JIS (usually used on Windows) and EUC-JP (more common on Unix servers). This resulted in characters being misinterpreted often enough that they coined the word mojibake to describe the phenomenon of text coming out completely garbled. These days, it seems Japanese website makers are more than happy to accept a slight inefficiency in encoding size, because what they gain from that is never having to see mojibake again.
If they are misinterpreted, it is because the character encoding is not declared properly.
I still sometimes see mojibake in Japanese web pages, but sometimes it works; if it works, it is because the character encoding is declared properly.
In my opinion, EUC-JP is a generally better encoding of JIS (especially in e.g. C source code, which should not use Shift-JIS but EUC-JP is OK), but Shift-JIS does have some benefits in some circumstances (such as making a character grid with one byte per character cell; if using Shift-JIS for a Pascal source code then you should use (* *) instead of { } for comments please).
> If they are misinterpreted, it is because the character encoding is not declared properly.
OR because the software is buggy, or making assumptions about encoding and not checking them (which also counts as "buggy", of course). You can declare the encoding all you like, it won't protect you against the stupid decisions that other people make in writing their software. (See Excel, for example).
Yes, if you declare your encoding properly, things should work. Most of the time. And if you're using any encoding that is not the worldwide default (which these days is UTF-8), then you definitely should declare the encoding. But you'll still occasionally hit badly-written software that doesn't even think about other encodings and doesn't handle them properly. The only defense against that situation, where you declare your encoding properly and it still doesn't work, is to just use the encoding that the software was written to expect, which is almost certainly the worldwide default.
Unicode could have just been encoded statefuly with a "current code page" mark byte.
With UTF and emojis we can't have random access to characters anyways, so why not go the whole way?
A huge, central, part of UTF-8 design is that you can start decoding it from any arbitrary offset, it is self-aligning.
Yikes. That would lose the ability to know the meaning of the current bytes, or misinterpret them badly, if you happen to get one critical byte dropped or mangled in transmission. At least UTF-8 is self-syncing: if you end up starting to read in the middle of a non-rewindable stream whose beginning has already passed, you can identify the start of the next valid codepoint sequence unambiguously, and then end up being able to sync up with the stream, and you're guaranteed not to have to read more than 4 bytes (6 bytes when UTF-8 was originally designed) in order to find a sync point.
But if you have to rely on a byte that may have already gone past? No way to pick up in the middle of a stream and know what went before.
We've already lost all that with emojis and other characters in supplementary planes.
No, we haven't. You can start at any byte in a UTF-8 document and resume reading coherent text. If you start reading from the middle of a multi code point sequence, then the first couple of glyphs may be wrong, for example you may see a lone skin tone modifier rendered as a beige blob where the author intended a smiley face with that skin tone. But these multi code point sequences are short, and the garbled text is bounded to the rest of the multi code point sequence. The entire rest of the document will be perfectly readable.
Compare this to missing a code page indicator. It will garble the whole section until the next code page indicator, often the whole rest of the document. The fact that you're even comparing these two situations as if they're the same is frankly ridiculous.
Unicode had support for language tag codepoints. They still exist but have long been deprecated. They were intended to deal with glyph variants, especially with regards to Han unification.
UTF-8 may still be a good choice for Japanese text, though.
For one thing, pure text is often not the only thing in the file. Markup is often present, and most markup syntaxes (such as HTML or XML) use characters from the ASCII range for the markup, so those characters are one byte (but would be two bytes in UTF-16). Back when the UTF-8 Everywhere manifesto (https://utf8everywhere.org/) was being written, they took the Japanese-language Wikipedia article on Japan, and compared the size of its HTML source between UTF-8 and UTF-16. (Scroll down to section 6 to see the results I'm about to cite). UTF-8 was 767 KB, UTF-16 was 1186 KB, a bit more than 50% larger than UTF-8. The space savings from the HTML markup outweighed the extra bytes from having a less-efficient encoding of Japanese text. Then they did a copy-and-paste of just the Japanese text into a text file, to give UTF-16 the biggest win. There, the UTF-8 text was 222 KB while the UTF-16 encoding got it down to 176 KB, a 21% win for UTF-16 — but not the 50% win you would have expected from a naive comparison, because Japanese text still uses many characters from the ASCII set (space, punctuation...) and so there are still some single-byte UTF-8 characters in there. And once the files were compressed, both UTF-8 and UTF-16 were nearly the same size (83 KB vs 76 KB) which means there's little efficiency gain anyway if your content is being served over a gzip'ed connection.
So in theory, UTF-8 could be up to 50% larger than UTF-16 for Japanese, Chinese, or Korean text (or any of the other languages that fit into the higher part of the basic multilingual place). But in practice, even giving the UTF-16 text every possible advantage, they only saw a 20% improvement over UTF-8.
Which is not nearly enough to justify all the extra cost of suddenly not knowing what encoding your text file is in any more, not when we've finally reached the point of being able to open a text file and just know the encoding.
P.S. I didn't even mention the Shift JIS encoding, and there's a reason I didn't. I've never had to use it "for real", but I've read about it. No. No thank you. No. Shudder. I'm not knocking the cleverness of it, it was entirely necessary back when all you had was 8 bits to work with. But let me put it this way: it's not a coincidence that Japan invented a word (mojibake) to represent what happens when you see text interpreted in the wrong encoding. There were multiple variations of Shift JIS (and there was also EUC-JP just to throw extra confusion into the works), so Japanese people saw garbled text all the time as it moved from one computer running Windows, to an email server likely running Unix, to another computer running Windows... it was a big mess. It's also not a coincidence that (according to Wikipedia), 99.1% of Japanese websites (defined as "in the .jp domain") are encoded in UTF-8, while Shift JIS is used by only 1% (probably about 0.95% rounded up) of .jp websites.
So in practice, nearly everyone in Japan would rather have slightly less efficient encoding of text, but know for a fact that their text will be read correctly on the other end.
vaxocentrism, or “All the World’s a VAX”
http://www.catb.org/esr/jargon/html/V/vaxocentrism.html
The point is, a lot of work went into making that happen. I.e., plain text as it is today is not some inherent property of computing. It is a binary protocol and displaying text through fonts is also not a trivial matter.
So my question is: what are we leaving on the table by over focusing on text? What about graphs and visual elements?
And what do we gain by leaving things on the table?
TUIs can include these, see the kitty graphics protocol, implemented by most if not all modern terminals.
https://sw.kovidgoyal.net/kitty/graphics-protocol/
I was not very descriptive, but I was referring to the next layer up of building blocks. Instead of text, we could also express things in hybrid ways with text but also visual nodes that can carry more dense information. The usual response is that those things don't work with text-based tools, but that's my point. Text based tools needed invention and decades of refinement, and they're still not all that great.
Until you hit a CSV exported by Excel
I should have said "a text file with no byte-order mark". I would hope that Excel's CSV export, if it's writing UTF-16, is writing a byte-order mark first (though I don't have any Excel-exported CSVs lying around right now to check). The byte-order mark is necessary for UTF-16 since it has big-endian and little-endian variants, but unnecessary (and actually harmful in a few situations) for UTF-8. So naturally, if you assume something is UTF-8 but the first few bytes you encounter are FF FE or FE FF (both of which are illegal in UTF-8) then instead of throwing an error saying "Hey, that's illegal UTF-8, buddy!" you should just reparse in UTF-16 (and you now know the correct byte order to use). In fact, you should read four bytes just to make sure you're not seeing FF FE 00 00, because that would indicate a UTF-32LE document. (Which indicates an ambiguity in UTF-16, that UTF-8 doesn't have. A UTF-16 document that begins with a null byte is likely to be misinterpreted as UTF-32LE).
Before I go off on too much of a rabbit trail, I have two points I want to make:
1. Since UTF-8 should be the default assumption for any sensible software, a byte-order mark is not needed for UTF-8, but any non-UTF-8 encoding should use a byte-order mark. (And in fact needs a BOM, because both UTF-16 and UTF-32 have LE and BE variants).
2. Excel needs to fix its stupid CSV import/export defaults.
Another classic from Microsoft -the Language Server Protocol is UTF-16. Could be paying that price for the rest of time.
I can't tell what the argument is just from the slideshow. The main point appears to be that code pages, UTF-16, etc are all "plain text" but not really.
If that really was the argument, then it is, in 2026, obsolete; utf-8 is everywhere.
He has a YouTube channel, there's a talk on there.
He also discusses code pages etc.
I don't think the thesis is wrong. Eg when I think plain text I think ASCII, so we're already disagreeing about what 'plain text' is. His point isn't that we don't have a standard, it's that we've had multiple standards over what we think is the most basic of formats, with lots of hidden complications.
I read that article long time ago, and for me it's a hard disagree. A system as complex and quirky as Unicode can never be considered "plain", and even today it is common for many apps that something Unicode-related breaks. ASCII is still the only text system that will really work well everywhere, which I consider a must for calling something plain text.
And yes, ASCII means mostly limiting things to English but for many environments that's almost expected. I would even defend this not being a native English speaker myself.
I feel like that isn’t exactly a very useful definition of plaintext. If you mean “ASCII” say ASCII.
Plain text is text intended to be interpreted as bytes that map simply to characters. Complexity is irrelevant.
Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!). Then you need to convert numbers to bytes, so aside from Unicode you also need an encoding. And there are multiple choices. So what would be "plain text" then? UTF-16? UTF-8? If so, with or without BOM? It can't be all of them. For something to really be "plain text" it has to be the same thing to everyone...
> Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!).
It is worse than that; you can also get different characters with the same code points, and also same code points and characters that should be different according to some uses, and also different code points and characters that should be same according to some uses, etc.
https://en.wikipedia.org/wiki/Plaintext
https://csrc.nist.gov/glossary/term/plaintext
I agree with you that Unicode is too complicated and messy, although it also shows that whether or not something is considered "plain" is itself too difficult.
Unicode has caused many problems (although it was common for m17n and i18n to be not working well before Unicode either). One problem is making some programs no longer 8-bit clean.
Unicode might be considered in two ways: (1) Unicode is an approximation of multiple other character sets, (2) All character sets are an encoding of a subset of Unicode. At best, if Unicode is used at all, it should be used as (1) (as a last resort), but it is too common for Unicode to be used as (2) (as a first resort), which is not good in my opinion.
(I mostly avoid Unicode in my software, although it is also often the case (and, in many (but not all) programs, should be the case) that it only cares about ASCII but does not prevent you from using any other character encodings that are compatible with ASCII.)
> ASCII is still the only text system that will really work well everywhere, which I consider a must for calling something plain text.
Yes, it does work well (almost) everywhere.
Supersets of ASCII are also common, including UTF-8, and PC character set, ISO 2022 (if ASCII is the initially selected G0 set, which it is in the ASN.1 Graphic string and General string types, as well in most terminal emulators), EUC-JP, etc. In these cases, ASCII will also usually work well.
However, as another comment mentions, and I agree with them, that if you mean "ASCII" then it is what you should say, rather than "plain text" which does not tell you what the character encoding is. That other comment says:
> Plain text is text intended to be interpreted as bytes that map simply to characters.
However, it is not always so clear and simple what "characters" is, depending on the character sets and what language you are writing. And then, there are also control characters, to be considered, so it is again not quite so "plain".
> And yes, ASCII means mostly limiting things to English but for many environments that's almost expected. I would even defend this not being a native English speaker myself.
In my opinion, it depends on the context and usage. One character set (regardless of which one it is) cannot be suitable for all purposes. However, for many purposes, ASCII is suitable (including C source code; you might put l10n in a separate file).
You should have proper m17n (in the contexts where it is appropriate, which is not necessarily all files), but Unicode is not a good way to do it.
Nice. I've used the phrase before, with the vague notion that a proper talk must already exist.
Plain text is great.
20+ years of my notes are in plain text, facilitated with https://github.com/nickjj/notes.
Also been doing plain text invoicing for around 7 years with https://github.com/nickjj/invoice.
There's also Plutus https://github.com/nickjj/plutus for income and expense tracking. Couldn't be happier. All I do now is export my bank's CSV files and import them into Plutus, a few minutes later my books are done after I align some of the categories. I've done 2 years of taxes with this now.
Text is Lindy. It has withstood the test of time and it's as ubiquitous as SQL or TCP/IP.
Reminds me of this decade old post (and discussion) by Graydon Hoare, "Always bet on text".
[1]: https://news.ycombinator.com/item?id=8451271
[2]: https://graydon2.dreamwidth.org/193447.html
The list at the top could be longer:
- https://asciiflow.com/
- https://asciidraw.github.io/
Anybody know more?
D2 https://d2lang.com/ added beta support for ASCII & Unicode output last year.
That would be interesting. I like D2 though the lack of control over the layout is a bit frustrating sometimes.
https://monosketch.io
I have a few more on my site under the bookmarks page. Link in bio.
https://xosh.org/text-to-diagram a list of lots of tools
how about a unicode art tool?
https://electroglyph.github.io/atheriz_draw/
https://github.com/TheoKVA/ascii-box-editor
A visual editor of UTF-8 BOX DRAWING characters, contrary to "ascii" in the name.
No server, no installation: browser-side Javascript only.
"ascii" for some means "anything textlike", so for example you may see roguelike game developers saying "nice ascii" in response to a screenshot full of CP437, Unicode, or text-like glyphs, all very much not ASCII. Some will get defensive when called out on this, claiming that CP437 is okay to call ascii because it's "extended ASCII" (nevermind the many different and conflicting extensions), or others point out that they do not have a better term for something textlike.
Tangent to article: text character based charts for statistics. Decades ago I had an education version of MINITAB that ran under DOS and did scatter diagrams and dotplots and box and whisker plots from text characters (you could use pure text, I think proper ASCII or you could set an option to use those DOS drawing characters). The idea was to encourage initial data exploration before launching on formal statistical tests.
Anyone know of a terminal program that can do proper dotplots?
Gnu plot dumb terminal mode?
That’s possible as well. I wish common terminals (the kind that is shipped with the OS) would do ReGIS, Tektronix, or even sixel (yuck!).
Thanks to all posts above for engaging with my quest for minitab style text character dotplots! Below is an example of what I'm on about (artisan construction in Mousepad) and apologies to anyone on a narrow screen where the text mode is going to get jumbled.
The example is typed out from
https://support.minitab.com/en-us/minitab/help-and-how-to/gr...
The `plotrix` package for R looks hopeful (mentioned on one of the links kindly provided above) as it includes a 'minitab style dotplot' function.
This stack overflow thread had a pretty good list of terminal plotting tools:
https://stackoverflow.com/questions/123378/command-line-unix...
gnuplot, feedgnuplot, eplot, asciichart, bashplotlib, ervy, ttyplot, youplot, visidata
And there's a lovely ASCII plot in the AWK book: https://dn790008.ca.archive.org/0/items/pdfy-MgN0H1joIoDVoIC...
Also: M-x artist-mode in emacs.
Plain text is great as far as it goes, but when it comes to structure you start from zero for every file. There’s always someone getting wistful about ad-hoc combinations of venerable Unix tools to process “plain text”, and that’s fine when you’re in an ad-hoc situation, but it’s no substitute for a well-specified format.
XML, JSON, YAML, RDF, EDN, LaTeX, OrgMode, Markdown... Plenty of plaintext, but structured information formats that are "yes, and". Yes, I can process them as lines of plain text, and I can do structured data transformations on them too, and there are clients (or readers) that know how to render them in WYSIWYG style.
If that’s our definition of “plain text”, sure. I would still rather our tools were more advanced, such that printable and non-printable formats were on a more equal footing, though. I always process structured formats through something that understands the structure, if I can, so I feel that the only benefit I regularly get out of formats being printable is that I have to use tools that only cope with printable formats. The argument starts getting a bit circular for me.
Yes, I thought of what you mentioned too, and in my opinion, DER is a better format, and it is a binary format rather than text.
(In my ideas of an operating system design, there is a structured binary format (similar to DER but different) used for most files and data, so that the tools (and the command shell) would be usable consistently with most of them; and if some need special handling, you can use other programs and functions to convert them and/or handle them in a way that can be interoperable.)
Hm, you made me think about non-printing characters as metadata, which is of course immediately lost on printing and therefore does not round trip between digital and printed versions.
Many nonprinting characters imply some directive; line break (hard-wrap the text here, but this is not a paragraph), page break (let the rest of the page be blank, start the next paragraph overleaf), EOL (file over, bye bye), nonbreaking space (keep these two words together, always, till death do them part).
This is out-of-band information spliced in-band (with the text corpus), which a computer program can "see", but a person can't.
XML arguably isn’t plain text, but a binary format: If you add/change the encoding declaration on the first line, the remaining bytes will be interpreted differently. Unless you process it as a function of its declared (or auto-detected, see below) encoding, you have to treat it as a binary file.
In the absence of an encoding declaration, the encoding is in some cases detected automatically based on the first four bytes: https://www.w3.org/TR/xml/#sec-guessing-no-ext-info Again, that means that XML is a binary format.
Another way that the character encoding could be declared is ISO 2022. When using ISO 2022, the declaration of UTF-8 is <1B 25 47>, rather than the <EF BB BF> that XML and some other formats use.
However, whether you do it that way or another way, I think that the encoding declaration should not be omitted unless it is purely ASCII in which case the encoding declaration should be omitted.
Does the community here consider HN as plaintext?
Obviously, the site is html with hyperlinks etc ... but it's basically implementing a text interface that I can click on.
In a cryptographic sense, html itself is plaintext since it is encoded as ascii/utf-8. However, the MIME type of text/plain is distinct from text/html to describe the html encoding of document style/structure information.
A terminal is often considered plaintext but in reality there are escape sequences to encode meta-information which is otherwise unreadable to most humans.
On the other end of the spectrum, there are images with a few words transcribed on a lot of social media platforms. To blur this line, recent(ish) mobile platforms recognize text embedded in images and make it selectable. Given this context, is text in an image without anything else considered plaintext?
I guess my real question is, "where do we draw the line" for what we call plaintext, decades after the initial implementation.
HN is normal HTML, so I think that's fair to call 'plaintext'
I have a mixed opinion of unicode, but it's hard not to love the box-drawing / block-element chars.
The box-drawing characters pre-date unicode though:
https://en.wikipedia.org/wiki/Code_page_437
From the title, I was not expecting a bunch of extended ASCII characters.
The article mentioned that the use of 'ASCII' within the context of those tools should not be seen as the limited character set ASCII. Personally, I would avoid mentioning ASCII at all.
The title just talks of plain text though, and plain text usually means UTF-8 encoded text these days. Plain, as in conventional, standardised, portable, and editable with any text editor. I would be surprised if someone talked about plain text as being limited to just ASCII.
I would?
Would an emoji count as plain text?
What about right to left text? I have no idea how many editors handle that.
Plain text has been around for thousands of years, it just wasn't digital.
When you stop worrying about pixels and start focusing on the data structure, the friction between thought and execution just melts away
This and just use it instead of a .pdf.. stay libre make it work for all platforms of all ages and all over the world...
no rtf nothing needs to be fancy
a canonical .txt resource, is parseable and universal. Don't restrict your audience. For example:
If a repo has a .txt backup of some changelog it's therefore not tied to whatever platform.. The repo is a living project and memory.. don't assume anything: ( ie if for some reason your a/c is gone... that information is not lost its saved forever... )
usually if I must reference a pdf it means a browser.. And often I don't want a browser running..
I'm all for it, but it's dangerously mixing ASCII with the meaning of plain-text...
Well, we are computer nerds, not the rock drawers.
Are ASCII diagrams painful for screen reader users, though?
Unsung is one of the best little blogs around. Well worth checking out the rest of the posts.
How could you forget draw.io
But draw.io doesn't produce text "drawings", it's a graphical drawing editor. Not the same thing at all.
Text and text files are simple. I think this is their number #1 advantage.
There are limitations though. Compare a database of .yml files to a database in a DBMS. I wrote a custom forum via ruby + yaml files. It also works. It also can not compete anywhere with e. g. rails/activerecord and so forth. Its sole advantage is simplicity. Everywhere else it loses without even a fight.
All plaintext bullshit should be eradicated. Fucking useless as a medium when displaying and handling complex tasks.
Counterpoint: Adding in a custom, proprietary interpretation of your data makes your complex task more complicated.
Howso? :)
Plain text is great, but if you're holding a hammer ...
Plain text is the hammer. And the silver bullet.
It's good to see the plain text, it's been a while that people wanting them.
So many users wants the Special fonts but in here simple is Special to eyes and Mind.
As a developer I agree. Sometimes simplicity is more Special and powerful than complex formats.
* L a u g h s i n u t f 1 6 *
That’s cool ! How did you do that ?
Manually, with spaces. Ironically each actual space took 3 spaces, which HN truncated to one space removing any concept of words. Technically this is incorrect because a real post would be 0x00 and not 0x20, but my point was even ‘basic’ text can be problematic if you’re not aware of encoding types, and it can really fukin bite if you’re not ready for it.