I work on a complex desktop application, and it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names. If you do anything with subprocesses or path processing, it's absurdly easy to hit in a thousand different ways, over and over again.
Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
Forces you to deal with this properly, and immediately ensures that every automated test checks this case without you having to remember every time. Hasn't been particularly inconvenient, since I'm autocompleting it 99% of the time anyway, and I haven't shipped a single path parsing bug since.
Imagine if they made programmers put 64 bit DLLs in a "System32" directory and 32 bit DLLs in a "SysWoW64" directory. That would really keep 'em on their toes!
You should look into the behavior of the /windows/sysnative link. It appears and disappears depending on whether your process is running as 32 bit or 64 bit.
Maybe it's from back before Windows had a built-in TCP/IP stack? If it were a third-party/optional driver, having files related to it in a path under system32\drivers would make sense.
Back around Win 95 when they added networking it was based off of (IIRC) BSD's TCP stack and related tools. They were an optional 'third party' driver of sorts, but shipped by the first party. I'm not positive about WinNT or Win3.11 (for workgroups?)
You know, this makes me wonder.. tangentially speaking- I wonder how hard it would be to rearrange the folder structure in linux so that I have something like this:
Wow, thanks for the reply, nice find! I did some poking around on my Linux system and even re-arranging the home folder was a task of its own because the system kept trying to replace folders in their original places. I will do some digging in to Gobo and see how they're handling this. Thanks again for pointing this out.
Afaik there is an option to change it, but it is not advisable as that will break the binary cache and you are left with compiling everything yourself. This is due to a technical limitation in that different packages can contain paths everywhere and thus they are inherently part of the resulting hash, on which other packages can depend.
Since Live CDs/Flash drives were invented, I wouldn't worry about this stuff any longer. Certainly have your personal files in a centralized location and backed up first.
Probably the easiest way to experiment these days is to create a VM and make snapshot, then start knocking down walls, just to see when and where the house collapses. Then revert and try something new.
A couple of weeks after moving to UNIX from MSDOS, I thought I'd remove lots of unnecessary 'dot-directories' from the /tmp directory. I was root as I had no concept of being a 'normal user'.
At least that doesn't happen today anymore. From bash:
> When a pattern is used for pathname expansion, the character ``.'' at the start of a name or immediately following a slash must be matched explicitly, unless the shell option dotglob is set. The filenames ``.'' and ``..'' must always be matched explicitly, even if dotglob is set.
I did that on my NAS a few years ago. I had copied in a bunch of directories from a mac and they all had tons of dot files in each dir that were showing up on my windows machines. I popped open a terminal and did the exact same thing and wiped most of the NAS out =P Good thing I had it mirrored with my other synology.
GoboLinux symlinks everything into an FHS-ish structure under /System/Index/ so you still have a single place where binaries/libraries/includes/etc. live. (There are also symlinks from /usr/lib, /usr/bin, and others into /System/Index/ for compatibility with programs where those might be hardcoded.)
The only real holdouts are proc/sys/dev which are the kernel and mnt/media/opt/srv which are really for the user/sysadmin and aren't really used by the OS anymore.
Genuine question: on what systems is `/tmp` persistent? Both macOS and Ubuntu 20.04 clear `/tmp` on every reboot for me, and I haven't changed the defaults at all.
People don't reboot often. Persistent tmp basically means it will be cleared in an infrequent manner, so the likelihood of it going away 1s after you release your file handle is low.
It's not an oxymoron to have files which are temporary but not limited in scope to a single power cycle. For example, you could have a long-running process which you want to be able to resume if it's interrupted; /var/tmp would be an appropriate place for the state. The data is temporary because it will be deleted once the process is finished, but you wouldn't want it wiped out by a system reset. Generally /tmp is cleared at every reset, and is often a tmpfs mount, while files in /var/tmp are automatically cleaned up only when they reach a certain age.
Except that the FHS says that "data stored in /var/tmp is typically deleted in a site-specific manner", and as an application vendor you have no control over that site-specific clean frequency. On all my systems, /var/tmp is a symlink to /tmp and that has never caused any issue.
The FHS is not wrong; cleaning policies are indeed site-specific and files placed in any temp directory can in principle disappear at any time. (Though, in theory, it's not supposed to happen while the files are still "in use" by running programs.) Still, historically you could count on files in /var/tmp lasting longer than files in /tmp, including across reboots.
Nothing will immediately break because you linked /var/tmp to /tmp. Whether it causes issues depends on the programs that you (or your users) run and how they make use of /var/tmp. However, if someone did have to restart a long-running process from the beginning because recent state information in /var/tmp was not preserved across a reset, I would say that is a problem with the administration of the system and not the program that stored its state there.
Basically no one uses /var/tmp for anything (and nobody should either). World writable directories are a mistake and only continue to exist because apps assume they are available.
/tmp and friends are poorly named. They really should be /shared or /dmz or /freeforall or something.
* If you need service-specific tmp space use RuntimeDirectory or PrivateTmp if your app is hardcoded to /tmp.
* If you need service-specific persistent data that goes in /var/lib/your-app.
* If you need temp space for your user it's at /var/run/user/your-uid.
* If you need more than one user/service to share files but not everyone then god have mercy on your soul because all options are bad. There sure are a lot of them but none of them are at all satisfying.
Right, /var/tmp is the "Persistent Temp" directory, and /tmp is "Ephemeral Temp". The /run directory is for runtime data such as PID files, Unix sockets, named FIFOs, and generated systemd units—it has a specific internal structure and shouldn't be used as a direct alternative to the relatively unstructured /tmp directory. While both are generally ephemeral tmpfs mounts, only /tmp is writable to all users.
Why not? That's how proper English text is written. Of course there are many programs that can't handle it properly (or handles it inconveniently) so in practice it might be problematic at times, but otherwise I see nothing wrong with it.
Generally just because typing it out with tab completion in zsh sucks, and I don't see a good solution (if it was solved nicely it'd be solved already)
Yeah, except that tells me nothing useful... The question is exactly the same: So where do I install this random binary I downloaded from the internet or compiled myself? Is it /opt, /usr/bin, /usr/local/bin, or /bin? Where do I put the dependencies I compiled for this software - /usr/lib, /usr/local/lib, /lib, /opt/lib, /opt/<app name>/lib, or what?
If I am not writing all of my installation scripts by hand, because that would be really intense, then every folder gets filled with random bits of software.
Offering too many similar choices leads to mess. There's nothing fundamentally different between using one or more of these options and using the only option, except that in the second case there isn't any opportunity to make mess.
> You can do `configure --prefix=/Program\ Files/<app>` if you want.
Thanks for the tip! Can't do that with distro repo software though :-/
Use Gnu Stow to keep the random bits contained in their own app directory that is symlinked into the /usr/local tree. Then you can manage everything without leaving orphan files behind.
Okay, but what about ProgramData? I have enough programs that put their junk in there instead of Program Files, and others that make their own directories on the root of the drive (driver installers are really bad about this).
I think the best model I've seen for consistent binary locations is the 'Applications' folder in Mac OS X, but it fails as well by retaining the /usr/bin elsewhere.
When you download a portable app (just a bare .exe), do you make a folder for it and drop it in program files? (quite possible, you'd just be unusual) If not, why does Windows get a free pass?
But why are many Windows programs under C:\Windows\System32 then, if Windows has only a single model? Why aren't all Steam-provided (for example) games in a single location? Or, if they are, does Windows really have a single model?
Yes, the Linux/POSIX model is confusing, but the split is to segregate administrative domains:
- / and /usr are the domain of the distribution. As a user, you should never install there. The administrative group is root.
- /usr/local is the domain of the machine admin. If the machine is yours to manage, you can install software there. The administrative group is staff.
- /opt/$vendor is the domain of third-party vendors. Each vendor (like Steam, Eclipse, Arduino Studio) can get its own subdirectory and its own administrative user group.
How would you achieve the same on Windows? How do you make sure the Adobe updater can only install new versions of CS, but not surreptitiously install a new (free!) spyware package under C:\Windows? How would you allow certain power users to share one Google Chrome installation, allow each of them to update it, but not let them install additional software system-wide?
To add: when you install software yourself you choose this, when your install software from e.g. a distribution package it is chosen by the package maintainers, and to a larger extent the maintainers of the distribution.
This is one of the big advantages of using a pre-made advantages of using a ready-made Linux distribution: beyond the convenience of having an installer or easy to install packages, you get some assurance that the system as a whole has been thoughtfully put together.
Arch Linux for example symlinks /bin and /sbin to /usr/bin and /lib to /usr/lib among other things.
Is your account the only account that's expected to run the binary? If so, then `$HOME/bin` is a perfectly acceptable (albeit not standard) place to put it.
If you expect other users to be able to execute the program, then you should put it in either `/usr/bin` or `/usr/local/bin`, depending on whether the former is already being used by a package manager. `/opt` is generally for self-contained software that doesn't play nicely with the rest of the system, but might still be installable through the default package manager.
I don’t think there’s any “official” word on that (the XDG spec that defines ~/.local/share doesn’t mention ~/.local/{bin,lib} IIRC, and the traditional per-user entry in PATH seems to be ~/bin), but a fair number of people use it this way, yes, including me.
I started out using $HOME/bin, but a fair amount of stuff assumes a /usr- or /usr/local-style folder structure when doing make install, so I've settled on using $HOME/usr/bin instead, so that programs can create $HOME/usr/include and $HOME/usr/share and whatever, without trampling on stuff in my home folder.
Can't remember the last time I had a problem arranging this. If using autotools, which covers 95+% of stuff, it's usually a question of something like "./configure --prefix=$HOME/usr".
(If I want to share stuff between users, /usr/local/ is of course a better place. macOS is a bit more restrictive, so I have a separate user for this, whose /usr folder is readable by everybody.)
On freedesktop systems there's the ~/.local directory which is supposed to be a mirror of the file system hierarchy. Seems like a good place for bin, lib, include directories.
The standard is, indeed, excessively vague because it was written to let many existing implementations be conformant as is, though I’d say it’s still more helpful than many other standards with that deficiency. There’s a method to it, however:
- Things installed in /, if it’s different from /usr, are generally not to be touched;
- Things installed in /usr are under the distro’s purview or otherwise under a package manager, any modifications are on pain of confusing it;
- Things installed in /usr/local are under the admin’s purview and unmanaged one-offs, there are always some but overuse will lead to anarchy;
- Things installed in /opt are for whatever is so foreign and hopeless in not conforming to the usual factoring that you just give up and put it in its own little padded cell (hello, Mathematica);
- Everything is generally configured using files in /etc, possibly with the exception of some of the special snowflakes in /opt; the package manager will put config files meant to be edited there and expect the admin to merge any changes in manually, and sometimes put default settings meant to be overridden by them in /usr/share (see below)—both approaches can be problematic, but the difficulty is with migrating configuration in general, not the FHS as such.
There used to be additional hierarchies like /usr/X11R6, and even a /usr/etc on some (non-Linux?) systems, but AFAIU everyone agrees their existence makes no sense (anymore?), so much that even FHS doesn’t lower itself to permitting them.
The distinction between / and /usr might appear to be pointless as well, and nowadays it might be (some distros symlink them together), but previously (especially before initial ramdisks were widespread) stuff in / was whatever was needed to bring up the system enough that it could netmount a shared /usr.
Inside each of /, /usr and /usr/local there is bin for things that are supposed to be directly executable, whether binary or a script and all in a single place; share and lib for other portable and non-portable (usually but not necessarily text and binary) shared files, respectively, segregated by application or purpose; finally, due to the dominance of C ABIs and APIs on Unices, the top level of lib also hosts C and C++ library files and there’s an additional directory called include for the headers required to use them. Some people also felt that putting auxiliary executables (things like cc1, the first pass of the C compiler) inside lib was awkward so they created libexec for that purpose, but I don’t think the distinction turned out to be particularly useful so not all distros maintain it.
That’s it, basically. There are subtler but logical points (files vs subdiretories in /etc) and things people haven’t found an obviously superior solution for (multilib and cross environments), and I made no attempt to be historically accurate (the original separation of / and /usr happened for intensely silly reasons), but those are the fundamental principles of the system, and I feel it does make sense as a coherent implementation of a particular design. Other designs are possible (separation by application or package not purpose, Plan 9-ish overlays, NixOS’s isolated environments), but that’s a discussion on a different level; the point is that this one is at the very least internally consistent.
Re the unfriendly names ... I honestly don’t know. Newbie-friendliness matters, but it’s not the only thing that does; particularly in a system intended for interactive text-mode use, concise names have a quality of their own. There’s a reason I’m more willing to reach for curl and jq rather than for httpx and lxml, for regular expressions rather than for Parsec, and even for cmd.exe, as miserable as it is, rather than for PowerShell.
I feel weird that no HCI people seem to have seriously considered the tension between interactive and programmatic environments and what the text-mode user’s experience in Unix says about it, but even Tcl, which is in many ways a Bourne shell done right, loses something in casual REPL use when it eliminates (as far as idiomatic libraries are concerned) short switches. Coming up with things like rsync -avz or objdump -Ctsr is not very pleasant initially, but I certainly wouldn’t want to type out the longhand form that would be the only possible one in most programming languages (even if I find their syntax beautiful, e.g. Smalltalk/Self).
>the original separation of / and /usr happened for intensely silly reasons
As I recall, there were very good reasons for separating / and /usr (as well as /home and /var). The biggest one was that various Unix kernels would panic[0] if / was full. But that issue was almost universally fixed by 1990 or so.
And netmounts of pretty much everything other than / were pretty common for many years, due to the high cost of storage.
So no, the reasons weren't silly, they just don't apply to more modern systems.
OK, I didn’t put this completely correctly. The original separation of /usr to hold user home directories (!) and / to hold everything else was because the first RK05 disk ran out, but it makes sense in any case. The additional hierarchy under /usr was created some time later when space on the first RK05 disk ran out again, and while this can be a perfectly sensible decision for a single installation on a single site, taking it seriously decades later is silly. Neither does that mean that there weren’t good reasons the split got preserved in subsequent systems, just that they couldn’t have been the same as the original ones; there are no netmounts in V6, after all.
(I have an old Unix intro book that describes /usr as user home directories, the rest is a second-hand retelling[1].)
Follow your distribution. For example Arch Linux provides PKGBUILDs for official repos and AUR. Most of the time someone has already published PKGBUILD, but if not I just patch accordingly.
And conditions that formed separation are long gone, Arch Linux symlinks most of it:
I've read that a handful of times (whenever trying to figure out where to put some new random thing), and still have never come to a clear conclusion. Even better, because there are so many similar places, you might choose completely different ones depending on the day of the week and your current mood.
Too much choice for things like this is harmful IMO. Deep down I truly couldn't care less where the files end up, as long as that place is the 'right' place. There are too many 'right' places which makes it hard to find random things at a later date or when on a box you're not super familiar with. It's also a complete waste of time to think about it at all.
It’s not just you: Every distro is its own special snowflake and patches the programs they distribute to store files in a different place.
The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
> The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
The FHS, does actually answer both of of those questions. Files inside /etc/ should be grouped in subdirectories[0] andd the HTTP server should serve user-specified website files from /srv[1] and normal distro-provided files (such as the apache test page) from /var[2].
"use subdirectories" is probably the most handwavey answer possible, aside from maybe "just put it somewhere, lol". I feel like the standard could provide some sort of guidance on how to name folders or something.
> HTTP server should serve user-specified website files from /srv
I’ve never seen that in my life, but I’m sure someone does that. This is one of those cases where the people who follow the standard are increasing fragmentation
> I've read that a handful of times
(whenever trying to figure out where to
put some new random thing), and still have
never come to a clear conclusion.
So, given some data, say a file and/or
directory, maybe from saving a Web page,
that is relevant to subjects A, K, T, and
Z, where in the file system directory
trees to put that data?
My solution: Put the data in a directory
for one of the subjects A, K, T, or Z
without thinking very hard about which of
these. Then go to a file I call FACTS.DAT
(right, an old idea with an old 8.3 file
name!). I maintain that file with a few,
simple editor macros. So, sure, the file
is a catch-all for entries of random short
facts. And each entry starts with a
time-date stamp and a list of key words.
So, in the case of subjects A, K, T, or Z,
include the key words appropriate for each
of those. Then in the body of the
entry, put the tree name of the
file/directory where did store the data.
In a few seconds with my favorite text
editor I can append an entry or search for
an entry.
So far this year I have put 686 entries in
the file FACTS.DAT for about 2.1 entries
per day. For anything like current
personal computers, handling such a file
is trivial.
I feel like it just highlights the problem of how antiquated and confusing linux terminology that so many of those reference "single-user mode", used to refer to booting into root, when the vast majority of computing devices a given user will interact with only have a single actual user, making this a confusing and almost meaningless distinction to someone not already intimate familiar with linux.
Want to see true craziness? POSIX file names are just a bag of bytes. They don't even have to be text, they can be anything (almost), there's no standard text encoding:
And in typical Open Source fashion, someone actually claims it's a feature: https://lwn.net/Articles/325398/ because hey, you 99.999% percenters can suffer so that I, 0.001% percenter can implement my wacky system.
The directories that house your executables are read only to users other than root, to prevent attacks and overwriting them by non-root users.
/var stands for variable data--like log files, cache directories, spool directories, etc. You shouldn't put executables there. Ideally you should be able to set the noexec flag on it.
`/usr` actually exists because the original UNIX developers ran out of disk space and had to attach another disk. The difference between /bin and /usr/bin is not worth it and even Debian symlinks /usr/bin to bin.
But your distribution's package manager should be putting stuff in /bin or /usr/bin, not you. Anything that follows the regex "{asterisk}/local{asterisk}" is something the system owner can do whatever with. So you should be using /usr/local/bin or $HOME/local/bin. I don't know why there's no /local off of the root. (One thing I do on my own systems is make and use an /etc/local although I think you're supposed to use something like /usr/local/etc).
/opt is for third party programs that aren't installed via your distro's package manager.
If you do this, any customizations you make to a system can be easily backed up by copying all dirs with local in the name.
There's multiple decades of tradition behind these names, but they do date back to the age where actual teletypes were used.
MacOS too. /usr/ and /dev/ and whatnot exist, they're just flagged as invisible in Finder. There's a command to globally unhide them for those who want to see them.
Is it coincidence that you almost exactly replicated what macOS has? Except that /Devices is /Volumes, .../Apps is .../Applications. and /Boot is handled differently.
Of course, that's not perfect either, because a) decades of changes vs. compatibility have made it less clean in certain places, and b) pretty much all the POSIX paths still exist for unix-y compatibility, but overall it's like that.
> I wonder how hard it would be to rearrange the folder structure in linux
Restructuring the directories is the easy part. You just delete the old tree and make a new one. You can also mount procfs and sysfs wherever you want.
The hard part is modifying existing software to work with the new tree. So many programs assume you have a "standard" file system tree. So many programs assume procfs is mounted at /proc. So many programs have hardcoded paths. Shared library locationd can become part of the binaries when they're compiled. It's insane and you'd essentially be creating a new Linux distribution.
I know this is completely tangential. But you can Win-R and just type Documents and it will load your documents folder. Same for downloads, pictures, temp (windows temp), and I'm sure many others.
Works from File-Open dialogs and address bars and even in the command prompt you can even do "explorer documents".
Yeah, it's a junction point, but it's also useless. Open a command box and CD to it; now what? A file explorer and set it as the directory, again, now what?
Huh, spaces. There's way too much software, especially on Windows, that breaks when there are Cyrillic characters in a path. I'll let you guess how I found out.
The problem isn't the Cyrillic or the é but the fact that Windows lets you put those characters in file names in non-Unicode encodings which will create sequences of bytes which are invalid UTF-8. It's 2021, FFS, stop using legacy encodings.
All win32 functions that accept or return strings come in two varieties, with A and W suffixes, MessageBoxA/MessageBoxW. The A works with the system default 8-bit encoding (cp1251 in case of Cyrillic), the W works with unicode in wide chars. There shouldn't be much of a problem with string handling if you stick exclusively with W functions.
Using the W functions has been the advice from Microsoft's documentation for ages. But people still use the A functions because they're easier, especially when writing cross-platform software since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide.
Fortunately the future of the Windows API does look better since Microsoft has now added proper UTF-8 support since Win 10 1904. All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.
I would rather they added a U suffixed version and better still backported that all the way to Win 7. Now in 3-7 years people can write programs that use the A functions, but have to check the version of Windows and refuse to run if it isn't new enough.
It's possible! It seemed like a sensible choice back in the early 90s when the answer to making a system for global use was UCS-2. I know Java was another one that went with that decision.
If you have a username with your full name (plus point if you have special characters in your name), you will get the whole deal with shitty programs. I’m not sure if it’s me, but there were cases I simply could not use a program installed in such a location, to the point where at my previous (admittedly shitty) workplace, we often installed software in a root location…
And that, children, is when marginalia_nu unlocked the seventh circle of the inferno. Tomorrow we'll read the story of how our new demon overlords forced us all back to Windows 3.1.
I wonder how much global work could have been saved if Microsoft also provided a covered interface for all paths in the system. Not sure if there is any, but one good implementation might save thousands of poor implementations required to handle it.
On the other hand their case sensitivity behaviour means that “cross-platform” Java applications can break if they are run on a non-windows platform where opening files is case sensitive (unlike on windows)
Easier to add a flag to ignore case rather than fix bugs where files only differ by case and are therefore overwritten on a case-insensitive filesystem.
And then to really mess you up and ensure you handle parens properly, threw “(x86)” into the mix. (A real pain on some REPLs as well as dealing with environment variables).
Except for programs that were too old / obscure to fix I guess. I think at least the Symbian Development Kit was such that builds would fail with strange errors unless you installed it in any other path than the default immediate subdirectory of C:\, let alone under "Program Files".
It not only keeps people on their toes due to the whitespace. The folder name is even localized. E.g. with german settings there is C:\Programme and c:\Programme (x86).
I know that at least like, idk like 3-5 years ago, when I had gotten a new windows laptop (windows 7 or 8 I think), setting the main account to have the name "" (without the quotes), caused some problems with the basic functioning, including, I think, with some pre-installed programs,
So, some things were still being handled not quite right (whether that's because it shouldn't be allowed to be the username, or because programs should handle it being in the path, I'm not sure, but probably one of those.)
I just wish they had a decent way to execute programs with arguments that might include spaces. But no, every program can do argument delineation differently.
It doesn't even have to be complex, often basic automation tasks fail with spaces and special characters. Honestly, treating a file system like a natural language processor is a bad idea. Besides at this point with how digital we have all become who can't understand...
thisismyconfig.txt vs this is my config.txt or this_is_my_config.txt
...i've forced myself to stop using spaces, character, and even cap. They are all constructs that provide minimal value for the extra complexity.
SV echo chamber is on your side here - it is very in vogue to denounce anglocentrism. they were defending hieroglyphs and emoji in variable names in that thread about invisible javascript backdoor a day or two ago if you'd like a recent example
Could you please stop posting ideological battle comments to HN? We ban accounts that do that, regardless of their ideology, because it's (a) not what this site is for, and (b) destroys what it is for.
But Hacker News should do something about all of the anti-bitcoin and anti-anti-nuclear ideologies running around in here.
I don't really mind it that much but it'd be nice, it's really the only 2 extremisms I've experienced here, all other subjects are discussed in a fair manner.
I appreciate informed discussion about bitcoin & nuclear, as both topics are highly relevant to the technical, business, and hacker roots of HN. They seem distinctly different from, say, "anglocentrism" @dang was calling out.
What does "fair" mean in this usage? If it means one position attracts a lopsided balance of comments either for or against then surely that's always going to be the case?
Otherwise what is your proposition, don't state any opinion unless you find a counter opinion commenter to match with?
Lots of folk here are pro-privacy and lots of folk are anti-bitcoin (and some of them will be the latter because they're the former) so I don't understand how you'd extended your position in a way that leaves HN with any value.
cestmaconfig.txt vs cest ma config.txt vs cest_ma_config.txt
It's the same in any language.
Hugs who hurt you.
I'm also pretty sure most of us in any language use Slack, SMS or other forms of communication where text isn't necessarily presented in a grammatical correct manner and we all figure out what the person is saying.
I'm not sure, but my gut instinct is that it wouldn't help. Dyslexia rates are much lower in China, so if I suppose we could start naming files with Chinese characters (on systems that support Unicode). It would take a bit to get used to, but eventually we'd develop a pidgin language for when we talk about software, much like how if you overhear Chinese or Vietnamese developers they will mix in English words like "linked list" into their sentences, because there's not a more natural sounding alternative.
Switching to Chinese would also help eliminate the spaces issue.
tbh I'm not dyslexic and realized the spaces make it really difficult to know what the filename actually is. If you just take the second example, how would you know if the file was "this is my config.txt" versus "config.txt"?
Aside from parsing errors it just seems to lend itself to ambiguity.
This. People are saying spaces improve ergonomics. Unless everyone always quotes their paths in documentation, emails, etc -- which they won't -- I say it actually reduces readability.
Also programs automatically that turn paths into links don't work with spaces.
I'm similar, but I would like to support labels intended for humans, along with various translations, as metadata on top of e.g. filesystem path components.
You nailed it - getting rid of spaces and dashes and underscores is extremely human-hostile. People added spaces to the English language for a reason, and that's because they make it way easier to read.
Your system is only intended for other programs to interact with? Go nuts, make hex UUIDs. Actual people are supposed to use it? You need separator characters.
I also don't see how those characters add "extra complexity" unless you're doing dumb things like text processing on paths and filenames (as opposed to using OS/library functions that handle paths correctly) - in which case, there's your problem.
Why stop there. A computer works more efficiently with numbers rather than strings, so let’s just give each file a number instead of a string. Besides, at this point with how digital we have all become who can’t understand… But wait, that already exists and is called an inode.
A file system has a human interface and a computer interface. Don’t mix them. Let users give file names in whichever way they please.
I set my nickname to U+FFFD at one point in one work system, resulting in a variety of bug reports and concerned emails. I think I dropped it since it was generating false reports from people who didn't check what character the page contained before reporting it.
One of the systems I built is being used by a group of younger people. I included an emoji in the superuser account name, just to make sure it would work. And to remind me to think more broadly about user input.
A related too for CI: change the system time to be a time zone that is during your work hours in a different day already than UTC. Really helped getting failures earlier than 4pm PST.
At my last job we had a wild time-zone bug that only happened with your system location set to Mumbai. I left mine set to that for the rest of my time there.
Could you consider rephrasing this? It sounds like an interesting observation that I'd love to understand, but I'm genuinely not able to parse it.
My best guess is "change the system time to be a timezone for which, during your work hours, the other-timezone is in a different day than UTC is" - but I'm still not sure what effect that would have on CI failures.
I read that as "set your CI to run earlier in your workday so you don't get new error reports at the end of the day." Midnight UTC being 4 pm/16:00 PST.
Maybe an example of the failure this detects helps: when I used to work on Rails apps in the olden days it was easy to call Time.now and get the local time instead of Time.zone.now to get UTC time. This often lead to wrong dates but tests would only fail once it was a new day in UTC land but still the old day in the local time zone. Making the CI machine's system time something Fiji time really helped in getting failures much sooner after changes were pushed.
To have such thoughtful coworkers. On an old team I had two coworkers named Chris and once in a blue moon when they reviewed each other code master would start crashing because one of them accidentally left in an absolute path starting with "/home/chris/".
I add a Japanese character into any .py, .js and .html file to ensure that Unicode is working properly through the entire chain. Mostly in form of a variable which gets passed along, even in URL parameters.
I've used to have a space in my user name and even contemplated to add a bit of non-1252 Unicode. You find a lot of issues, but unfortunately often in tools you have little control over and end up not being able to work effectively at times. It ended up being more frustrating than helpful.
I once returned a printer because the Mac driver and support software expected and enforced case insensitive access and basically couldn't install properly on my case-sensitive HFS+ volume. It half installed and blatantly just didn't work in any way when installed.
Circa Y2k, I learned that the OSX Palm Pilot software didn't work with case sensitive. I've since given up and stuck with the default. (I'm anti-case folding in general, because of the ambiguity.)
I also enjoyed doing that, but had to make a DMG just for Steam because it straight-up refuses to run on a case sensitive FS (that's true on Windows, also, which I suspect is how we all got here). I think the most recent Steam versions either caught wind of my trickery or -- more likely -- run something from $HOME/Library/SomethingOrOther and thus the work-around it no longer works
When I got a new Mac, I just gave up and acquiesced to the case-retentive world :-(
I have coworkers on Mac that write node/JS code. Every once in awhile I'd pull down the latest code and it wouldn't run. I'm on Linux.
Sure enough, they had SomeFile and were importing Somefile and it works fine on Mac but not on Linux (which, of course, is what our production servers use). It amazes me that "works fine on my machine" is still a thing when I definitely worked at companies that solved this back in the 2000s. It was solved. It was done. Then devs became enamored with running everything locally. Even dozens of microservices or databases. Even though JS is fairly isolated, you still have NPM packages that need built against the local OS and C/C++ library and compilers, etc. Which also has caused issues in the past.
Does Docker abstract filesystem behaviors like this? I always thought that it stopped at the libc level - that is, libc is included in the container, but it calls the host kernel's system calls, and so inherits the host kernel's behavior (including things like underlying filesystem case sensitivity).
I thought the name for the collection of kernel features was LXC, I didn't realize (until just now) that was the name only for the also-kernel-level wrapper for those features, which name does not cover the features themselves. That is, I didn't realize that LXC is to Cgroups+Namespaces as Libvirt is to KVM—I thought LXC, as a label, covered the whole feature-set—but regardless, it's still married to Linux kernel features and runs on other platforms under virtualization, no?
In my case, and for many people writing desktop software, and for absolutely everybody writing open-source tools or libraries, unfortunately you can't control the environment.
Non-ASCII paths are extremely common (e.g. the user's home directory on Windows, for the large majority of users outside the English-speaking world) and spaces, punctuation and weirder characters will definitely happen when you least expect it.
Yes if you can avoid it then absolutely that's great, but I don't think most people can.
It's also not usually very difficult to deal with, as long as you actually spot the issue in the first place.
Ugh, we have the 15 character Active Directory limit now with hostnames, and a previous IT administration has imposed a convention that every name had to follow [prod|dev]-[ph|vm]-[service]-[nn]. So basically every production service is prod-vm-owtf-01— you get exactly four characters to actually describe what the machine does. Works great when the service is "jira" or "wiki", but there are a lot that are pretty mystical-sounding, like jkns, jwrk, cntr, hrbr, etc, where you kind of just have to know.
I kind of like that honestly. No doubt you need some documentation so everyone knows what the service abbreviations are, but after you've been working there for a month you get it. Makes everything clean, consistent, and informational. You can quickly ascertain what a specific host is doing just from the name.
Oh absolutely it makes sense to have a standard, and being able to tell at a glance if something is a VM or physical machine is of value also. But dedicating 2/3s of the character budget to such a scheme is madness. If the prod-vm- prefix simply become pv-, then you'd at least be able to do pv-jenkins-01 again.
Anyway, all this was fine when we were on LDAP rather than Active Directory. So basically it's all Windows' fault.
Yes, and for many of the web-serving machines, that's what happens, they're jenkins.example.com or containers.example.com or similar. But often a singular service is backed by hidden worker nodes, databases, whatever else, and it seems silly to give those machines that level of indirection vs just using the hostname as their sole identifier.
only allow ASCII, maybe dashes, and up to twelve characters. Problem solved
...and only hire people from the exact same background as you, who will never have unusual characters or accents in their name. And also make sure not to have any users who aren't exactly like you, and conform to this very narrow requirement. Surely, excluding 90% of the world won't hurt revenue in any way.
Use strict schema for the hardware interface, networking, physical stuff the user never sees. Microservice names don't need to be non-Latin. Database replicas, infrastructures, etc. And you're not going to piss off employees by giving them ASCII ldap/email addresses.
Use utf8mb4 or similar for storing names. Don't state "first" or "last". I've been through this rodeo too many times. You're not surprising anyone.
You can use an "ASCII-fied" version of the name, only ~27% of mine can be typed in ASCII letters that look similar but the rest is just phonetically or visually close-enough letters. This is something people did for decades and nowadays even government IDs have an ASCII-fied (well, Latin-fied) version of the name.
I maintain a similar system, where a variety of companies submit files that get processed through multiple services - it is astounding how ridiculous people’s naming of files can be; spaces are the least concerning!
> it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names
If you consider spaces “unusual” I would say you haven’t encountered a single average user in your lifetime. Spaces in file-names is the single most common thing people have, outside programming environments.
As a x-plat developer, the only platform where I (still) regularly encounter these kind of bugs are platforms where solving problems through scripting is common, like Linux, where the primary means of operation is through stringly-typed statements getting parsed and processed in a untyped-fashion. It's not very reliable.
On Windows people more often use “real APIs” (because scripting doesn't really work as well), but then these problems just goes away.
It's especially funny that it affects Linux so much. Most file systems allow everything except `/` and NULL in file names. Early AT&T UNIX even allowed NULLs! POSIX shells use the IFS variable to perform field splitting, and it defaults to <space>, <tab>, and <newline>. The choice to perform field splitting by default (particularly with spaces in the default IFS set) has caused no end of headaches for developers and users.
> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
This will also break any code in external tools that are called during the builds of your application and do not handle spaces correctly for whatever reason, thus making it so that you won't be able to successfully finish the build.
Then again, you probably shouldn't be relying on technologies like that, but when you're struggling to keep an old enterprise system alive, causing yourself more problems is not necessarily what you should do.
I spent hours trying to figure out why an entire folder suddenly stopped syncing. Turns out I accidentally added a hidden space to the end of a folder name.
Saw a few hacks where malware authors used the RTL feature (which is baked into Windows) to obfuscate file extensions. It looked like .exe.innocuous-document.docx, but was actually .docx.innocuous-document.exe
My favorite filename special character bug was when I implemented CD ripping in 2005, and one of our beta testers ripped a CD with a song called "Have You Ever?". My code wasn't prepared to filter out the question mark on Windows.
I changed my username to not contain a space because it was too annoying to deal with all the random dev tools breaking. The worst offender was probably npx on Windows [1] (resolved after four years by deprecating npx), but it was far from the only one (though the JS ecosystem was somehow the worst in this regard of all languages I worked with).
I don't know if it's still a problem, but it used to break Python virtualenv badly. If your working directory had a space anywhere in the path, it would throw a huge fit and not work. Which is problematic when the expected name for a Mac's boot drive is "Macintosh HD" (if you ever had a reason to run a virtualenv outside of your home directory).
Even capitalization is a pain in the ass thanks to how OSes treat file names. I pretty much stick with either `file-name.ext` or `file_name.ext` exclusively now.
Somewhat related to injecting unusual characters, in my experience in localization efforts:
Inject a Turkish 'I'. I don't know how to type or paste it here, but picture an English lower case 'i' that is upper case. It is a splendid way among many to shake out some loc bugs.
That would only shake out anything if you'd also test in a Turkish locale, wouldn't it? Since Unicode casing rules are locale-dependent and en-US doesn't care much about dotless i or dotted i.
Late '90s I worked on Java software that got installed on several Unix platforms, including Linux for IBM mainframes. When you deal with the default en/de-coding of Unicode to EBCDIC you never have trouble with Java byte encodings ever again.
> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
The problem with that is that YOUR code may handle it, but your tooling may not. If my code formatter break on spaces, I'm not going to change the formatter.
It's easy to tell users to make a folder with no spaces if you're setting up a global path, however if you have an application that runs in user directories things can become painful fast. Changing your user name is a pain and can leave things inconsistent, but having to handle all the variations in people's names with spaces, punctuation, international characters, can just be mind boggling.
I'm begging software developers to stop using subprocess APIs that take a string argument (system(), child_process.exec(), Process.Start(string)) and start using subprocess APIs that take an array of arguments (execvp(), child_process.execFile(), Process.Start(string, IEnumerable<string>).)
The only common place where it doesn't work is in CMD for executing programs and as arguments for built-in commands. Everything else goes directly to the relevant APIs which don't care about / or \.
These days using CMD instead of PowerShell should be rare enough and PowerShell certainly doesn't mind the slashes.
More importantly than your source files, put your testing data on such a path as well. Nobody uses absolute paths in testing so it doesn't matter how many spaces your absolute path has if your input is "./tests/file1". Put those files in a folder with spaces too and throw in a unicode character for good measure.
I did something similar on accident. I used to keep all my development work synced with Dropbox and I had a work and a personal account. So any of my own projects would have /Dropbox (Personal)/ in the path which did catch some bugs. Dropbox renamed my folder to "Dropbox (Personal)" automatically when connecting a work account.
Overly aggresive is right! I don't know if this is genius or deranged! I'm leaning towards genius and stealing the idea.
By the way: what's your beef with en dashes? I mean, if it was "everything should be 'HYPHEN-MINUS' (U+002D)", then fine, but why specifically en dashes and not em dashes?
I totally agree that for some people, this could be a terrible command to have around. However, I know that it has been working for me for about 8+ years or so. I almost always run in in my ~/Downloads folder on files that I don't really care about. I download a lot of academic papers and books, and this just saves me a lot of time to put files in the format I like: author--paper-title.pdf. And that's part of the reason why I make all of the dashes the same, so if I'm opening something by an author, I can easily autocomplete and not have to remember how to make other sorts of dashes on the command line.
That's the most beautiful part! After running this script there are no more conflicts, because it just silently overwrites all but one version of the "cleaned" filename.
(Also—that entire function is super inefficient and could be replaced with a single invocation of "rename".)
Totally inefficient. But for me it's readable and practical. This is mostly just a convenience function for me to help store files in a format I like rather than something I need optimized. If it ever started to feel slow, sure I could optimize. But for now, when I still occasionally download a file that has some weird character and I just prefer to add another line to my function.
Without changing the design too much, you could rearrange it like so to avoid renaming multiple times and still have the option to just "add another line":
Though I would at least take advantage of character classes to reduce the number of substitutions:
# Rename all files in a directory
rn() {
rename \
-e 's/[ _—]/-/g' \
-e 's/[:\(\)\[\]",]//g' \
-e "s/'//g" \
-e 'y/A-Z/a-z/' \
-e 's/--+/--/g' \
*
}
(I'm using the `rename` command provided by the `rename` Debian package, a.k.a `file-rename`. The options may vary if you're using a different version.)
Yeah, the physical layout is the primary concern. I should have noted that since there is ambiguity because n and m also happen to be next to each other in the alphabet.
I use this snippet, to change spaces to underscore for directories and files in the current directory and below. Haven't made it a function yet, but should. I got it from stack overflow or somewhere, but no attribution. Thanks to whoever did it first:
find . -depth -name '* *' | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
Yeah, sometimes I end up renaming things I don't want to, but it really doesn't happen all that often. And sometimes I throw caution to the wind, add some excitement to my life, and rename a bunch of files (not for anything professional) in some really old directory and hope I don't break anything. But I'm not aiming for perfect with this comment. I just mentioned in another comment, but the vast majority of times I run this is in my ~/Downloads folder on files I don't really worry about breaking.
I've thrown some edge cases at it, and it handles it super well. It deals with consecutive "_", remove leading garbage, normalize unicode, and even prevents naming conflicts by opting out early.
I agree. I am a developer and I know how to deal with special characters. But this isn't something I use professionally. I just prefer not to have to deal with special characters in the pdfs, m4as, txts, and other files that I use on a daily basis. When I write papers, I'll write ū or Ñ or ç or whatever (incidentally, I have a lot of shortcuts in my .vimrc for those). I would not say I am "afraid" to use spaces in filenames, but I get a certain satisfaction storing academic papers in the author--paper-title.pdf format and my notes in author--paper-title.md because it helps me find things.
Define "space". Is the Hangul filler we talked about yesterday a spacing character? Is the zero-width non-breaking space a spacing character? What about the typographic spacing characters?
You should better be very afraid of using spaces in filenames.
You should do everything you can to support them but you have to know you'll invariably encounter countless cases where you'll have this or that tool that won't work properly with them.
I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
Every Git repository in the world has that example: let that sink in.
> FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
I use Git for documents too, not only code. Why shouldn't I use my native language?
That's actually a good point. On the other hand, not all languages use IMEs. Mine just uses the AltGr modifier key, but is otherwise just a standard QWERTY layout without any features.
yup, I type in pronunciation and let it guess what I'm trying to say. Works okay in editors but don't work great with shells in a terminal emulator, so I just prefer not having to use it in shell operations.
You get all those space characters working and then some jerk comes along and uploads a file like this: ŗ̶̧̢͓̳͍͙͔̳̻̥͉̭͓̫̟͍̞̭͉͓͉̮̹͍͚̳̹̬͉͚̰͈̘̐̊̾̈̀̒͒̀͛̓̋̔͊̏͘̚ę̴̨̛̣͙̤̟̬̩̟͙͖̥̹̱̱̊͑͗̇̇͛̆̈́̃͋̓̀̔̍̍̌̐͊̎̓̅̀̕ͅģ̴̹̜̘͍̱̑͐̉̌̐̄̊͛̎́̐̌̅̈́͂͑̈́̋̔͂̊̊̒̒̔͛͆̚͘̕͠e̶̙͕̫̳̘͐̾́̑͆̓͂̿͊̊̍͛͐̌̆͗̌̅̅̔͊̂͛͗̅̕͝͝͝͝x̵̢̧̦̫͖̝̥̹͓̬͖̤̩͚̝̫̋̃̅̈́̆͋̌͑́̎̈́̊̾͒̀̒̎̓͛͊̿̓͊̀̍͐̆̚͝͝-̴̨̮̯͖͖̠̜̲̪͕̘͈͖̮̈́̓̐̃́̅̄̏́̍̉̐͌́̔̓̄͋͗̐̕͜͝ţ̴̢̧̖̗͖̞̮̫̦̼̝̺̼̱̳͓͉̜̟̤̲͖̻͙́̌̈̌̈͆̾̄͊̿̏̓͗̈́̕͜ͅh̶̢̧̨̥̭̼̟̣͖̯̗̤̖̙͉͕̙͎̰̠̝̖͈̻͙̪̮̘̯̻̼͕͓̖̣͈̽́͊̎͐͌̆̍̎̏̿͐̒́͋͑̍̿̎͆̑͆̄͂̀͐̄͑̀͗̿̽̎̾̊̕͝͝͝͝͝ͅi̴͚͈͍̫̮̝̣͖͉͓̯̠̙̭̟̖̘̾̓̄̈́̒̏̽̆̉̿͛̀́̃̋̒̈́͋̂̇̈́͛̕͜͠͠͝ͅs̶͇̖̳̞͉̱̞͓̖͔͔͍̗͇̖̮̹̅͊̔͋͊̈́̎̐̆̋̒̀̍̕͜ͅ.̴̧͎͇̰͉̼̱̰̦̟̑̋̏͌̍͊͑̄̀͌́̆̓͛̒̆̾̉͐̄̂̈́͆̒̃͗̐̂̎̈́̈͛̿́͛̾̚͘͜͝͝ͅȩ̷̡̲̪̱̪̥̳͍̼̰̘̗̹͙͙͓̣̟̩̥̥̖̠̪̮̹̞̥̻͎͖͍̯̂͑̏̑̆̍͋̎͛̅̑̑̏̎̓̀̓̒̈́͊͌̀̈́̒̌͐͂͛̊̍̐͂́̔̌̾͐̈́̋̇̏̚͜͝͝͝͠ͅx̶̧̛͚̗̜̪͍͖̘̙͎͚͇͙̬̱̟̭͓̺̙͍̖̱͚̣̘̪̭͔͔̮͎̬̪̤̹̟͔̩͍̬͕͔̩͐̈́̒̂͛̂̈̀̿̍̔̓̓̀̃̍͆̈́̍̓̌͐̈́̾̇̎̑͌͒̄̆̿̍͆̅͗͆͘͠͝͝ͅͅͅe̷̢̡̡̨̧̛͕͚̬̮̞̥̼͍͔̝̟̝̯͈̟̥͖̱̹̣̩̼̩̅̌͌̑̎̐̀̽̏́͐̋̏̎̎͛͌̀̊͊͒̑͌̎̎̑͊̌̉͆̾̚͘̚͜͠͠͠͝͝ͅͅͅ
It's been like three internets since I heard someone using "internet" as a measurement of time.
It's actually interesting to think about "generations" of internet, just like generations of people, and how the culture shifted between them.
There was a time in the early '00s when broadband was catching on, yet YouTube didn't exist. A time when Ebaumsworld and Newgrounds ruled the internet. When Homestar Runner was pop internet culture. Weebls Stuff. The frog blender.
I disagree with calling it "corrupted." We're not tricking the browser into trying to render garbage bytes that are actually the middle of a jpeg or something. It's actually valid Unicode. It's an edge-case which is not seen in regular usage, but it's technically following all of the rules.
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Specifically _Vietnamese_ combining characters. The Vietnamese writing system uses multiple combining characters at a time, and stacks them vertically. Throw in a few that wrap around the character like t҉his, some 𝑎lternatᵉ lꬲttꬲr fᵒrms, disturbing imagery, and perhaps a few other tricks, and you have zalgo. See also https://stackoverflow.com/a/1732454/823846
Interestingly I get different behaviour per browser/OS. Firefox/Linux clips it to the bounding box of the parent element, Firefox/Mac and Safari/Mac clip it to the line height, and only Chrome/Mac lets it extended further.
Firefox and Safari on iOS 15 both render all the glyphs attached to the base character. Vivaldi, Chrome and Firefox on Win10 all render them stacked and overlapping the parent and child comments.
For anyone who is curious (and acolytes of Zalgo): "In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character. So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model."
768 characters is too long for macOS it seems. (References online say HFS+ has a limit of 255 UTF-16 characters. Didn't find anything for APFS immediately... edit: same for APFS)
It would be one thing if it was making other comments difficult to read or causing browser issues, but I appreciated the demonstration that both would presumably be possible on certain browsers
Until now, I haven't actually thought of what would happen if zalgotext occurred anywhere other than a web browser. Looking forward to the five minutes of fun with the file manager and whatnot.
> I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
I have an Android phone and I tell MusicBrainz Picard to save all files with ASCII-only names and Windows-compatible names for the ones that get sent over to the phone. Basically for this reason. Sometimes it's players on Android itself, but even more frequently, whatever bluetooth radio I'm connected to freaking out with non-ASCII characters.
I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string - it's a sequence of path components and should be treated as such by our APIs. A path should be parsed once - on user input - and then used in its "sequence form" throughout the software stack.
And "path component" is not an arbitrary string either - e.g. appending a path component to the path should first require converting/parsing the string into the path component, and only if that's successful appending it to the path.
"Path is not a string - it's a sequence of path components and should be treated as such by our APIs."
For maximum correctness, you want to turn it into a file handle as soon as possible, and do all operations through the variations of the file functions that end in "at", like: https://linux.die.net/man/2/openat
The downside of this approach is that you still technically have to carry the path around with you if you ever want to present it back to the user, because once you have a directory handle, you can get back to the root directory easily enough by following parent links and seeing what directories you end up in, but that may not be what the user "thinks" the path is, and they want to see their path, not a canonicalized one. And they're mostly right. And it's not easy to correctly track changes to their intended path from this basis either.
Basically, I don't know of a really solid, 100% correct way to handle this with any reasonable degree of effort.
"you want to turn it into a file handle as soon as possible"
But no sooner.
For example, I've run into problems where I'm configuring program A server to talk to file location B... but I don't have access to file location B. But the client-side library for talking to the server tries to convert location B into a file handle and then freaks out because I can't access it. When I don't want to access it. I want that program to serve it.
If it was using simple "path" objects that didn't confirm that I have access to the path, everything would be hunky dory. But because it tried to convert it into a file handle unnecessarily, I get blocked.
Why not just hold onto both? The users representation and the file handle. Only ever "display" the representation, while you do all operations on the handle. (Not trying to be sarcastic, just curious).
This goes for most instances of user input. Timestamps is the other common one people get wrong. I've even seen programs that pass around timestamps as strings in multiple formats and as integers (Unix time).
If you need to keep the timezone with it, then use an ISO8601 [0] string: "2021-11-11T15:32:35-07:00".
Otherwise, use an integer unix timestamp, the number of seconds since 1970-01-01T00:00:00Z: 1636673555. Use an unsigned 32-bit integer or a 64-bit integer to avoid the 2038 problem [1]. JSON's maximum safe integer value is a signed 53-bit integer, so if you're using HTTP JSON RPC, you'll have to check for overflow.
ISO8601 is a serialisation format. You wouldn't want to use it in internal function calls simply for performance reasons. You also wouldn't want to pass it around as just a "string" type. I think the question was asking about internal function calls. For external data interchange, ISO8601 is the only sane option and deals with all known timezone and leap second bollocks.
> For maximum correctness, you want to turn it into a file handle as soon as possible
That's not right. You want to resolve a file/folder path to a file/folder at the exact point it makes sense.
It's a problem if you're using a path when you wanted the file. The file can be switched/modified out from underneath you.
It's also a problem if you've got the file when you only wanted a reference. Now you can't simply switch/modify the file independent of the reference. E.g., maybe you want config file changes to take effect immediately and transparently.
You can also have the hybrid case, e.g., where you want the folder directly, but have a relative path to a file that is resolved late.
If you're unsure, I'd err on the side of late resolution.
Another inconvenience with this approach is that you can keep thousands of paths in memory no problem. But thousands of FDs may cause you to exceed per-process limits.
> I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string
I guess that depends on what you mean by "string". `open` and `fopen` need a char* path to open a file. Whatever fancy Path abstraction you use eventually becomes a char* string, because that's what the kernel needs.
On POSIX systems file names are not strings, they are sequences of bytes. They might not be UTF-8 or have any meaning. Python3 had to hack around this, they thought they could force everything to Unicode and discovered that doesn't work.
At least for most Linux systems (not sure about other *nix, but I expect the same?), there is a system default encoding, defined by the locale, and I think decoding the filename in that encoding and displaying the resulting string, is probably the correct way to display a filename? That seems as good as you are likely to get on any system really.
I think for any POSIX system, either there is locale support defining the encoding, or it uses the POSIX locale, which defines the encoding (ASCII).
Of course you need to handle cases where filenames cannot be decoded in the system encoding (probably by replacing characters that cannot be decoded), because a filename in a different encoding, or even with no valid encoding, has been used on disk. While systems can say that file names containing bytes that are not valid characters in the system's encoding are not valid file names, that doesn't stop people mounting disks with them, so the problem never goes away if you support opening media from other systems.
What I am saying is that this is no more a Unix problem than it is a problem on any system that supports removable media.
On POSIX system file paths are C strings, which are sequences of bytes that cannot include the 0 character. UTF-8 or oher meaning is not required for something to be a string.
POSIX "Fully portable filenames" allow all characters except 0x2F (/) and 0x00 (NULL). That means file names can include line feeds, backspaces, EOF, etc.
"This is `a
perfectly vali'd.\010! file name\377, despite the weirdness"
Strings following certain rules are entirely valid representations of paths, just like sequences of path components in the chosen language/framework are. Similarly, the sequences of bits that make up the sequences of your language/framework in memory are an entirely valid representation of said sequences of components.
Yes, paths have structure, but saying "a path is not a string" is equivalent of saying "C source code is not a string". Both are strings, and both are something else, represented by strings according to rules. Different internal representations have different advantages and disadvantages. I fully agree that for things such as "adding components" an internal sequence/list representation is better, but strings can pass arbitrary IPC or even ABI boundaries much easier for example. (And you wouldn't bat an eye for example when you see FQDNs like "www.google.com" passed as a string instead of as ["www","google","com"] because the string representation works pretty well.)
C source code and paths are both representable by strings, true, but the fact that they're not actually strings is still important, because most people don't know that, and in the case of paths that leads to a lot of edge cases (in the case of source code it leads to a bunch of inefficient and weak tooling, which isn't quite as bad).
Because neither are strings, their native representation shouldn't be such - it should be something structured, and only when necessary (IPC, FFI, serdes) be serialized into a string representation. This would save people a lot of time and effort.
It really depends. Do you usually keep hostnames as strings? URLs? JPEGs? Why or why not?
Sure, a browser will hopefully quickly parse that URL and break it up, an image viewer will do the same with a JPEG. Will anything that's only interested opening/displaying that URL or JPEG, through a library or external program?
POSIX paths are actually remarkably simple in structure[1]. The only caveat is equality and normalization: Without normalization, a path a might be equal to a path b while their representations differ, e.g. "/etc/foo" and "/etc/bar/../foo". But this is the same whether you have a string or a list of strings, you need to normalize in whatever representation you choose to check for equality.
[1] Almost shocking myself, even Haskell defines its primary FilePath type literally as "String".
things like this are why the Unix philosophy is so bad.
text processing is hard if you must support Unicode, and that means every Unix command line tool must implement or employ a text processor to handle input. it would be much easier if objects were passed back and forth. PowerShell got this right.
Everything seems to be going this way in Linux land. Longer names, harder to type names, camelcase names, spaces... I'm looking forward to an OS that treats command line ergonomics as a first class feature and where camelcase & spaces are verboten.
I find this attitude misguided. More descriptive names are more ergonomic for things you only use rarely but they need to be combined with much better autocompletion than most shells provide by default.
You state that as if that were objective.. but that's not my subjective experience at all. Somehow I have a hard time remembering these long names, (is it --conf or --config or --config-file or --config-path? -c would've done it for me. --set or --set-prop or --set-property or --prop or --property?), and I need to look them up in a man page anyway, and I make more typos typing them, and shell completion rarely works well if at all. I also find it harder to read and edit long lines that wrap.
Somehow these short letters stick much better for me, and the effort for finding them in the manual is the same, although in case of extra complexity as with xinput, it's even worse with the long names. I don't use either command often, but it's hard to forget xset m. The only thing I remember about xinput is that it's a horribly long lithany of things which I need to look up every time, and the syntax still feels weird.
the most used options for properly written tools have both short single char option like -c and long-form version --config if you need verbose self-describing option.
If you are using cli tools of github written by a random person, then no wonder you will see non-standard approaches to UX.
PowerShell takes an interesting approach in that it accepts any truncated variant of a long-form flag as a short form, provided it isn't ambiguous (i.e. if the interpreter can't decide which long-form flag to expand a short-form flag to.)
For example, if a command features a "-ConfigFile" flag, valid short-form variants include "-C", "-Co", "-Con", "-Conf", and so on. But if the command featured an additional flag "-ConfigURL" for example, the aforementioned short-form flags would be ambiguous.
getopt_long (and thus most GNU programs) work this way. I think it's probably a misfeature though since it means that adding a new option can introduce ambiguity. Having both short (ex. -x) and long (ex. --exclude) options is a less problematic solution.
The shell ought to be able to help with that. There's no need to remember if it's --conf or --config if you can press --conf<tab>.
One of the things I like about Fish is that by default it can tab-complete program options and also shows a one-line description of what each of them does. (It grabs that info from the man page).
I very much would if only that pesky State didn't persecute me for that. Apparently, when I refuse to acknowledge the copyright and software license terms, other people get upset to the point of bringing the wrath of that Leviathan of oppression upon me! The nerve of some people!
I just tried fish. xinput --set-[TAB] and nothing. Apparently it doesn't understand the standard long-option format that is supported by xinput and documented in the man page. You have to know to omit the dashes and then it'll complete. And it's downhill from there.
Yeah I used to have all kinds of simple as well as supposedly sophisticated completion setups with zsh years ago but I've given up on it since then. It's always half-assed and half the time causes more problems than it solves. Same with bash. There are some places where I must resist the urge to try complete a filename because the shell starts trying to figure out which target it can complete from a Makefile in a large build system and just freezes. The only practical way out is to interrupt and type the command again or wait a stupidly long time. There are other issues like completion trying to be smart and filtering out things it thinks you don't want to complete. Nothing is more frustrating than a shell refusing to complete a filename that you know is there.
I run fish. I was able to get long-option completion for gcc, polybar, firefox, man, emacs, xrandr, and fish itself. The only command I was not able to get long-option completion for was xinput. You just picked a bad program to try.
I could never overcome my repulsion for Java and ObjC because of that. On the other hand, I fell at home with crazy RegEx that look like line noise to most people.
I think shells could use something like a built-in eldoc[1], in addition to tab completion. It would make terse command line interfaces much more usable if you could see what the positional arguments were for.
I like long form version. It helps me remembering what it does and why. Eg: `iptables --insert INPUT --protocol tcp --jump ACCEPT` was more helpful to me than `iptables -i INPUT -p tcp -j ACCEPT` when told how to allow TCP traffic.
For everyday command like `ls -l` I don't mind but anything more serious I take a more cautious approach.
The few scripts that I've written for personal use generally lack documentation or help commands of any sort; instead, they take all possible straightforward variants I can think of for each command (`--config`, `--config-file`, `--cfg`, `--conf`, etc). They usually convert everything to lowercase before processing, too. It's easier to fail safely on too much/too little input than it is to provide actual help.
Spaces don't make anything more descriptive, they just cause completely unnecessary quoting and escaping hassle.
The amount of time that has been wasted by Windows using "C:\Program Files" instead of "C:\Program_Files" far outweighs any highly questionable aesthetic benefit IMO.
On the other hand, how much broken code has been fixed to properly deal with paths just because of that? I'd argue that to be a major benefit. Same with Windows Vista forcing developers to write applications that work properly as a non-admin user.
Short option for interactive terminal. Long option in automation.
I’ll be damned if I have to remember or lookup what -n means to some obscure program, when reading someone else’s script. Exception given for super common tools where everybody knows like ls -la.
With the disclaimer that shell scripts, especially ls, aren’t exactly suitable for reliable automation in the first place.
Nocase (did I break a rule by writing it that way?) seems great when you're enmeshed in the domain and you can see the implicit separators, but then someone looks at your naming from the outside and you're guaranteed to have an 'expertsexchange' in there somewhere.
I'd argue it's at most a tiny bit harder to read, and a lot easier to type. On balance I'd rather avoid making a pinky key one of the keys I have to use the most.
At least it's where they sit naturally on the keyboard. And the shift key is wider specifically so you don't have to be accurate with your pinky when you're pressing it. The underscore is one of the least ergonomic keys there is. And you need both pinkies to do it
I might be misunderstanding. On all layouts I'm familiar with the underscore key is directly next to one of the shift keys, or left of backspace. Neither layout requires the Vulcan death grip. Shift should always be under your pinky fingers to avoid contortions.
Having used a lot of all the formats, it's argue it's a lot easier to read an a tiny bit harder to type. For typing it's basically just an extra `-` because unless your alternative is nocase.
For reading, CamelCase has 2 significant ambiguity issues: similarity between I and l, and what do you do with acronyms. Acronyms wouldn't actually be a problem if everybody just wrote them would in snake_case (i.e. only capitalize the first letter), but they don't and so it's anyone's guess whether you're going to get "Id" or "ID".
There's also a minor issue where if you're on a case-insensitive file system it can be a little difficult to change casing, but adding/removing underscores is easy.
Adding an underscore everywhere is horrible! The spacebar is huge, and gets your thumbs basically to itself because space will be one of, if not the most commonly typed key. To replace that with one of the least ergonomic keys makes no sense.
And if CamelCase is so hard to read, why is it the norm for "high level languages"? Shouldn't those be optimized for ease of use?
> And if CamelCase is so hard to read, why is it the norm for "high level languages"
That's over-selling it a bit. It's more common, but not dramatically so. Outside of class names, CamelCase isn't the norm for Python, PHP, CSS, HTML. It's also not the norm for shell scripting, but shell scripting has horrible readability for other reasons.
I believe CamelCase is more common for languages like Go, C#, and Java because they grew up in large organizations where having god objects/classes with 400 methods is kinda normal and having aMethodWithAReallyLongName is pretty common. One of the advantages of CamelCase is that it does shorten really long names.
Sure. One could also make "move-down-one-line" be the incantation to move the cursor down a line in vi, but I prefer j.
Ergonomics isn't all about making everything self-descriptive for someone seeing the thing for the first time. It's about making things comfortable to actually use. If it's so long and complicated that you can't even remember how to do it, it's not very comfortable to use. Even if I could remember, xset m 0 0 is still far more comfortable.
And fwiw you still don't know what 0, 1 in accel profile do; you need to look that up or take a wild guess, and if you want to use that command, you'll also have to know how to look up the device because chances are yours is not the same as mine. So it's not any less magical in the end, just more verbose.
The "cool" thing about the xinput command is that you don't even find accel profile in the man page. You gotta look elsewhere if you want to understand what it is and what it does and what the parameters are.
Yeah well, given that mouse acceleration tends to be on by default, I need to turn it off every time I'm on a fresh install or computer I haven't used before. The last time I needed that was yesterday.
I don't want to waste time searching for a command to copy-paste when it could just be made short, simple, memorable and ergonomic. I could type xset m 0 0 faster than I could open a browser and ask google how to disable acceleration with libinput. And again: you can't just copy-paste the xinput command unless you're lucky enough that it matches your device. On my new computer, the device has a different name than on my old laptop even though it's the same damn mouse.
It should be, but how would you keep track of usage frequency?
At least it would push all the "This switch was added by someone playing with UNIX at a university in 1986 and hasn't been used since" options to the end of the list.
The less frequently I need something, the more frustrating it is if it's not short and memorable (or easy to look up in the synopsis or built-in help). Forgetting and googling a needlessly complicated command over and over again every year isn't fun.
xset achieves that perfectly. If I somehow didn't remember how to set mouse acceleration with it, a quick glance at the synopsis immediately tells me. Or I can just run the command and it'll tell me:
To set mouse acceleration and threshold:
m [acc_mult[/acc_div] [thr]] m default
Zero frustration, and the command is so short and simple that I end up remembering it without trying.
This is something I've observed more than once: I easily memorize useful sets of one-letter flags even if I can't remember or know what they all stand for. This just doesn't happen nearly as much with long options. Commands like ls -ctrl or ss -nap quickly become part of my repertoire even if I don't use them very often, but I really couldn't remember ss --numeric --all --processes (if I had written that from memory, it could've ended up as --num --all --pid or --numeric --any --process), and I don't even know what the corresponding long options for ls are. In the rare case when I have to deal with an option that has no short equivalent, I feel like I have to look it up every time if it's been longer than a few weeks.
You talk of optimization but I think this is just a very basic (and reasonably successful) attempt at sane design. It's not like someone had to go far out of their way to make this in a manner that isn't batshit insane.
But which case should software interfaces optimize for? Ergonomics of someone who uses a tool frequently, or interpretability for casual by-standers of some out-of-context shell command?
Needlessly long parameter/command names and the bizarre insistence on capital letters are the #1 and #2 reasons I detest PowerShell. Like GP, I resent that Linux tools are moving in that direction.
Long option names are more descriptive, more easily distinguished, and easier to remember. Your shell should be intelligent enough to provide tab completion for option names, assuming it is configured to.
Long option names are more difficult to remember because a long option name can be spelled multiple ways and it is difficult to remember which spelling is correct.
IMO, powershell got it right. Yeah, it’s syntax is strange, but it has standard flag usage with proper autocomplete, and you can shorten any flag the way you want (eg. fuzzy match) if it is unambiguous.
Cue nmcli (CLI for Gnome's NetworkManager) which uses UUIDs for everything and (at least a while ago) did not accept partial-but-unique UUIDs. Basically goes "nmcli connection up 5095665a-d82c-4ae6-8964-283623387941".
Weird, I haven't had to do this. Most(/all?) connections have nice names you can see with `nmcli c`... and so I can do `nmcli c up id DroidNet` and that's pretty dang nice. Pretty sure this worked with Ubuntu 14.04 (though, nmcli has gotten much more featureful since then)
(The ability to shorthand connection->c and similar is great, too; obviously not unique to nmcli)
That may be a part of the problem but honestly I don't feel like all these new crazy interfaces are easy to learn either. I mean how do you come up with the lithany xinput calls for? You need to understand the syntax for specifying a device. You need to know that you're to set a libinput property, and you need to know the name of that property, and it's not documented in xinput man page, and of course you need to know the values to pass which again are not documented in xinput man page. You can play with --list-props and then take your search elsewhere because it is completely opaque and doesn't explain what the properties actually do.
I suspect the number of people who figured all that out without having to find it by googling / arch wiki / whatever is very very low.
Now I'm not gonna say xset is the easiest interface to figure out, but the syntax for setting mouse acceleration is right there in the synopsis, and if you search down the man page, you'll learn a little more (and also if you just run xset without arguments, it'll tell you how to set mouse acceleration). It might not be the best designed tool but it's something I learned back in the day as a teenager just by looking at the man page.
I think the real issue is that people nowadays are designing these interfaces to be consumed by interactive configuration tools, GUI apps, and desktop environments; they're more dynamic, more complex, more flexible, but not easier to figure out, not for you on the command line. The command line is just a last resort. Second class citizen if you will.
On some level it makes sense. The problem with the command line is familiarity.
How often do you reach for iptables? If you're like myself, and most home/desktop users, then probably once in a blue moon to set it up and then you leave it alone. But a system admin? Maybe they touch it a few times a week or month. Every time I use iptables I have to relearn how Linux networking works.
Similarly, the xset/xinput thing. When I need those tools I just create a script or throw it in .bashrc. I adjust the settings once and will not touch them again for a couple years. It makes sense to have long parameters that are readable. I can look at my .bashrc and see exactly what device is getting adjusted.
Well, if you think that's bad, behold the recent trend in network interface names on Linux.
We started out with 'eth0', 'eth1', etc. Which adapter was which could change when adding and removing a network card. That was bad, so that prompted the evolution.
Now we have 'enp1s0', 'enp0s31f6', 'enp13s0' and many similar variations. These are supposedly more stable across device changes. As it turns out, it wasn't.
But wait, there is more! Now we have the "predictable names" scheme that produces interface names that are even longer, and not even slightly easier to remember.
I do get that it is not an easy problem to solve, especially in the face of removable network interfaces (like USB Ethernet / WLAN). But surely this is not the best we can do.
I was actually ranting about this on IRC last night (yeah now my laptop has two enp* interfaces and enx[MAC])..
One thing I like about OpenBSD is that buses are scanned and drivers probe in order and there's no race between drivers coming up. Unless your hardware is physically tampered with or broken, all interfaces come up with the same name across reboots. Linux isn't like that (even if you don't touch your hardware, interfaces could swap across reboots), so you need to do something about it.
As is typical on Linux, the default is unergonomic and if you want something nice, you're on your own to make it so.
If you already have userspace daemons responsible for device insertion and naming, it really wouldn't have been so hard for it to e.g. automatically add a config file / database entry for each interface the first time is seen. So the devices that came up as eth0 and eth1 are still eth0 and eth1 on the next boot; if I unplug eth0 and add a new card, the new one would be eth2 because eth0 is still reserved for the first card I had.
These changes are meant to make it easier to read and understand command-line incantations (and to make them more explicit, which is always good), because the command-line paradigm, being text-based, imposes an unavoidable trade-off between ergonomics and understandability/ease-of-use. It sounds like you prefer ergonomics - although I wouldn't be surprised if most users would prefer ease-of-use.
Of course, if one doesn't write a CLI to begin with, this trade-off doesn't exist - you can have your cake and eat it too.
I am also that age, and kebab-case is the best case for filenames.
2021-01-01-some-important-document.pdf gives me the warm fuzzies. On the off chance that some more differentiation is needed, throw in an underscore and a whole new world opens up
Cut most mine off in an unsupervised Halloween pumpkin carving accident when I was a kid. I think the lack of length actually allows me to type faster.
I see and applaud your use of the underscore there, but I must reject the premise!
work/client/project/2021-11-11-file.ext is more or less how I lay stuff out. I’d say client/project is a folder level distinction (arguably dates too).
[EDIT] Realistically most of the stuff under <project> is git repos and I usually make a “home” repo where I keep org files for tracking hours, notes, and resources related to the engagement.
work/client/project/2021-11-11-file.ext is great until you've got a '2021-11-11-project-status.txt' in a few directories and you need to find one quickly! I do a combination: clients/client/project/2021-11-11-client-project-update.txt
It sounds like what everyone in this thread needs is a database file system. This was always my favorite proposed feature of Windows Longhorn that never made the cut. Almost 2 decades later and Microsoft's latest OS still doesn't have this feature.
I wrote about what I perceived as deficiencies of hierarchical file systems, and proposed an alternative organization based on tags and hashes. It was discussed on Hacker News last week and many years ago.
I'll be the opposite voice: the file system isn't for precise organisation, it's just for storing. For organisation, the ideal thing to use is tags. Since most file systems don't have tags and using software for that would be a pain, the best way to do this is to list the tags in the file name.
I've always thought that personal files, photos, or any other kind of just needed more connections between them to improve my information retrieval experience. That's how I had become a Zettelkasten evangelist. I believed it would be the cure for the information overload disease of our era.
But life made me use Emacs org-mode more and more, and I'm now in love with tags. Retrieving information has become so easy, especially with org-mode's tags inheritance, that I hardly think making connections between headings or notes is necessary anymore[1]. And I believe that applying tags to filenames (a la Karl Voit [2]) will create the same effect
[1] A Zettelkasten-like system is still unbeatable imo when it comes to ideas repositories, i.e. a second brain you can talk to and get new insights. It's just not that great for personal knowledge management or project management.
> But underscore separates fields, hyphens for space replacement
But why not the other way, hyphen-minus for separating fields and underscore for space replacement? That seems to me more consistent with how underscores and dashes are used.
Kebab case is the often overlooked benefit of prefix notation and semantic white space in programming languages. Honestly the best case of all cases imo.
Forth does something like this, by virtue of its reverse Polish notation.
In Forth, 'words' (which are roughly analogous to functions and operators) must always be separated by whitespace, as Forth doesn't parse out operators the way most languages do. In exchange, you get the ability to use symbols in identifiers, as Forth has no reason to single out symbols like + as being syntactically special. You can even use a number for the first character. (For that matter, Forth will even let you override the usual interpretation of a numerical literal, but that's always struck me as going a bit far.)
It gives you a + word, analogous to the + operator of most languages [0]. It also gives you a 1+ word, as an (admittedly slight) abbreviation of the sequence 1 +. [1] If you wanted a 2+ word, you could easily define it yourself.
(This property of Forth evidently wasn't enough to get it to take over the world, but it's still neat.)
Are you trying to catch GP on differentiating hours, were it to be appended to his time format (1st @ 11 vs 11th @ 1am)?
Notably he didn't promise any, but presumably one'd need a separator... Maybe, per his "K" usage of the month, one'd use the alphabet again. 11am would be "K" again... or lowercase just for giggles?
I don't think it reads very well, but I also think one'd get used to it pretty quickly.
If you hadn't heard kebob-case called that before there's a chance you haven't heard SCREAMING_SNAKE_CASE called that before, and I couldn't live myself if I didn't let you know.
Awe, in turn I have never seen that particular xkcd—it's great! I learned to call it "feigning surprise" and I always try and be conscious of it (though I still catch myself doing it from time-to-time).
Years that end in a 1 are awful when doing this, especially in October and November. We've had 20211001, 20211010, 20211101, 20211110, and now today 20211111.
I've recently shifted sharply toward the dash from the underscore. I find it more readable, and it doesn't require the shift key. However, I do find it useful to use underscores to create groups, e.g. test-001_2021-10-11.log. Including hours, minutes, seconds is still awkward.
There's a customer for everything. I've just never liked the aesthetics of the underscore. Also if your underscored thing gets put in some document and then underlined the underscores can become invisible.
A lot of this is personal aesthetics, for sure. Personally, I am not a big fan of camel casing. In code, I only use it for class names, generally. I don't find it particularly readable, and for filenames, not all filesystems are case sensitive, so best not to rely on case to differentiate files. Camel case does have the nice property of being more compact, as no character is required. That's its main benefit.
R traditionally uses the . as a legal character in identifiers. Once you get it used to not being syntactic, I found I actually prefer them to underscores.
I'm of the opinion that kebab-case is the best case for all identifiers, because it's easy to read and to type. As always, Lispers were right all along.
I found that some_document_2021-01-01_v03.pdf works best because it keeps the same document next to its other versions alphabetically, keeps them in date order, and keeps them in a sub-day version order.
As a side note, in the good ol' times of ISO9660 level 1-4 and the various mkisofs parameters, an underscore _ which is a CAPITAL -, may have given issues, only for the record/as a curiosity:
One of the main reasons why Windows used "Program Files" and "Documents and Settings" was to force the programs (and programmers) to deal with paths with spaces. And you know, for the most part it kinda, more or less worked out although of course even today you will find programs that ask you to install them in a folder without spaces in the path.
Yes, I was doing code to quickly read FAT folders (on a micro controller) and got to the bit about filenames more than 8.3. I decided my life was too short (and processing time) to go and sort out what the "real" file name is. Enforced 8.3 as a requirement!
The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
Sadly such set includes loads of Java programs. If only SUN had shipped a standard way to generate isolated exe files in 1998... but they worked under the presumption that you'd have a JVM already there, because distributing that monster was difficult in dialup times, so you could just hand people a jar; and the enterprise market did not care, since they had webapp servers. Sadly it's an "optimization" that became obsolete very quickly but wasn't rectified until it was too late (java 9+).
> The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
Actually it isn't, just use double quotes and add a '~'. It's just about the only thing batch files handle better than shell scripts.
set "VARIABLE=%~PATH"
And that was a good idea, if only Microsoft also fixed the CreateProcess function, Windows would be somewhat sane in this regard. But somehow nobody seemed to think of it. Seriously, look at it:
The arguments are a single string. So you want to pass parameters with spaces in them? You've got to add quotes and stuff all of that into a single string. Instead of doing it in a more sane manner, like oh, the arguments to main().
The root cause is that argv isn't a first-class citizen like on linux, but an abstraction. The kernel only cares about a single string argument. If you use main instead of WinMain, the CRT will transform the single string into an argv for you.
Oh and cmd.exe uses a different escaping scheme than the CRT.
Microsoft is in full control of the Windows kernel, so they can make it care about whatever they want to, and one would think better argument passing would be a nice quality of life improvement. Less nonsense for developers to deal with, and less weird bugs on the platform.
They can either add a new API which almost nobody would use ― because everyone already learned to use the existing one and either reused or reimplemented the MSVCRT's logic so that most of the software parse the command lines the same way; or they can literally break every single program in existence by breaking the interface of CreateProcess ― which is just as likely as Linux breaking the interface of execve(2).
Giving CreateProcess a new flag so it would to correctly accept "path\\to\\my\\program.exe\0arg_1\0second argument\0argument with literal \" symbol" (with an implicit \0 terminating it) as lpszCmdLine is an easy part; the hard part would be forcing everyone to switch to using it.
Also, I'm pretty certain this processing happens in the user space, and Win32 API is already bloated beyond any belief.
They may have thought that would happen but I saw just as much stuff end up in C:\Windows or \Users or (always my favorite) those “Documents” that are really just “whatever random crap every app wants to put there”.
That annoys me every time I use a Windows system. It was a terrible decision, especially since both the command prompt and the new powershell doesn't accept like bash a backspace before a space, you have to quote the whole path! I get that most users on Windows don't use the shell, but as a developer I do a lot, and every time it's a pain (no wonder they added the WSL in Windows after the failure of Powershell...)
Why would they accept a backslash? Backslash is a path separator on Windows. In most Windows programs, you don't even need to escape the space - arguments can contain spaces and it will understand it, like `notepad My file.txt`
The escape character on PowerShell is backtick, and on cmd it is caret. You don't need to quote everything.
Not exactly - the problem is mostly when doing variable expansion. The fact that bash treats "$x" and $x as different is a bit of a design flaw. Of course there's still an issue with evaluating dynamically generated code, but that problem is partly solved by working with arrays.
I mean how do you want shells to deal with file names with spaces in? Do you think we should have to quote and escape all file names all the time? If not then how do you think it should work?
Shells should treat data as data, and not have the default behaviour be treating it as code (i.e. you should need to do 'eval $x' or some equivilant if you acutally want the string to be treated as a shell command). This would also mean having a real list type, instead of depending on arbitrary seperators in strings. This is exactly how other languages treat it, and it is not a significant challenge for interactive use (in fact, it would substantially reduce the opportunity for suprises when running commands interactively as well).
Even if you're not using WSL, you've always been able to turn on case sensitivity via a registry key. This has not been recommended in the past due to possible issues with windows itself as well as third party software. This history is mentioned here[0]. Everywhere that mentioned a registry key seems to be referring to windows nfs server, not to general file access, however I know that SFU (Services for Unix) installer had an option to do so, so it's certainly possible.
As of sometime in 2018, fsutil can set specific directory trees to be treated case sensitive in Windows 10 without setting it for the OS. This ability is mentioned here[1]
Was recently encoding my Stargate: SG-1 DVDs to move them to plex. I was encoding it on a system other than what was serving it, so I had to copy it. It's surprisingly difficult to "scp" a file with a colon in it directly.
I also love when you're using bash and you have a file with ! in the name, and you accidentally fail to correctly backslash it, you not only get "bash: !rest_of_filename: event not found", but it also fails to add that command line to the history, so you can't just hit up and fix it. You have to actually go to the mouse and copy and paste.
That sounds like... Puzzle time! I had to cheat, sort of, by looking at the man page:
> Local file names can be made explicit using absolute or relative pathnames to avoid scp treating file names containing ':' as host specifiers.
So `scp foo:bar user@host:~` fails because it tries to find the host foo. But `scp ./foo:bar user@host:~` works just fine. I feel kind of stupid for not guessing as much.
That is just lazy programming. If the input "foo:bar" is ambiguous, the program should try both interpretations (HOST:FILE and FILE) and then present the user with a prompt that provides sufficient information.
"Does foo:bar refer to the local file `foo:bar' (size: 102kB, date: 2021-11-11) or to the file `bar' on host `foo' (FQDN: foo.example.com, IP address: 1.2.3.4)?
I'm not young, but I've been using Macintosh computers regularly since 1990, and even back then file names could be up to 31 characters long, and could include any character except colon.¹ So I'm pretty comfortable using spaces, and sometimes even non-ASCII characters, in file names.
Also back then Mac file names typically did not include an extension, because the file's type was stored as part of the metadata in its resource fork. I remember one time a friend of mine was visiting and was playing around with a paint program on my Mac. Being used to DOS, when she went to save her file, she typed a very short name, and then asked me what the proper file extension should be. I smirked and said, "That's not how you name files on a Mac. THIS is how you name files on a Mac." And then I named her file "Ailsa's Cool Picture". Her mind was blown. :-)
¹This is because the colon was the path separator. But since the classic Mac OS had no command line interface, the typical user would never type or even see a file path written out.
Well, you should still be afraid! Be very afraid! Seriously: only a few months ago I was confronted with a video encoding tool that didn't work properly when the file names contained spaces - so yes, even in 2021 it's still safer not to use spaces in file names...
Looks like I'm in the minority. I always use spaces and non-ASCII characters in filenames.
In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics (fata, fată, fața, față, făta, făță, fâța, fâță).
Given that we have to use diacritics, spaces don't seem like a big deal.
Google Translate is a horrible tool for "translating" single words or lists of unrelated words.
Use a proper dictionary for that. The very nature of statistical models makes proper translation without context impossible for these systems, especially when uncommon words and diacritics are involved.
As you would assume: use ASCII and deduce from context. Many people still do that.
That has lead to phantom diacritics: reading letters in unfamiliar words/names based on what you assume they are. For example some pronounce Chirica as Chirică because they assume someone forgot to type the breve in ă.
Not sure about Romanian, but for many other languages people essentially came up with transliteration schemes (multiple, incompatible, ambiguous) to squeeze your language into ascii.
The resulting text was understandable by the "computer people" but not the general population who did not use the networks back then, perhaps somewhat comparable to when some time ago USA parents encountered the "SMS slang" used by their teenagers.
Back in the day there were dozens of character sets that were alternatives to US-ASCII. Having once worked on an Email client, I needed to bake in a bunch of translation tables to convert stuff sent that way into UTF-8.
> Given that we have to use diacritics, spaces don't seem like a big deal.
There is one big difference: CLI utilities don't usually care about diacritics (though encoding issues can throw a wrench in that), but they care a lot about spaces. So putting spaces in filenames requires properly quoting or escaping parameters, whereas diacritics does not. That makes one-off shell snippets and scripts a lot more annoying (though TBH I tend to shy away from those anyway, these days).
We have a few words that depend on diacritics to be unique in Czech as well - though not as bad as this example - but people just manage without. Hell, I don't even bother installing the Czech keyboard, if I REALLY need it (like in names), I just google for words that have the character and copy it
There's a server at work that name with a non-ascii character. I've run into compatibility issues lots of times where I can't connect. I prefer to just use English with ASCII and be happy
Server names are different. They are by and large machine-facing identifiers, whereas filenames have a 50-50 split of whether they are machine-facing, human-facing, or both. They makes their support of Unicode a much more critical (and appealing) proposition.
I never put spaces, and won't go over 32 characters, preferably less than than 16. even when sending a file to my grand mom. that's how deep rooted the trauma is. and yes, it remains an issue with some parsers and what not.
I have been trying to repro with a small nodejs server but either the server cut off the content-disposition filename or firefox truncates it. When I get that in the wild I'll post an update.
In the meantime:
$ touch 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
touch: impossible de faire un touch '1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111': Nom de fichier trop long
Same. For documents and stuff that I use in normiespace I give them friendly names with capitalization and spaces and such, but for anything I'm going to be working on via CLI I try to use filenames that will be easily chunked as "words" when doing things like double clicking it in terminal to select, ^w to erase it, tab completion etc.
Somehow the OneDrive clients still refuse to allow leading or trailing spaces in the filenames, along with a few other characters that are not allowed - seems to cause quite a bit of user friction at least with the non-tech guys that I work with who are confused about why OneDrive is one of the few file syncing clients that has these requirements....
This is a UI/UX problem that I only face when dealing with shells and shell scripts. Never had any issues when spawning processes from within languages/runtimes that support sane argument arrays.
sh, bash and cmd.exe are shit. The shell needs serious rethinking.
I see that there are lots of comments about problems of TAB-completions with filenames with spaces in this comment section and I am frankly puzzled: both Bash and cmd.exe actually TAB-complete those perfectly fine, inserting quoting where it's needed.
I seem to remember bash losing preferred escaping when TAB-completing, but can't reproduce it now with 5.0.17.
Eg. you'd type `ls -l "Spaced [TAB]` and it would turn it into `ls -l Spaced\ Name`. I remember similar annoyances with other special shell characters (eg. single quotes, dollars, slashes), but that all seems to behave sane now.
I didn't even know this was a thing, but can't say I've ever preferred an escape style. I actually use backslashes a fair bit, usually just with spaces. I tend to reserve double quotes for variable or shell expansion, explicitly.
It's not so much about a preference, but your cursor would jump about and you'd need to be on the lookout if you wanted to edit the completion (eg. to change the extension).
You have to remind yourself to do this manually in scripts if you don't want to see lines full of "No such file or directory."
One of the reasons the shell is broken is because the character they use as an argument array member separator is something that regular people use to distinguish between two words, such as in a file name.
And where it isn't needed. If you have a path that contains a variable and a space, bash will happily escape the $, making the path invalid. See the following:
$ cd $HOME
$ mkdir my\ dir
$ ls my[tab]
$ cd /
$ ls $HOME/my[tab]
ls: cannot access '$HOME/my dir/': No such file or directory
That error is because when you press [tab], bash changed the path to \$HOME/my\ dir/ but that isn't obvious from the output and I couldn't find a proper way to include the tab-expanded result in the transcript.
(edit: this is on GNU bash, version 4.3.48(1)-release but I've seen this behaviour for years)
Depends on the Bash version, I guess? Mine is 4.4.20(1) and when I do "cd $HOME/my[TAB]", it replaces the input line with "cd /home/joker/my\ dir/", and pressing [ENTER] changes the directory to '/home/joker/my dir', as can be seen from the prompt.
This is a difference between $@ and "$@" (note the quotes):
$ cat proba.sh
#!/bin/sh
echo "Using quotes:"
for i in "$@"; do echo "$i"; done
echo "No quotes:"
for i in $@; do echo "$i"; done
$ ./proba.sh "ho ho ho"
Using quotes:
ho ho ho
No quotes:
ho
ho
ho
Every week, I encounter a user - just like I did in the 80's - who cannot explain the difference between a file and a folder.
"What do I use a folder for?", they ask, in the same breath that they request "some way to organize things logically".
The no-filesystem movement has worked hard to eradicate this scourge from user experiences, but I fear that this is the devils work. Computer users should know what a file is, and what its for - and they should know what a folder is for, and why they would want to create one to put their files into it ..
But yet: they don't.
It hasn't improved since the 80's. Taking away the users responsibility to understand these things, only makes computing worse. The fact that "special chars in paths" breaks things, also holds this factor into place, imho.
Is that the movement to store all your data as an amorphous pile of crap, and then provide easy-to-use search tools to actually find the content you're looking for?
On one hand, I really like the search tools that come from this. But I still like to actually organize my data, so I can browse it if I want to. Also, these search tools seem to only work well enough on macOS and fall flat on their face in Windows. (and no idea where Linux falls on this)
You had me at "amorphous pile of crap", but lost me at 'actually find the content'... ;)
Meanwhile, I've got a single directory full of PDF files (over 60,000+) which I routinely "ls -alF | grep <search term>" for, and I've also got some PyPDF scripts for doing deeper content search - but yet I yearn for a way to automatically parse the filenames and organize things categorically into a folder tree resembling a word cloud, symbolic links and all .. one of these days ..
You think space are bad (and yes I'm old enough that I don't use them)... We work with a company that has forward slashes "/" in their trading name and insist on shared cloud directories involving them to be prefixed with that trading name.
As you as you do anything programmatic in/out of these drives it all hits the fan. So I'd add to the original statement - "Avoid 'technical' companies with special characters in their name", it's just not right...
If putting spaces in file names makes you queasy, try punctuation - especially punctuation like semicolon or ampersand or single quote that's meaningful to shells and such. <shudder>
Honestly, this still causes a lot of problems with some Software. I've had friends asking for help with obscure errors that were ultimately caused by the files they were using being on a path that contains a space or special character.
Shells are indeed the main culprits for the continued fear of spaces, but not the only ones. A lot of programs that deal with "metadata" which will then generate database tables and stuff like that, still struggle when working with any sort of special character. And the same for anything that, behind the scenes, just feeds text into regexes.
Our local development environment has evolved to a complex enough sequence of steps to set up and troubleshoot that I spent 2 weeks creating tooling that you can simply point at source checkout locations and the tool will take care to setup that repo.
It broke on the first try on a jr hire's machine, the source checkout location was `C:\source code`.
Slightly off topic but I find myself stuck at being "please for the love of god don't use spaces in git branch names" old. Anno dazumal this might not even have been an issue and I'm just cargo culting.
And on that topic, git branches are case sensitive but windows filesystem API isn't. Git branches are materialized on the filesystem as files and directories.
If people actually abuse git branches being CS, odds are good they're also abusing CS in the repository content.
The linux kernel is one of the offenders, if you check it out on Windows or macOS (which supports CS but remains CI by default) you'll immediately get garbage in netfilter, because it's an habitual user of having different files with names identical but for the casing e.g. xt_TCPMSS.h and xt_tcpmss.h.
Yep, I recently got bit by this, someone checked in a branch named something like "x<-->y", Windows was unhappy. I think this is a git bug: git should escape these names for the native platform.
I enjoy choosing fun branch names from time to time. A few of them: Russian when a user reported a typo in a Russian translation; emoji (mostly added emoji rather than pure emoji); and my personal favourite, a ~250 character diatribe about a single-character bug I was fixing (~250 after I discovered that Git’s error messages when you cause it to try to use file names too long for the file system are fairly mediocre).
Where I used to work they had a risk system that created directories on the window server that matched the book name. They had a trader that named one of his books "COM1"...
Spaces in file names break half of the shell scripts I have encountered.
And it is one of the biggest reason I hate Unix shells as programming languages, it is a minefield. In fact I think that after a dozen lines, Perl is a better option. It has most of what shells are good at (i.e. running commands), but saner and more powerful.
my god, I was simply trying to loop over every file in a dir and zip it in a bash one liner. Of course, some of the inputs had spaces in the file names. What an exercise in frustration!!!
Remember when we put + instead of %20? Spaces in URL's are still a nightmare IMO. I still get strange access log entries where some encoding went lose, especially in heavy Javascript enviroments.
Same goes for capitalisation. All filenames should be lowercase.
Maybe it's not strictly necessary, it can avoid headaches.
But nice for testing. I spend a few month on Windows while doing a Django project and found a number of bugs no one else discovered because they used Mac or Linux.
And a : in a file name at the GUI level gets turned into a dash! I just tried to name a text file "Foo/Bar 10:01.rtf" and it changed it to "Foo/Bar 10-01.rtf"!
Ah so that’s not really putting a slash in the name on disk - finder is just displaying the colon that way - it substitutes with a colon for historical reasons that have to do with pre OSX MacOS (but you can see if you create a file from a program or the command line with a colon in it, it will display as a slash in finder). It shouldn’t cause any problems on its own on the system - but the colon is troublesome if you have to interact with DOS/Windows lineage machines.
It's not a matter of being afraid, spaces in filenames are annoying.
I mostly use the shell and navigating in directories with spaces is annoying, you have either to quote it or put a \ before each space. You also have to remember to quote everything, and in bash that can become complex, you start adding quotes everywhere to solve problems caused by spaces (or other special characters like *) in filenames.
So I prefer to not use them, a simple _ is as readable as a space. Only thing is that spaces gets rendered better on graphical file managers, but... that could have been solved (and can still be solved) by simply adding an option to render a _ as a space graphically if there is no ambiguity. I don't care that much since I don't use graphical file managers that much.
Maybe it's just me, but it always seemed like prohibiting spaces and other special characters was a reasonable way to avoid unnecessary complexity (and the bugs that accompany it) when parsing and navigating directory trees and files.
I'm old enough to remember working with 8.3 filenames in DOS, and while the length limitation was maddening, the space part never was. Then Windows 95 came out and all restrictions were thrown out.
Why couldn't we just have a file system that robustly supports long filenames, including variable length extensions, while prohibiting certain special characters - namely spaces, slashes or any directory denoting characters in files, and characters that have special meaning in regex context? (brackets, asterisk, etc.)
By coincidence, I found another reason just two days ago. A web app lists uploaded files’ names, and (in a rarely used context) lets the user search for them. One user has copied a file name from the web page, and pasted it into the search box, but got no results. Turned out that the file name contained two consecutive spaces, which the browser turns into a single space, hence no match. Every layer between the user and file system can do something unexpected.
I nearly gave up on learning newer front-end JavaScript stuff like React & webpack and so on a few years ago because of spaces in paths.
node-gyp doesn't like it when there's a space anywhere in your working path. Stuff I was messing around with was all in ~/Code Projects at the time, and using npm install on some things just broke. Looking back, I definitely could have done a better job parsing the error messages but still...
Tangentially, I frequently add dates to filenames to keep things organized. And _always_ in the `YYYYMMDD` format for clarity and technical reasons; `DDMMYYYY` (or God forbid the Americans' `MMDDYYYY`) never made much sense to me.
Or, one could claim that the poor parsing of a text interface shouldn't dictate the for-human names of files, especially when an exceedingly small percentage of users deal with that text interface.
But, of course, if you mix the abstractions of metadata (filename) with location, things won't be trivial.
Even if libraries all handled it, I’d still personally avoid spaces because spaces get semantically used to separate tokens and I see file names as tokens.
Hey look, HackerNews just broke a URL with spaces in it. And it's written in a Lisp dialect and all; it's not some Unix job cobbed together with shell, sed and awk. The language has a string data type, and strings are passed to functions without word-breaking interpolations taking place.
You know what else breaks on spaces? Basic everyday gui text manipulation.
Suppose that in a block of text we have the sentence:
> Please look for the Holiday Schedule 2021 file.
If you double click on any part of the name like Schedule, pretty much every text widget on the planet will just select only that word, and not the entire filename.
However, if you have:
> Please look for the holiday-schedule-2021 file.
There is at least a ghost of a chance that a semi-intelligent GUI can pick that out as a word.
There exist good reasons to keep identifiers as clump beyond just command line shells.
It's why we need encoding like %20 in URLs that never pass through a shell script.
I don't use spaces, because I want to be able to run ad-hoc shell one-liners when working with my data without worrying about quotation and similar stuff.
I also don't use :, as I have ran into problems with both Bash and its completion and FAT FS. Unfortunately, I routinely have timestamps in filenames, so I need to use +%F-%H-%M-%S instead of simple +%F-%T.
One thing has improved, though: I have not run into problems with ěščřžýáíé (which my language is full of) for maybe a decade, except on OpenWRT where space seems to be scarce to support non-ascii.
Edit: I now remember one problem, getting images for a website from an OS X user, which used combining characters instead of direct code points (https://en.wikipedia.org/wiki/Unicode_equivalence#Example), but HTTP requests got normalized in some browsers, leading to strange 404s.
That's funny because the first operating system I used (Apple DOS 3.3) was very liberal about file names. There was a 30-character limit which was a lot, and it didn't mind spaces in file names. Even control characters were fair game, which made things fun when you accidentally inserted a ^A in a SAVE command.
File names shouldn't have anything except a-z,0-9,_ and perhaps a -. No unicode, no spaces, no nulls.
It's not fear that keeps me from using spaces in file names, it's habit.
If we're going to play this dangerous game, from now on I'll figure out how to use nulls (\0) in my file names, and make all the C/C++ programmers cry.
I don't use spaces because it's so much faster to type filenames out (including with TAB-completion) in the terminal.
I do, however, use Cyrillic (UTF-8) in filenames, and I regularly try out if moving a file into ASCII-path will let some programs open it (half the time it's that when I am having trouble).
It's just such a pain in the butt to work with files with spaces. In a script it's fine b/c I just surround it in double quotes, but on the command line I hate having to escape the spaces.
This might already exist, but I wonder about a terminal that was really just a multi-line repl to a language. It would be preloaded with libraries that replicated all the features of the gnu core utils, but instead of calling grep like normal, you called a function like grep("args"). The advantage would be that you had access to a full blown programming language at all times. So when you needed to do something more complicated you would still have access to all the standard language features. And when you didn't need that, your canned core utils like functions would work
Coming from web-heavy and perl5 backgrounds, it's insane to me that people don't treat filenames and arguments and environment variables as tainted user input, and just blindly trust properties about them like "does not contain whitespace or control characters".
I had to move my development folders because you can't develop Android apps if your project path contains a space. Not sure where the issue is, if it's gradle or something else.
Edit: thinking about it again, it might not have even been the space but the exclamation mark in my path. Or both.
If any of you reading this have to deal with very large scale data pipelines for data science / ML type processing, and if "don't use spaces and weird chars in file names" hasn't become second nature by now, let me just say: you are very, very brave.
My first job as a SW Eng was in 1989 in the nuclear industry. Our folders and files were limited to 8 letters. So names were effectively acronyms. It was actually pretty awesome. Clean and concise. Years later, I still remembered the whole folder structure.
If you're in tech long enough, you can be traumatized by anything. Like the time a vendor-supplied system decided after an update that nothing could have a hyphen in the title, and a lot of existing content just... broke at once. Fun times.
I don’t think it’s fair to claim that any Make implementation supports spaces: there are too many fundamental bugs and breakages, so that lots of rather important Make functionality is off-limits if any of your file names will have spaces.
https://www.cmcrossroads.com/article/gnu-make-meets-file-nam... explains the situation in GNU Make in 2007 (and I don’t think it’s changed since then, though jgrahamc especially could correct me). Not being able to use such features as $^ and $(patsubst) is severely debilitating for all but the simplest of makefiles.
Not exactly spaces, but I have been bitten by something like this at my work quite recently. A Confluence page with special characters in the page title was working fine for a while. At some point there was a Confluence version update which made the page URL broken (and apparently unrecoverable, or at least not easily recoverable).
One way to look at it is that people of a certain generation eschew spaces because the tools of their formative years simply couldn't handle spaces - but another is that the olds have learned that generally erring on the side of KISS ("Keep it simple, stupid!") isn't a bad idea.
Software engineers - particularly of the more embedded variety - absolutely still have this problem.
The main culprit is GNU Make which does not cope with spaces in filenames. As far as it is concerned an array is a string separated by spaces so it gets very confused. Yes there are some partial workarounds, no none of them consistently work. You learn very quickly to check all code out in a file tree with no spaces in it, otherwise builds can randomly break in strange ways. It's not always clear up front whether Make is going to be involved somewhere in the build, so it's just easier to be safe.
My username has been my name which has an accented character and has broken countless Windows apps every year since forever, so I just keep a C:/Programs folder where I run stuff. You should never not fear filenames.
I am overly aggressive with spaces and special characters in filenames: I use them everywhere and report a bug when they cause errors, because they shouldn't in this UTF-8 age.
I still don't use the special character of my name in my username because that has caused me many hard to fix troubles. Think "cannot recover user password because this user doesn't exist".
I use c:\programs too, but for different reasons. C:\Programs is for portable applications that don't get installed, can be directly overwritten, and consist of at most two files with relevant names. As a bonus, I can run such programs directly from the run menu. C:\Programs\procexp for Process Explorer, for example.
I recently find out a windows folder can't end by a space..
But python for example you can create this folder 'example '
every file you create in this folder will be inaccessible, and impossible to delete.
I've never created a filesystem entry name with a space. Mainly because fear and when fear is not proven, "\" looks so ugly. But I think I'm even worse, I dislike capital letters too.
Nothing old about that; lots of stuff is still broken. What are the odds Homebrew works if installed to a directory with a space in the name? Maybe the core brew manager itself, but all the packages?
I tend to follow a Postel-like system when it comes to this. When I write a script I'll usually get paranoid and make at least token efforts to handle spaces. Which I will then never, ever use.
I have come back to this thread, which I have spotted and forgotten something like two days ago, to say that just like a minute ago one of new Jenkins jobs that I added failed because I named the item using space and some custom Gradle/Maven magic tool failed to load one of its own auto generated files (I could tell that space was the culprit because error message printed only second half of item name).
How can I not be afraid of spaces if this happens like every other day with every other custom tool ...
Browsers will take http://example.com/some name.pdf and automagically turn it into http://example.com/some%20name.pdf, and deliver the goods without a problem. But having that space in the URL is still out of spec, and will cause your web page to fail validation, even though it works fine.
Depending on the specific issue, the `subst` command may help you. If the OneDrive folder itself has a space in the path, or a necessary subfolder does, you can give that folder a drive letter instead.
And honestly, it's a good fear to have; there are contexts where it still just doesn't work.
Last I checked, the standard answer for GNU make is "Spaces are expected to break the tool, that's working as intended, it will never be fixed." And because we build our towering edifices of software on the pillars of the past, I can't guarantee to you that a project of arbitrary complexity won't try to cram a list of filenames through a make script.
I don’t think this is so much an age thing as a programmer thing. Old people will still name files all sorts of things, and a lot of young programmers today avoid spaces.
If you're developing on Windows, I find a good way of dealing with this to convert paths to short format before using them (E.G. GetShortPathName in kernel32.dll).
Not afraid, but typing a dash in the terminal is easier and shorter than typing a reverse slash and a space. Spaces are kind of a pain in the ass in the terminal, tbh.
This seems like a case for an axiom I hear infrequently, but I think comes up a lot - things that seem like they should be simple and easy, but are in fact difficult.
I must be nightmare customer, because I've always been exploiting my ability to use filenames in full UTF-8. I'm that guy that sends .pdf to your website.
Why stop here? Why not put spaces in your variable names also? Allowing spaces only in file names and not in variable names is short-sighted when not inconsistent.
As a software engineer, I require testing of paths and files in spaces, and forbid the use of spaces for any system generated file possible to make cli easier.
I do it the other way around. I used to be afraid of spaces. But I have come to realize that it is better to learn sooner than later which pieces of software is in such a bad state that they aren’t handling spaces correctly.
That being said, even after all these years I sometimes need to try a few times in order to get the quoting and the escapes right when communicating names of files with spaces through multiple layers of software.
I like to store data on USB flash drives. After being left to mature for a few years in a humidity and temperature environment, you get some really interesting and complex byte streams where your original file names used to be.
Often they are not even valid UTF8 which, when you uncork the filesystem for the first time in a decade causes the most delightful crashes. The more years the better the aroma.
Today, WSL will try to add PATH in Windows to PATH in Linux. So if you install something like NodeJS in Windows, and run node in Linux, it will try to call /mnt/c/Program Files/nodejs/node.exe and say "no such file or directory: /mnt/c/Program".
I had half a feeling that the warning against using spaces in names pre-dates computing, but after a little research into library call numbers and archive accession numbers, which turn out to have both historically included spaces, I have found no evidence to support this feeling.
It seems to me that many of the problems associated with spaces in filenames are due the OS assuming that a space signals the end of a command or filename.
Maybe we ought have to a different character signify the end of a name? Or signfiy a option section, or the next option section of a command?
And I'm older than Google. If you want some hilarity, newlines are allowed in filenames as well (\n, \r, \r\n). Try getting bash to handle that! (It's possible, though annoying. try redirecting to `while read line` in addition to xargs -print0 hackery)
I've never had any problems with this. At this point, it's second nature for me to either use underscores for spaces, or camel caps if there aren't any single character words like 'i' or 'a' in my desired file name.
Yes, but working with filenames with spaces in them is a huge PITA in command-line tools, because you have to quote everything. The ergonomics is just really annoying.
Personally I wish console shells had chosen another delimiter than space, but here we are.
Yes, you have a point there, but in this case would being liberal in what you accept be to accept filenames with spaces or (arguably) doing filename handling correctly (ie accept filenames with spaces)?
I'm apparently in the minority of people who know how to write shell scripts that have a chance of working correctly with filenames with spaces in them... and that's not the only reason I avoid spaces in filenames. :)
Reminds me of the time I watched a coworker's head explode when he tried to extract an archive (from a 'Nix environment) on his Windows machine and was indignant about getting duplicate filename errors.
I work in Azure Data Factory, and there are places where a space in a name will cause you difficult to troubleshoot errors. But I can never remember where. It's not universal. So I just avoid them entirely.
So, born today, eh?—says the guy who still regularly runs into build scripts that cheerily command that they be run from directories without spaces, since that's easier than proper quoting in the script.
The meta point here is that spaces are the type of thing that work fine ... until they don't. This class of bug is best avoided entirely, especially if there is an easy workaround (not using spaces).
But it still breaks in so many situations and becomes a pain in the ass in so many other ones! I HATE people who use spaces in file names. For me it is a sign of a "deeply nontechnical person".
Except nowadays I worry more about user names that get fed into collaborating applications (with different edit criteria) and password characters (again for systems with differing, strange edit rules.)
No way I would put anything but a-z, 0-9, and underscore in any file name. Too many stupid ways it can go wrong. I guess I have very little trust in my fellow programmers!
i still get issues with old one-off scripts, that still work, and I forgot to properly quote stuff... plus the urls are pain in the ass with the %20;s.
Our tool has no issues with spaces in fields, but we still advise users not to do it because other systems OFTEN STILL DO, in the year of our lord 2021.
I try to avoid spaces and special characters because issues still happen to this day (just yesterday, I had an issue with a file with an accent in it).
It's worse than that. Whitespace is a hellish invention in the world of computers: there are multiple characters that may or may not render as whitespace with no way to distinguish them by just looking at the output.
Yet to the machine (script, shell, program, ...) it matters a lot, since u0020≠u0009≠u00A0≠u2000≠u2001, etc. whereas the aforementioned codepoints render like this: " "
(and yes, that's indeed the five codepoint in that order - at least I typed them that way).
(Ab)Using whitespace like that can lead to all sorts of funny business, not just when dealing with shell scripts and variable expansion.
Anyone else totally fine with spaces in filenames? I use to rip a lot of CDs back in the day, and never had an issue with the spaces in the file names.
01 - Metallica - Metallica - For Whom the Bell Tolls.mp3
I work on a complex desktop application, and it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names. If you do anything with subprocesses or path processing, it's absurdly easy to hit in a thousand different ways, over and over again.
Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
Forces you to deal with this properly, and immediately ensures that every automated test checks this case without you having to remember every time. Hasn't been particularly inconvenient, since I'm autocompleting it 99% of the time anyway, and I haven't shipped a single path parsing bug since.
While I agree that we should do this in the ideal world, doing so will inevitably break other necessary tools so it is unworkable for me :(
Seems like MS had the same idea according to an answer in the link:
> Microsoft intentionally made programs install to C:\Program Files on Windows 95+ to force programmers to deal with spaces in filenames.
And yet they introduced C:\ProgramData in later versions.
why "yet"?
one occurrence is enough to make devs care about it
Imagine if they made programmers put 64 bit DLLs in a "System32" directory and 32 bit DLLs in a "SysWoW64" directory. That would really keep 'em on their toes!
You should look into the behavior of the /windows/sysnative link. It appears and disappears depending on whether your process is running as 32 bit or 64 bit.
Programmers should never put DLLs in those folders... Or even ever touch them.
Except for \Windows\System32\drivers\etc\hosts, of course.
I occasionally try to search for the reasoning behind the location of the hosts file in Windows, and I always come up blank.
Maybe it's from back before Windows had a built-in TCP/IP stack? If it were a third-party/optional driver, having files related to it in a path under system32\drivers would make sense.
Back around Win 95 when they added networking it was based off of (IIRC) BSD's TCP stack and related tools. They were an optional 'third party' driver of sorts, but shipped by the first party. I'm not positive about WinNT or Win3.11 (for workgroups?)
I remember adding "trumpet winsock" to Win 3.x back in the day. Says '94 for that, and summer of '93 for NT 3.1 debut:
https://en.wikipedia.org/wiki/Trumpet_Winsock
https://en.wikipedia.org/wiki/Windows_NT_3.1
They originally copied BSD's network stack, IIRC
https://superuser.com/questions/355297/why-does-windows-have...
They already hamstrung themselves with LONG because DWORD just wasn't good enough and now long can't be 64-bit either.
I wish they did "User Files" instead of "Users" too, because so much software breaks on the home area having a space in it.
Not least, it makes writing scripts for various shells and getting the quoting rules right an absolute pain as well...
They used to. The folder was called `Documents and Settings` until Win7.
"Documents and Settings" still exists on Windows 10, as a soft link to "Users".
You know, this makes me wonder.. tangentially speaking- I wonder how hard it would be to rearrange the folder structure in linux so that I have something like this:
/Users/{root, user0, user1, ... }...
/System/{Logs, Apps/{opt, container, ...}, Temp, Conf ...}...
/Devices/{Mount, sda, sdb, null ...}...
/Boot/...
GoboLinux does exactly that: https://en.m.wikipedia.org/wiki/GoboLinux
Wow, thanks for the reply, nice find! I did some poking around on my Linux system and even re-arranging the home folder was a task of its own because the system kept trying to replace folders in their original places. I will do some digging in to Gobo and see how they're handling this. Thanks again for pointing this out.
This is the file that you want:
That helps, but be warned that there are still programs running around that just hardcode their paths
Cries in nixpkgs
(Anyone who tried to package a program that hardcodes the “usual” binary paths know the pain)
Doesn't nix itself hardcode the nix store path though?
Afaik there is an option to change it, but it is not advisable as that will break the binary cache and you are left with compiling everything yourself. This is due to a technical limitation in that different packages can contain paths everywhere and thus they are inherently part of the resulting hash, on which other packages can depend.
You’re clearly a more capable user than me, but even so, take care. The time I accidentally moved /etc has scarred me for life.
Early on in my Linux-using-life I made the mistake of deleting /etc. That was a learning experience like no other :)
Since Live CDs/Flash drives were invented, I wouldn't worry about this stuff any longer. Certainly have your personal files in a centralized location and backed up first.
Probably the easiest way to experiment these days is to create a VM and make snapshot, then start knocking down walls, just to see when and where the house collapses. Then revert and try something new.
There's a computer game that deletes random files when you make a mistake or lose.
There could be a competition!
A couple of weeks after moving to UNIX from MSDOS, I thought I'd remove lots of unnecessary 'dot-directories' from the /tmp directory. I was root as I had no concept of being a 'normal user'.
So I ran two simple commands:
and wondered why it was taking so long. <grin>
At least that doesn't happen today anymore. From bash:
> When a pattern is used for pathname expansion, the character ``.'' at the start of a name or immediately following a slash must be matched explicitly, unless the shell option dotglob is set. The filenames ``.'' and ``..'' must always be matched explicitly, even if dotglob is set.
I did that on my NAS a few years ago. I had copied in a bunch of directories from a mac and they all had tons of dot files in each dir that were showing up on my windows machines. I popped open a terminal and did the exact same thing and wiped most of the NAS out =P Good thing I had it mirrored with my other synology.
Dammit, I wanted to be the one to mention gobo linux [HN deleted my laughing emoji ffs]
Beat the system ʕ•ᴥ•ʔ
How do you deal with lack of being able to just point to "/usr/lib/include" or other things when saying "here's my directory of shared libs"?
This is definitely interesting though, and an improvement I would say
GoboLinux symlinks everything into an FHS-ish structure under /System/Index/ so you still have a single place where binaries/libraries/includes/etc. live. (There are also symlinks from /usr/lib, /usr/bin, and others into /System/Index/ for compatibility with programs where those might be hardcoded.)
That actually seems like some low hanging fruit to go on a commit spree correcting code that hard codes paths
GoboLinux is old enough to vote in most countries.
So either those low hanging fruits are higher than they seem, or we're all just a bunch of dwarves.
My bet is on the second option.
It is low hanging fruit as far as the software is concerned. Simply parameterize all paths.
Will upstream accept such patches though? Sounds unlikely to me.
You monster!
Don't even get me started on /usr/local/bin..
You mean "Start Menu"?
macos does something like that.
I mean we're heading there with /usr being your /System. Redhat/Pottering are doing heroic work in this space.
The only real holdouts are proc/sys/dev which are the kernel and mnt/media/opt/srv which are really for the user/sysadmin and aren't really used by the OS anymore.
Genuine question: on what systems is `/tmp` persistent? Both macOS and Ubuntu 20.04 clear `/tmp` on every reboot for me, and I haven't changed the defaults at all.
All storage is temporary. You just gotta wait long enough.
People don't reboot often. Persistent tmp basically means it will be cleared in an infrequent manner, so the likelihood of it going away 1s after you release your file handle is low.
"Persistent Temp" should be /var/tmp. "Persistent Temp" is also an oxymoron.
> "Persistent Temp" is also an oxymoron.
It's not an oxymoron to have files which are temporary but not limited in scope to a single power cycle. For example, you could have a long-running process which you want to be able to resume if it's interrupted; /var/tmp would be an appropriate place for the state. The data is temporary because it will be deleted once the process is finished, but you wouldn't want it wiped out by a system reset. Generally /tmp is cleared at every reset, and is often a tmpfs mount, while files in /var/tmp are automatically cleaned up only when they reach a certain age.
Except that the FHS says that "data stored in /var/tmp is typically deleted in a site-specific manner", and as an application vendor you have no control over that site-specific clean frequency. On all my systems, /var/tmp is a symlink to /tmp and that has never caused any issue.
The FHS is not wrong; cleaning policies are indeed site-specific and files placed in any temp directory can in principle disappear at any time. (Though, in theory, it's not supposed to happen while the files are still "in use" by running programs.) Still, historically you could count on files in /var/tmp lasting longer than files in /tmp, including across reboots.
Nothing will immediately break because you linked /var/tmp to /tmp. Whether it causes issues depends on the programs that you (or your users) run and how they make use of /var/tmp. However, if someone did have to restart a long-running process from the beginning because recent state information in /var/tmp was not preserved across a reset, I would say that is a problem with the administration of the system and not the program that stored its state there.
Basically no one uses /var/tmp for anything (and nobody should either). World writable directories are a mistake and only continue to exist because apps assume they are available.
/tmp and friends are poorly named. They really should be /shared or /dmz or /freeforall or something.
* If you need service-specific tmp space use RuntimeDirectory or PrivateTmp if your app is hardcoded to /tmp.
* If you need service-specific persistent data that goes in /var/lib/your-app.
* If you need temp space for your user it's at /var/run/user/your-uid.
* If you need more than one user/service to share files but not everyone then god have mercy on your soul because all options are bad. There sure are a lot of them but none of them are at all satisfying.
> Basically no one uses /var/tmp for anything
Gentoo does, at least by default: https://wiki.gentoo.org/wiki//etc/portage/make.conf#PORTAGE_...
Right, /var/tmp is the "Persistent Temp" directory, and /tmp is "Ephemeral Temp". The /run directory is for runtime data such as PID files, Unix sockets, named FIFOs, and generated systemd units—it has a specific internal structure and shouldn't be used as a direct alternative to the relatively unstructured /tmp directory. While both are generally ephemeral tmpfs mounts, only /tmp is writable to all users.
I'm not sure I'm a fan of the capitalization and spaces, other than that I'm all for more self-explanatory names.
Why not? That's how proper English text is written. Of course there are many programs that can't handle it properly (or handles it inconveniently) so in practice it might be problematic at times, but otherwise I see nothing wrong with it.
Generally just because typing it out with tab completion in zsh sucks, and I don't see a good solution (if it was solved nicely it'd be solved already)
Why compare with English? It's computer domain, it's not a book or a poem. It should be clear and unambiguous.
Caps are annoying to type, and difficult to remember (Do You Caps, or Do you caps, or DO YOU CAPS, etc).
Spaces are nuisances that bring no benefit. At best we should use non-breaking space for filenames, but that would be even more atrocious.
This is what I want from Linux. Sensible & guessable names for newcomers to figure out where to put files and programs.
It's frustrating having to spend time to decide whether I should install a program in /var or /opt or /usr. What do they even mean!
So, I disagree with this convention altogether and use /apps or ~/apps now.
Behold! https://en.m.wikipedia.org/wiki/Filesystem_Hierarchy_Standar...
Yeah, except that tells me nothing useful... The question is exactly the same: So where do I install this random binary I downloaded from the internet or compiled myself? Is it /opt, /usr/bin, /usr/local/bin, or /bin? Where do I put the dependencies I compiled for this software - /usr/lib, /usr/local/lib, /lib, /opt/lib, /opt/<app name>/lib, or what?
Wherever you want. All of the above, or none. It really is up to you.
That's exactly the problem. This leads to mess. The Windows model of C:\Program Files\<app name> is much better.
No, it frees you to pick whatever unmessy solution you want.
You can do `configure --prefix=/Program\ Files/<app>` if you want.
If I am not writing all of my installation scripts by hand, because that would be really intense, then every folder gets filled with random bits of software.
Offering too many similar choices leads to mess. There's nothing fundamentally different between using one or more of these options and using the only option, except that in the second case there isn't any opportunity to make mess.
> You can do `configure --prefix=/Program\ Files/<app>` if you want.
Thanks for the tip! Can't do that with distro repo software though :-/
> then every folder gets filled with random bits of software.
What does that even mean? When you install something, you put it where you want it.
If you don’t like where your distribution puts files, choose a different one. Not all of them use the same convention.
All (except aforementioned GoboLinux) use FHS.
Use Gnu Stow to keep the random bits contained in their own app directory that is symlinked into the /usr/local tree. Then you can manage everything without leaving orphan files behind.
Very cool
Okay, but what about ProgramData? I have enough programs that put their junk in there instead of Program Files, and others that make their own directories on the root of the drive (driver installers are really bad about this).
I think the best model I've seen for consistent binary locations is the 'Applications' folder in Mac OS X, but it fails as well by retaining the /usr/bin elsewhere.
When you download a portable app (just a bare .exe), do you make a folder for it and drop it in program files? (quite possible, you'd just be unusual) If not, why does Windows get a free pass?
But why are many Windows programs under C:\Windows\System32 then, if Windows has only a single model? Why aren't all Steam-provided (for example) games in a single location? Or, if they are, does Windows really have a single model?
Yes, the Linux/POSIX model is confusing, but the split is to segregate administrative domains:
- / and /usr are the domain of the distribution. As a user, you should never install there. The administrative group is root.
- /usr/local is the domain of the machine admin. If the machine is yours to manage, you can install software there. The administrative group is staff.
- /opt/$vendor is the domain of third-party vendors. Each vendor (like Steam, Eclipse, Arduino Studio) can get its own subdirectory and its own administrative user group.
How would you achieve the same on Windows? How do you make sure the Adobe updater can only install new versions of CS, but not surreptitiously install a new (free!) spyware package under C:\Windows? How would you allow certain power users to share one Google Chrome installation, allow each of them to update it, but not let them install additional software system-wide?
Except instead of config files, Windows has the registry.
Also, as mentioned by the siblings to thia comment, the 'mess' has a purpose, and is less messy than it appears.
Want to manually install something? Into /usr/local it goes. Done.
The only way to handle this that I've been really impressed with is Mac's "Applications" folder. Unfortunately, I dislike most other things about Mac.
I was taught /usr/local/bin
/opt is for standalone packages, so if it’s a single file, no.
/bin is only for stuff needed on single user mode, so probably not (unless that’s what the binary is for.
/usr/bin is going to typically contain files installed by your package manager and should probably be left unaltered by human hands.
The deps I would assume /usr/local/lib but it hasn’t ever come up for me.
Fun fact: Debian is working towards[1] and Arch already has merged / and /usr. /bin is a soft link to /usr/bin and similarly with sbin and lib.
[1]: https://wiki.debian.org/UsrMerge
To add: when you install software yourself you choose this, when your install software from e.g. a distribution package it is chosen by the package maintainers, and to a larger extent the maintainers of the distribution.
This is one of the big advantages of using a pre-made advantages of using a ready-made Linux distribution: beyond the convenience of having an installer or easy to install packages, you get some assurance that the system as a whole has been thoughtfully put together.
Arch Linux for example symlinks /bin and /sbin to /usr/bin and /lib to /usr/lib among other things.
Is your account the only account that's expected to run the binary? If so, then `$HOME/bin` is a perfectly acceptable (albeit not standard) place to put it.
If you expect other users to be able to execute the program, then you should put it in either `/usr/bin` or `/usr/local/bin`, depending on whether the former is already being used by a package manager. `/opt` is generally for self-contained software that doesn't play nicely with the rest of the system, but might still be installable through the default package manager.
$HOME/.local is the equivalent if /usr/local for per-user stuff.
I don’t think there’s any “official” word on that (the XDG spec that defines ~/.local/share doesn’t mention ~/.local/{bin,lib} IIRC, and the traditional per-user entry in PATH seems to be ~/bin), but a fair number of people use it this way, yes, including me.
I started out using $HOME/bin, but a fair amount of stuff assumes a /usr- or /usr/local-style folder structure when doing make install, so I've settled on using $HOME/usr/bin instead, so that programs can create $HOME/usr/include and $HOME/usr/share and whatever, without trampling on stuff in my home folder.
Can't remember the last time I had a problem arranging this. If using autotools, which covers 95+% of stuff, it's usually a question of something like "./configure --prefix=$HOME/usr".
(If I want to share stuff between users, /usr/local/ is of course a better place. macOS is a bit more restrictive, so I have a separate user for this, whose /usr folder is readable by everybody.)
Yeah, it definitely gets hairier when using anything that's more than just a drop-in binary.
> $HOME/bin
On freedesktop systems there's the ~/.local directory which is supposed to be a mirror of the file system hierarchy. Seems like a good place for bin, lib, include directories.
The standard is, indeed, excessively vague because it was written to let many existing implementations be conformant as is, though I’d say it’s still more helpful than many other standards with that deficiency. There’s a method to it, however:
- Things installed in /, if it’s different from /usr, are generally not to be touched;
- Things installed in /usr are under the distro’s purview or otherwise under a package manager, any modifications are on pain of confusing it;
- Things installed in /usr/local are under the admin’s purview and unmanaged one-offs, there are always some but overuse will lead to anarchy;
- Things installed in /opt are for whatever is so foreign and hopeless in not conforming to the usual factoring that you just give up and put it in its own little padded cell (hello, Mathematica);
- Everything is generally configured using files in /etc, possibly with the exception of some of the special snowflakes in /opt; the package manager will put config files meant to be edited there and expect the admin to merge any changes in manually, and sometimes put default settings meant to be overridden by them in /usr/share (see below)—both approaches can be problematic, but the difficulty is with migrating configuration in general, not the FHS as such.
There used to be additional hierarchies like /usr/X11R6, and even a /usr/etc on some (non-Linux?) systems, but AFAIU everyone agrees their existence makes no sense (anymore?), so much that even FHS doesn’t lower itself to permitting them.
The distinction between / and /usr might appear to be pointless as well, and nowadays it might be (some distros symlink them together), but previously (especially before initial ramdisks were widespread) stuff in / was whatever was needed to bring up the system enough that it could netmount a shared /usr.
Inside each of /, /usr and /usr/local there is bin for things that are supposed to be directly executable, whether binary or a script and all in a single place; share and lib for other portable and non-portable (usually but not necessarily text and binary) shared files, respectively, segregated by application or purpose; finally, due to the dominance of C ABIs and APIs on Unices, the top level of lib also hosts C and C++ library files and there’s an additional directory called include for the headers required to use them. Some people also felt that putting auxiliary executables (things like cc1, the first pass of the C compiler) inside lib was awkward so they created libexec for that purpose, but I don’t think the distinction turned out to be particularly useful so not all distros maintain it.
That’s it, basically. There are subtler but logical points (files vs subdiretories in /etc) and things people haven’t found an obviously superior solution for (multilib and cross environments), and I made no attempt to be historically accurate (the original separation of / and /usr happened for intensely silly reasons), but those are the fundamental principles of the system, and I feel it does make sense as a coherent implementation of a particular design. Other designs are possible (separation by application or package not purpose, Plan 9-ish overlays, NixOS’s isolated environments), but that’s a discussion on a different level; the point is that this one is at the very least internally consistent.
Re the unfriendly names ... I honestly don’t know. Newbie-friendliness matters, but it’s not the only thing that does; particularly in a system intended for interactive text-mode use, concise names have a quality of their own. There’s a reason I’m more willing to reach for curl and jq rather than for httpx and lxml, for regular expressions rather than for Parsec, and even for cmd.exe, as miserable as it is, rather than for PowerShell.
I feel weird that no HCI people seem to have seriously considered the tension between interactive and programmatic environments and what the text-mode user’s experience in Unix says about it, but even Tcl, which is in many ways a Bourne shell done right, loses something in casual REPL use when it eliminates (as far as idiomatic libraries are concerned) short switches. Coming up with things like rsync -avz or objdump -Ctsr is not very pleasant initially, but I certainly wouldn’t want to type out the longhand form that would be the only possible one in most programming languages (even if I find their syntax beautiful, e.g. Smalltalk/Self).
Thank you for the thoughtful reply, the point about netmounting shared usr makes it much easier to understand.
>the original separation of / and /usr happened for intensely silly reasons
As I recall, there were very good reasons for separating / and /usr (as well as /home and /var). The biggest one was that various Unix kernels would panic[0] if / was full. But that issue was almost universally fixed by 1990 or so.
And netmounts of pretty much everything other than / were pretty common for many years, due to the high cost of storage.
So no, the reasons weren't silly, they just don't apply to more modern systems.
[0] https://en.wikipedia.org/wiki/Kernel_panic
OK, I didn’t put this completely correctly. The original separation of /usr to hold user home directories (!) and / to hold everything else was because the first RK05 disk ran out, but it makes sense in any case. The additional hierarchy under /usr was created some time later when space on the first RK05 disk ran out again, and while this can be a perfectly sensible decision for a single installation on a single site, taking it seriously decades later is silly. Neither does that mean that there weren’t good reasons the split got preserved in subsequent systems, just that they couldn’t have been the same as the original ones; there are no netmounts in V6, after all.
(I have an old Unix intro book that describes /usr as user home directories, the rest is a second-hand retelling[1].)
[1] http://lists.busybox.net/pipermail/busybox/2010-December/074...
Interesting stuff. Thanks for sharing it!
> So where do I install this random binary I downloaded from the internet or compiled myself?
In your home directory.
Where?
See my post to this thread at
https://news.ycombinator.com/item?id=29198222
Follow your distribution. For example Arch Linux provides PKGBUILDs for official repos and AUR. Most of the time someone has already published PKGBUILD, but if not I just patch accordingly.
And conditions that formed separation are long gone, Arch Linux symlinks most of it:
I've read that a handful of times (whenever trying to figure out where to put some new random thing), and still have never come to a clear conclusion. Even better, because there are so many similar places, you might choose completely different ones depending on the day of the week and your current mood.
Too much choice for things like this is harmful IMO. Deep down I truly couldn't care less where the files end up, as long as that place is the 'right' place. There are too many 'right' places which makes it hard to find random things at a later date or when on a box you're not super familiar with. It's also a complete waste of time to think about it at all.
It’s not just you: Every distro is its own special snowflake and patches the programs they distribute to store files in a different place.
The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
> The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
The FHS, does actually answer both of of those questions. Files inside /etc/ should be grouped in subdirectories[0] andd the HTTP server should serve user-specified website files from /srv[1] and normal distro-provided files (such as the apache test page) from /var[2].
[0]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s07.htm...
[1]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s17.htm...
[2]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05.html#p...
"use subdirectories" is probably the most handwavey answer possible, aside from maybe "just put it somewhere, lol". I feel like the standard could provide some sort of guidance on how to name folders or something.
> HTTP server should serve user-specified website files from /srv
I’ve never seen that in my life, but I’m sure someone does that. This is one of those cases where the people who follow the standard are increasing fragmentation
> I've read that a handful of times (whenever trying to figure out where to put some new random thing), and still have never come to a clear conclusion.
So, given some data, say a file and/or directory, maybe from saving a Web page, that is relevant to subjects A, K, T, and Z, where in the file system directory trees to put that data?
My solution: Put the data in a directory for one of the subjects A, K, T, or Z without thinking very hard about which of these. Then go to a file I call FACTS.DAT (right, an old idea with an old 8.3 file name!). I maintain that file with a few, simple editor macros. So, sure, the file is a catch-all for entries of random short facts. And each entry starts with a time-date stamp and a list of key words. So, in the case of subjects A, K, T, or Z, include the key words appropriate for each of those. Then in the body of the entry, put the tree name of the file/directory where did store the data.
In a few seconds with my favorite text editor I can append an entry or search for an entry.
So far this year I have put 686 entries in the file FACTS.DAT for about 2.1 entries per day. For anything like current personal computers, handling such a file is trivial.
The idea works great!
I feel like it just highlights the problem of how antiquated and confusing linux terminology that so many of those reference "single-user mode", used to refer to booting into root, when the vast majority of computing devices a given user will interact with only have a single actual user, making this a confusing and almost meaningless distinction to someone not already intimate familiar with linux.
Oh, my young friend, you have no idea what POSIX has done to you.
"While no one sane would put newlines in directory names, such corruption of the results could lead to exploitable vulnerabilities in scripts."
http://www.etalabs.net/sh_tricks.html
He he.
Want to see true craziness? POSIX file names are just a bag of bytes. They don't even have to be text, they can be anything (almost), there's no standard text encoding:
https://lwn.net/Articles/325304/
And in typical Open Source fashion, someone actually claims it's a feature: https://lwn.net/Articles/325398/ because hey, you 99.999% percenters can suffer so that I, 0.001% percenter can implement my wacky system.
https://xkcd.com/1172/
This appears to demonstrate the full range of abuse.
Just because you can do something does not mean that you should.
It's software. Software's contract is the same as a legal contract. And a legal contract mostly says what you can't do.
So anything not directly blocked by the software is allowed.
Ergo, clear specifications, strict yet flexible types and APIs, etc.
Otherwise, it's just bad design.
It's basically the same on Windows with NTFS. Just a bag of 16-bit words instead of bytes.
The directories that house your executables are read only to users other than root, to prevent attacks and overwriting them by non-root users.
/var stands for variable data--like log files, cache directories, spool directories, etc. You shouldn't put executables there. Ideally you should be able to set the noexec flag on it.
`/usr` actually exists because the original UNIX developers ran out of disk space and had to attach another disk. The difference between /bin and /usr/bin is not worth it and even Debian symlinks /usr/bin to bin.
But your distribution's package manager should be putting stuff in /bin or /usr/bin, not you. Anything that follows the regex "{asterisk}/local{asterisk}" is something the system owner can do whatever with. So you should be using /usr/local/bin or $HOME/local/bin. I don't know why there's no /local off of the root. (One thing I do on my own systems is make and use an /etc/local although I think you're supposed to use something like /usr/local/etc).
/opt is for third party programs that aren't installed via your distro's package manager.
If you do this, any customizations you make to a system can be easily backed up by copying all dirs with local in the name.
There's multiple decades of tradition behind these names, but they do date back to the age where actual teletypes were used.
You don't even need to rearrange the folders themselves, just show them like that in the file explorer. Same way the windows explorer does.
Do you have any docs on how to do that? Thanks for the reply, I look forward to trying that.
MacOS too. /usr/ and /dev/ and whatnot exist, they're just flagged as invisible in Finder. There's a command to globally unhide them for those who want to see them.
Why not just symlink them? You can have best of both worlds with relatively little effort.
Make the overlay of your dreams!
Is it coincidence that you almost exactly replicated what macOS has? Except that /Devices is /Volumes, .../Apps is .../Applications. and /Boot is handled differently.
Of course, that's not perfect either, because a) decades of changes vs. compatibility have made it less clean in certain places, and b) pretty much all the POSIX paths still exist for unix-y compatibility, but overall it's like that.
Couldn't you do it with plain old symlinks?
> I wonder how hard it would be to rearrange the folder structure in linux
Restructuring the directories is the easy part. You just delete the old tree and make a new one. You can also mount procfs and sysfs wherever you want.
The hard part is modifying existing software to work with the new tree. So many programs assume you have a "standard" file system tree. So many programs assume procfs is mounted at /proc. So many programs have hardcoded paths. Shared library locationd can become part of the binaries when they're compiled. It's insane and you'd essentially be creating a new Linux distribution.
I know this is completely tangential. But you can Win-R and just type Documents and it will load your documents folder. Same for downloads, pictures, temp (windows temp), and I'm sure many others.
Works from File-Open dialogs and address bars and even in the command prompt you can even do "explorer documents".
Yeah, it's a junction point, but it's also useless. Open a command box and CD to it; now what? A file explorer and set it as the directory, again, now what?
Nothing says progress like renaming all your paths.
In the Win95 era, it was "C:\My Documents".
Huh, spaces. There's way too much software, especially on Windows, that breaks when there are Cyrillic characters in a path. I'll let you guess how I found out.
A friend had the username "Rubén" and jfc it broke everything other than windows itself xD
The problem isn't the Cyrillic or the é but the fact that Windows lets you put those characters in file names in non-Unicode encodings which will create sequences of bytes which are invalid UTF-8. It's 2021, FFS, stop using legacy encodings.
All win32 functions that accept or return strings come in two varieties, with A and W suffixes, MessageBoxA/MessageBoxW. The A works with the system default 8-bit encoding (cp1251 in case of Cyrillic), the W works with unicode in wide chars. There shouldn't be much of a problem with string handling if you stick exclusively with W functions.
Using the W functions has been the advice from Microsoft's documentation for ages. But people still use the A functions because they're easier, especially when writing cross-platform software since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide.
Fortunately the future of the Windows API does look better since Microsoft has now added proper UTF-8 support since Win 10 1904. All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.
I would rather they added a U suffixed version and better still backported that all the way to Win 7. Now in 3-7 years people can write programs that use the A functions, but have to check the version of Windows and refuse to run if it isn't new enough.
There’s been some talk of repurposing the A variants to work on UTF-8
> since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide
Apple OSes use something they call "unichar" inside NSStrings. I'm not 100% sure what it is, but it feels like it's the same 16-bit wide character.
It's possible! It seemed like a sensible choice back in the early 90s when the answer to making a system for global use was UCS-2. I know Java was another one that went with that decision.
> All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.
They really should have gone with WTF-8 [0] since the W functions generally accept WTF-16 and not just the valid UTF-16 subset.
[0] https://simonsapin.github.io/wtf-8/
I had a really odd one last year where a Grave I ( well known brand name) got converted by office/excell into a Double Grave I.
The double grave I is used by some obscure orthodox religionious texts
If you have a username with your full name (plus point if you have special characters in your name), you will get the whole deal with shitty programs. I’m not sure if it’s me, but there were cases I simply could not use a program installed in such a location, to the point where at my previous (admittedly shitty) workplace, we often installed software in a root location…
Laughs in C:\PROGRA~1\ (try it, still works in Windows 10)
There is no guarantee that the short name has that. In fact on a lot of German Windows installations it was PROGRA~2.
Well, on my disk PROGRA~1 is "Program Files" and PROGRA~2 is "Program Files (x86)", so still works :)
That order is not guaranteed consistent across installations, however.
I wonder if code to this effect has ever been written before
And that, children, is when marginalia_nu unlocked the seventh circle of the inferno. Tomorrow we'll read the story of how our new demon overlords forced us all back to Windows 3.1.
Win 3.1? on DOS 6.22? Actually, this sounds like heaven. Just don't put it on the public 'tubes.
Or do. Can’t hack a Mac Classic web server!
Got to tweak HIMEM.SYS before the slumbering one can be awakened.
PEEK and POKE could break the HIMEM.
For whatever it’s worth, this is a terrible idea, for so many different reasons:
https://web.archive.org/web/20100107184218/http://blogs.msdn...
And so, yes, I’m certain someone must have done it, because it’s clearly bad idea jeans and so Murphy’s Law says it must exist.
Can that work for i > 9 ?
If you mkdir PROGRA~10, yes!
And on mine (Windows 11) "PROGRA~3" is "ProgramData"
Truly lifesaving for when she'll quoting gets in the way.
You've got a stray single quote in your shell. :)
That was a typo, but it seemed like a perfect illustration of my point, so I left it in.
Typo? I would guess it’s autocomplete at work. iOS does that all the time for me.
Apart from what others mentioned, that can only work if the file system automatically creates 8.3 names. NTFS does not necessarily do that (https://docs.microsoft.com/en-us/windows-server/administrati...)
I wonder how much global work could have been saved if Microsoft also provided a covered interface for all paths in the system. Not sure if there is any, but one good implementation might save thousands of poor implementations required to handle it.
You mean like the Environment.SpecialFolders enum?
https://docs.microsoft.com/en-us/dotnet/api/system.environme...
There are several other classes that take care of getting folders, least of which checking system variables.
You have %Appdata% and friends.
Then they made poor APIs so that you have to do this to get it correct:
https://docs.microsoft.com/en-gb/archive/blogs/twistylittlep...
In nix at least you can call execve or other APIs that take a char argv[] and the whole problem is largely solved and you don't need to quote things.
On the other hand their case sensitivity behaviour means that “cross-platform” Java applications can break if they are run on a non-windows platform where opening files is case sensitive (unlike on windows)
It's actually a feature.
Easier to add a flag to ignore case rather than fix bugs where files only differ by case and are therefore overwritten on a case-insensitive filesystem.
C:\PROGRA~1
Easy fix!
And then to really mess you up and ensure you handle parens properly, threw “(x86)” into the mix. (A real pain on some REPLs as well as dealing with environment variables).
Funny, in the Italian Win9x it is C:\Programmi, which I always thought was more convenient because of the lack of spaces :)
Sure. Microsoft only ever ships features
Shame it wasn't
> C:\P̷̧̽r̸̬͘ŏ̵̮g̷̜͘r̸̦̋a̴͎̒m̶̲̈́ ̷̠̉F̵͇̈ĩ̴̫l̶̨͗ë̵̦s̸͚͆\
Except for programs that were too old / obscure to fix I guess. I think at least the Symbian Development Kit was such that builds would fail with strange errors unless you installed it in any other path than the default immediate subdirectory of C:\, let alone under "Program Files".
It not only keeps people on their toes due to the whitespace. The folder name is even localized. E.g. with german settings there is C:\Programme and c:\Programme (x86).
You can still use the English names, though.
I know that at least like, idk like 3-5 years ago, when I had gotten a new windows laptop (windows 7 or 8 I think), setting the main account to have the name "" (without the quotes), caused some problems with the basic functioning, including, I think, with some pre-installed programs,
So, some things were still being handled not quite right (whether that's because it shouldn't be allowed to be the username, or because programs should handle it being in the path, I'm not sure, but probably one of those.)
I just wish they had a decent way to execute programs with arguments that might include spaces. But no, every program can do argument delineation differently.
And Microsoft even provides three different slightly incompatible ways to parse arguemnts: CommandLineToArgvW, the CRT and cmd.exe.
Could you please link the reference?
At one time there was no number 0. Half of binary was missing.
There was a short path name IIRC like prog~1
Pro tip2: Use std lib path processing utilities
It doesn't even have to be complex, often basic automation tasks fail with spaces and special characters. Honestly, treating a file system like a natural language processor is a bad idea. Besides at this point with how digital we have all become who can't understand...
thisismyconfig.txt vs this is my config.txt or this_is_my_config.txt
...i've forced myself to stop using spaces, character, and even cap. They are all constructs that provide minimal value for the extra complexity.
> thisismyconfig.txt vs this is my config.txt or this_is_my_config.txt
Just wondering, what is the readability of this for people who are dyslexic?
Or in my case, people for whom English is a second language, or have low education levels.
Saying, "who can't understand..." is arrogant, selfish, and an example of why normal people hate people in the SV echo chamber.
> Saying, "who can't understand..." is arrogant, selfish, and an example of why normal people hate people in the SV echo chamber
Exactly how I feel every time Economics is brought up on HN.
SV echo chamber is on your side here - it is very in vogue to denounce anglocentrism. they were defending hieroglyphs and emoji in variable names in that thread about invisible javascript backdoor a day or two ago if you'd like a recent example
Could you please stop posting ideological battle comments to HN? We ban accounts that do that, regardless of their ideology, because it's (a) not what this site is for, and (b) destroys what it is for.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
Agreed.
But Hacker News should do something about all of the anti-bitcoin and anti-anti-nuclear ideologies running around in here.
I don't really mind it that much but it'd be nice, it's really the only 2 extremisms I've experienced here, all other subjects are discussed in a fair manner.
I appreciate informed discussion about bitcoin & nuclear, as both topics are highly relevant to the technical, business, and hacker roots of HN. They seem distinctly different from, say, "anglocentrism" @dang was calling out.
> discussion
There's no such thing as fair discussion about those topics here.
What does "fair" mean in this usage? If it means one position attracts a lopsided balance of comments either for or against then surely that's always going to be the case?
Otherwise what is your proposition, don't state any opinion unless you find a counter opinion commenter to match with?
Lots of folk here are pro-privacy and lots of folk are anti-bitcoin (and some of them will be the latter because they're the former) so I don't understand how you'd extended your position in a way that leaves HN with any value.
cestmaconfig.txt vs cest ma config.txt vs cest_ma_config.txt
It's the same in any language.
Hugs who hurt you.
I'm also pretty sure most of us in any language use Slack, SMS or other forms of communication where text isn't necessarily presented in a grammatical correct manner and we all figure out what the person is saying.
I'm not sure, but my gut instinct is that it wouldn't help. Dyslexia rates are much lower in China, so if I suppose we could start naming files with Chinese characters (on systems that support Unicode). It would take a bit to get used to, but eventually we'd develop a pidgin language for when we talk about software, much like how if you overhear Chinese or Vietnamese developers they will mix in English words like "linked list" into their sentences, because there's not a more natural sounding alternative.
Switching to Chinese would also help eliminate the spaces issue.
tbh I'm not dyslexic and realized the spaces make it really difficult to know what the filename actually is. If you just take the second example, how would you know if the file was "this is my config.txt" versus "config.txt"?
Aside from parsing errors it just seems to lend itself to ambiguity.
This. People are saying spaces improve ergonomics. Unless everyone always quotes their paths in documentation, emails, etc -- which they won't -- I say it actually reduces readability.
Also programs automatically that turn paths into links don't work with spaces.
> treating a file system like a natural language processor is a bad idea
could you please explain what you mean by that?
I'm similar, but I would like to support labels intended for humans, along with various translations, as metadata on top of e.g. filesystem path components.
You nailed it - getting rid of spaces and dashes and underscores is extremely human-hostile. People added spaces to the English language for a reason, and that's because they make it way easier to read.
Your system is only intended for other programs to interact with? Go nuts, make hex UUIDs. Actual people are supposed to use it? You need separator characters.
I also don't see how those characters add "extra complexity" unless you're doing dumb things like text processing on paths and filenames (as opposed to using OS/library functions that handle paths correctly) - in which case, there's your problem.
Why stop there. A computer works more efficiently with numbers rather than strings, so let’s just give each file a number instead of a string. Besides, at this point with how digital we have all become who can’t understand… But wait, that already exists and is called an inode.
A file system has a human interface and a computer interface. Don’t mix them. Let users give file names in whichever way they please.
> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
A former co-worker changed his name in our auth system to include an apostrophe, so that whenever we handled names wrong he'd find it.
Oh, I like this!
I set my nickname to U+FFFD at one point in one work system, resulting in a variety of bug reports and concerned emails. I think I dropped it since it was generating false reports from people who didn't check what character the page contained before reporting it.
Áčçëñts hęlp tóø
For anyone curious, this is called Pseudo-localization (https://en.wikipedia.org/wiki/Pseudolocalization). I first singled across this in Raymond Chen's blog.
One of the systems I built is being used by a group of younger people. I included an emoji in the superuser account name, just to make sure it would work. And to remind me to think more broadly about user input.
A related too for CI: change the system time to be a time zone that is during your work hours in a different day already than UTC. Really helped getting failures earlier than 4pm PST.
At my last job we had a wild time-zone bug that only happened with your system location set to Mumbai. I left mine set to that for the rest of my time there.
Related: here's a recent Firefox bug about a test that failed during the daylight saving time change:
https://bugzilla.mozilla.org/show_bug.cgi?id=1739847
Could you consider rephrasing this? It sounds like an interesting observation that I'd love to understand, but I'm genuinely not able to parse it.
My best guess is "change the system time to be a timezone for which, during your work hours, the other-timezone is in a different day than UTC is" - but I'm still not sure what effect that would have on CI failures.
I read that as "set your CI to run earlier in your workday so you don't get new error reports at the end of the day." Midnight UTC being 4 pm/16:00 PST.
Maybe an example of the failure this detects helps: when I used to work on Rails apps in the olden days it was easy to call Time.now and get the local time instead of Time.zone.now to get UTC time. This often lead to wrong dates but tests would only fail once it was a new day in UTC land but still the old day in the local time zone. Making the CI machine's system time something Fiji time really helped in getting failures much sooner after changes were pushed.
To have such thoughtful coworkers. On an old team I had two coworkers named Chris and once in a blue moon when they reviewed each other code master would start crashing because one of them accidentally left in an absolute path starting with "/home/chris/".
the proper name of the glorious sultan of slack, j. r. "bob" dobbs, has the quotation marks and therefore is a great subject for this
my test accounts always have emojis + accents + other weird characters.
it keeps everybody on their toes lol.
I add a Japanese character into any .py, .js and .html file to ensure that Unicode is working properly through the entire chain. Mostly in form of a variable which gets passed along, even in URL parameters.
Obligatory xkcd https://xkcd.com/327/
I've used to have a space in my user name and even contemplated to add a bit of non-1252 Unicode. You find a lot of issues, but unfortunately often in tools you have little control over and end up not being able to work effectively at times. It ended up being more frustrating than helpful.
My Mac is formatted case sensitive when the default is case insensitive. This will also catch a ton of import related bugs.
League of legends doesn’t run until I sed files for instance.
I once returned a printer because the Mac driver and support software expected and enforced case insensitive access and basically couldn't install properly on my case-sensitive HFS+ volume. It half installed and blatantly just didn't work in any way when installed.
Adobe software used to refuse to install on case sensitive file systems back in the not too distant past.
Circa Y2k, I learned that the OSX Palm Pilot software didn't work with case sensitive. I've since given up and stuck with the default. (I'm anti-case folding in general, because of the ambiguity.)
I also enjoyed doing that, but had to make a DMG just for Steam because it straight-up refuses to run on a case sensitive FS (that's true on Windows, also, which I suspect is how we all got here). I think the most recent Steam versions either caught wind of my trickery or -- more likely -- run something from $HOME/Library/SomethingOrOther and thus the work-around it no longer works
When I got a new Mac, I just gave up and acquiesced to the case-retentive world :-(
I have coworkers on Mac that write node/JS code. Every once in awhile I'd pull down the latest code and it wouldn't run. I'm on Linux.
Sure enough, they had SomeFile and were importing Somefile and it works fine on Mac but not on Linux (which, of course, is what our production servers use). It amazes me that "works fine on my machine" is still a thing when I definitely worked at companies that solved this back in the 2000s. It was solved. It was done. Then devs became enamored with running everything locally. Even dozens of microservices or databases. Even though JS is fairly isolated, you still have NPM packages that need built against the local OS and C/C++ library and compilers, etc. Which also has caused issues in the past.
Good news, we have solutions. You could use continuous integration and software containers like Docker.
Does Docker abstract filesystem behaviors like this? I always thought that it stopped at the libc level - that is, libc is included in the container, but it calls the host kernel's system calls, and so inherits the host kernel's behavior (including things like underlying filesystem case sensitivity).
Docker relies on LXC, so it's Linux-only. On other platforms it runs in a Linux VM. The host for Docker, then, is Linux no matter where you are.
> Docker relies on LXC, so it's Linux-only.
Docker hasn't supported LXC since 2016, and stopped relying on it in 2014
https://www.docker.com/blog/docker-0-9-introducing-execution...
I thought the name for the collection of kernel features was LXC, I didn't realize (until just now) that was the name only for the also-kernel-level wrapper for those features, which name does not cover the features themselves. That is, I didn't realize that LXC is to Cgroups+Namespaces as Libvirt is to KVM—I thought LXC, as a label, covered the whole feature-set—but regardless, it's still married to Linux kernel features and runs on other platforms under virtualization, no?
> it's still married to Linux kernel features and runs on other platforms under virtualization, no?
Actually no. At least on Windows Docker can do native Windows containers too
https://poweruser.blog/lightweight-windows-containers-using-...
my favorite is often being the only developer on linux and giving two files with different casing and watching their systems crash and burn.
Better solution: only allow ASCII, maybe dashes, and up to twelve characters. Problem solved.
Enforce this in LDAP.
Strict convention is better than flexibility and predicting obscure edge cases that can fail.
In my case, and for many people writing desktop software, and for absolutely everybody writing open-source tools or libraries, unfortunately you can't control the environment.
Non-ASCII paths are extremely common (e.g. the user's home directory on Windows, for the large majority of users outside the English-speaking world) and spaces, punctuation and weirder characters will definitely happen when you least expect it.
Yes if you can avoid it then absolutely that's great, but I don't think most people can.
It's also not usually very difficult to deal with, as long as you actually spot the issue in the first place.
Ah, that's the he enterprise edition.
But then your program will crash hard and unexpectedly when a user decides to save under "~/house plans" or ~/Téléchargements.
I think it's better to exercise this in CI, that's what CI is for.
Ugh, we have the 15 character Active Directory limit now with hostnames, and a previous IT administration has imposed a convention that every name had to follow [prod|dev]-[ph|vm]-[service]-[nn]. So basically every production service is prod-vm-owtf-01— you get exactly four characters to actually describe what the machine does. Works great when the service is "jira" or "wiki", but there are a lot that are pretty mystical-sounding, like jkns, jwrk, cntr, hrbr, etc, where you kind of just have to know.
I kind of like that honestly. No doubt you need some documentation so everyone knows what the service abbreviations are, but after you've been working there for a month you get it. Makes everything clean, consistent, and informational. You can quickly ascertain what a specific host is doing just from the name.
Oh absolutely it makes sense to have a standard, and being able to tell at a glance if something is a VM or physical machine is of value also. But dedicating 2/3s of the character budget to such a scheme is madness. If the prod-vm- prefix simply become pv-, then you'd at least be able to do pv-jenkins-01 again.
Anyway, all this was fine when we were on LDAP rather than Active Directory. So basically it's all Windows' fault.
Do they at least allow you to set up CNAMEs?
Yes, and for many of the web-serving machines, that's what happens, they're jenkins.example.com or containers.example.com or similar. But often a singular service is backed by hidden worker nodes, databases, whatever else, and it seems silly to give those machines that level of indirection vs just using the hostname as their sole identifier.
only allow ASCII, maybe dashes, and up to twelve characters. Problem solved
...and only hire people from the exact same background as you, who will never have unusual characters or accents in their name. And also make sure not to have any users who aren't exactly like you, and conform to this very narrow requirement. Surely, excluding 90% of the world won't hurt revenue in any way.
Snarky, but I'll take it.
Use strict schema for the hardware interface, networking, physical stuff the user never sees. Microservice names don't need to be non-Latin. Database replicas, infrastructures, etc. And you're not going to piss off employees by giving them ASCII ldap/email addresses.
Use utf8mb4 or similar for storing names. Don't state "first" or "last". I've been through this rodeo too many times. You're not surprising anyone.
UTF-8 strings aren’t reproducible anyways. User ID should be strictly for identification, be alphanumeric random string if necessary.
This is not excluding? I just use an ascii canonicalized version of my name and works fine.
You can use an "ASCII-fied" version of the name, only ~27% of mine can be typed in ASCII letters that look similar but the rest is just phonetically or visually close-enough letters. This is something people did for decades and nowadays even government IDs have an ASCII-fied (well, Latin-fied) version of the name.
I maintain a similar system, where a variety of companies submit files that get processed through multiple services - it is astounding how ridiculous people’s naming of files can be; spaces are the least concerning!
> it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names
If you consider spaces “unusual” I would say you haven’t encountered a single average user in your lifetime. Spaces in file-names is the single most common thing people have, outside programming environments.
As a x-plat developer, the only platform where I (still) regularly encounter these kind of bugs are platforms where solving problems through scripting is common, like Linux, where the primary means of operation is through stringly-typed statements getting parsed and processed in a untyped-fashion. It's not very reliable.
On Windows people more often use “real APIs” (because scripting doesn't really work as well), but then these problems just goes away.
Pros and cons, I guess.
It's especially funny that it affects Linux so much. Most file systems allow everything except `/` and NULL in file names. Early AT&T UNIX even allowed NULLs! POSIX shells use the IFS variable to perform field splitting, and it defaults to <space>, <tab>, and <newline>. The choice to perform field splitting by default (particularly with spaces in the default IFS set) has caused no end of headaches for developers and users.
> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
This will also break any code in external tools that are called during the builds of your application and do not handle spaces correctly for whatever reason, thus making it so that you won't be able to successfully finish the build.
Then again, you probably shouldn't be relying on technologies like that, but when you're struggling to keep an old enterprise system alive, causing yourself more problems is not necessarily what you should do.
Still a good idea in most cases, though.
And yet OneDrive WP t allow fir spaces before or after a file name.
I spent hours trying to figure out why an entire folder suddenly stopped syncing. Turns out I accidentally added a hidden space to the end of a folder name.
Yup, their UI sucks when it comes to sync errors.
> other unusual characters in file names
Saw a few hacks where malware authors used the RTL feature (which is baked into Windows) to obfuscate file extensions. It looked like .exe.innocuous-document.docx, but was actually .docx.innocuous-document.exe
This exact vulnerability in most modern code editors just made the rounds, allowing smuggling malicious code right through review.
My favorite filename special character bug was when I implemented CD ripping in 2005, and one of our beta testers ripped a CD with a song called "Have You Ever?". My code wasn't prepared to filter out the question mark on Windows.
I just hit the one where an album folder ends in a period. Rsync copies every time because the period is dropped by the filesystem silently. :-/
Let's not forget return carriages in filenames within apps...
> Pro tip: rename your development directory
I changed my username to not contain a space because it was too annoying to deal with all the random dev tools breaking. The worst offender was probably npx on Windows [1] (resolved after four years by deprecating npx), but it was far from the only one (though the JS ecosystem was somehow the worst in this regard of all languages I worked with).
1: https://github.com/zkat/npx/issues/100
Same, even I had to rename my user folder to not have a space because so many tools were breaking.
Or not, which when bugs crop up will teach the businessy types to stop putting spaces in their filenames.
The beatings will continue until morale improves?
Spaces are very useful for readability.
depends entirely what you're using to browse files
And add a emoji, a character in a right to left language ( א) and perhaps 太. Maybe italicize one of those too...
For those purposes I've found hyphen to be a nice substitute.
I don't know if it's still a problem, but it used to break Python virtualenv badly. If your working directory had a space anywhere in the path, it would throw a huge fit and not work. Which is problematic when the expected name for a Mac's boot drive is "Macintosh HD" (if you ever had a reason to run a virtualenv outside of your home directory).
Even capitalization is a pain in the ass thanks to how OSes treat file names. I pretty much stick with either `file-name.ext` or `file_name.ext` exclusively now.
Someone should provide the OneDrive/SharePoint people some of this religion.
Mysterious character requirements that do not conform with Microsoft’s OS limits, limits on tbe fully qualified pathname length, etc.
Somewhat related to injecting unusual characters, in my experience in localization efforts:
Inject a Turkish 'I'. I don't know how to type or paste it here, but picture an English lower case 'i' that is upper case. It is a splendid way among many to shake out some loc bugs.
İ
From https://en.wikipedia.org/wiki/%C4%B0
That would only shake out anything if you'd also test in a Turkish locale, wouldn't it? Since Unicode casing rules are locale-dependent and en-US doesn't care much about dotless i or dotted i.
Late '90s I worked on Java software that got installed on several Unix platforms, including Linux for IBM mainframes. When you deal with the default en/de-coding of Unicode to EBCDIC you never have trouble with Java byte encodings ever again.
> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
The problem with that is that YOUR code may handle it, but your tooling may not. If my code formatter break on spaces, I'm not going to change the formatter.
You could submit a PR to their repo.
I could submit a PR to 5 tools a week on average. I actually have the time and resources to do it once a year.
Last week I opened a ticket for a Firefox bug. Following up on the bug took me 2 hours in total.
FOSS is not free, you pay it with your time. And as with everything you pay for, we all have a budget.
It's easy to tell users to make a folder with no spaces if you're setting up a global path, however if you have an application that runs in user directories things can become painful fast. Changing your user name is a pain and can leave things inconsistent, but having to handle all the variations in people's names with spaces, punctuation, international characters, can just be mind boggling.
Spaces are a pain in the ass when you're using CLI so I'd rather enforce a no space policy
Most shells will behave just fine if you put a quote (single or double) before anything that has a space.
A small extra step but something you get used to if you spend a lot of time in the cli.
Escaping spaces is a pain. I have to do it every day.
I set up symlinks which help navigating around but then the relative paths are wrong for git.
No thanks.
Friends don't let friends put spaces in paths
See the recent article about unicode invisible glyphs in JavaScript or bash.
Naming freedom needs a stdlib module
In that case, be thorough and insert a Chinese and an Arabic character to enforce a Unicode check.
> anything with subprocesses
I'm begging software developers to stop using subprocess APIs that take a string argument (system(), child_process.exec(), Process.Start(string)) and start using subprocess APIs that take an array of arguments (execvp(), child_process.execFile(), Process.Start(string, IEnumerable<string>).)
Sometimes / works as a path separator in Windows, sometimes it doesn't. It's not predictable.
I never use / on Windows as a result.
The only common place where it doesn't work is in CMD for executing programs and as arguments for built-in commands. Everything else goes directly to the relevant APIs which don't care about / or \.
These days using CMD instead of PowerShell should be rare enough and PowerShell certainly doesn't mind the slashes.
Today I learned that You cannot install Tailscale on windows if installer is inside path with non-latin chars.
More importantly than your source files, put your testing data on such a path as well. Nobody uses absolute paths in testing so it doesn't matter how many spaces your absolute path has if your input is "./tests/file1". Put those files in a folder with spaces too and throw in a unicode character for good measure.
I did something similar on accident. I used to keep all my development work synced with Dropbox and I had a work and a personal account. So any of my own projects would have /Dropbox (Personal)/ in the path which did catch some bugs. Dropbox renamed my folder to "Dropbox (Personal)" automatically when connecting a work account.
I have an overly-aggressive function in my .bashrc to rename all files in the current directory:
I use this all the time, especially when I download files.
Overly aggresive is right! I don't know if this is genius or deranged! I'm leaning towards genius and stealing the idea.
By the way: what's your beef with en dashes? I mean, if it was "everything should be 'HYPHEN-MINUS' (U+002D)", then fine, but why specifically en dashes and not em dashes?
> By the way: what's your beef with en dashes?
Of all the changes in that list, removing the character that doesn't appear on a standard keyboard seems like the least controversial...
To add, it's a character that gets magically inserted for no reason in various situations.
It's up there with those damn angled quotes.
A better question might be "how did it get there in the first place?"
Presume all inputs are hostile.
Whether people or processes, something is likely to introduce the character at some point.
Sw which converts -- and __ on the fly. Same sw converts quote pairs "for your convenience"
Opt+- if you use macOS, long press on - if you use any Apple touch OS.
I totally agree that for some people, this could be a terrible command to have around. However, I know that it has been working for me for about 8+ years or so. I almost always run in in my ~/Downloads folder on files that I don't really care about. I download a lot of academic papers and books, and this just saves me a lot of time to put files in the format I like: author--paper-title.pdf. And that's part of the reason why I make all of the dashes the same, so if I'm opening something by an author, I can easily autocomplete and not have to remember how to make other sorts of dashes on the command line.
For a download folder in particular, this does sound like a great idea. You'd break the list in the browser or whatever, but who cares about that?
Surely you must run into conflicts now and then?
That's the most beautiful part! After running this script there are no more conflicts, because it just silently overwrites all but one version of the "cleaned" filename.
(Also—that entire function is super inefficient and could be replaced with a single invocation of "rename".)
Totally inefficient. But for me it's readable and practical. This is mostly just a convenience function for me to help store files in a format I like rather than something I need optimized. If it ever started to feel slow, sure I could optimize. But for now, when I still occasionally download a file that has some weird character and I just prefer to add another line to my function.
Without changing the design too much, you could rearrange it like so to avoid renaming multiple times and still have the option to just "add another line":
Though I would at least take advantage of character classes to reduce the number of substitutions:
(I'm using the `rename` command provided by the `rename` Debian package, a.k.a `file-rename`. The options may vary if you're using a different version.)
https://github.com/dharple/detox is a nice tool for this. Sane defaults but configurable.
In addition to CLI I use it from emacs dired-mode too:
I bind it to "_" in dired-mode.
Word of warning from hard experience: rn is a really dangerous thing to name a function because it is one char away from rm.
ren would be better than rn. :)
Looks like it's typically run without any arguments, so it's probably fine.
A typo can go the other way, like "rn somefile" where it was meant to remove a file but instead it renames all files.
One char away also physically on the keyboard (maybe that's what you meant?).
Yeah, the physical layout is the primary concern. I should have noted that since there is ambiguity because n and m also happen to be next to each other in the alphabet.
laughs in dvorak
cries in colemak
I once ran “crontab -r” instead of “crontab -e” and also thought that was terrible design for the same reason.
Note to self: snag “notTerseAtAllMoreVerboseIdentifiersForGreatGood.js” on NPM
Agree. Having this function exit if any arguments are passed to it seems like a good safety measure.
I use this snippet, to change spaces to underscore for directories and files in the current directory and below. Haven't made it a function yet, but should. I got it from stack overflow or somewhere, but no attribution. Thanks to whoever did it first:
Nice but how do you prevent overwrites? What about directories/folders and the files in that directory/folder?
I have:
But also:
Would not like to lose files like the the srt.
rename will stop and output and error.
Yeah, sometimes I end up renaming things I don't want to, but it really doesn't happen all that often. And sometimes I throw caution to the wind, add some excitement to my life, and rename a bunch of files (not for anything professional) in some really old directory and hope I don't break anything. But I'm not aiming for perfect with this comment. I just mentioned in another comment, but the vast majority of times I run this is in my ~/Downloads folder on files I don't really worry about breaking.
Thanks to all the comments in this threads, I now have "sudo apt install rename detox" in my install script, and:
in my .bashrc.
I've thrown some edge cases at it, and it handles it super well. It deals with consecutive "_", remove leading garbage, normalize unicode, and even prevents naming conflicts by opting out early.
Thanks you.
You might be interested in detox:
https://github.com/dharple/detox
I wonder if rename has an -e flag like sed. It might be worth baking this into one monolithic regex if you call this often
You missed ~ You really don't want to create a directory named "~".....
If you're a developer you're doing yourself a big disservice by not learning how to deal with special characters.
I agree. I am a developer and I know how to deal with special characters. But this isn't something I use professionally. I just prefer not to have to deal with special characters in the pdfs, m4as, txts, and other files that I use on a daily basis. When I write papers, I'll write ū or Ñ or ç or whatever (incidentally, I have a lot of shortcuts in my .vimrc for those). I would not say I am "afraid" to use spaces in filenames, but I get a certain satisfaction storing academic papers in the author--paper-title.pdf format and my notes in author--paper-title.md because it helps me find things.
Define "space". Is the Hangul filler we talked about yesterday a spacing character? Is the zero-width non-breaking space a spacing character? What about the typographic spacing characters?
You should better be very afraid of using spaces in filenames.
You should do everything you can to support them but you have to know you'll invariably encounter countless cases where you'll have this or that tool that won't work properly with them.
I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
Every Git repository in the world has that example: let that sink in.
> FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
I use Git for documents too, not only code. Why shouldn't I use my native language?
Tab completion don’t work well for languages that require IME. That is one reason why I don’t.
That's actually a good point. On the other hand, not all languages use IMEs. Mine just uses the AltGr modifier key, but is otherwise just a standard QWERTY layout without any features.
IME == Input method editor?
https://en.wikipedia.org/wiki/Input_method
yup, I type in pronunciation and let it guess what I'm trying to say. Works okay in editors but don't work great with shells in a terminal emulator, so I just prefer not having to use it in shell operations.
Tab completion works just fine for me with a Japanese IME.
non-ascii characters cause annoying hard to fix problems. If you're willing to deal with that - kudos. Personally I don't find it worthwhile
I haven't had problems yet. Spaces, punctuation, and quotes are the main offenders, most of the time.
You get all those space characters working and then some jerk comes along and uploads a file like this: ŗ̶̧̢͓̳͍͙͔̳̻̥͉̭͓̫̟͍̞̭͉͓͉̮̹͍͚̳̹̬͉͚̰͈̘̐̊̾̈̀̒͒̀͛̓̋̔͊̏͘̚ę̴̨̛̣͙̤̟̬̩̟͙͖̥̹̱̱̊͑͗̇̇͛̆̈́̃͋̓̀̔̍̍̌̐͊̎̓̅̀̕ͅģ̴̹̜̘͍̱̑͐̉̌̐̄̊͛̎́̐̌̅̈́͂͑̈́̋̔͂̊̊̒̒̔͛͆̚͘̕͠e̶̙͕̫̳̘͐̾́̑͆̓͂̿͊̊̍͛͐̌̆͗̌̅̅̔͊̂͛͗̅̕͝͝͝͝x̵̢̧̦̫͖̝̥̹͓̬͖̤̩͚̝̫̋̃̅̈́̆͋̌͑́̎̈́̊̾͒̀̒̎̓͛͊̿̓͊̀̍͐̆̚͝͝-̴̨̮̯͖͖̠̜̲̪͕̘͈͖̮̈́̓̐̃́̅̄̏́̍̉̐͌́̔̓̄͋͗̐̕͜͝ţ̴̢̧̖̗͖̞̮̫̦̼̝̺̼̱̳͓͉̜̟̤̲͖̻͙́̌̈̌̈͆̾̄͊̿̏̓͗̈́̕͜ͅh̶̢̧̨̥̭̼̟̣͖̯̗̤̖̙͉͕̙͎̰̠̝̖͈̻͙̪̮̘̯̻̼͕͓̖̣͈̽́͊̎͐͌̆̍̎̏̿͐̒́͋͑̍̿̎͆̑͆̄͂̀͐̄͑̀͗̿̽̎̾̊̕͝͝͝͝͝ͅi̴͚͈͍̫̮̝̣͖͉͓̯̠̙̭̟̖̘̾̓̄̈́̒̏̽̆̉̿͛̀́̃̋̒̈́͋̂̇̈́͛̕͜͠͠͝ͅs̶͇̖̳̞͉̱̞͓̖͔͔͍̗͇̖̮̹̅͊̔͋͊̈́̎̐̆̋̒̀̍̕͜ͅ.̴̧͎͇̰͉̼̱̰̦̟̑̋̏͌̍͊͑̄̀͌́̆̓͛̒̆̾̉͐̄̂̈́͆̒̃͗̐̂̎̈́̈͛̿́͛̾̚͘͜͝͝ͅȩ̷̡̲̪̱̪̥̳͍̼̰̘̗̹͙͙͓̣̟̩̥̥̖̠̪̮̹̞̥̻͎͖͍̯̂͑̏̑̆̍͋̎͛̅̑̑̏̎̓̀̓̒̈́͊͌̀̈́̒̌͐͂͛̊̍̐͂́̔̌̾͐̈́̋̇̏̚͜͝͝͝͠ͅx̶̧̛͚̗̜̪͍͖̘̙͎͚͇͙̬̱̟̭͓̺̙͍̖̱͚̣̘̪̭͔͔̮͎̬̪̤̹̟͔̩͍̬͕͔̩͐̈́̒̂͛̂̈̀̿̍̔̓̓̀̃̍͆̈́̍̓̌͐̈́̾̇̎̑͌͒̄̆̿̍͆̅͗͆͘͠͝͝ͅͅͅe̷̢̡̡̨̧̛͕͚̬̮̞̥̼͍͔̝̟̝̯͈̟̥͖̱̹̣̩̼̩̅̌͌̑̎̐̀̽̏́͐̋̏̎̎͛͌̀̊͊͒̑͌̎̎̑͊̌̉͆̾̚͘̚͜͠͠͠͝͝ͅͅͅ
Honest question - what the heck are those characters?
Zalgo text: https://zalgo.org/
It was a great joke for a couple weeks two internets ago.
> two internets ago
It's been like three internets since I heard someone using "internet" as a measurement of time.
It's actually interesting to think about "generations" of internet, just like generations of people, and how the culture shifted between them.
There was a time in the early '00s when broadband was catching on, yet YouTube didn't exist. A time when Ebaumsworld and Newgrounds ruled the internet. When Homestar Runner was pop internet culture. Weebls Stuff. The frog blender.
Combining diacritic marks.
It corrupted text or "Zalgo" text, it relies on diacritics.
See this answer on stackoverflow:
https://stackoverflow.com/questions/1732348/regex-match-open...
I disagree with calling it "corrupted." We're not tricking the browser into trying to render garbage bytes that are actually the middle of a jpeg or something. It's actually valid Unicode. It's an edge-case which is not seen in regular usage, but it's technically following all of the rules.
It's just the way it's called, not a statement of fact.
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
https://en.wikipedia.org/wiki/Combining_character
Specifically _Vietnamese_ combining characters. The Vietnamese writing system uses multiple combining characters at a time, and stacks them vertically. Throw in a few that wrap around the character like t҉his, some 𝑎lternatᵉ lꬲttꬲr fᵒrms, disturbing imagery, and perhaps a few other tricks, and you have zalgo. See also https://stackoverflow.com/a/1732454/823846
This legitimately made me laugh out loud in my office.
The characters reach up off the screen as I reply to this. They overlay the comment above you. Amazing. How?
It's usually called Zalgo text, and it's what you get when you start stacking all kinds of Unicode diacritics on poor unsuspecting characters.
https://en.wikipedia.org/wiki/Zalgo_text
Interestingly I get different behaviour per browser/OS. Firefox/Linux clips it to the bounding box of the parent element, Firefox/Mac and Safari/Mac clip it to the line height, and only Chrome/Mac lets it extended further.
Huh, I tried it in Chrome to see how it reacted here and it maintained about the same position as it did in my usual browser, Firefox.
Firefox and Safari on iOS 15 both render all the glyphs attached to the base character. Vivaldi, Chrome and Firefox on Win10 all render them stacked and overlapping the parent and child comments.
This is the best generator I found: https://lingojam.com/GlitchTextGenerator
I find http://animalswithinanimals.com/generator/generator.html much more controllable.
For anyone who is curious (and acolytes of Zalgo): "In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character. So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model."
[https://stackoverflow.com/questions/6579844/how-does-zalgo-t...]
Hah, lucky for me Chrome on Ubuntu didn't implement the spec correctly. ;)
768 characters is too long for macOS it seems. (References online say HFS+ has a limit of 255 UTF-16 characters. Didn't find anything for APFS immediately... edit: same for APFS)
Glad you didn’t choose a sequence that crashes my browser.
regex this, bravo
Please don't Zalgo on HN. It's enough to speak its name.
It would be one thing if it was making other comments difficult to read or causing browser issues, but I appreciated the demonstration that both would presumably be possible on certain browsers
Until now, I haven't actually thought of what would happen if zalgotext occurred anywhere other than a web browser. Looking forward to the five minutes of fun with the file manager and whatnot.
> I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
I have an Android phone and I tell MusicBrainz Picard to save all files with ASCII-only names and Windows-compatible names for the ones that get sent over to the phone. Basically for this reason. Sometimes it's players on Android itself, but even more frequently, whatever bluetooth radio I'm connected to freaking out with non-ASCII characters.
What do you mean, display garbage?
L'imp?ratrice? L'imp�ratrice? L'impératrice? L'imp‚ratrice? L'impÚratrice?
I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string - it's a sequence of path components and should be treated as such by our APIs. A path should be parsed once - on user input - and then used in its "sequence form" throughout the software stack.
And "path component" is not an arbitrary string either - e.g. appending a path component to the path should first require converting/parsing the string into the path component, and only if that's successful appending it to the path.
"Path is not a string - it's a sequence of path components and should be treated as such by our APIs."
For maximum correctness, you want to turn it into a file handle as soon as possible, and do all operations through the variations of the file functions that end in "at", like: https://linux.die.net/man/2/openat
The downside of this approach is that you still technically have to carry the path around with you if you ever want to present it back to the user, because once you have a directory handle, you can get back to the root directory easily enough by following parent links and seeing what directories you end up in, but that may not be what the user "thinks" the path is, and they want to see their path, not a canonicalized one. And they're mostly right. And it's not easy to correctly track changes to their intended path from this basis either.
Basically, I don't know of a really solid, 100% correct way to handle this with any reasonable degree of effort.
"you want to turn it into a file handle as soon as possible"
But no sooner.
For example, I've run into problems where I'm configuring program A server to talk to file location B... but I don't have access to file location B. But the client-side library for talking to the server tries to convert location B into a file handle and then freaks out because I can't access it. When I don't want to access it. I want that program to serve it.
If it was using simple "path" objects that didn't confirm that I have access to the path, everything would be hunky dory. But because it tried to convert it into a file handle unnecessarily, I get blocked.
Why not just hold onto both? The users representation and the file handle. Only ever "display" the representation, while you do all operations on the handle. (Not trying to be sarcastic, just curious).
This goes for most instances of user input. Timestamps is the other common one people get wrong. I've even seen programs that pass around timestamps as strings in multiple formats and as integers (Unix time).
As a programming noob, I'm wondering what would be the better way to pass or return a unix time value as opposed to an integer?
Depends on the language but most high-level languages have a timestamp or datetime abstraction which you should be using.
If it's being serialized, consider fully qualified iso8601.
If you need to keep the timezone with it, then use an ISO8601 [0] string: "2021-11-11T15:32:35-07:00".
Otherwise, use an integer unix timestamp, the number of seconds since 1970-01-01T00:00:00Z: 1636673555. Use an unsigned 32-bit integer or a 64-bit integer to avoid the 2038 problem [1]. JSON's maximum safe integer value is a signed 53-bit integer, so if you're using HTTP JSON RPC, you'll have to check for overflow.
[0] https://en.wikipedia.org/wiki/ISO_8601
[1] https://en.wikipedia.org/wiki/Year_2038_problem
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
ISO8601 is a serialisation format. You wouldn't want to use it in internal function calls simply for performance reasons. You also wouldn't want to pass it around as just a "string" type. I think the question was asking about internal function calls. For external data interchange, ISO8601 is the only sane option and deals with all known timezone and leap second bollocks.
> For maximum correctness, you want to turn it into a file handle as soon as possible
This is why I get stressed out when I see paths turned into special objects encoding separators and such.
It tells me the path is living for way too long compared to the file handle.
I only want to see path-specific objects if we're modifying the path, and even then I want that to happen as late as possible.
> For maximum correctness, you want to turn it into a file handle as soon as possible
That's not right. You want to resolve a file/folder path to a file/folder at the exact point it makes sense.
It's a problem if you're using a path when you wanted the file. The file can be switched/modified out from underneath you.
It's also a problem if you've got the file when you only wanted a reference. Now you can't simply switch/modify the file independent of the reference. E.g., maybe you want config file changes to take effect immediately and transparently.
You can also have the hybrid case, e.g., where you want the folder directly, but have a relative path to a file that is resolved late.
If you're unsure, I'd err on the side of late resolution.
Another inconvenience with this approach is that you can keep thousands of paths in memory no problem. But thousands of FDs may cause you to exceed per-process limits.
doesn't this lock the file?
> I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string
I guess that depends on what you mean by "string". `open` and `fopen` need a char* path to open a file. Whatever fancy Path abstraction you use eventually becomes a char* string, because that's what the kernel needs.
yeah. it's a string.
On POSIX systems file names are not strings, they are sequences of bytes. They might not be UTF-8 or have any meaning. Python3 had to hack around this, they thought they could force everything to Unicode and discovered that doesn't work.
Which makes for fun issues like there's no standard way to display a filename in Unix. A system that's, you know, all about files.
Unix: everything is a file, including file names!
That's probably because paths aren't properties of the file itself, they're helpers to reference the file.
At least for most Linux systems (not sure about other *nix, but I expect the same?), there is a system default encoding, defined by the locale, and I think decoding the filename in that encoding and displaying the resulting string, is probably the correct way to display a filename? That seems as good as you are likely to get on any system really.
I think for any POSIX system, either there is locale support defining the encoding, or it uses the POSIX locale, which defines the encoding (ASCII).
Of course you need to handle cases where filenames cannot be decoded in the system encoding (probably by replacing characters that cannot be decoded), because a filename in a different encoding, or even with no valid encoding, has been used on disk. While systems can say that file names containing bytes that are not valid characters in the system's encoding are not valid file names, that doesn't stop people mounting disks with them, so the problem never goes away if you support opening media from other systems.
What I am saying is that this is no more a Unix problem than it is a problem on any system that supports removable media.
On POSIX system file paths are C strings, which are sequences of bytes that cannot include the 0 character. UTF-8 or oher meaning is not required for something to be a string.
POSIX "Fully portable filenames" allow all characters except 0x2F (/) and 0x00 (NULL). That means file names can include line feeds, backspaces, EOF, etc.
"This is `a
perfectly vali'd.\010! file name\377, despite the weirdness"
Strings following certain rules are entirely valid representations of paths, just like sequences of path components in the chosen language/framework are. Similarly, the sequences of bits that make up the sequences of your language/framework in memory are an entirely valid representation of said sequences of components.
Yes, paths have structure, but saying "a path is not a string" is equivalent of saying "C source code is not a string". Both are strings, and both are something else, represented by strings according to rules. Different internal representations have different advantages and disadvantages. I fully agree that for things such as "adding components" an internal sequence/list representation is better, but strings can pass arbitrary IPC or even ABI boundaries much easier for example. (And you wouldn't bat an eye for example when you see FQDNs like "www.google.com" passed as a string instead of as ["www","google","com"] because the string representation works pretty well.)
C source code and paths are both representable by strings, true, but the fact that they're not actually strings is still important, because most people don't know that, and in the case of paths that leads to a lot of edge cases (in the case of source code it leads to a bunch of inefficient and weak tooling, which isn't quite as bad).
Because neither are strings, their native representation shouldn't be such - it should be something structured, and only when necessary (IPC, FFI, serdes) be serialized into a string representation. This would save people a lot of time and effort.
It really depends. Do you usually keep hostnames as strings? URLs? JPEGs? Why or why not?
Sure, a browser will hopefully quickly parse that URL and break it up, an image viewer will do the same with a JPEG. Will anything that's only interested opening/displaying that URL or JPEG, through a library or external program?
POSIX paths are actually remarkably simple in structure[1]. The only caveat is equality and normalization: Without normalization, a path a might be equal to a path b while their representations differ, e.g. "/etc/foo" and "/etc/bar/../foo". But this is the same whether you have a string or a list of strings, you need to normalize in whatever representation you choose to check for equality.
[1] Almost shocking myself, even Haskell defines its primary FilePath type literally as "String".
things like this are why the Unix philosophy is so bad.
text processing is hard if you must support Unicode, and that means every Unix command line tool must implement or employ a text processor to handle input. it would be much easier if objects were passed back and forth. PowerShell got this right.
I'm hardly afraid but I just think it's poor ergonomics. Same as the move from
to
Everything seems to be going this way in Linux land. Longer names, harder to type names, camelcase names, spaces... I'm looking forward to an OS that treats command line ergonomics as a first class feature and where camelcase & spaces are verboten.
I find this attitude misguided. More descriptive names are more ergonomic for things you only use rarely but they need to be combined with much better autocompletion than most shells provide by default.
You state that as if that were objective.. but that's not my subjective experience at all. Somehow I have a hard time remembering these long names, (is it --conf or --config or --config-file or --config-path? -c would've done it for me. --set or --set-prop or --set-property or --prop or --property?), and I need to look them up in a man page anyway, and I make more typos typing them, and shell completion rarely works well if at all. I also find it harder to read and edit long lines that wrap.
Somehow these short letters stick much better for me, and the effort for finding them in the manual is the same, although in case of extra complexity as with xinput, it's even worse with the long names. I don't use either command often, but it's hard to forget xset m. The only thing I remember about xinput is that it's a horribly long lithany of things which I need to look up every time, and the syntax still feels weird.
the most used options for properly written tools have both short single char option like -c and long-form version --config if you need verbose self-describing option.
If you are using cli tools of github written by a random person, then no wonder you will see non-standard approaches to UX.
PowerShell takes an interesting approach in that it accepts any truncated variant of a long-form flag as a short form, provided it isn't ambiguous (i.e. if the interpreter can't decide which long-form flag to expand a short-form flag to.)
For example, if a command features a "-ConfigFile" flag, valid short-form variants include "-C", "-Co", "-Con", "-Conf", and so on. But if the command featured an additional flag "-ConfigURL" for example, the aforementioned short-form flags would be ambiguous.
getopt_long (and thus most GNU programs) work this way. I think it's probably a misfeature though since it means that adding a new option can introduce ambiguity. Having both short (ex. -x) and long (ex. --exclude) options is a less problematic solution.
The shell ought to be able to help with that. There's no need to remember if it's --conf or --config if you can press --conf<tab>.
One of the things I like about Fish is that by default it can tab-complete program options and also shows a one-line description of what each of them does. (It grabs that info from the man page).
So much of computing is dedicated to solving problems that could be omitted.
I mean, that's precisely my thoughts on copyright and licensing in general but what can you realistically do?
Realistically, on an individual scale, you can pretend it doesn't exist and go on with living your life?
I very much would if only that pesky State didn't persecute me for that. Apparently, when I refuse to acknowledge the copyright and software license terms, other people get upset to the point of bringing the wrath of that Leviathan of oppression upon me! The nerve of some people!
Seriously. Just get up from the computer and go do something else. /s
We computer people are truly an odd bunch.
> and shell completion rarely works well if at all
I just tried fish. xinput --set-[TAB] and nothing. Apparently it doesn't understand the standard long-option format that is supported by xinput and documented in the man page. You have to know to omit the dashes and then it'll complete. And it's downhill from there.
Yeah I used to have all kinds of simple as well as supposedly sophisticated completion setups with zsh years ago but I've given up on it since then. It's always half-assed and half the time causes more problems than it solves. Same with bash. There are some places where I must resist the urge to try complete a filename because the shell starts trying to figure out which target it can complete from a Makefile in a large build system and just freezes. The only practical way out is to interrupt and type the command again or wait a stupidly long time. There are other issues like completion trying to be smart and filtering out things it thinks you don't want to complete. Nothing is more frustrating than a shell refusing to complete a filename that you know is there.
I run fish. I was able to get long-option completion for gcc, polybar, firefox, man, emacs, xrandr, and fish itself. The only command I was not able to get long-option completion for was xinput. You just picked a bad program to try.
I'm with you. Terseness is paramount.
I could never overcome my repulsion for Java and ObjC because of that. On the other hand, I fell at home with crazy RegEx that look like line noise to most people.
I think shells could use something like a built-in eldoc[1], in addition to tab completion. It would make terse command line interfaces much more usable if you could see what the positional arguments were for.
[1]: https://docs.cider.mx/cider/config/eldoc.html
I hate .methodNameAsLongAsMyArm as well, but there's the opposite extreme:
As a beginner, I liked short variable names. When I came back a few months later, I learned my lesson. Years later? easier to just start over.
I like long form version. It helps me remembering what it does and why. Eg: `iptables --insert INPUT --protocol tcp --jump ACCEPT` was more helpful to me than `iptables -i INPUT -p tcp -j ACCEPT` when told how to allow TCP traffic.
For everyday command like `ls -l` I don't mind but anything more serious I take a more cautious approach.
The few scripts that I've written for personal use generally lack documentation or help commands of any sort; instead, they take all possible straightforward variants I can think of for each command (`--config`, `--config-file`, `--cfg`, `--conf`, etc). They usually convert everything to lowercase before processing, too. It's easier to fail safely on too much/too little input than it is to provide actual help.
Spaces don't make anything more descriptive, they just cause completely unnecessary quoting and escaping hassle.
The amount of time that has been wasted by Windows using "C:\Program Files" instead of "C:\Program_Files" far outweighs any highly questionable aesthetic benefit IMO.
On the other hand, how much broken code has been fixed to properly deal with paths just because of that? I'd argue that to be a major benefit. Same with Windows Vista forcing developers to write applications that work properly as a non-admin user.
Short option for interactive terminal. Long option in automation.
I’ll be damned if I have to remember or lookup what -n means to some obscure program, when reading someone else’s script. Exception given for super common tools where everybody knows like ls -la.
With the disclaimer that shell scripts, especially ls, aren’t exactly suitable for reliable automation in the first place.
What's wrong with camelCase? It's easier to type than snake
There's a tendency away from snake_case and towards kebab-case in things you interact with via CLI. Even moreso towards nocase.
Programs like Powershell eschew ease of use in CLI for readability in scripts.
Snake_case is problematic for including filenames in TeX also. This is a big no for me, even if I find it more readable than the other.
> Even moreso towards nocase.
Nocase (did I break a rule by writing it that way?) seems great when you're enmeshed in the domain and you can see the implicit separators, but then someone looks at your naming from the outside and you're guaranteed to have an 'expertsexchange' in there somewhere.
oh, fsck
Powershell is case-insensitive, so camelCase is only a writing preference
It's still verbose in places
camelCase is objectively harder to read than snake_case or kebab-case, though familiarity can mitigate that.
I'd argue it's at most a tiny bit harder to read, and a lot easier to type. On balance I'd rather avoid making a pinky key one of the keys I have to use the most.
"On balance I'd rather avoid making a pinky key one of the keys I have to use the most."
And you use something else than your pinky finger for the shift key specifically when typing capitalized letters for camelCase?
At least it's where they sit naturally on the keyboard. And the shift key is wider specifically so you don't have to be accurate with your pinky when you're pressing it. The underscore is one of the least ergonomic keys there is. And you need both pinkies to do it
I might be misunderstanding. On all layouts I'm familiar with the underscore key is directly next to one of the shift keys, or left of backspace. Neither layout requires the Vulcan death grip. Shift should always be under your pinky fingers to avoid contortions.
On the US layout it is next to the zero key on the top row.
Having used a lot of all the formats, it's argue it's a lot easier to read an a tiny bit harder to type. For typing it's basically just an extra `-` because unless your alternative is nocase.
For reading, CamelCase has 2 significant ambiguity issues: similarity between I and l, and what do you do with acronyms. Acronyms wouldn't actually be a problem if everybody just wrote them would in snake_case (i.e. only capitalize the first letter), but they don't and so it's anyone's guess whether you're going to get "Id" or "ID".
There's also a minor issue where if you're on a case-insensitive file system it can be a little difficult to change casing, but adding/removing underscores is easy.
Adding an underscore everywhere is horrible! The spacebar is huge, and gets your thumbs basically to itself because space will be one of, if not the most commonly typed key. To replace that with one of the least ergonomic keys makes no sense.
And if CamelCase is so hard to read, why is it the norm for "high level languages"? Shouldn't those be optimized for ease of use?
> And if CamelCase is so hard to read, why is it the norm for "high level languages"
That's over-selling it a bit. It's more common, but not dramatically so. Outside of class names, CamelCase isn't the norm for Python, PHP, CSS, HTML. It's also not the norm for shell scripting, but shell scripting has horrible readability for other reasons.
I believe CamelCase is more common for languages like Go, C#, and Java because they grew up in large organizations where having god objects/classes with 400 methods is kinda normal and having aMethodWithAReallyLongName is pretty common. One of the advantages of CamelCase is that it does shorten really long names.
I could infer a lot about the second and what those params mean and what they do.
The first one is some magical incantation.
Sure. One could also make "move-down-one-line" be the incantation to move the cursor down a line in vi, but I prefer j.
Ergonomics isn't all about making everything self-descriptive for someone seeing the thing for the first time. It's about making things comfortable to actually use. If it's so long and complicated that you can't even remember how to do it, it's not very comfortable to use. Even if I could remember, xset m 0 0 is still far more comfortable.
And fwiw you still don't know what 0, 1 in accel profile do; you need to look that up or take a wild guess, and if you want to use that command, you'll also have to know how to look up the device because chances are yours is not the same as mine. So it's not any less magical in the end, just more verbose.
The "cool" thing about the xinput command is that you don't even find accel profile in the man page. You gotta look elsewhere if you want to understand what it is and what it does and what the parameters are.
xset m? Yes, that is documented in the man page.
It should be based on frequency of usage. I can tell you that moving down a line in vim is a little more common than toggling the mouse acceleration.
I would never even type such a command. I would just copy paste it once.
Yeah well, given that mouse acceleration tends to be on by default, I need to turn it off every time I'm on a fresh install or computer I haven't used before. The last time I needed that was yesterday.
I don't want to waste time searching for a command to copy-paste when it could just be made short, simple, memorable and ergonomic. I could type xset m 0 0 faster than I could open a browser and ask google how to disable acceleration with libinput. And again: you can't just copy-paste the xinput command unless you're lucky enough that it matches your device. On my new computer, the device has a different name than on my old laptop even though it's the same damn mouse.
It should be, but how would you keep track of usage frequency?
At least it would push all the "This switch was added by someone playing with UNIX at a university in 1986 and hasn't been used since" options to the end of the list.
> Ergonomics isn't all about making everything self-descriptive for someone seeing the thing for the first time.
We're talking about `xset`. It doesn't make sense to optimize that for usage of more than once a year.
The less frequently I need something, the more frustrating it is if it's not short and memorable (or easy to look up in the synopsis or built-in help). Forgetting and googling a needlessly complicated command over and over again every year isn't fun.
xset achieves that perfectly. If I somehow didn't remember how to set mouse acceleration with it, a quick glance at the synopsis immediately tells me. Or I can just run the command and it'll tell me:
Zero frustration, and the command is so short and simple that I end up remembering it without trying.
This is something I've observed more than once: I easily memorize useful sets of one-letter flags even if I can't remember or know what they all stand for. This just doesn't happen nearly as much with long options. Commands like ls -ctrl or ss -nap quickly become part of my repertoire even if I don't use them very often, but I really couldn't remember ss --numeric --all --processes (if I had written that from memory, it could've ended up as --num --all --pid or --numeric --any --process), and I don't even know what the corresponding long options for ls are. In the rare case when I have to deal with an option that has no short equivalent, I feel like I have to look it up every time if it's been longer than a few weeks.
You talk of optimization but I think this is just a very basic (and reasonably successful) attempt at sane design. It's not like someone had to go far out of their way to make this in a manner that isn't batshit insane.
But which case should software interfaces optimize for? Ergonomics of someone who uses a tool frequently, or interpretability for casual by-standers of some out-of-context shell command?
Another interpretation is:
On the first, you think you know what it does, but you're not sure. So maybe it gets looked up.
On the second, you know you don't know what it does. You so know to look it up.
Personally, I'll take the second. Assumptions during debugging are dangerous things.
I've a feeling you will hate powershell
Needlessly long parameter/command names and the bizarre insistence on capital letters are the #1 and #2 reasons I detest PowerShell. Like GP, I resent that Linux tools are moving in that direction.
Long option names are more descriptive, more easily distinguished, and easier to remember. Your shell should be intelligent enough to provide tab completion for option names, assuming it is configured to.
> Long option names are ... easier to remember ... Your shell should be intelligent enough to provide tab completion
They are so easy to remember that you need to configure your shell to remember them for you?
Long option names are more difficult to remember because a long option name can be spelled multiple ways and it is difficult to remember which spelling is correct.
>Your shell should be intelligent enough to provide tab completion for option names, assuming it is configured to.
Wait, are you saying that I need to change my shell or config to make up for another tool's poor design?
No, thanks.
IMO, powershell got it right. Yeah, it’s syntax is strange, but it has standard flag usage with proper autocomplete, and you can shorten any flag the way you want (eg. fuzzy match) if it is unambiguous.
Cue nmcli (CLI for Gnome's NetworkManager) which uses UUIDs for everything and (at least a while ago) did not accept partial-but-unique UUIDs. Basically goes "nmcli connection up 5095665a-d82c-4ae6-8964-283623387941".
By this point, I'm pretty sure there are people at gnome who compete to see who will make the stupidest suggestion that gets put in production.
It's a Gnomespiracy to determine whether worse is actually better.
Fits right in with COP26. (Could Of Punted?)
Weird, I haven't had to do this. Most(/all?) connections have nice names you can see with `nmcli c`... and so I can do `nmcli c up id DroidNet` and that's pretty dang nice. Pretty sure this worked with Ubuntu 14.04 (though, nmcli has gotten much more featureful since then)
(The ability to shorthand connection->c and similar is great, too; obviously not unique to nmcli)
apt-get install nmtui # it's better
nmtui is a life saver tbh
The problem is we're optimizing for "easy to learn" rather than "easy to use".
In a world of broken promises and tool churn, minimizing tooling investment isn't laziness, it's a defense mechanism.
This is a lesson I had to learn the hard way, multiple times.
I've learned this lesson too, and I now avoid using any tools that have broken backwards compatibility in the past 20 years.
That may be a part of the problem but honestly I don't feel like all these new crazy interfaces are easy to learn either. I mean how do you come up with the lithany xinput calls for? You need to understand the syntax for specifying a device. You need to know that you're to set a libinput property, and you need to know the name of that property, and it's not documented in xinput man page, and of course you need to know the values to pass which again are not documented in xinput man page. You can play with --list-props and then take your search elsewhere because it is completely opaque and doesn't explain what the properties actually do.
I suspect the number of people who figured all that out without having to find it by googling / arch wiki / whatever is very very low.
Now I'm not gonna say xset is the easiest interface to figure out, but the syntax for setting mouse acceleration is right there in the synopsis, and if you search down the man page, you'll learn a little more (and also if you just run xset without arguments, it'll tell you how to set mouse acceleration). It might not be the best designed tool but it's something I learned back in the day as a teenager just by looking at the man page.
I think the real issue is that people nowadays are designing these interfaces to be consumed by interactive configuration tools, GUI apps, and desktop environments; they're more dynamic, more complex, more flexible, but not easier to figure out, not for you on the command line. The command line is just a last resort. Second class citizen if you will.
Kind of ridiculous if you ask me.
It is, but they actually have a shortcut for that (--disable, --enable).
Direct quote from my console:
On some level it makes sense. The problem with the command line is familiarity.
How often do you reach for iptables? If you're like myself, and most home/desktop users, then probably once in a blue moon to set it up and then you leave it alone. But a system admin? Maybe they touch it a few times a week or month. Every time I use iptables I have to relearn how Linux networking works.
Similarly, the xset/xinput thing. When I need those tools I just create a script or throw it in .bashrc. I adjust the settings once and will not touch them again for a couple years. It makes sense to have long parameters that are readable. I can look at my .bashrc and see exactly what device is getting adjusted.
imho, the fundamental problem is using space as a delimiter. Also, case-sensitivity is a disaster for ergonomics.
If you had comma-delimiting like in an algol-derived language, you wouldn't need to quote things with spaces.
edit: also, code is read more times than it is written, so optimizing for readability over brevity is generally a good move.
Well, if you think that's bad, behold the recent trend in network interface names on Linux.
We started out with 'eth0', 'eth1', etc. Which adapter was which could change when adding and removing a network card. That was bad, so that prompted the evolution.
Now we have 'enp1s0', 'enp0s31f6', 'enp13s0' and many similar variations. These are supposedly more stable across device changes. As it turns out, it wasn't.
But wait, there is more! Now we have the "predictable names" scheme that produces interface names that are even longer, and not even slightly easier to remember.
Read about the whole sorry saga here:
https://wiki.debian.org/NetworkInterfaceName
I do get that it is not an easy problem to solve, especially in the face of removable network interfaces (like USB Ethernet / WLAN). But surely this is not the best we can do.
Missed the 's', it's:
https://wiki.debian.org/NetworkInterfaceNames
I was actually ranting about this on IRC last night (yeah now my laptop has two enp* interfaces and enx[MAC])..
One thing I like about OpenBSD is that buses are scanned and drivers probe in order and there's no race between drivers coming up. Unless your hardware is physically tampered with or broken, all interfaces come up with the same name across reboots. Linux isn't like that (even if you don't touch your hardware, interfaces could swap across reboots), so you need to do something about it.
As is typical on Linux, the default is unergonomic and if you want something nice, you're on your own to make it so.
If you already have userspace daemons responsible for device insertion and naming, it really wouldn't have been so hard for it to e.g. automatically add a config file / database entry for each interface the first time is seen. So the devices that came up as eth0 and eth1 are still eth0 and eth1 on the next boot; if I unplug eth0 and add a new card, the new one would be eth2 because eth0 is still reserved for the first card I had.
> add a config file / database entry for each interface the first time is seen.
Ubuntu did that with their persistent-net.rules udev rule. That was a part of the PITA of the old naming.
> These are supposedly more stable across device changes.
No. These are stable across reboots. The old eth? weren't. And yes, that had been a PITA.
If netwok interfaces were files we could just have both short names and stable names, like what we have for block devices.
These changes are meant to make it easier to read and understand command-line incantations (and to make them more explicit, which is always good), because the command-line paradigm, being text-based, imposes an unavoidable trade-off between ergonomics and understandability/ease-of-use. It sounds like you prefer ergonomics - although I wouldn't be surprised if most users would prefer ease-of-use.
Of course, if one doesn't write a CLI to begin with, this trade-off doesn't exist - you can have your cake and eat it too.
A lot of my stuff is cross platform so making filenames portable means avoiding spaces.
Ironically, even NASA doesn't like space.
https://www.nas.nasa.gov/hecc/support/kb/portable-file-names...
Touché my friend, had a good laugh
I am also that age, and kebab-case is the best case for filenames.
2021-01-01-some-important-document.pdf gives me the warm fuzzies. On the off chance that some more differentiation is needed, throw in an underscore and a whole new world opens up
I go one step further: 2021-11-11_client_project-name.ext
2021-11-11_client_projectName.ext is also OK. But underscore separates fields, hyphens for space replacement.
this is the way
but the extra Shifts, no thank you
you gotta involve your pinky or it'll atrophy
Cut most mine off in an unsupervised Halloween pumpkin carving accident when I was a kid. I think the lack of length actually allows me to type faster.
I see and applaud your use of the underscore there, but I must reject the premise!
work/client/project/2021-11-11-file.ext is more or less how I lay stuff out. I’d say client/project is a folder level distinction (arguably dates too).
[EDIT] Realistically most of the stuff under <project> is git repos and I usually make a “home” repo where I keep org files for tracking hours, notes, and resources related to the engagement.
work/client/project/2021-11-11-file.ext is great until you've got a '2021-11-11-project-status.txt' in a few directories and you need to find one quickly! I do a combination: clients/client/project/2021-11-11-client-project-update.txt
I just store it as a content hash and then when I want to find the file, I just have to recreate its content and I can then just get the hash.
It sounds like what everyone in this thread needs is a database file system. This was always my favorite proposed feature of Windows Longhorn that never made the cut. Almost 2 decades later and Microsoft's latest OS still doesn't have this feature.
I wrote about what I perceived as deficiencies of hierarchical file systems, and proposed an alternative organization based on tags and hashes. It was discussed on Hacker News last week and many years ago.
https://www.nayuki.io/page/designing-better-file-organizatio... ; https://news.ycombinator.com/item?id=29141800
Have you used BeOS?
For sure! I actually used Be before I ever used Linux.
I'll be the opposite voice: the file system isn't for precise organisation, it's just for storing. For organisation, the ideal thing to use is tags. Since most file systems don't have tags and using software for that would be a pain, the best way to do this is to list the tags in the file name.
I've always thought that personal files, photos, or any other kind of just needed more connections between them to improve my information retrieval experience. That's how I had become a Zettelkasten evangelist. I believed it would be the cure for the information overload disease of our era.
But life made me use Emacs org-mode more and more, and I'm now in love with tags. Retrieving information has become so easy, especially with org-mode's tags inheritance, that I hardly think making connections between headings or notes is necessary anymore[1]. And I believe that applying tags to filenames (a la Karl Voit [2]) will create the same effect
[1] A Zettelkasten-like system is still unbeatable imo when it comes to ideas repositories, i.e. a second brain you can talk to and get new insights. It's just not that great for personal knowledge management or project management.
[2] https://github.com/novoid/filetags
Maybe you mean `2021-11-11_client_project-name_v2_final.ext`
2021-11-11_client_project-name_v2_final_ridaj(1).ext
Copy (2) of 2021-11-11_client_project-name_v2_final_ridaj(1)__FINAL-v2.ext
> But underscore separates fields, hyphens for space replacement
But why not the other way, hyphen-minus for separating fields and underscore for space replacement? That seems to me more consistent with how underscores and dashes are used.
I fully agree, that's how I do it :)
my_project-some_activity-this_document-20210923-v02.txt
Kebab case is the often overlooked benefit of prefix notation and semantic white space in programming languages. Honestly the best case of all cases imo.
One glorious day we'll accept programming languages that require spaces around infix arithmetic operators so that we can make kebab case a reality!
Lisps, especially Scheme with its `x->something-else` convention, have ruined naming in other languages for me.
Maybe Raku[1] is for you!
[1] https://raku.guide/#_syntax_overview (see section 1.7.1)
Forth does something like this, by virtue of its reverse Polish notation.
In Forth, 'words' (which are roughly analogous to functions and operators) must always be separated by whitespace, as Forth doesn't parse out operators the way most languages do. In exchange, you get the ability to use symbols in identifiers, as Forth has no reason to single out symbols like + as being syntactically special. You can even use a number for the first character. (For that matter, Forth will even let you override the usual interpretation of a numerical literal, but that's always struck me as going a bit far.)
It gives you a + word, analogous to the + operator of most languages [0]. It also gives you a 1+ word, as an (admittedly slight) abbreviation of the sequence 1 +. [1] If you wanted a 2+ word, you could easily define it yourself.
(This property of Forth evidently wasn't enough to get it to take over the world, but it's still neat.)
[0] https://www.complang.tuwien.ac.at/forth/ansforth-cvs/documen...
[1] https://www.complang.tuwien.ac.at/forth/ansforth-cvs/documen...
In my work, today's date would be 21K11, to save space over the longer date.
How do you distinguish 21K111 and 21K111?
Are you trying to catch GP on differentiating hours, were it to be appended to his time format (1st @ 11 vs 11th @ 1am)?
Notably he didn't promise any, but presumably one'd need a separator... Maybe, per his "K" usage of the month, one'd use the alphabet again. 11am would be "K" again... or lowercase just for giggles?
I don't think it reads very well, but I also think one'd get used to it pretty quickly.
I was thinking January 11th vs November 1st. Maybe their "date" doesn't need/support day-of-month? Or they typod and I should just focus on my work.
I imagine January is A and November is K, so 21A11 vs. 21K1 (or maybe 21K01).
Ah yes, I missed that K was a month.
Are you working in some embedded system with tiny memory space or something? What's the use of saving one character? Just make it YYMMDD!
> kebab-case
I hadn't heard that before and I love it.
If you hadn't heard kebob-case called that before there's a chance you haven't heard SCREAMING_SNAKE_CASE called that before, and I couldn't live myself if I didn't let you know.
that's hilarious, thanks for sharing that.
Perennially relevant xkcd: https://xkcd.com/1053/
Awe, in turn I have never seen that particular xkcd—it's great! I learned to call it "feigning surprise" and I always try and be conscious of it (though I still catch myself doing it from time-to-time).
Google considers it too violent apparently. In one of their recent changes to their style guide, they started recommending "dash-case" instead.
https://developers.google.com/style/word-list#letter-k
This guide is stupid. They recommend not using “janky.”
Tbf, dash-case is more descriptive. Kebab doesn't mean skewer everywhere
Same. I had tears in my eyes from laughing. For some inexplicable reason it seems incredibly funny.
> 2021-01-01
Yes on the date format.
Saves you so much time.
Agreed on dates ordering problem but 20210101 is so much easier to type.
But much less easy to read!
Years that end in a 1 are awful when doing this, especially in October and November. We've had 20211001, 20211010, 20211101, 20211110, and now today 20211111.
I just tend to use $(date -Is) so I don't need to think what date it happens to be today. I guess -Id would work if you don't want the time part.
I don't bother with the century or the dashes, saves time...
211111_foobar_v1.txt
I am old enough that I still save before printing. I think it was Lotus 123 that engrained it for me.
I've recently shifted sharply toward the dash from the underscore. I find it more readable, and it doesn't require the shift key. However, I do find it useful to use underscores to create groups, e.g. test-001_2021-10-11.log. Including hours, minutes, seconds is still awkward.
Burn the witch!
Brother in arms. I just posted similar thing below.
There's a customer for everything. I've just never liked the aesthetics of the underscore. Also if your underscored thing gets put in some document and then underlined the underscores can become invisible.
A lot of this is personal aesthetics, for sure. Personally, I am not a big fan of camel casing. In code, I only use it for class names, generally. I don't find it particularly readable, and for filenames, not all filesystems are case sensitive, so best not to rely on case to differentiate files. Camel case does have the nice property of being more compact, as no character is required. That's its main benefit.
R traditionally uses the . as a legal character in identifiers. Once you get it used to not being syntactic, I found I actually prefer them to underscores.
I'm of the opinion that kebab-case is the best case for all identifiers, because it's easy to read and to type. As always, Lispers were right all along.
I use this style:
2021-01-01_what-happened_who-did-it_possible-reason
I found that some_document_2021-01-01_v03.pdf works best because it keeps the same document next to its other versions alphabetically, keeps them in date order, and keeps them in a sub-day version order.
As a side note, in the good ol' times of ISO9660 level 1-4 and the various mkisofs parameters, an underscore _ which is a CAPITAL -, may have given issues, only for the record/as a curiosity:
https://web.archive.org/web/20151007005513/http://www.911cd....
P.S. should anyone want to see/run the actual batch, a copy has been uploaded here:
http://reboot.pro/index.php?showtopic=18962&page=29#entry204...
This used to be my default, and then I used Matlab, and "-" was interpreted as subtraction.
One of the main reasons why Windows used "Program Files" and "Documents and Settings" was to force the programs (and programmers) to deal with paths with spaces. And you know, for the most part it kinda, more or less worked out although of course even today you will find programs that ask you to install them in a folder without spaces in the path.
VFAT and stuff like that actually provided alternate names like PROGRA~1
Yes, I was doing code to quickly read FAT folders (on a micro controller) and got to the bit about filenames more than 8.3. I decided my life was too short (and processing time) to go and sort out what the "real" file name is. Enforced 8.3 as a requirement!
The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
Sadly such set includes loads of Java programs. If only SUN had shipped a standard way to generate isolated exe files in 1998... but they worked under the presumption that you'd have a JVM already there, because distributing that monster was difficult in dialup times, so you could just hand people a jar; and the enterprise market did not care, since they had webapp servers. Sadly it's an "optimization" that became obsolete very quickly but wasn't rectified until it was too late (java 9+).
> The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
Actually it isn't, just use double quotes and add a '~'. It's just about the only thing batch files handle better than shell scripts. set "VARIABLE=%~PATH"
And that was a good idea, if only Microsoft also fixed the CreateProcess function, Windows would be somewhat sane in this regard. But somehow nobody seemed to think of it. Seriously, look at it:
https://docs.microsoft.com/en-us/windows/win32/api/processth...
The arguments are a single string. So you want to pass parameters with spaces in them? You've got to add quotes and stuff all of that into a single string. Instead of doing it in a more sane manner, like oh, the arguments to main().
The root cause is that argv isn't a first-class citizen like on linux, but an abstraction. The kernel only cares about a single string argument. If you use main instead of WinMain, the CRT will transform the single string into an argv for you.
Oh and cmd.exe uses a different escaping scheme than the CRT.
Microsoft is in full control of the Windows kernel, so they can make it care about whatever they want to, and one would think better argument passing would be a nice quality of life improvement. Less nonsense for developers to deal with, and less weird bugs on the platform.
Sure, but MS values backwards compatibility a lot.
They aren't going to break existing API or bloat the kernel with a bunch of functions that do the same thing.
They can either add a new API which almost nobody would use ― because everyone already learned to use the existing one and either reused or reimplemented the MSVCRT's logic so that most of the software parse the command lines the same way; or they can literally break every single program in existence by breaking the interface of CreateProcess ― which is just as likely as Linux breaking the interface of execve(2).
Giving CreateProcess a new flag so it would to correctly accept "path\\to\\my\\program.exe\0arg_1\0second argument\0argument with literal \" symbol" (with an implicit \0 terminating it) as lpszCmdLine is an easy part; the hard part would be forcing everyone to switch to using it.
Also, I'm pretty certain this processing happens in the user space, and Win32 API is already bloated beyond any belief.
maintaining backwards compatibility means maintaining silly decisions, and Microsoft does both.
They may have thought that would happen but I saw just as much stuff end up in C:\Windows or \Users or (always my favorite) those “Documents” that are really just “whatever random crap every app wants to put there”.
Yet in Microsofts own cmd tool I need to put quotes around my path if I want to refer to any files/folders below those folders.
That annoys me every time I use a Windows system. It was a terrible decision, especially since both the command prompt and the new powershell doesn't accept like bash a backspace before a space, you have to quote the whole path! I get that most users on Windows don't use the shell, but as a developer I do a lot, and every time it's a pain (no wonder they added the WSL in Windows after the failure of Powershell...)
Why would they accept a backslash? Backslash is a path separator on Windows. In most Windows programs, you don't even need to escape the space - arguments can contain spaces and it will understand it, like `notepad My file.txt`
The escape character on PowerShell is backtick, and on cmd it is caret. You don't need to quote everything.
Still way too many libraries and programs can't handle spaces in filenames.
And shells and other programs still have problems with perfectly legal characters in filenames too, like '!' or ':'.
> And shells and other programs still have problems with perfectly legal characters in filenames too, like '!' or ':'.
Without asking you to always quote and escape every file name - what alternative is there? If they tried this you'd probably find you didn't like it.
Not exactly - the problem is mostly when doing variable expansion. The fact that bash treats "$x" and $x as different is a bit of a design flaw. Of course there's still an issue with evaluating dynamically generated code, but that problem is partly solved by working with arrays.
I mean how do you want shells to deal with file names with spaces in? Do you think we should have to quote and escape all file names all the time? If not then how do you think it should work?
Shells should treat data as data, and not have the default behaviour be treating it as code (i.e. you should need to do 'eval $x' or some equivilant if you acutally want the string to be treated as a shell command). This would also mean having a real list type, instead of depending on arbitrary seperators in strings. This is exactly how other languages treat it, and it is not a significant challenge for interactive use (in fact, it would substantially reduce the opportunity for suprises when running commands interactively as well).
> Still way too many libraries and programs can't handle spaces in filenames.
"It's nothing."
"What do you mean?"
"It's nothing... It's empty space. I never taught the computer how to read empty space!"
"I never taught Virgil how to fly."
yep, I still don't use spaces. I also don't use uppercase characters. Just underscores or hyphens.
Sometimes I break the rule and use uppercase but never spaces.
I've had issues when moving between Window/*nix file systems, where Windows file names are case insensitive and *nix systems are case sensitive.
Build script works fine locally on Windows, but then chokes in *nix test server, as it's effectively a different path.
file names aren't case insensitive, it's the windows API that is
I assume you mean that the Windows API (standard for Windows apps) is case insensitive, but if using the WSL (Windows Subsystem for Linux) it's possible to get case sensitivity: https://docs.microsoft.com/en-us/windows/wsl/case-sensitivit...
Even if you're not using WSL, you've always been able to turn on case sensitivity via a registry key. This has not been recommended in the past due to possible issues with windows itself as well as third party software. This history is mentioned here[0]. Everywhere that mentioned a registry key seems to be referring to windows nfs server, not to general file access, however I know that SFU (Services for Unix) installer had an option to do so, so it's certainly possible.
As of sometime in 2018, fsutil can set specific directory trees to be treated case sensitive in Windows 10 without setting it for the OS. This ability is mentioned here[1]
[0]: https://devblogs.microsoft.com/commandline/per-directory-cas... [1]: https://docs.microsoft.com/en-us/windows/wsl/case-sensitivit...
I’ve had issues with git when changing a filename, if the only change is the casing.
Was recently encoding my Stargate: SG-1 DVDs to move them to plex. I was encoding it on a system other than what was serving it, so I had to copy it. It's surprisingly difficult to "scp" a file with a colon in it directly.
I also love when you're using bash and you have a file with ! in the name, and you accidentally fail to correctly backslash it, you not only get "bash: !rest_of_filename: event not found", but it also fails to add that command line to the history, so you can't just hit up and fix it. You have to actually go to the mouse and copy and paste.
It's almost like in-band signaling isn't a good idea or something.
That sounds like... Puzzle time! I had to cheat, sort of, by looking at the man page:
> Local file names can be made explicit using absolute or relative pathnames to avoid scp treating file names containing ':' as host specifiers.
So `scp foo:bar user@host:~` fails because it tries to find the host foo. But `scp ./foo:bar user@host:~` works just fine. I feel kind of stupid for not guessing as much.
Can't you usually just put quotes around the filename and/or path to prevent all those issues?
Edit: nope, just tried it and scp still sees the quoted filename as a host + path
That is just lazy programming. If the input "foo:bar" is ambiguous, the program should try both interpretations (HOST:FILE and FILE) and then present the user with a prompt that provides sufficient information.
"Does foo:bar refer to the local file `foo:bar' (size: 102kB, date: 2021-11-11) or to the file `bar' on host `foo' (FQDN: foo.example.com, IP address: 1.2.3.4)?
1: local file `foo:bar'
2: file `bar' on remote host `foo'
Your selection: "
WinSCP currently has a bug that crashes if it tries to sync a folder with a space in the name
If you suspect that the file might be handed to a bash script at any point, being afraid of spaces is very healthy for sure.
Colons are a problem on Windows, so it's reasonable to discourage creating files with colons in the name.
Is "!" legal in Windows? I'm pretty sure it is not, but I'm not on a Windows machine to test.
I'm not young, but I've been using Macintosh computers regularly since 1990, and even back then file names could be up to 31 characters long, and could include any character except colon.¹ So I'm pretty comfortable using spaces, and sometimes even non-ASCII characters, in file names.
Also back then Mac file names typically did not include an extension, because the file's type was stored as part of the metadata in its resource fork. I remember one time a friend of mine was visiting and was playing around with a paint program on my Mac. Being used to DOS, when she went to save her file, she typed a very short name, and then asked me what the proper file extension should be. I smirked and said, "That's not how you name files on a Mac. THIS is how you name files on a Mac." And then I named her file "Ailsa's Cool Picture". Her mind was blown. :-)
¹This is because the colon was the path separator. But since the classic Mac OS had no command line interface, the typical user would never type or even see a file path written out.
All of that was very cool and impressive and extremely user-friendly.
However, I found the lack of a command-line to be restricting.
On the other hand Mac had some great GUI programs.
Sometimes i think that the command-line is a crutch that keeps programmers from learning how to make good UIs.
True, but most Mac apps were virtually inaccessible by keyboard, and with the slow cursor rate made them a nightmare for the wrist.
Well, you should still be afraid! Be very afraid! Seriously: only a few months ago I was confronted with a video encoding tool that didn't work properly when the file names contained spaces - so yes, even in 2021 it's still safer not to use spaces in file names...
Not to mention most naively written bash scripts!
Looks like I'm in the minority. I always use spaces and non-ASCII characters in filenames.
In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics (fata, fată, fața, față, făta, făță, fâța, fâță).
Given that we have to use diacritics, spaces don't seem like a big deal.
Hmmm, I thought I was fluent in Romanian (born there and lived there for 26 years), but I only know 5 of those 8 words...
That doesn't seem unusual. Only the first 5 are very common.
According to Google Translate the first two are "girl" and the rest are "face". =)
* fata - the girl
* fată - girl
* fața - the face
* față - face
* făta - was giving birth
* făță - a small fish, or a child who won't sit still
* fâța - was fussing
* fâță - variant of făță
As you might infer from the first 4, Romanian uses postfix "the" and for singular feminine words you can't tell the difference if you use only ASCII.
Google Translate is a horrible tool for "translating" single words or lists of unrelated words.
Use a proper dictionary for that. The very nature of statistical models makes proper translation without context impossible for these systems, especially when uncommon words and diacritics are involved.
So how did you deal with it in the 80s/90s?
As you would assume: use ASCII and deduce from context. Many people still do that.
That has lead to phantom diacritics: reading letters in unfamiliar words/names based on what you assume they are. For example some pronounce Chirica as Chirică because they assume someone forgot to type the breve in ă.
I call it the habanero trap. There is no ñ in "habanero", yet a lot of people say "habanyero", probably by analogy with "jalapeño".
Not sure about Romanian, but for many other languages people essentially came up with transliteration schemes (multiple, incompatible, ambiguous) to squeeze your language into ascii.
The resulting text was understandable by the "computer people" but not the general population who did not use the networks back then, perhaps somewhat comparable to when some time ago USA parents encountered the "SMS slang" used by their teenagers.
Back in the day there were dozens of character sets that were alternatives to US-ASCII. Having once worked on an Email client, I needed to bake in a bunch of translation tables to convert stuff sent that way into UTF-8.
> Given that we have to use diacritics, spaces don't seem like a big deal.
There is one big difference: CLI utilities don't usually care about diacritics (though encoding issues can throw a wrench in that), but they care a lot about spaces. So putting spaces in filenames requires properly quoting or escaping parameters, whereas diacritics does not. That makes one-off shell snippets and scripts a lot more annoying (though TBH I tend to shy away from those anyway, these days).
We have a few words that depend on diacritics to be unique in Czech as well - though not as bad as this example - but people just manage without. Hell, I don't even bother installing the Czech keyboard, if I REALLY need it (like in names), I just google for words that have the character and copy it
>In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics
That is what context is for.
So do I. I have a language, and I'm not afraid to use it. My computer should speak it just as well as I do.
There's a server at work that name with a non-ascii character. I've run into compatibility issues lots of times where I can't connect. I prefer to just use English with ASCII and be happy
Server names are different. They are by and large machine-facing identifiers, whereas filenames have a 50-50 split of whether they are machine-facing, human-facing, or both. They makes their support of Unicode a much more critical (and appealing) proposition.
everything is a file
I never put spaces, and won't go over 32 characters, preferably less than than 16. even when sending a file to my grand mom. that's how deep rooted the trauma is. and yes, it remains an issue with some parsers and what not.
I still find files on the internet that my browser can't download because too many characters :(.
Edit: can't save, downloading works.
This is a Windows-only issue AFAIK. It's the same reason why people decide to put their projects in something like C:\dev
Apparently it's quite easy to reach the 260 chars limit
No, it's also a Linux issue.
Too many characters on Linux? Quite difficult to reach to be fair. Do you have an example?
I have been trying to repro with a small nodejs server but either the server cut off the content-disposition filename or firefox truncates it. When I get that in the wild I'll post an update.
In the meantime:
https://serverfault.com/questions/9546/filename-length-limit... 255 bytes it is then.
Firefox cut off at ~217, httpie at 255.
But it's _per file name_. The path could be waaay longer than Windows' one
Ah, my bad. I thought we were still talking filenames, not paths.
Probably my mistake. You only discussed about file names and I threw in the path thing :)
If I'm going to use the file in the command line, I won't use spaces, since I don't know what sick bug I might encounter.
I avoid spaces because they make tab completion more cumbersome in bash.
100% this
Same. For documents and stuff that I use in normiespace I give them friendly names with capitalization and spaces and such, but for anything I'm going to be working on via CLI I try to use filenames that will be easily chunked as "words" when doing things like double clicking it in terminal to select, ^w to erase it, tab completion etc.
Somehow the OneDrive clients still refuse to allow leading or trailing spaces in the filenames, along with a few other characters that are not allowed - seems to cause quite a bit of user friction at least with the non-tech guys that I work with who are confused about why OneDrive is one of the few file syncing clients that has these requirements....
I have had to deal with that nightmare multiple times this year! It was a real head scratcher at first.
Gdrive the same "issue". I think it's on purpose to avoid files that seems to have exactly the same name.
This can cause user confusion
This is a UI/UX problem that I only face when dealing with shells and shell scripts. Never had any issues when spawning processes from within languages/runtimes that support sane argument arrays.
sh, bash and cmd.exe are shit. The shell needs serious rethinking.
I see that there are lots of comments about problems of TAB-completions with filenames with spaces in this comment section and I am frankly puzzled: both Bash and cmd.exe actually TAB-complete those perfectly fine, inserting quoting where it's needed.
I seem to remember bash losing preferred escaping when TAB-completing, but can't reproduce it now with 5.0.17.
Eg. you'd type `ls -l "Spaced [TAB]` and it would turn it into `ls -l Spaced\ Name`. I remember similar annoyances with other special shell characters (eg. single quotes, dollars, slashes), but that all seems to behave sane now.
I didn't even know this was a thing, but can't say I've ever preferred an escape style. I actually use backslashes a fair bit, usually just with spaces. I tend to reserve double quotes for variable or shell expansion, explicitly.
It's not so much about a preference, but your cursor would jump about and you'd need to be on the lookout if you wanted to edit the completion (eg. to change the extension).
> inserting quoting where it's needed
You have to remind yourself to do this manually in scripts if you don't want to see lines full of "No such file or directory."
One of the reasons the shell is broken is because the character they use as an argument array member separator is something that regular people use to distinguish between two words, such as in a file name.
Well, writing scripts would be much less painful if $VARNAME did not explode into pieces by default. Alas, this ship has sailed long ago.
IMHO, it's possible to add a flag to bash, which will turn on this behavior, so problem can be fixed, but it will diverge bash from POSIX sh a lot.
And where it isn't needed. If you have a path that contains a variable and a space, bash will happily escape the $, making the path invalid. See the following:
That error is because when you press [tab], bash changed the path to \$HOME/my\ dir/ but that isn't obvious from the output and I couldn't find a proper way to include the tab-expanded result in the transcript.
(edit: this is on GNU bash, version 4.3.48(1)-release but I've seen this behaviour for years)
Depends on the Bash version, I guess? Mine is 4.4.20(1) and when I do "cd $HOME/my[TAB]", it replaces the input line with "cd /home/joker/my\ dir/", and pressing [ENTER] changes the directory to '/home/joker/my dir', as can be seen from the prompt.
The variable escaping behavior has existed for a while https://stackoverflow.com/questions/32463052/bash-tabbing-fo... https://askubuntu.com/questions/70750/how-to-get-bash-to-sto... https://askubuntu.com/questions/41891/bash-auto-complete-for...
And I experience the problematic behavior on my Ubuntu VM. However, I can get the above describe expansion behavior if I run: shopt -s direxpand
This is a difference between $@ and "$@" (note the quotes):
Damn I didn’t know that. Thanks
Posix makefiles don't support spaces in dependency names. Not sure about gmake.
Cmake doesn't support semicolons, because everything in cmake is a string, and ; is the list item separator.
PATH is separated by colons, so you can't add directories containing : to it.
Every week, I encounter a user - just like I did in the 80's - who cannot explain the difference between a file and a folder.
"What do I use a folder for?", they ask, in the same breath that they request "some way to organize things logically".
The no-filesystem movement has worked hard to eradicate this scourge from user experiences, but I fear that this is the devils work. Computer users should know what a file is, and what its for - and they should know what a folder is for, and why they would want to create one to put their files into it ..
But yet: they don't.
It hasn't improved since the 80's. Taking away the users responsibility to understand these things, only makes computing worse. The fact that "special chars in paths" breaks things, also holds this factor into place, imho.
> The no-filesystem movement
Is that the movement to store all your data as an amorphous pile of crap, and then provide easy-to-use search tools to actually find the content you're looking for?
On one hand, I really like the search tools that come from this. But I still like to actually organize my data, so I can browse it if I want to. Also, these search tools seem to only work well enough on macOS and fall flat on their face in Windows. (and no idea where Linux falls on this)
You had me at "amorphous pile of crap", but lost me at 'actually find the content'... ;)
Meanwhile, I've got a single directory full of PDF files (over 60,000+) which I routinely "ls -alF | grep <search term>" for, and I've also got some PyPDF scripts for doing deeper content search - but yet I yearn for a way to automatically parse the filenames and organize things categorically into a folder tree resembling a word cloud, symbolic links and all .. one of these days ..
You think space are bad (and yes I'm old enough that I don't use them)... We work with a company that has forward slashes "/" in their trading name and insist on shared cloud directories involving them to be prefixed with that trading name.
As you as you do anything programmatic in/out of these drives it all hits the fan. So I'd add to the original statement - "Avoid 'technical' companies with special characters in their name", it's just not right...
There was some prior discussion about a generational shift here at https://news.ycombinator.com/item?id=28615884 -- there's an idea that people no longer need to know what files or folders are in order to get things done day-to-day with software ( https://www.theverge.com/22684730/students-file-folder-direc... ).
I'm wondering when the first generation of college students will start who have never used a physical keyboard to input text.
If putting spaces in file names makes you queasy, try punctuation - especially punctuation like semicolon or ampersand or single quote that's meaningful to shells and such. <shudder>
Also, emoji.
You don’t name your files with extensions && rm -rf?
Or for more fun, use language specific characters, like äöüß...
And even more fun is, when it mostly works, but then it doesn't and you notice too late.
Honestly, this still causes a lot of problems with some Software. I've had friends asking for help with obscure errors that were ultimately caused by the files they were using being on a path that contains a space or special character.
I've been stuck for years with a bug in my commercial Electron application where images do not get displayed if the folder path has spaces in it :'(
https://github.com/whyboris/Video-Hub-App/issues/667
Any help would be really appreciated!
Shells are indeed the main culprits for the continued fear of spaces, but not the only ones. A lot of programs that deal with "metadata" which will then generate database tables and stuff like that, still struggle when working with any sort of special character. And the same for anything that, behind the scenes, just feeds text into regexes.
Just this weekend I learned that the Espressif Framework doesn't like it aswell.
Our local development environment has evolved to a complex enough sequence of steps to set up and troubleshoot that I spent 2 weeks creating tooling that you can simply point at source checkout locations and the tool will take care to setup that repo.
It broke on the first try on a jr hire's machine, the source checkout location was `C:\source code`.
Slightly off topic but I find myself stuck at being "please for the love of god don't use spaces in git branch names" old. Anno dazumal this might not even have been an issue and I'm just cargo culting.
And on that topic, git branches are case sensitive but windows filesystem API isn't. Git branches are materialized on the filesystem as files and directories.
If people actually abuse git branches being CS, odds are good they're also abusing CS in the repository content.
The linux kernel is one of the offenders, if you check it out on Windows or macOS (which supports CS but remains CI by default) you'll immediately get garbage in netfilter, because it's an habitual user of having different files with names identical but for the casing e.g. xt_TCPMSS.h and xt_tcpmss.h.
The Windows filesystem API supports CS file- and directory names just fine.
It can be enabled on a per-directory basis like so:
> fsutil.exe file setCaseSensitiveInfo C:\folder enable
NTFS had support for this for decades now - it was designed that way to be POSIX-compliant.
It's shoddy software that lacks support for it, not the OS or the file system.
Yep, I recently got bit by this, someone checked in a branch named something like "x<-->y", Windows was unhappy. I think this is a git bug: git should escape these names for the native platform.
https://stackoverflow.com/questions/1976007/what-characters-...
I enjoy choosing fun branch names from time to time. A few of them: Russian when a user reported a typo in a Russian translation; emoji (mostly added emoji rather than pure emoji); and my personal favourite, a ~250 character diatribe about a single-character bug I was fixing (~250 after I discovered that Git’s error messages when you cause it to try to use file names too long for the file system are fairly mediocre).
Spaces breaking tab completion is still an issue, so, yeah.
ETA: not broken in a technical sense, but having to escape them isn't the best experience. So it's just easier for me to avoid spaces.
Where? It works fine in bash and I think most shells ….
That was a bit of hyperbole on my end, my bad. But you do have to escape the space, which I'm counting as a minor break.
Where I used to work they had a risk system that created directories on the window server that matched the book name. They had a trader that named one of his books "COM1"...
I saw this and felt old, but then the comments in here made me realize that the fear\ is%20real.
I still find them annoying, doing lots of work on the command line. I use this hack:
Spaces in file names break half of the shell scripts I have encountered.
And it is one of the biggest reason I hate Unix shells as programming languages, it is a minefield. In fact I think that after a dozen lines, Perl is a better option. It has most of what shells are good at (i.e. running commands), but saner and more powerful.
my god, I was simply trying to loop over every file in a dir and zip it in a bash one liner. Of course, some of the inputs had spaces in the file names. What an exercise in frustration!!!
Yes, spaces in filenames introduce edge cases and bugs that people are not always aware of.
E.g. Here's a random StackOverflow q&a about a Git pre-commit hook where the top-voted answer does not properly handle filenames with spaces : https://stackoverflow.com/questions/2412450/git-pre-commit-h...
However, the 2nd and 3rd most upvoted answers do mention "-z" option to handle spaces.: https://stackoverflow.com/questions/2412450/git-pre-commit-h...
Remember when we put + instead of %20? Spaces in URL's are still a nightmare IMO. I still get strange access log entries where some encoding went lose, especially in heavy Javascript enviroments.
Same goes for capitalisation. All filenames should be lowercase.
Maybe it's not strictly necessary, it can avoid headaches.
Plus sign actually came from https://en.wikipedia.org/wiki/Query_string#Indexed_search
I had a guy in my team use forward slashes in filenames. Terrible idea, caused all sorts of weird issues.
But nice for testing. I spend a few month on Windows while doing a Django project and found a number of bugs no one else discovered because they used Mac or Linux.
Did you mean backslashes? I don't know if any filesystem/OS supports forward slashes in filenames
OS X does in the GUI; they're isomorphic to ‘:’ at the UNIX level. (The Mac used ‘:’ as the directory separator.)
And a : in a file name at the GUI level gets turned into a dash! I just tried to name a text file "Foo/Bar 10:01.rtf" and it changed it to "Foo/Bar 10-01.rtf"!
In that case the GUI is merely changing the file name you type; in a shell you'll see it as "Foo:Bar 10-01.rtf".
How was this possible? None of the mainstream operating systems allow this.
via GUI in OS X.
Ah so that’s not really putting a slash in the name on disk - finder is just displaying the colon that way - it substitutes with a colon for historical reasons that have to do with pre OSX MacOS (but you can see if you create a file from a program or the command line with a colon in it, it will display as a slash in finder). It shouldn’t cause any problems on its own on the system - but the colon is troublesome if you have to interact with DOS/Windows lineage machines.
It's not a matter of being afraid, spaces in filenames are annoying.
I mostly use the shell and navigating in directories with spaces is annoying, you have either to quote it or put a \ before each space. You also have to remember to quote everything, and in bash that can become complex, you start adding quotes everywhere to solve problems caused by spaces (or other special characters like *) in filenames.
So I prefer to not use them, a simple _ is as readable as a space. Only thing is that spaces gets rendered better on graphical file managers, but... that could have been solved (and can still be solved) by simply adding an option to render a _ as a space graphically if there is no ambiguity. I don't care that much since I don't use graphical file managers that much.
Maybe it's just me, but it always seemed like prohibiting spaces and other special characters was a reasonable way to avoid unnecessary complexity (and the bugs that accompany it) when parsing and navigating directory trees and files.
I'm old enough to remember working with 8.3 filenames in DOS, and while the length limitation was maddening, the space part never was. Then Windows 95 came out and all restrictions were thrown out.
Why couldn't we just have a file system that robustly supports long filenames, including variable length extensions, while prohibiting certain special characters - namely spaces, slashes or any directory denoting characters in files, and characters that have special meaning in regex context? (brackets, asterisk, etc.)
By coincidence, I found another reason just two days ago. A web app lists uploaded files’ names, and (in a rarely used context) lets the user search for them. One user has copied a file name from the web page, and pasted it into the search box, but got no results. Turned out that the file name contained two consecutive spaces, which the browser turns into a single space, hence no match. Every layer between the user and file system can do something unexpected.
Related: David Wheeler's Fixing Unix/Linux/POSIX Filenames
https://dwheeler.com/essays/fixing-unix-linux-filenames.html
I nearly gave up on learning newer front-end JavaScript stuff like React & webpack and so on a few years ago because of spaces in paths.
node-gyp doesn't like it when there's a space anywhere in your working path. Stuff I was messing around with was all in ~/Code Projects at the time, and using npm install on some things just broke. Looking back, I definitely could have done a better job parsing the error messages but still...
There's an issue but it was closed in 2018 as "The workaround is to use a path without blanks" https://github.com/nodejs/node-gyp/issues/439
Tangentially, I frequently add dates to filenames to keep things organized. And _always_ in the `YYYYMMDD` format for clarity and technical reasons; `DDMMYYYY` (or God forbid the Americans' `MMDDYYYY`) never made much sense to me.
I do this so often that I have an emacs macro or two that helps me out:
That inserts the "proper" date format (e.g., 2021-11-11) at the current point.
Then to create a date-stamped file name:
And a few others.
Nobody seems to misunderstand this date format. US folks might find it annoying, but understand what it means.
I have had a huge music library on my RAID, and naturally it had a lot of spaces, and non-ASCII, in the file names.
It's cumbersome-ish, but can be made to work.
Then there's shell injection via files containing a newline character in their name...
POSIX portable file names were defined not to have spaces, and just contain '[[:alnum:]_./]'.
The findnl script as part of fslint identifies problematic patterns, and has 4 levels of stringency, with "POSIX" being the most stringent. https://github.com/pixelb/fslint/blob/master/fslint/findnl
Spaces in filenames were a mistake to begin with.
Spaces are used to separate parameters in the command line. There's also no real need for filenames to support spaces.
Or, one could claim that the poor parsing of a text interface shouldn't dictate the for-human names of files, especially when an exceedingly small percentage of users deal with that text interface.
But, of course, if you mix the abstractions of metadata (filename) with location, things won't be trivial.
The filename belongs to the user. Therefore, it is incumbent on the computer to adapt, not the other way around.
Even if libraries all handled it, I’d still personally avoid spaces because spaces get semantically used to separate tokens and I see file names as tokens.
acme.sh - a shell script that I use to create "Let's Encrypt" SSL certificates - creates and maintains directories with asterisks in them:
https://github.com/acmesh-official/acme.sh/issues/1408
This is the sysadmin equivalent of piercing your nose just to make your parents mad.
Spaces in file names are a poor idea. File names are identifiers, not titles.
Let's test something: http://example.com/my silly webpage.html.
Hey look, HackerNews just broke a URL with spaces in it. And it's written in a Lisp dialect and all; it's not some Unix job cobbed together with shell, sed and awk. The language has a string data type, and strings are passed to functions without word-breaking interpolations taking place.
You know what else breaks on spaces? Basic everyday gui text manipulation.
Suppose that in a block of text we have the sentence:
> Please look for the Holiday Schedule 2021 file.
If you double click on any part of the name like Schedule, pretty much every text widget on the planet will just select only that word, and not the entire filename.
However, if you have:
> Please look for the holiday-schedule-2021 file.
There is at least a ghost of a chance that a semi-intelligent GUI can pick that out as a word.
There exist good reasons to keep identifiers as clump beyond just command line shells.
It's why we need encoding like %20 in URLs that never pass through a shell script.
I don't use spaces, because I want to be able to run ad-hoc shell one-liners when working with my data without worrying about quotation and similar stuff.
I also don't use :, as I have ran into problems with both Bash and its completion and FAT FS. Unfortunately, I routinely have timestamps in filenames, so I need to use +%F-%H-%M-%S instead of simple +%F-%T.
One thing has improved, though: I have not run into problems with ěščřžýáíé (which my language is full of) for maybe a decade, except on OpenWRT where space seems to be scarce to support non-ascii.
Edit: I now remember one problem, getting images for a website from an OS X user, which used combining characters instead of direct code points (https://en.wikipedia.org/wiki/Unicode_equivalence#Example), but HTTP requests got normalized in some browsers, leading to strange 404s.
That's funny because the first operating system I used (Apple DOS 3.3) was very liberal about file names. There was a 30-character limit which was a lot, and it didn't mind spaces in file names. Even control characters were fair game, which made things fun when you accidentally inserted a ^A in a SAVE command.
File names shouldn't have anything except a-z,0-9,_ and perhaps a -. No unicode, no spaces, no nulls.
It's not fear that keeps me from using spaces in file names, it's habit.
If we're going to play this dangerous game, from now on I'll figure out how to use nulls (\0) in my file names, and make all the C/C++ programmers cry.
I don't use spaces because it's so much faster to type filenames out (including with TAB-completion) in the terminal.
I do, however, use Cyrillic (UTF-8) in filenames, and I regularly try out if moving a file into ASCII-path will let some programs open it (half the time it's that when I am having trouble).
It's just such a pain in the butt to work with files with spaces. In a script it's fine b/c I just surround it in double quotes, but on the command line I hate having to escape the spaces.
This might already exist, but I wonder about a terminal that was really just a multi-line repl to a language. It would be preloaded with libraries that replicated all the features of the gnu core utils, but instead of calling grep like normal, you called a function like grep("args"). The advantage would be that you had access to a full blown programming language at all times. So when you needed to do something more complicated you would still have access to all the standard language features. And when you didn't need that, your canned core utils like functions would work
Coming from web-heavy and perl5 backgrounds, it's insane to me that people don't treat filenames and arguments and environment variables as tainted user input, and just blindly trust properties about them like "does not contain whitespace or control characters".
I had to move my development folders because you can't develop Android apps if your project path contains a space. Not sure where the issue is, if it's gradle or something else.
Edit: thinking about it again, it might not have even been the space but the exclamation mark in my path. Or both.
If any of you reading this have to deal with very large scale data pipelines for data science / ML type processing, and if "don't use spaces and weird chars in file names" hasn't become second nature by now, let me just say: you are very, very brave.
My first job as a SW Eng was in 1989 in the nuclear industry. Our folders and files were limited to 8 letters. So names were effectively acronyms. It was actually pretty awesome. Clean and concise. Years later, I still remembered the whole folder structure.
If you're in tech long enough, you can be traumatized by anything. Like the time a vendor-supplied system decided after an update that nothing could have a hyphen in the title, and a lot of existing content just... broke at once. Fun times.
Spaces in file names are a nightmare in Makefiles.
Not if you are careful (a bit like "$@" vs $@ in shell scripts).
Edit: replace $@ with quoted version which actually changes the behavior (I was wrong that the difference is between $* and $@).
I don’t think it’s fair to claim that any Make implementation supports spaces: there are too many fundamental bugs and breakages, so that lots of rather important Make functionality is off-limits if any of your file names will have spaces.
https://www.cmcrossroads.com/article/gnu-make-meets-file-nam... explains the situation in GNU Make in 2007 (and I don’t think it’s changed since then, though jgrahamc especially could correct me). Not being able to use such features as $^ and $(patsubst) is severely debilitating for all but the simplest of makefiles.
That's a fair point, thanks!
Not exactly spaces, but I have been bitten by something like this at my work quite recently. A Confluence page with special characters in the page title was working fine for a while. At some point there was a Confluence version update which made the page URL broken (and apparently unrecoverable, or at least not easily recoverable).
One way to look at it is that people of a certain generation eschew spaces because the tools of their formative years simply couldn't handle spaces - but another is that the olds have learned that generally erring on the side of KISS ("Keep it simple, stupid!") isn't a bad idea.
Software engineers - particularly of the more embedded variety - absolutely still have this problem.
The main culprit is GNU Make which does not cope with spaces in filenames. As far as it is concerned an array is a string separated by spaces so it gets very confused. Yes there are some partial workarounds, no none of them consistently work. You learn very quickly to check all code out in a file tree with no spaces in it, otherwise builds can randomly break in strange ways. It's not always clear up front whether Make is going to be involved somewhere in the build, so it's just easier to be safe.
My username has been my name which has an accented character and has broken countless Windows apps every year since forever, so I just keep a C:/Programs folder where I run stuff. You should never not fear filenames.
I am overly aggressive with spaces and special characters in filenames: I use them everywhere and report a bug when they cause errors, because they shouldn't in this UTF-8 age.
I still don't use the special character of my name in my username because that has caused me many hard to fix troubles. Think "cannot recover user password because this user doesn't exist".
I use c:\programs too, but for different reasons. C:\Programs is for portable applications that don't get installed, can be directly overwritten, and consist of at most two files with relevant names. As a bonus, I can run such programs directly from the run menu. C:\Programs\procexp for Process Explorer, for example.
I recently find out a windows folder can't end by a space.. But python for example you can create this folder 'example ' every file you create in this folder will be inaccessible, and impossible to delete.
I've never created a filesystem entry name with a space. Mainly because fear and when fear is not proven, "\" looks so ugly. But I think I'm even worse, I dislike capital letters too.
On a similar note: “it makes sense to add a date to a file name” years old.
Nothing old about that; lots of stuff is still broken. What are the odds Homebrew works if installed to a directory with a space in the name? Maybe the core brew manager itself, but all the packages?
I tend to follow a Postel-like system when it comes to this. When I write a script I'll usually get paranoid and make at least token efforts to handle spaces. Which I will then never, ever use.
I have come back to this thread, which I have spotted and forgotten something like two days ago, to say that just like a minute ago one of new Jenkins jobs that I added failed because I named the item using space and some custom Gradle/Maven magic tool failed to load one of its own auto generated files (I could tell that space was the culprit because error message printed only second half of item name).
How can I not be afraid of spaces if this happens like every other day with every other custom tool ...
Spaces are still not "permitted" in URLs.
Browsers will take http://example.com/some name.pdf and automagically turn it into http://example.com/some%20name.pdf, and deliver the goods without a problem. But having that space in the URL is still out of spec, and will cause your web page to fail validation, even though it works fine.
Let me tell you how much of a pain in the ass that my employer forces spaces in the corporate OneDrive directory.
PS-Microsoft is horrible about stupidly named folders being created and dumped in there.
Depending on the specific issue, the `subst` command may help you. If the OneDrive folder itself has a space in the path, or a necessary subfolder does, you can give that folder a drive letter instead.
I'm still afraid of any non-8.3 filename.
https://en.wikipedia.org/wiki/8.3_filename
And honestly, it's a good fear to have; there are contexts where it still just doesn't work.
Last I checked, the standard answer for GNU make is "Spaces are expected to break the tool, that's working as intended, it will never be fixed." And because we build our towering edifices of software on the pillars of the past, I can't guarantee to you that a project of arbitrary complexity won't try to cram a list of filenames through a make script.
I don’t think this is so much an age thing as a programmer thing. Old people will still name files all sorts of things, and a lot of young programmers today avoid spaces.
If you're developing on Windows, I find a good way of dealing with this to convert paths to short format before using them (E.G. GetShortPathName in kernel32.dll).
Not afraid, but typing a dash in the terminal is easier and shorter than typing a reverse slash and a space. Spaces are kind of a pain in the ass in the terminal, tbh.
Quotes around the path is easier and avoids any issues - but tab completion and drag and drop files into terminal handles most cases for me.
This seems like a case for an axiom I hear infrequently, but I think comes up a lot - things that seem like they should be simple and easy, but are in fact difficult.
I must be nightmare customer, because I've always been exploiting my ability to use filenames in full UTF-8. I'm that guy that sends .pdf to your website.
Why stop here? Why not put spaces in your variable names also? Allowing spaces only in file names and not in variable names is short-sighted when not inconsistent.
My proposal for a shell on the Mac, in the late 80s, was:
- Spaces in filenames get transformed to non-breaking spaces by the filesystem;
- The filesystem treats nbsp as equal to space (just as case-folding treats A=a, B=b, etc.)
Now, argument parsing, mouse double-clicks, etc. all respect filenames as "words", and the output from things like 'ls' just work.
(Yes, I'm well aware that there are case-sensitive filesystems out there. I'd forgotten that iOS was one of those).
As a software engineer, I require testing of paths and files in spaces, and forbid the use of spaces for any system generated file possible to make cli easier.
It messes with tab completion in bash is why I avoid spaces
I do it the other way around. I used to be afraid of spaces. But I have come to realize that it is better to learn sooner than later which pieces of software is in such a bad state that they aren’t handling spaces correctly.
That being said, even after all these years I sometimes need to try a few times in order to get the quoting and the escapes right when communicating names of files with spaces through multiple layers of software.
I like to store data on USB flash drives. After being left to mature for a few years in a humidity and temperature environment, you get some really interesting and complex byte streams where your original file names used to be.
Often they are not even valid UTF8 which, when you uncork the filesystem for the first time in a decade causes the most delightful crashes. The more years the better the aroma.
I'm »still tempted to write umlauts like 'Mot"orhead' old.«
But also a "use a font that has a proper capital ß" hipster.
Spaces in file names are a bad idea because spaces delimit the name of separate distinct files,
At least in my crazy old illogical head anyway.
File names should be long enough to clearly communicate meaning/purpose/context, no more no less.
.doc
och my emojis didn't display, sorry
Hahah how ironic.
I'm hoping to one day be "Windows adds user root folder to the quick links in explorer by default" years old.
I always format my filesystems (macOS) as case sensitive and I'm surprised by the software that has a hard time with that.
On Unix/Linux we've grown up with case sensitive by default but everywhere else it still seems to be a problem now and again.
I should qualify this...I'm en-US so I have no idea what the experience is like for anyone else.
You need them for URL's. Running a stand-alone web page maker using Rust. Document structure:
Crashed on trying to deal with building html when there are spaces in the file name. It is still an issue.
I have been following the guidelines from this presentation for all my filenames, everywhere and it has been working well so far - https://speakerdeck.com/jennybc/how-to-name-files
Today, WSL will try to add PATH in Windows to PATH in Linux. So if you install something like NodeJS in Windows, and run node in Linux, it will try to call /mnt/c/Program Files/nodejs/node.exe and say "no such file or directory: /mnt/c/Program".
I had half a feeling that the warning against using spaces in names pre-dates computing, but after a little research into library call numbers and archive accession numbers, which turn out to have both historically included spaces, I have found no evidence to support this feeling.
It seems to me that many of the problems associated with spaces in filenames are due the OS assuming that a space signals the end of a command or filename.
Maybe we ought have to a different character signify the end of a name? Or signfiy a option section, or the next option section of a command?
In the shell spaces have to be escaped which is annoying. This doesn't change with age I think
And I'm older than Google. If you want some hilarity, newlines are allowed in filenames as well (\n, \r, \r\n). Try getting bash to handle that! (It's possible, though annoying. try redirecting to `while read line` in addition to xargs -print0 hackery)
I've never had any problems with this. At this point, it's second nature for me to either use underscores for spaces, or camel caps if there aren't any single character words like 'i' or 'a' in my desired file name.
Yes, but working with filenames with spaces in them is a huge PITA in command-line tools, because you have to quote everything. The ergonomics is just really annoying.
Personally I wish console shells had chosen another delimiter than space, but here we are.
Not obeying the "Robustness Principle" in software is just poor engineering.
https://en.wikipedia.org/wiki/Robustness_principle
Definitely applicable here. There's no way we're going to eliminate all problems with spaces etc, so why invite trouble.
I wouldn't say it's always poor engineering though, especially the 'liberal in what you accept' half.
Yes, you have a point there, but in this case would being liberal in what you accept be to accept filenames with spaces or (arguably) doing filename handling correctly (ie accept filenames with spaces)?
I'm apparently in the minority of people who know how to write shell scripts that have a chance of working correctly with filenames with spaces in them... and that's not the only reason I avoid spaces in filenames. :)
I have experienced a person using a space in a password for Windows login.
I still don't know how to process this emotionally. Either it is somehow naively really genius, or stupid.
In any case, it scares me, mostly because it is a non-IT person.
Reminds me of the time I watched a coworker's head explode when he tried to extract an archive (from a 'Nix environment) on his Windows machine and was indignant about getting duplicate filename errors.
As a Windows guy case still seems like a weird thing to worry about.
I work in Azure Data Factory, and there are places where a space in a name will cause you difficult to troubleshoot errors. But I can never remember where. It's not universal. So I just avoid them entirely.
I still use the "web safe palette" when choosing color codes for CSS
So, born today, eh?—says the guy who still regularly runs into build scripts that cheerily command that they be run from directories without spaces, since that's easier than proper quoting in the script.
I'm newly afraid to use emojis in domain names: https://tinyprojects.dev/projects/mailoji
The meta point here is that spaces are the type of thing that work fine ... until they don't. This class of bug is best avoided entirely, especially if there is an easy workaround (not using spaces).
But it still breaks in so many situations and becomes a pain in the ass in so many other ones! I HATE people who use spaces in file names. For me it is a sign of a "deeply nontechnical person".
Oh, yeah. Me too!
Except nowadays I worry more about user names that get fed into collaborating applications (with different edit criteria) and password characters (again for systems with differing, strange edit rules.)
I name almost everything with underlines still. I think it’s a programming habit.
Although lately I have started saving my Logic Pro files with spaces, simply because I prefer it to be the name of the song as-is.
I still use the Netbios limitations (15 Characters) when naming servers
I'm “still afraid to use spaces in file names” wise, dammit!
I would say I'm "wise enough to not use spaces in filenames".
It's not about fear, it's about making good decisions, and avoiding unnecessary complication.
No way I would put anything but a-z, 0-9, and underscore in any file name. Too many stupid ways it can go wrong. I guess I have very little trust in my fellow programmers!
Spaces in path are a pain for the shell autocompletion, since you have to escape them by using either "" for the whole string or use the "\ " instead.
Me too. Afraid of dashes too as they might be interpreted as minus. I use a lot of underscores __ _____ _ _ _
Weirdly, my friend hates underscores. But he's a baseball fan
I know I can put spaces in file names, but \ is one of the characters I still can't touch type, so I still hate dealing with them in the terminal.
ascii, no spaces for me
i still get issues with old one-off scripts, that still work, and I forgot to properly quote stuff... plus the urls are pain in the ass with the %20;s.
[0-9A-Za-z_-]+ for me.
Same here and most of the time it's even just [0-9a-z_]+ It's simple and there are no suprises around the corner
I wonder why "space" wasn't always simply treated as another character. To save a couple bytes back in the 50s (when it mattered) I assume?
Any shell script that uses files should use double quotes for at least the variables: `mv $1 $2` is not safe, should be `mv "$1" "$2"`
I'm 19 now and learned this advice from my dad growing up. Still run into situations in my IT work and programming stuff where it makes a difference.
Our tool has no issues with spaces in fields, but we still advise users not to do it because other systems OFTEN STILL DO, in the year of our lord 2021.
I try to avoid spaces and special characters because issues still happen to this day (just yesterday, I had an issue with a file with an accent in it).
My coworkers still don't quote strings in their bash scripts, even when they're paths... and yet they wonder why everything falls apart.
There was a Discussion yesterday at work about allowing quotation marks and semicolons in some user-set titles. We use Mongo. But I empathize.
I'm not "afraid" of it, I just think it's unnecessarily compicated to work with spaces in filenames on the command line.
You should be still afraid. Many commands such as Unix "xargs" don't work properly with spaces if the right flag is omitted.
If you're working on cli this is reasonable
Why stop at spaces?
An old prof of mine used to send emails where the subject line was always a valid identifier in C.
Hello_dear_students_where_are_your_reports_
That identifier is clearly too long.
MISRA C:2004, 5.1 - Identifiers (internal and external) shall not rely on the significance of more than 31 character.
All because we use programmatically interfaces that were intended for humans to write: command line, sql, html, email headers.
It's worse than that. Whitespace is a hellish invention in the world of computers: there are multiple characters that may or may not render as whitespace with no way to distinguish them by just looking at the output.
Yet to the machine (script, shell, program, ...) it matters a lot, since u0020≠u0009≠u00A0≠u2000≠u2001, etc. whereas the aforementioned codepoints render like this: " " (and yes, that's indeed the five codepoint in that order - at least I typed them that way).
(Ab)Using whitespace like that can lead to all sorts of funny business, not just when dealing with shell scripts and variable expansion.
This is why \Program Files, and \Program Files(x86) exist as they do. With spaces, and strange characters, in the name.
Can someone convince me to not use spaces in music, film, and book files where they have a "standard title"?
Some react scripts freaked out on me recently because my login (and thus user folder) in windows contained a space.
I think people who use a terminal interface, regardless of OS, don't like spaces in file names. I avoid them.
Then you must also be "still afraid to write Python instead of Bash scripts" years old, too.
I dislike constantly having to backslash escape files on the command line, so I use dashes instead.
I won't use a space if I think I may need to address that file from the command line...
I'm "still afraid to use more than 8.3 characters in file names" years old!
I'm "8 characters max plus a 3 character extension in your file names" old.
Kids these days will say “What’s a file name?” and mean it. Typing? That’s for the olds.
Never use spaces in file names. It shouldn't depend on age, it's common sense.
Yep. Me too. Early bad experience with spaces in file name and Unix cured me of that.
Sort of related, but here's a joke: Windows 95 does support long filena~1
I\ am\ not\ afraid,\ I\ just\ do\ not\ see\ how\ it\ benefits\ my\ quality\ of\ life.
The nice thing about spaces is there are so many to choose from, thanks to Unicode.
I still feel slight unease sometimes when using more characters than 8.3
Damn, I feel old now :P
Maybe if we'd do it more software would actually learn to deal with it.
"You need to add --print0 to your find call and -0 to your xargs."
Literally just fixed a bug in our software because of an issue with spaces.
I'm still afraid to use national specific characters in file names...
Keep%20the%20names%40and%20links%20readable%20or%20submit%20to%20encoding
Without exception, I never ever ever use spaces in filenames. Ever.
2021-11-11_I_have_absolutely_no_idea_what_you_are_talking_about.txt
I don't know that this is really hacker news material guys...
This is a general issue to this day. So that isn't very old.
I'm "whitespace as syntax is stupid" years old
Python has made me afraid to use hyphens in file names
The Amiga supported spaces in filenames in 1985... =-)
Admittedly trite/unhelpful comment: avoid xargs
I\'m%20still%20afraid%20to%20use%spaces%20too.
Years of Java has me seeing the world in camel case
If a filename doesn't match \w+\.\w+ I hate it
I love it when characters like | break OneDrive
I don’t even use spaces in csv column names
Yet another reason to ditch Make /s
Heck, I'm still afraid to use caps!
Anything more than 8.3 is for sissies.
What about long filenames and paths?
Instead of spaces I just use U+2215
You mean, I'm linux years old?
This is much older than linux or gnu.
Well, I'm using makefiles old
Wait, what's a file? :P
Base64 is your best friend!
I_promise_I'm_not.
Me too, I never do it.
I’m 15. I am as well.
I can relate! :)
me too… still use underscore all the time.
Yep. Yep yep.
You can now?
or Capital letters
i feel so seen
tbh using spaces in file names is still stupid.
Anyone else totally fine with spaces in filenames? I use to rip a lot of CDs back in the day, and never had an issue with the spaces in the file names.
01 - Metallica - Metallica - For Whom the Bell Tolls.mp3
Names like that were common, and had many spaces.