I'm “still afraid to use spaces in file names” years old

1356 points by dario_satu 4 years ago

pimterry 4 years ago

I work on a complex desktop application, and it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names. If you do anything with subprocesses or path processing, it's absurdly easy to hit in a thousand different ways, over and over again.

Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.

Forces you to deal with this properly, and immediately ensures that every automated test checks this case without you having to remember every time. Hasn't been particularly inconvenient, since I'm autocompleting it 99% of the time anyway, and I haven't shipped a single path parsing bug since.

lifthrasiir 4 years ago

While I agree that we should do this in the ideal world, doing so will inevitably break other necessary tools so it is unworkable for me :(
alpaca128 4 years ago

Seems like MS had the same idea according to an answer in the link:
> Microsoft intentionally made programs install to C:\Program Files on Windows 95+ to force programmers to deal with spaces in filenames.
- lifthrasiir 4 years ago
  
  And yet they introduced C:\ProgramData in later versions.
  
  kitkat_new 4 years ago
  
  why "yet"?
  one occurrence is enough to make devs care about it
  
  jjoonathan 4 years ago
  
  Imagine if they made programmers put 64 bit DLLs in a "System32" directory and 32 bit DLLs in a "SysWoW64" directory. That would really keep 'em on their toes!
  
  eyegor 4 years ago
  
  You should look into the behavior of the /windows/sysnative link. It appears and disappears depending on whether your process is running as 32 bit or 64 bit.
  
  Karuma 4 years ago
  
  Programmers should never put DLLs in those folders... Or even ever touch them.
  
  mastax 4 years ago
  
  Except for \Windows\System32\drivers\etc\hosts, of course.
  
  jaywalk 4 years ago
  
  I occasionally try to search for the reasoning behind the location of the hosts file in Windows, and I always come up blank.
  
  blincoln 4 years ago
  
  Maybe it's from back before Windows had a built-in TCP/IP stack? If it were a third-party/optional driver, having files related to it in a path under system32\drivers would make sense.
  
  mjevans 4 years ago
  
  Back around Win 95 when they added networking it was based off of (IIRC) BSD's TCP stack and related tools. They were an optional 'third party' driver of sorts, but shipped by the first party. I'm not positive about WinNT or Win3.11 (for workgroups?)
  
  mixmastamyk 4 years ago
  
  I remember adding "trumpet winsock" to Win 3.x back in the day. Says '94 for that, and summer of '93 for NT 3.1 debut:
  https://en.wikipedia.org/wiki/Trumpet_Winsock
  https://en.wikipedia.org/wiki/Windows_NT_3.1
  
  HideousKojima 4 years ago
  
  They originally copied BSD's network stack, IIRC
  
  jve 4 years ago
  
  https://superuser.com/questions/355297/why-does-windows-have...
  
  kevin_thibedeau 4 years ago
  
  They already hamstrung themselves with LONG because DWORD just wasn't good enough and now long can't be 64-bit either.
- ealexhudson 4 years ago
  
  I wish they did "User Files" instead of "Users" too, because so much software breaks on the home area having a space in it.
  Not least, it makes writing scripts for various shells and getting the quoting rules right an absolute pain as well...
  
  the_mitsuhiko 4 years ago
  
  They used to. The folder was called `Documents and Settings` until Win7.
  
  323 4 years ago
  
  "Documents and Settings" still exists on Windows 10, as a soft link to "Users".
  
  0des 4 years ago
  
  You know, this makes me wonder.. tangentially speaking- I wonder how hard it would be to rearrange the folder structure in linux so that I have something like this:
  /Users/{root, user0, user1, ... }...
  /System/{Logs, Apps/{opt, container, ...}, Temp, Conf ...}...
  /Devices/{Mount, sda, sdb, null ...}...
  /Boot/...
  
  selfhoster11 4 years ago
  
  GoboLinux does exactly that: https://en.m.wikipedia.org/wiki/GoboLinux
  
  0des 4 years ago
  
  Wow, thanks for the reply, nice find! I did some poking around on my Linux system and even re-arranging the home folder was a task of its own because the system kept trying to replace folders in their original places. I will do some digging in to Gobo and see how they're handling this. Thanks again for pointing this out.
  
  dotancohen 4 years ago
  
  > the system kept trying to replace folders in their original places.
  This is the file that you want:
  $ cat ~/.config/user-dirs.dirs XDG_DESKTOP_DIR="$HOME/" XDG_DOWNLOAD_DIR="$HOME/Downloads" XDG_TEMPLATES_DIR="$HOME/" XDG_PUBLICSHARE_DIR="$HOME/" XDG_DOCUMENTS_DIR="$HOME/" XDG_MUSIC_DIR="$HOME/" XDG_PICTURES_DIR="$HOME/" XDG_VIDEOS_DIR="$HOME/"
  
  yjftsjthsd-h 4 years ago
  
  That helps, but be warned that there are still programs running around that just hardcode their paths
  
  kaba0 4 years ago
  
  Cries in nixpkgs
  (Anyone who tried to package a program that hardcodes the “usual” binary paths know the pain)
  
  account42 4 years ago
  
  Doesn't nix itself hardcode the nix store path though?
  
  kaba0 4 years ago
  
  Afaik there is an option to change it, but it is not advisable as that will break the binary cache and you are left with compiling everything yourself. This is due to a technical limitation in that different packages can contain paths everywhere and thus they are inherently part of the resulting hash, on which other packages can depend.
  
  lostlogin 4 years ago
  
  You’re clearly a more capable user than me, but even so, take care. The time I accidentally moved /etc has scarred me for life.
  
  tech2 4 years ago
  
  Early on in my Linux-using-life I made the mistake of deleting /etc. That was a learning experience like no other :)
  
  mixmastamyk 4 years ago
  
  Since Live CDs/Flash drives were invented, I wouldn't worry about this stuff any longer. Certainly have your personal files in a centralized location and backed up first.
  Probably the easiest way to experiment these days is to create a VM and make snapshot, then start knocking down walls, just to see when and where the house collapses. Then revert and try something new.
  
  genewitch 4 years ago
  
  There's a computer game that deletes random files when you make a mistake or lose.
  There could be a competition!
  
  simonblack 4 years ago
  
  A couple of weeks after moving to UNIX from MSDOS, I thought I'd remove lots of unnecessary 'dot-directories' from the /tmp directory. I was root as I had no concept of being a 'normal user'.
  So I ran two simple commands:
  cd /tmp rm -fr .*
  and wondered why it was taking so long. <grin>
  
  account42 4 years ago
  
  At least that doesn't happen today anymore. From bash:
  > When a pattern is used for pathname expansion, the character ``.'' at the start of a name or immediately following a slash must be matched explicitly, unless the shell option dotglob is set. The filenames ``.'' and ``..'' must always be matched explicitly, even if dotglob is set.
  
  acquacow 4 years ago
  
  I did that on my NAS a few years ago. I had copied in a bunch of directories from a mac and they all had tons of dot files in each dir that were showing up on my windows machines. I popped open a terminal and did the exact same thing and wiped most of the NAS out =P Good thing I had it mirrored with my other synology.
  
  DarkWiiPlayer 4 years ago
  
  Dammit, I wanted to be the one to mention gobo linux [HN deleted my laughing emoji ffs]
  
  andai 4 years ago
  
  Beat the system ʕ•ᴥ•ʔ
  
  gavinray 4 years ago
  
  How do you deal with lack of being able to just point to "/usr/lib/include" or other things when saying "here's my directory of shared libs"?
  This is definitely interesting though, and an improvement I would say
  
  JonathonW 4 years ago
  
  GoboLinux symlinks everything into an FHS-ish structure under /System/Index/ so you still have a single place where binaries/libraries/includes/etc. live. (There are also symlinks from /usr/lib, /usr/bin, and others into /System/Index/ for compatibility with programs where those might be hardcoded.)
  
  short12 4 years ago
  
  That actually seems like some low hanging fruit to go on a commit spree correcting code that hard codes paths
  
  oblio 4 years ago
  
  GoboLinux is old enough to vote in most countries.
  So either those low hanging fruits are higher than they seem, or we're all just a bunch of dwarves.
  My bet is on the second option.
  
  matheusmoreira 4 years ago
  
  It is low hanging fruit as far as the software is concerned. Simply parameterize all paths.
  Will upstream accept such patches though? Sounds unlikely to me.
  
  caymanjim 4 years ago
  
  You monster!
  
  0des 4 years ago
  
  Don't even get me started on /usr/local/bin..
  
  ThaJay 4 years ago
  
  You mean "Start Menu"?
  
  Bad_CRC 4 years ago
  
  macos does something like that.
  
  Spivak 4 years ago
  
  I mean we're heading there with /usr being your /System. Redhat/Pottering are doing heroic work in this space.
  /Users -> /home /System -> /usr /Data -> /var /Config -> /etc /Boot -> /boot /Ephemeral Temp -> /run /Persistent Temp -> /tmp
  The only real holdouts are proc/sys/dev which are the kernel and mnt/media/opt/srv which are really for the user/sysadmin and aren't really used by the OS anymore.
  
  woodruffw 4 years ago
  
  Genuine question: on what systems is `/tmp` persistent? Both macOS and Ubuntu 20.04 clear `/tmp` on every reboot for me, and I haven't changed the defaults at all.
  
  earthboundkid 4 years ago
  
  All storage is temporary. You just gotta wait long enough.
  
  novok 4 years ago
  
  People don't reboot often. Persistent tmp basically means it will be cleared in an infrequent manner, so the likelihood of it going away 1s after you release your file handle is low.
  
  mike_hock 4 years ago
  
  "Persistent Temp" should be /var/tmp. "Persistent Temp" is also an oxymoron.
  
  nybble41 4 years ago
  
  > "Persistent Temp" is also an oxymoron.
  It's not an oxymoron to have files which are temporary but not limited in scope to a single power cycle. For example, you could have a long-running process which you want to be able to resume if it's interrupted; /var/tmp would be an appropriate place for the state. The data is temporary because it will be deleted once the process is finished, but you wouldn't want it wiped out by a system reset. Generally /tmp is cleared at every reset, and is often a tmpfs mount, while files in /var/tmp are automatically cleaned up only when they reach a certain age.
  
  tremon 4 years ago
  
  Except that the FHS says that "data stored in /var/tmp is typically deleted in a site-specific manner", and as an application vendor you have no control over that site-specific clean frequency. On all my systems, /var/tmp is a symlink to /tmp and that has never caused any issue.
  
  nybble41 4 years ago
  
  The FHS is not wrong; cleaning policies are indeed site-specific and files placed in any temp directory can in principle disappear at any time. (Though, in theory, it's not supposed to happen while the files are still "in use" by running programs.) Still, historically you could count on files in /var/tmp lasting longer than files in /tmp, including across reboots.
  Nothing will immediately break because you linked /var/tmp to /tmp. Whether it causes issues depends on the programs that you (or your users) run and how they make use of /var/tmp. However, if someone did have to restart a long-running process from the beginning because recent state information in /var/tmp was not preserved across a reset, I would say that is a problem with the administration of the system and not the program that stored its state there.
  
  Spivak 4 years ago
  
  Basically no one uses /var/tmp for anything (and nobody should either). World writable directories are a mistake and only continue to exist because apps assume they are available.
  /tmp and friends are poorly named. They really should be /shared or /dmz or /freeforall or something.
  * If you need service-specific tmp space use RuntimeDirectory or PrivateTmp if your app is hardcoded to /tmp.
  * If you need service-specific persistent data that goes in /var/lib/your-app.
  * If you need temp space for your user it's at /var/run/user/your-uid.
  * If you need more than one user/service to share files but not everyone then god have mercy on your soul because all options are bad. There sure are a lot of them but none of them are at all satisfying.
  
  account42 4 years ago
  
  > Basically no one uses /var/tmp for anything
  Gentoo does, at least by default: https://wiki.gentoo.org/wiki//etc/portage/make.conf#PORTAGE_...
  
  nybble41 4 years ago
  
  Right, /var/tmp is the "Persistent Temp" directory, and /tmp is "Ephemeral Temp". The /run directory is for runtime data such as PID files, Unix sockets, named FIFOs, and generated systemd units—it has a specific internal structure and shouldn't be used as a direct alternative to the relatively unstructured /tmp directory. While both are generally ephemeral tmpfs mounts, only /tmp is writable to all users.
  
  lillecarl 4 years ago
  
  I'm not sure I'm a fan of the capitalization and spaces, other than that I'm all for more self-explanatory names.
  
  johnlorentzson 4 years ago
  
  Why not? That's how proper English text is written. Of course there are many programs that can't handle it properly (or handles it inconveniently) so in practice it might be problematic at times, but otherwise I see nothing wrong with it.
  
  lillecarl 4 years ago
  
  Generally just because typing it out with tab completion in zsh sucks, and I don't see a good solution (if it was solved nicely it'd be solved already)
  
  xeyownt 4 years ago
  
  Why compare with English? It's computer domain, it's not a book or a poem. It should be clear and unambiguous.
  Caps are annoying to type, and difficult to remember (Do You Caps, or Do you caps, or DO YOU CAPS, etc).
  Spaces are nuisances that bring no benefit. At best we should use non-breaking space for filenames, but that would be even more atrocious.
  
  abdusco 4 years ago
  
  This is what I want from Linux. Sensible & guessable names for newcomers to figure out where to put files and programs.
  It's frustrating having to spend time to decide whether I should install a program in /var or /opt or /usr. What do they even mean!
  So, I disagree with this convention altogether and use /apps or ~/apps now.
  
  0des 4 years ago
  
  Behold! https://en.m.wikipedia.org/wiki/Filesystem_Hierarchy_Standar...
  
  emteycz 4 years ago
  
  Yeah, except that tells me nothing useful... The question is exactly the same: So where do I install this random binary I downloaded from the internet or compiled myself? Is it /opt, /usr/bin, /usr/local/bin, or /bin? Where do I put the dependencies I compiled for this software - /usr/lib, /usr/local/lib, /lib, /opt/lib, /opt/<app name>/lib, or what?
  
  db48x 4 years ago
  
  Wherever you want. All of the above, or none. It really is up to you.
  
  emteycz 4 years ago
  
  That's exactly the problem. This leads to mess. The Windows model of C:\Program Files\<app name> is much better.
  
  db48x 4 years ago
  
  No, it frees you to pick whatever unmessy solution you want.
  You can do `configure --prefix=/Program\ Files/<app>` if you want.
  
  emteycz 4 years ago
  
  If I am not writing all of my installation scripts by hand, because that would be really intense, then every folder gets filled with random bits of software.
  Offering too many similar choices leads to mess. There's nothing fundamentally different between using one or more of these options and using the only option, except that in the second case there isn't any opportunity to make mess.
  > You can do `configure --prefix=/Program\ Files/<app>` if you want.
  Thanks for the tip! Can't do that with distro repo software though :-/
  
  db48x 4 years ago
  
  > then every folder gets filled with random bits of software.
  What does that even mean? When you install something, you put it where you want it.
  If you don’t like where your distribution puts files, choose a different one. Not all of them use the same convention.
  
  emteycz 4 years ago
  
  All (except aforementioned GoboLinux) use FHS.
  
  kevin_thibedeau 4 years ago
  
  Use Gnu Stow to keep the random bits contained in their own app directory that is symlinked into the /usr/local tree. Then you can manage everything without leaving orphan files behind.
  
  emteycz 4 years ago
  
  Very cool
  
  drewzero1 4 years ago
  
  Okay, but what about ProgramData? I have enough programs that put their junk in there instead of Program Files, and others that make their own directories on the root of the drive (driver installers are really bad about this).
  I think the best model I've seen for consistent binary locations is the 'Applications' folder in Mac OS X, but it fails as well by retaining the /usr/bin elsewhere.
  
  yjftsjthsd-h 4 years ago
  
  When you download a portable app (just a bare .exe), do you make a folder for it and drop it in program files? (quite possible, you'd just be unusual) If not, why does Windows get a free pass?
  
  tremon 4 years ago
  
  But why are many Windows programs under C:\Windows\System32 then, if Windows has only a single model? Why aren't all Steam-provided (for example) games in a single location? Or, if they are, does Windows really have a single model?
  Yes, the Linux/POSIX model is confusing, but the split is to segregate administrative domains:
  - / and /usr are the domain of the distribution. As a user, you should never install there. The administrative group is root.
  - /usr/local is the domain of the machine admin. If the machine is yours to manage, you can install software there. The administrative group is staff.
  - /opt/$vendor is the domain of third-party vendors. Each vendor (like Steam, Eclipse, Arduino Studio) can get its own subdirectory and its own administrative user group.
  How would you achieve the same on Windows? How do you make sure the Adobe updater can only install new versions of CS, but not surreptitiously install a new (free!) spyware package under C:\Windows? How would you allow certain power users to share one Google Chrome installation, allow each of them to update it, but not let them install additional software system-wide?
  
  Shared404 4 years ago
  
  Except instead of config files, Windows has the registry.
  Also, as mentioned by the siblings to thia comment, the 'mess' has a purpose, and is less messy than it appears.
  Want to manually install something? Into /usr/local it goes. Done.
  The only way to handle this that I've been really impressed with is Mac's "Applications" folder. Unfortunately, I dislike most other things about Mac.
  
  aranchelk 4 years ago
  
  I was taught /usr/local/bin
  /opt is for standalone packages, so if it’s a single file, no.
  /bin is only for stuff needed on single user mode, so probably not (unless that’s what the binary is for.
  /usr/bin is going to typically contain files installed by your package manager and should probably be left unaltered by human hands.
  The deps I would assume /usr/local/lib but it hasn’t ever come up for me.
  
  krinchan 4 years ago
  
  Fun fact: Debian is working towards[1] and Arch already has merged / and /usr. /bin is a soft link to /usr/bin and similarly with sbin and lib.
  [1]: https://wiki.debian.org/UsrMerge
  
  nsv 4 years ago
  
  To add: when you install software yourself you choose this, when your install software from e.g. a distribution package it is chosen by the package maintainers, and to a larger extent the maintainers of the distribution.
  This is one of the big advantages of using a pre-made advantages of using a ready-made Linux distribution: beyond the convenience of having an installer or easy to install packages, you get some assurance that the system as a whole has been thoughtfully put together.
  Arch Linux for example symlinks /bin and /sbin to /usr/bin and /lib to /usr/lib among other things.
  
  woodruffw 4 years ago
  
  Is your account the only account that's expected to run the binary? If so, then `$HOME/bin` is a perfectly acceptable (albeit not standard) place to put it.
  If you expect other users to be able to execute the program, then you should put it in either `/usr/bin` or `/usr/local/bin`, depending on whether the former is already being used by a package manager. `/opt` is generally for self-contained software that doesn't play nicely with the rest of the system, but might still be installable through the default package manager.
  
  megous 4 years ago
  
  $HOME/.local is the equivalent if /usr/local for per-user stuff.
  
  mananaysiempre 4 years ago
  
  I don’t think there’s any “official” word on that (the XDG spec that defines ~/.local/share doesn’t mention ~/.local/{bin,lib} IIRC, and the traditional per-user entry in PATH seems to be ~/bin), but a fair number of people use it this way, yes, including me.
  
  tom_ 4 years ago
  
  I started out using $HOME/bin, but a fair amount of stuff assumes a /usr- or /usr/local-style folder structure when doing make install, so I've settled on using $HOME/usr/bin instead, so that programs can create $HOME/usr/include and $HOME/usr/share and whatever, without trampling on stuff in my home folder.
  Can't remember the last time I had a problem arranging this. If using autotools, which covers 95+% of stuff, it's usually a question of something like "./configure --prefix=$HOME/usr".
  (If I want to share stuff between users, /usr/local/ is of course a better place. macOS is a bit more restrictive, so I have a separate user for this, whose /usr folder is readable by everybody.)
  
  woodruffw 4 years ago
  
  Yeah, it definitely gets hairier when using anything that's more than just a drop-in binary.
  
  matheusmoreira 4 years ago
  
  > $HOME/bin
  On freedesktop systems there's the ~/.local directory which is supposed to be a mirror of the file system hierarchy. Seems like a good place for bin, lib, include directories.
  
  mananaysiempre 4 years ago
  
  The standard is, indeed, excessively vague because it was written to let many existing implementations be conformant as is, though I’d say it’s still more helpful than many other standards with that deficiency. There’s a method to it, however:
  - Things installed in /, if it’s different from /usr, are generally not to be touched;
  - Things installed in /usr are under the distro’s purview or otherwise under a package manager, any modifications are on pain of confusing it;
  - Things installed in /usr/local are under the admin’s purview and unmanaged one-offs, there are always some but overuse will lead to anarchy;
  - Things installed in /opt are for whatever is so foreign and hopeless in not conforming to the usual factoring that you just give up and put it in its own little padded cell (hello, Mathematica);
  - Everything is generally configured using files in /etc, possibly with the exception of some of the special snowflakes in /opt; the package manager will put config files meant to be edited there and expect the admin to merge any changes in manually, and sometimes put default settings meant to be overridden by them in /usr/share (see below)—both approaches can be problematic, but the difficulty is with migrating configuration in general, not the FHS as such.
  There used to be additional hierarchies like /usr/X11R6, and even a /usr/etc on some (non-Linux?) systems, but AFAIU everyone agrees their existence makes no sense (anymore?), so much that even FHS doesn’t lower itself to permitting them.
  The distinction between / and /usr might appear to be pointless as well, and nowadays it might be (some distros symlink them together), but previously (especially before initial ramdisks were widespread) stuff in / was whatever was needed to bring up the system enough that it could netmount a shared /usr.
  Inside each of /, /usr and /usr/local there is bin for things that are supposed to be directly executable, whether binary or a script and all in a single place; share and lib for other portable and non-portable (usually but not necessarily text and binary) shared files, respectively, segregated by application or purpose; finally, due to the dominance of C ABIs and APIs on Unices, the top level of lib also hosts C and C++ library files and there’s an additional directory called include for the headers required to use them. Some people also felt that putting auxiliary executables (things like cc1, the first pass of the C compiler) inside lib was awkward so they created libexec for that purpose, but I don’t think the distinction turned out to be particularly useful so not all distros maintain it.
  That’s it, basically. There are subtler but logical points (files vs subdiretories in /etc) and things people haven’t found an obviously superior solution for (multilib and cross environments), and I made no attempt to be historically accurate (the original separation of / and /usr happened for intensely silly reasons), but those are the fundamental principles of the system, and I feel it does make sense as a coherent implementation of a particular design. Other designs are possible (separation by application or package not purpose, Plan 9-ish overlays, NixOS’s isolated environments), but that’s a discussion on a different level; the point is that this one is at the very least internally consistent.
  Re the unfriendly names ... I honestly don’t know. Newbie-friendliness matters, but it’s not the only thing that does; particularly in a system intended for interactive text-mode use, concise names have a quality of their own. There’s a reason I’m more willing to reach for curl and jq rather than for httpx and lxml, for regular expressions rather than for Parsec, and even for cmd.exe, as miserable as it is, rather than for PowerShell.
  I feel weird that no HCI people seem to have seriously considered the tension between interactive and programmatic environments and what the text-mode user’s experience in Unix says about it, but even Tcl, which is in many ways a Bourne shell done right, loses something in casual REPL use when it eliminates (as far as idiomatic libraries are concerned) short switches. Coming up with things like rsync -avz or objdump -Ctsr is not very pleasant initially, but I certainly wouldn’t want to type out the longhand form that would be the only possible one in most programming languages (even if I find their syntax beautiful, e.g. Smalltalk/Self).
  
  emteycz 4 years ago
  
  Thank you for the thoughtful reply, the point about netmounting shared usr makes it much easier to understand.
  
  nobody9999 4 years ago
  
  >the original separation of / and /usr happened for intensely silly reasons
  As I recall, there were very good reasons for separating / and /usr (as well as /home and /var). The biggest one was that various Unix kernels would panic[0] if / was full. But that issue was almost universally fixed by 1990 or so.
  And netmounts of pretty much everything other than / were pretty common for many years, due to the high cost of storage.
  So no, the reasons weren't silly, they just don't apply to more modern systems.
  [0] https://en.wikipedia.org/wiki/Kernel_panic
  
  mananaysiempre 4 years ago
  
  OK, I didn’t put this completely correctly. The original separation of /usr to hold user home directories (!) and / to hold everything else was because the first RK05 disk ran out, but it makes sense in any case. The additional hierarchy under /usr was created some time later when space on the first RK05 disk ran out again, and while this can be a perfectly sensible decision for a single installation on a single site, taking it seriously decades later is silly. Neither does that mean that there weren’t good reasons the split got preserved in subsequent systems, just that they couldn’t have been the same as the original ones; there are no netmounts in V6, after all.
  (I have an old Unix intro book that describes /usr as user home directories, the rest is a second-hand retelling[1].)
  [1] http://lists.busybox.net/pipermail/busybox/2010-December/074...
  
  nobody9999 4 years ago
  
  Interesting stuff. Thanks for sharing it!
  
  matheusmoreira 4 years ago
  
  > So where do I install this random binary I downloaded from the internet or compiled myself?
  In your home directory.
  
  graycat 4 years ago
  
  Where?
  See my post to this thread at
  https://news.ycombinator.com/item?id=29198222
  
  sergeykish 4 years ago
  
  Follow your distribution. For example Arch Linux provides PKGBUILDs for official repos and AUR. Most of the time someone has already published PKGBUILD, but if not I just patch accordingly.
  And conditions that formed separation are long gone, Arch Linux symlinks most of it:
  /bin -> /usr/bin /sbin -> /usr/bin /usr/sbin -> /usr/bin /lib -> /usr/lib /lib64 -> /usr/lib /usr/lib64 -> /usr/lib
  
  somehnguy 4 years ago
  
  I've read that a handful of times (whenever trying to figure out where to put some new random thing), and still have never come to a clear conclusion. Even better, because there are so many similar places, you might choose completely different ones depending on the day of the week and your current mood.
  Too much choice for things like this is harmful IMO. Deep down I truly couldn't care less where the files end up, as long as that place is the 'right' place. There are too many 'right' places which makes it hard to find random things at a later date or when on a box you're not super familiar with. It's also a complete waste of time to think about it at all.
  
  NavinF 4 years ago
  
  It’s not just you: Every distro is its own special snowflake and patches the programs they distribute to store files in a different place.
  The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
  
  stryan 4 years ago
  
  > The “standard” doesn’t tell you what directory structure to use inside /etc to group related config files. The “standard” doesn’t tell you where an HTTP server should serve its files. Everyone just does their own thing which makes upstream docs incorrect and useless for newcomers.
  The FHS, does actually answer both of of those questions. Files inside /etc/ should be grouped in subdirectories[0] andd the HTTP server should serve user-specified website files from /srv[1] and normal distro-provided files (such as the apache test page) from /var[2].
  [0]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s07.htm...
  [1]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s17.htm...
  [2]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05.html#p...
  
  elevader 4 years ago
  
  "use subdirectories" is probably the most handwavey answer possible, aside from maybe "just put it somewhere, lol". I feel like the standard could provide some sort of guidance on how to name folders or something.
  
  NavinF 4 years ago
  
  > HTTP server should serve user-specified website files from /srv
  I’ve never seen that in my life, but I’m sure someone does that. This is one of those cases where the people who follow the standard are increasing fragmentation
  
  graycat 4 years ago
  
  > I've read that a handful of times (whenever trying to figure out where to put some new random thing), and still have never come to a clear conclusion.
  So, given some data, say a file and/or directory, maybe from saving a Web page, that is relevant to subjects A, K, T, and Z, where in the file system directory trees to put that data?
  My solution: Put the data in a directory for one of the subjects A, K, T, or Z without thinking very hard about which of these. Then go to a file I call FACTS.DAT (right, an old idea with an old 8.3 file name!). I maintain that file with a few, simple editor macros. So, sure, the file is a catch-all for entries of random short facts. And each entry starts with a time-date stamp and a list of key words. So, in the case of subjects A, K, T, or Z, include the key words appropriate for each of those. Then in the body of the entry, put the tree name of the file/directory where did store the data.
  In a few seconds with my favorite text editor I can append an entry or search for an entry.
  So far this year I have put 686 entries in the file FACTS.DAT for about 2.1 entries per day. For anything like current personal computers, handling such a file is trivial.
  The idea works great!
  
  LordDragonfang 4 years ago
  
  I feel like it just highlights the problem of how antiquated and confusing linux terminology that so many of those reference "single-user mode", used to refer to booting into root, when the vast majority of computing devices a given user will interact with only have a single actual user, making this a confusing and almost meaningless distinction to someone not already intimate familiar with linux.
  
  chasil 4 years ago
  
  Oh, my young friend, you have no idea what POSIX has done to you.
  "While no one sane would put newlines in directory names, such corruption of the results could lead to exploitable vulnerabilities in scripts."
  http://www.etalabs.net/sh_tricks.html
  
  oblio 4 years ago
  
  He he.
  Want to see true craziness? POSIX file names are just a bag of bytes. They don't even have to be text, they can be anything (almost), there's no standard text encoding:
  https://lwn.net/Articles/325304/
  And in typical Open Source fashion, someone actually claims it's a feature: https://lwn.net/Articles/325398/ because hey, you 99.999% percenters can suffer so that I, 0.001% percenter can implement my wacky system.
  https://xkcd.com/1172/
  
  chasil 4 years ago
  
  This appears to demonstrate the full range of abuse.
  $ mkdir hold $ cd hold $ cat ../wildname.c #include <stdio.h> int main(int argc, char **argv) { char n[256]; int i,j=0; FILE *fp; for(i=1; i<256; i++) if(i!=47) n[j++] = i; n[j] = 0; if(fp = fopen(n, "w")) { fprintf(fp, "hello world!"); fclose(fp); } } $ cc ../wildname.c $ ./a.out $ ls -l total 16 -rw-r--r--. 1 luser lgroup 12 Nov 11 16:32 ??????????????????????????????? !"#$%&'()*+,-.0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -rwxr-xr-x. 1 luser lgroup 8464 Nov 11 16:32 a.out
  Just because you can do something does not mean that you should.
  
  oblio 4 years ago
  
  It's software. Software's contract is the same as a legal contract. And a legal contract mostly says what you can't do.
  So anything not directly blocked by the software is allowed.
  Ergo, clear specifications, strict yet flexible types and APIs, etc.
  Otherwise, it's just bad design.
  
  ygra 4 years ago
  
  It's basically the same on Windows with NTFS. Just a bag of 16-bit words instead of bytes.
  
  tenebrisalietum 4 years ago
  
  The directories that house your executables are read only to users other than root, to prevent attacks and overwriting them by non-root users.
  /var stands for variable data--like log files, cache directories, spool directories, etc. You shouldn't put executables there. Ideally you should be able to set the noexec flag on it.
  `/usr` actually exists because the original UNIX developers ran out of disk space and had to attach another disk. The difference between /bin and /usr/bin is not worth it and even Debian symlinks /usr/bin to bin.
  But your distribution's package manager should be putting stuff in /bin or /usr/bin, not you. Anything that follows the regex "{asterisk}/local{asterisk}" is something the system owner can do whatever with. So you should be using /usr/local/bin or $HOME/local/bin. I don't know why there's no /local off of the root. (One thing I do on my own systems is make and use an /etc/local although I think you're supposed to use something like /usr/local/etc).
  /opt is for third party programs that aren't installed via your distro's package manager.
  If you do this, any customizations you make to a system can be easily backed up by copying all dirs with local in the name.
  There's multiple decades of tradition behind these names, but they do date back to the age where actual teletypes were used.
  
  biryani_chicken 4 years ago
  
  You don't even need to rearrange the folders themselves, just show them like that in the file explorer. Same way the windows explorer does.
  
  0des 4 years ago
  
  Do you have any docs on how to do that? Thanks for the reply, I look forward to trying that.
  
  post-it 4 years ago
  
  MacOS too. /usr/ and /dev/ and whatnot exist, they're just flagged as invisible in Finder. There's a command to globally unhide them for those who want to see them.
  
  riccardomc 4 years ago
  
  Why not just symlink them? You can have best of both worlds with relatively little effort.
  Make the overlay of your dreams!
  
  anyfoo 4 years ago
  
  Is it coincidence that you almost exactly replicated what macOS has? Except that /Devices is /Volumes, .../Apps is .../Applications. and /Boot is handled differently.
  Of course, that's not perfect either, because a) decades of changes vs. compatibility have made it less clean in certain places, and b) pretty much all the POSIX paths still exist for unix-y compatibility, but overall it's like that.
  
  e0a74c 4 years ago
  
  Couldn't you do it with plain old symlinks?
  
  matheusmoreira 4 years ago
  
  > I wonder how hard it would be to rearrange the folder structure in linux
  Restructuring the directories is the easy part. You just delete the old tree and make a new one. You can also mount procfs and sysfs wherever you want.
  The hard part is modifying existing software to work with the new tree. So many programs assume you have a "standard" file system tree. So many programs assume procfs is mounted at /proc. So many programs have hardcoded paths. Shared library locationd can become part of the binaries when they're compiled. It's insane and you'd essentially be creating a new Linux distribution.
  
  sixothree 4 years ago
  
  I know this is completely tangential. But you can Win-R and just type Documents and it will load your documents folder. Same for downloads, pictures, temp (windows temp), and I'm sure many others.
  Works from File-Open dialogs and address bars and even in the command prompt you can even do "explorer documents".
  
  thedday 4 years ago
  
  Yeah, it's a junction point, but it's also useless. Open a command box and CD to it; now what? A file explorer and set it as the directory, again, now what?
  
  WalterBright 4 years ago
  
  Nothing says progress like renaming all your paths.
  
  gary_0 4 years ago
  
  In the Win95 era, it was "C:\My Documents".
  
  grishka 4 years ago
  
  Huh, spaces. There's way too much software, especially on Windows, that breaks when there are Cyrillic characters in a path. I'll let you guess how I found out.
  
  DarkWiiPlayer 4 years ago
  
  A friend had the username "Rubén" and jfc it broke everything other than windows itself xD
  
  dhosek 4 years ago
  
  The problem isn't the Cyrillic or the é but the fact that Windows lets you put those characters in file names in non-Unicode encodings which will create sequences of bytes which are invalid UTF-8. It's 2021, FFS, stop using legacy encodings.
  
  grishka 4 years ago
  
  All win32 functions that accept or return strings come in two varieties, with A and W suffixes, MessageBoxA/MessageBoxW. The A works with the system default 8-bit encoding (cp1251 in case of Cyrillic), the W works with unicode in wide chars. There shouldn't be much of a problem with string handling if you stick exclusively with W functions.
  
  ziml77 4 years ago
  
  Using the W functions has been the advice from Microsoft's documentation for ages. But people still use the A functions because they're easier, especially when writing cross-platform software since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide.
  Fortunately the future of the Windows API does look better since Microsoft has now added proper UTF-8 support since Win 10 1904. All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.
  
  mjevans 4 years ago
  
  I would rather they added a U suffixed version and better still backported that all the way to Win 7. Now in 3-7 years people can write programs that use the A functions, but have to check the version of Windows and refuse to run if it isn't new enough.
  
  colejohnson66 4 years ago
  
  There’s been some talk of repurposing the A variants to work on UTF-8
  
  grishka 4 years ago
  
  > since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide
  Apple OSes use something they call "unichar" inside NSStrings. I'm not 100% sure what it is, but it feels like it's the same 16-bit wide character.
  
  ziml77 4 years ago
  
  It's possible! It seemed like a sensible choice back in the early 90s when the answer to making a system for global use was UCS-2. I know Java was another one that went with that decision.
  
  account42 4 years ago
  
  > All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.
  They really should have gone with WTF-8 [0] since the W functions generally accept WTF-16 and not just the valid UTF-16 subset.
  [0] https://simonsapin.github.io/wtf-8/
  
  DnDGrognard 4 years ago
  
  I had a really odd one last year where a Grave I ( well known brand name) got converted by office/excell into a Double Grave I.
  The double grave I is used by some obscure orthodox religionious texts
  
  kaba0 4 years ago
  
  If you have a username with your full name (plus point if you have special characters in your name), you will get the whole deal with shitty programs. I’m not sure if it’s me, but there were cases I simply could not use a program installed in such a location, to the point where at my previous (admittedly shitty) workplace, we often installed software in a root location…
- 323 4 years ago
  
  Laughs in C:\PROGRA~1\ (try it, still works in Windows 10)
  
  the_mitsuhiko 4 years ago
  
  There is no guarantee that the short name has that. In fact on a lot of German Windows installations it was PROGRA~2.
  
  323 4 years ago
  
  Well, on my disk PROGRA~1 is "Program Files" and PROGRA~2 is "Program Files (x86)", so still works :)
  
  floatingatoll 4 years ago
  
  That order is not guaranteed consistent across installations, however.
  
  marginalia_nu 4 years ago
  
  I wonder if code to this effect has ever been written before
  for (int i = 1; i < INT_MAX; i++) { if (dirExists("C:\\PROGRA~%d\\ProgramName", i)) {
  
  gmfawcett 4 years ago
  
  And that, children, is when marginalia_nu unlocked the seventh circle of the inferno. Tomorrow we'll read the story of how our new demon overlords forced us all back to Windows 3.1.
  
  jagged-chisel 4 years ago
  
  Win 3.1? on DOS 6.22? Actually, this sounds like heaven. Just don't put it on the public 'tubes.
  
  floatingatoll 4 years ago
  
  Or do. Can’t hack a Mac Classic web server!
  
  marginalia_nu 4 years ago
  
  Got to tweak HIMEM.SYS before the slumbering one can be awakened.
  
  aksss 4 years ago
  
  PEEK and POKE could break the HIMEM.
  
  floatingatoll 4 years ago
  
  For whatever it’s worth, this is a terrible idea, for so many different reasons:
  https://web.archive.org/web/20100107184218/http://blogs.msdn...
  And so, yes, I’m certain someone must have done it, because it’s clearly bad idea jeans and so Murphy’s Law says it must exist.
  
  benibela 4 years ago
  
  Can that work for i > 9 ?
  
  floatingatoll 4 years ago
  
  If you mkdir PROGRA~10, yes!
  
  TowerTall 4 years ago
  
  And on mine (Windows 11) "PROGRA~3" is "ProgramData"
  
  selfhoster11 4 years ago
  
  Truly lifesaving for when she'll quoting gets in the way.
  
  kijin 4 years ago
  
  You've got a stray single quote in your shell. :)
  
  selfhoster11 4 years ago
  
  That was a typo, but it seemed like a perfect illustration of my point, so I left it in.
  
  Someone 4 years ago
  
  Typo? I would guess it’s autocomplete at work. iOS does that all the time for me.
  
  Someone 4 years ago
  
  Apart from what others mentioned, that can only work if the file system automatically creates 8.3 names. NTFS does not necessarily do that (https://docs.microsoft.com/en-us/windows-server/administrati...)
- hetspookjee 4 years ago
  
  I wonder how much global work could have been saved if Microsoft also provided a covered interface for all paths in the system. Not sure if there is any, but one good implementation might save thousands of poor implementations required to handle it.
  
  moontear 4 years ago
  
  You mean like the Environment.SpecialFolders enum?
  https://docs.microsoft.com/en-us/dotnet/api/system.environme...
  There are several other classes that take care of getting folders, least of which checking system variables.
  
  Too 4 years ago
  
  You have %Appdata% and friends.
- lamontcg 4 years ago
  
  Then they made poor APIs so that you have to do this to get it correct:
  https://docs.microsoft.com/en-gb/archive/blogs/twistylittlep...
  In nix at least you can call execve or other APIs that take a char argv[] and the whole problem is largely solved and you don't need to quote things.
- dan-robertson 4 years ago
  
  On the other hand their case sensitivity behaviour means that “cross-platform” Java applications can break if they are run on a non-windows platform where opening files is case sensitive (unlike on windows)
  
  908B64B197 4 years ago
  
  It's actually a feature.
  Easier to add a flag to ignore case rather than fix bugs where files only differ by case and are therefore overwritten on a case-insensitive filesystem.
- henrikschroder 4 years ago
  
  C:\PROGRA~1
  Easy fix!
- billti 4 years ago
  
  And then to really mess you up and ensure you handle parens properly, threw “(x86)” into the mix. (A real pain on some REPLs as well as dealing with environment variables).
- gattilorenz 4 years ago
  
  Funny, in the Italian Win9x it is C:\Programmi, which I always thought was more convenient because of the lack of spaces :)
- cerved 4 years ago
  
  Sure. Microsoft only ever ships features
- antihero 4 years ago
  
  Shame it wasn't
  > C:\P̷̧̽r̸̬͘ŏ̵̮g̷̜͘r̸̦̋a̴͎̒m̶̲̈́ ̷̠̉F̵͇̈ĩ̴̫l̶̨͗ë̵̦s̸͚͆\
- vesinisa 4 years ago
  
  Except for programs that were too old / obscure to fix I guess. I think at least the Symbian Development Kit was such that builds would fail with strange errors unless you installed it in any other path than the default immediate subdirectory of C:\, let alone under "Program Files".
- Matthias247 4 years ago
  
  It not only keeps people on their toes due to the whitespace. The folder name is even localized. E.g. with german settings there is C:\Programme and c:\Programme (x86).
  
  spacechild1 4 years ago
  
  You can still use the English names, though.
- drdeca 4 years ago
  
  I know that at least like, idk like 3-5 years ago, when I had gotten a new windows laptop (windows 7 or 8 I think), setting the main account to have the name "" (without the quotes), caused some problems with the basic functioning, including, I think, with some pre-installed programs,
  So, some things were still being handled not quite right (whether that's because it shouldn't be allowed to be the username, or because programs should handle it being in the path, I'm not sure, but probably one of those.)
- anarazel 4 years ago
  
  I just wish they had a decent way to execute programs with arguments that might include spaces. But no, every program can do argument delineation differently.
  
  account42 4 years ago
  
  And Microsoft even provides three different slightly incompatible ways to parse arguemnts: CommandLineToArgvW, the CRT and cmd.exe.
- zerr 4 years ago
  
  Could you please link the reference?
- 8bitsrule 4 years ago
  
  At one time there was no number 0. Half of binary was missing.
- zaphirplane 4 years ago
  
  There was a short path name IIRC like prog~1
kitkat_new 4 years ago

Pro tip2: Use std lib path processing utilities
jeffwask 4 years ago

It doesn't even have to be complex, often basic automation tasks fail with spaces and special characters. Honestly, treating a file system like a natural language processor is a bad idea. Besides at this point with how digital we have all become who can't understand...
thisismyconfig.txt vs this is my config.txt or this_is_my_config.txt
...i've forced myself to stop using spaces, character, and even cap. They are all constructs that provide minimal value for the extra complexity.
- long_time_gone 4 years ago
  
  > thisismyconfig.txt vs this is my config.txt or this_is_my_config.txt
  Just wondering, what is the readability of this for people who are dyslexic?
  
  reaperducer 4 years ago
  
  Or in my case, people for whom English is a second language, or have low education levels.
  Saying, "who can't understand..." is arrogant, selfish, and an example of why normal people hate people in the SV echo chamber.
  
  long_time_gone 4 years ago
  
  > Saying, "who can't understand..." is arrogant, selfish, and an example of why normal people hate people in the SV echo chamber
  Exactly how I feel every time Economics is brought up on HN.
  
  throwaway2077 4 years ago
  
  SV echo chamber is on your side here - it is very in vogue to denounce anglocentrism. they were defending hieroglyphs and emoji in variable names in that thread about invisible javascript backdoor a day or two ago if you'd like a recent example
  
  dang 4 years ago
  
  Could you please stop posting ideological battle comments to HN? We ban accounts that do that, regardless of their ideology, because it's (a) not what this site is for, and (b) destroys what it is for.
  If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
  
  danlugo92 4 years ago
  
  Agreed.
  But Hacker News should do something about all of the anti-bitcoin and anti-anti-nuclear ideologies running around in here.
  I don't really mind it that much but it'd be nice, it's really the only 2 extremisms I've experienced here, all other subjects are discussed in a fair manner.
  
  beambot 4 years ago
  
  I appreciate informed discussion about bitcoin & nuclear, as both topics are highly relevant to the technical, business, and hacker roots of HN. They seem distinctly different from, say, "anglocentrism" @dang was calling out.
  
  danlugo92 4 years ago
  
  > discussion
  There's no such thing as fair discussion about those topics here.
  
  Hallucinaut 4 years ago
  
  What does "fair" mean in this usage? If it means one position attracts a lopsided balance of comments either for or against then surely that's always going to be the case?
  Otherwise what is your proposition, don't state any opinion unless you find a counter opinion commenter to match with?
  Lots of folk here are pro-privacy and lots of folk are anti-bitcoin (and some of them will be the latter because they're the former) so I don't understand how you'd extended your position in a way that leaves HN with any value.
  
  jeffwask 4 years ago
  
  cestmaconfig.txt vs cest ma config.txt vs cest_ma_config.txt
  It's the same in any language.
  Hugs who hurt you.
  I'm also pretty sure most of us in any language use Slack, SMS or other forms of communication where text isn't necessarily presented in a grammatical correct manner and we all figure out what the person is saying.
  
  JCharante 4 years ago
  
  I'm not sure, but my gut instinct is that it wouldn't help. Dyslexia rates are much lower in China, so if I suppose we could start naming files with Chinese characters (on systems that support Unicode). It would take a bit to get used to, but eventually we'd develop a pidgin language for when we talk about software, much like how if you overhear Chinese or Vietnamese developers they will mix in English words like "linked list" into their sentences, because there's not a more natural sounding alternative.
  Switching to Chinese would also help eliminate the spaces issue.
  
  teorema 4 years ago
  
  tbh I'm not dyslexic and realized the spaces make it really difficult to know what the filename actually is. If you just take the second example, how would you know if the file was "this is my config.txt" versus "config.txt"?
  Aside from parsing errors it just seems to lend itself to ambiguity.
  
  vertere 4 years ago
  
  This. People are saying spaces improve ergonomics. Unless everyone always quotes their paths in documentation, emails, etc -- which they won't -- I say it actually reduces readability.
  Also programs automatically that turn paths into links don't work with spaces.
- 400thecat 4 years ago
  
  > treating a file system like a natural language processor is a bad idea
  could you please explain what you mean by that?
- rch 4 years ago
  
  I'm similar, but I would like to support labels intended for humans, along with various translations, as metadata on top of e.g. filesystem path components.
  
  fouric 4 years ago
  
  You nailed it - getting rid of spaces and dashes and underscores is extremely human-hostile. People added spaces to the English language for a reason, and that's because they make it way easier to read.
  Your system is only intended for other programs to interact with? Go nuts, make hex UUIDs. Actual people are supposed to use it? You need separator characters.
  I also don't see how those characters add "extra complexity" unless you're doing dumb things like text processing on paths and filenames (as opposed to using OS/library functions that handle paths correctly) - in which case, there's your problem.
- Too 4 years ago
  
  Why stop there. A computer works more efficiently with numbers rather than strings, so let’s just give each file a number instead of a string. Besides, at this point with how digital we have all become who can’t understand… But wait, that already exists and is called an inode.
  A file system has a human interface and a computer interface. Don’t mix them. Let users give file names in whichever way they please.
Izkata 4 years ago

> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
A former co-worker changed his name in our auth system to include an apostrophe, so that whenever we handled names wrong he'd find it.
- geoduck14 4 years ago
  
  Oh, I like this!
- floatingatoll 4 years ago
  
  I set my nickname to U+FFFD at one point in one work system, resulting in a variety of bug reports and concerned emails. I think I dropped it since it was generating false reports from people who didn't check what character the page contained before reporting it.
- ridaj 4 years ago
  
  Áčçëñts hęlp tóø
  
  ygra 4 years ago
  
  For anyone curious, this is called Pseudo-localization (https://en.wikipedia.org/wiki/Pseudolocalization). I first singled across this in Raymond Chen's blog.
- reaperducer 4 years ago
  
  One of the systems I built is being used by a group of younger people. I included an emoji in the superuser account name, just to make sure it would work. And to remind me to think more broadly about user input.
- ajmurmann 4 years ago
  
  A related too for CI: change the system time to be a time zone that is during your work hours in a different day already than UTC. Really helped getting failures earlier than 4pm PST.
  
  brundolf 4 years ago
  
  At my last job we had a wild time-zone bug that only happened with your system location set to Mumbai. I left mine set to that for the rest of my time there.
  
  cpeterso 4 years ago
  
  Related: here's a recent Firefox bug about a test that failed during the daylight saving time change:
  https://bugzilla.mozilla.org/show_bug.cgi?id=1739847
  
  scubbo 4 years ago
  
  Could you consider rephrasing this? It sounds like an interesting observation that I'd love to understand, but I'm genuinely not able to parse it.
  My best guess is "change the system time to be a timezone for which, during your work hours, the other-timezone is in a different day than UTC is" - but I'm still not sure what effect that would have on CI failures.
  
  Teknoman117 4 years ago
  
  I read that as "set your CI to run earlier in your workday so you don't get new error reports at the end of the day." Midnight UTC being 4 pm/16:00 PST.
  
  ajmurmann 4 years ago
  
  Maybe an example of the failure this detects helps: when I used to work on Rails apps in the olden days it was easy to call Time.now and get the local time instead of Time.zone.now to get UTC time. This often lead to wrong dates but tests would only fail once it was a new day in UTC land but still the old day in the local time zone. Making the CI machine's system time something Fiji time really helped in getting failures much sooner after changes were pushed.
- enragedcacti 4 years ago
  
  To have such thoughtful coworkers. On an old team I had two coworkers named Chris and once in a blue moon when they reviewed each other code master would start crashing because one of them accidentally left in an absolute path starting with "/home/chris/".
- curuinor 4 years ago
  
  the proper name of the glorious sultan of slack, j. r. "bob" dobbs, has the quotation marks and therefore is a great subject for this
- fernandotakai 4 years ago
  
  my test accounts always have emojis + accents + other weird characters.
  it keeps everybody on their toes lol.
- qwertox 4 years ago
  
  I add a Japanese character into any .py, .js and .html file to ensure that Unicode is working properly through the entire chain. Mostly in form of a variable which gets passed along, even in URL parameters.
- soheil 4 years ago
  
  Obligatory xkcd https://xkcd.com/327/
- ygra 4 years ago
  
  I've used to have a space in my user name and even contemplated to add a bit of non-1252 Unicode. You find a lot of issues, but unfortunately often in tools you have little control over and end up not being able to work effectively at times. It ended up being more frustrating than helpful.
shane_b 4 years ago

My Mac is formatted case sensitive when the default is case insensitive. This will also catch a ton of import related bugs.
League of legends doesn’t run until I sed files for instance.
- memsom 4 years ago
  
  I once returned a printer because the Mac driver and support software expected and enforced case insensitive access and basically couldn't install properly on my case-sensitive HFS+ volume. It half installed and blatantly just didn't work in any way when installed.
  
  NegativeLatency 4 years ago
  
  Adobe software used to refuse to install on case sensitive file systems back in the not too distant past.
- dunham 4 years ago
  
  Circa Y2k, I learned that the OSX Palm Pilot software didn't work with case sensitive. I've since given up and stuck with the default. (I'm anti-case folding in general, because of the ambiguity.)
- mdaniel 4 years ago
  
  I also enjoyed doing that, but had to make a DMG just for Steam because it straight-up refuses to run on a case sensitive FS (that's true on Windows, also, which I suspect is how we all got here). I think the most recent Steam versions either caught wind of my trickery or -- more likely -- run something from $HOME/Library/SomethingOrOther and thus the work-around it no longer works
  When I got a new Mac, I just gave up and acquiesced to the case-retentive world :-(
- deckard1 4 years ago
  
  I have coworkers on Mac that write node/JS code. Every once in awhile I'd pull down the latest code and it wouldn't run. I'm on Linux.
  Sure enough, they had SomeFile and were importing Somefile and it works fine on Mac but not on Linux (which, of course, is what our production servers use). It amazes me that "works fine on my machine" is still a thing when I definitely worked at companies that solved this back in the 2000s. It was solved. It was done. Then devs became enamored with running everything locally. Even dozens of microservices or databases. Even though JS is fairly isolated, you still have NPM packages that need built against the local OS and C/C++ library and compilers, etc. Which also has caused issues in the past.
  
  speedgoose 4 years ago
  
  Good news, we have solutions. You could use continuous integration and software containers like Docker.
  
  fouric 4 years ago
  
  Does Docker abstract filesystem behaviors like this? I always thought that it stopped at the libc level - that is, libc is included in the container, but it calls the host kernel's system calls, and so inherits the host kernel's behavior (including things like underlying filesystem case sensitivity).
  
  handrous 4 years ago
  
  Docker relies on LXC, so it's Linux-only. On other platforms it runs in a Linux VM. The host for Docker, then, is Linux no matter where you are.
  
  zokier 4 years ago
  
  > Docker relies on LXC, so it's Linux-only.
  Docker hasn't supported LXC since 2016, and stopped relying on it in 2014
  https://www.docker.com/blog/docker-0-9-introducing-execution...
  
  handrous 4 years ago
  
  I thought the name for the collection of kernel features was LXC, I didn't realize (until just now) that was the name only for the also-kernel-level wrapper for those features, which name does not cover the features themselves. That is, I didn't realize that LXC is to Cgroups+Namespaces as Libvirt is to KVM—I thought LXC, as a label, covered the whole feature-set—but regardless, it's still married to Linux kernel features and runs on other platforms under virtualization, no?
  
  zokier 4 years ago
  
  > it's still married to Linux kernel features and runs on other platforms under virtualization, no?
  Actually no. At least on Windows Docker can do native Windows containers too
  https://poweruser.blog/lightweight-windows-containers-using-...
  
  jatone 4 years ago
  
  my favorite is often being the only developer on linux and giving two files with different casing and watching their systems crash and burn.
echelon 4 years ago

Better solution: only allow ASCII, maybe dashes, and up to twelve characters. Problem solved.
Enforce this in LDAP.
Strict convention is better than flexibility and predicting obscure edge cases that can fail.
- pimterry 4 years ago
  
  In my case, and for many people writing desktop software, and for absolutely everybody writing open-source tools or libraries, unfortunately you can't control the environment.
  Non-ASCII paths are extremely common (e.g. the user's home directory on Windows, for the large majority of users outside the English-speaking world) and spaces, punctuation and weirder characters will definitely happen when you least expect it.
  Yes if you can avoid it then absolutely that's great, but I don't think most people can.
  It's also not usually very difficult to deal with, as long as you actually spot the issue in the first place.
- MayeulC 4 years ago
  
  Ah, that's the he enterprise edition.
  But then your program will crash hard and unexpectedly when a user decides to save under "~/house plans" or ~/Téléchargements.
  I think it's better to exercise this in CI, that's what CI is for.
- mikepurvis 4 years ago
  
  Ugh, we have the 15 character Active Directory limit now with hostnames, and a previous IT administration has imposed a convention that every name had to follow [prod|dev]-[ph|vm]-[service]-[nn]. So basically every production service is prod-vm-owtf-01— you get exactly four characters to actually describe what the machine does. Works great when the service is "jira" or "wiki", but there are a lot that are pretty mystical-sounding, like jkns, jwrk, cntr, hrbr, etc, where you kind of just have to know.
  
  HNo 4 years ago
  
  I kind of like that honestly. No doubt you need some documentation so everyone knows what the service abbreviations are, but after you've been working there for a month you get it. Makes everything clean, consistent, and informational. You can quickly ascertain what a specific host is doing just from the name.
  
  mikepurvis 4 years ago
  
  Oh absolutely it makes sense to have a standard, and being able to tell at a glance if something is a VM or physical machine is of value also. But dedicating 2/3s of the character budget to such a scheme is madness. If the prod-vm- prefix simply become pv-, then you'd at least be able to do pv-jenkins-01 again.
  Anyway, all this was fine when we were on LDAP rather than Active Directory. So basically it's all Windows' fault.
  
  icedchai 4 years ago
  
  Do they at least allow you to set up CNAMEs?
  
  mikepurvis 4 years ago
  
  Yes, and for many of the web-serving machines, that's what happens, they're jenkins.example.com or containers.example.com or similar. But often a singular service is backed by hidden worker nodes, databases, whatever else, and it seems silly to give those machines that level of indirection vs just using the hostname as their sole identifier.
- reaperducer 4 years ago
  
  only allow ASCII, maybe dashes, and up to twelve characters. Problem solved
  ...and only hire people from the exact same background as you, who will never have unusual characters or accents in their name. And also make sure not to have any users who aren't exactly like you, and conform to this very narrow requirement. Surely, excluding 90% of the world won't hurt revenue in any way.
  
  echelon 4 years ago
  
  Snarky, but I'll take it.
  Use strict schema for the hardware interface, networking, physical stuff the user never sees. Microservice names don't need to be non-Latin. Database replicas, infrastructures, etc. And you're not going to piss off employees by giving them ASCII ldap/email addresses.
  Use utf8mb4 or similar for storing names. Don't state "first" or "last". I've been through this rodeo too many times. You're not surprising anyone.
  
  numpad0 4 years ago
  
  UTF-8 strings aren’t reproducible anyways. User ID should be strictly for identification, be alphanumeric random string if necessary.
  
  stopagephobia 4 years ago
  
  This is not excluding? I just use an ascii canonicalized version of my name and works fine.
  
  badsectoracula 4 years ago
  
  You can use an "ASCII-fied" version of the name, only ~27% of mine can be typed in ASCII letters that look similar but the rest is just phonetically or visually close-enough letters. This is something people did for decades and nowadays even government IDs have an ASCII-fied (well, Latin-fied) version of the name.
achn 4 years ago

I maintain a similar system, where a variety of companies submit files that get processed through multiple services - it is astounding how ridiculous people’s naming of files can be; spaces are the least concerning!
josteink 4 years ago

> it's been astounding the number of bugs that have appeared over the years triggered by spaces and other unusual characters in file names
If you consider spaces “unusual” I would say you haven’t encountered a single average user in your lifetime. Spaces in file-names is the single most common thing people have, outside programming environments.
As a x-plat developer, the only platform where I (still) regularly encounter these kind of bugs are platforms where solving problems through scripting is common, like Linux, where the primary means of operation is through stringly-typed statements getting parsed and processed in a untyped-fashion. It's not very reliable.
On Windows people more often use “real APIs” (because scripting doesn't really work as well), but then these problems just goes away.
Pros and cons, I guess.
- SAI_Peregrinus 4 years ago
  
  It's especially funny that it affects Linux so much. Most file systems allow everything except `/` and NULL in file names. Early AT&T UNIX even allowed NULLs! POSIX shells use the IFS variable to perform field splitting, and it defaults to <space>, <tab>, and <newline>. The choice to perform field splitting by default (particularly with spaces in the default IFS set) has caused no end of headaches for developers and users.
KronisLV 4 years ago

> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
This will also break any code in external tools that are called during the builds of your application and do not handle spaces correctly for whatever reason, thus making it so that you won't be able to successfully finish the build.
Then again, you probably shouldn't be relying on technologies like that, but when you're struggling to keep an old enterprise system alive, causing yourself more problems is not necessarily what you should do.
Still a good idea in most cases, though.
chris_wot 4 years ago

And yet OneDrive WP t allow fir spaces before or after a file name.
- alx__ 4 years ago
  
  I spent hours trying to figure out why an entire folder suddenly stopped syncing. Turns out I accidentally added a hidden space to the end of a folder name.
  
  chris_wot 4 years ago
  
  Yup, their UI sucks when it comes to sync errors.
sysadm1n 4 years ago

> other unusual characters in file names
Saw a few hacks where malware authors used the RTL feature (which is baked into Windows) to obfuscate file extensions. It looked like .exe.innocuous-document.docx, but was actually .docx.innocuous-document.exe
- masklinn 4 years ago
  
  This exact vulnerability in most modern code editors just made the rounds, allowing smuggling malicious code right through review.
mwcampbell 4 years ago

My favorite filename special character bug was when I implemented CD ripping in 2005, and one of our beta testers ripped a CD with a song called "Have You Ever?". My code wasn't prepared to filter out the question mark on Windows.
- mixmastamyk 4 years ago
  
  I just hit the one where an album folder ends in a period. Rsync copies every time because the period is dropped by the filesystem silently. :-/
Foobar8568 4 years ago

Let's not forget return carriages in filenames within apps...
wongarsu 4 years ago

> Pro tip: rename your development directory
I changed my username to not contain a space because it was too annoying to deal with all the random dev tools breaking. The worst offender was probably npx on Windows [1] (resolved after four years by deprecating npx), but it was far from the only one (though the JS ecosystem was somehow the worst in this regard of all languages I worked with).
1: https://github.com/zkat/npx/issues/100
- kermire 4 years ago
  
  Same, even I had to rename my user folder to not have a space because so many tools were breaking.
dheera 4 years ago

Or not, which when bugs crop up will teach the businessy types to stop putting spaces in their filenames.
- macintux 4 years ago
  
  The beatings will continue until morale improves?
  Spaces are very useful for readability.
  
  cerved 4 years ago
  
  depends entirely what you're using to browse files
cduzz 4 years ago

And add a emoji, a character in a right to left language ( א) and perhaps 太. Maybe italicize one of those too...
5faulker 4 years ago

For those purposes I've found hyphen to be a nice substitute.
redwall_hp 4 years ago

I don't know if it's still a problem, but it used to break Python virtualenv badly. If your working directory had a space anywhere in the path, it would throw a huge fit and not work. Which is problematic when the expected name for a Mac's boot drive is "Macintosh HD" (if you ever had a reason to run a virtualenv outside of your home directory).
wldcordeiro 4 years ago

Even capitalization is a pain in the ass thanks to how OSes treat file names. I pretty much stick with either `file-name.ext` or `file_name.ext` exclusively now.
Spooky23 4 years ago

Someone should provide the OneDrive/SharePoint people some of this religion.
Mysterious character requirements that do not conform with Microsoft’s OS limits, limits on tbe fully qualified pathname length, etc.
idatum 4 years ago

Somewhat related to injecting unusual characters, in my experience in localization efforts:
Inject a Turkish 'I'. I don't know how to type or paste it here, but picture an English lower case 'i' that is upper case. It is a splendid way among many to shake out some loc bugs.
- gus_massa 4 years ago
  
  İ
  From https://en.wikipedia.org/wiki/%C4%B0
- ygra 4 years ago
  
  That would only shake out anything if you'd also test in a Turkish locale, wouldn't it? Since Unicode casing rules are locale-dependent and en-US doesn't care much about dotless i or dotted i.
ralphc 4 years ago

Late '90s I worked on Java software that got installed on several Unix platforms, including Linux for IBM mainframes. When you deal with the default en/de-coding of Unicode to EBCDIC you never have trouble with Java byte encodings ever again.
BiteCode_dev 4 years ago

> Pro tip: rename your development directory (or even better: the workspace path in CI) to put a space and/or special characters in it.
The problem with that is that YOUR code may handle it, but your tooling may not. If my code formatter break on spaces, I'm not going to change the formatter.
- ChrisSD 4 years ago
  
  You could submit a PR to their repo.
  
  BiteCode_dev 4 years ago
  
  I could submit a PR to 5 tools a week on average. I actually have the time and resources to do it once a year.
  Last week I opened a ticket for a Firefox bug. Following up on the bug took me 2 hours in total.
  FOSS is not free, you pay it with your time. And as with everything you pay for, we all have a budget.
InfiniteRand 4 years ago

It's easy to tell users to make a folder with no spaces if you're setting up a global path, however if you have an application that runs in user directories things can become painful fast. Changing your user name is a pain and can leave things inconsistent, but having to handle all the variations in people's names with spaces, punctuation, international characters, can just be mind boggling.
cerved 4 years ago

Spaces are a pain in the ass when you're using CLI so I'd rather enforce a no space policy
- reayn 4 years ago
  
  Most shells will behave just fine if you put a quote (single or double) before anything that has a space.
  A small extra step but something you get used to if you spend a lot of time in the cli.
  
  cerved 4 years ago
  
  Escaping spaces is a pain. I have to do it every day.
  I set up symlinks which help navigating around but then the relative paths are wrong for git.
  No thanks.
  Friends don't let friends put spaces in paths
agumonkey 4 years ago

See the recent article about unicode invisible glyphs in JavaScript or bash.
Naming freedom needs a stdlib module
qwertox 4 years ago

In that case, be thorough and insert a Chinese and an Arabic character to enforce a Unicode check.
rossy 4 years ago

> anything with subprocesses
I'm begging software developers to stop using subprocess APIs that take a string argument (system(), child_process.exec(), Process.Start(string)) and start using subprocess APIs that take an array of arguments (execvp(), child_process.execFile(), Process.Start(string, IEnumerable<string>).)
WalterBright 4 years ago

Sometimes / works as a path separator in Windows, sometimes it doesn't. It's not predictable.
I never use / on Windows as a result.
- ygra 4 years ago
  
  The only common place where it doesn't work is in CMD for executing programs and as arguments for built-in commands. Everything else goes directly to the relevant APIs which don't care about / or \.
  These days using CMD instead of PowerShell should be rare enough and PowerShell certainly doesn't mind the slashes.
AlfeG 4 years ago

Today I learned that You cannot install Tailscale on windows if installer is inside path with non-latin chars.
franga2000 4 years ago

More importantly than your source files, put your testing data on such a path as well. Nobody uses absolute paths in testing so it doesn't matter how many spaces your absolute path has if your input is "./tests/file1". Put those files in a folder with spaces too and throw in a unicode character for good measure.
uberswe 4 years ago

I did something similar on accident. I used to keep all my development work synced with Dropbox and I had a work and a personal account. So any of my own projects would have /Dropbox (Personal)/ in the path which did catch some bugs. Dropbox renamed my folder to "Dropbox (Personal)" automatically when connecting a work account.

mtift 4 years ago

I have an overly-aggressive function in my .bashrc to rename all files in the current directory:

  # Rename all files in a directory
  rn() {
    rename "s/ /-/g" *
    rename "s/_/-/g" *
    rename "s/–/-/g" *
    rename "s/://g" *
    rename "s/\(//g" *
    rename "s/\)//g" *
    rename "s/\[//g" *
    rename "s/\]//g" *
    rename 's/"//g' *
    rename "s/'//g" *
    rename "s/,//g" *
    rename "y/A-Z/a-z/" *
    rename "s/---/--/g" *
    rename "s/-‎--/--/g" *
  }

I use this all the time, especially when I download files.

OskarS 4 years ago

Overly aggresive is right! I don't know if this is genius or deranged! I'm leaning towards genius and stealing the idea.
By the way: what's your beef with en dashes? I mean, if it was "everything should be 'HYPHEN-MINUS' (U+002D)", then fine, but why specifically en dashes and not em dashes?
- michaelt 4 years ago
  
  > By the way: what's your beef with en dashes?
  Of all the changes in that list, removing the character that doesn't appear on a standard keyboard seems like the least controversial...
  
  mywittyname 4 years ago
  
  To add, it's a character that gets magically inserted for no reason in various situations.
  It's up there with those damn angled quotes.
  
  jedimastert 4 years ago
  
  A better question might be "how did it get there in the first place?"
  
  dredmorbius 4 years ago
  
  Presume all inputs are hostile.
  Whether people or processes, something is likely to introduce the character at some point.
  
  ggm 4 years ago
  
  Sw which converts -- and __ on the fly. Same sw converts quote pairs "for your convenience"
  
  eyelidlessness 4 years ago
  
  Opt+- if you use macOS, long press on - if you use any Apple touch OS.
- mtift 4 years ago
  
  I totally agree that for some people, this could be a terrible command to have around. However, I know that it has been working for me for about 8+ years or so. I almost always run in in my ~/Downloads folder on files that I don't really care about. I download a lot of academic papers and books, and this just saves me a lot of time to put files in the format I like: author--paper-title.pdf. And that's part of the reason why I make all of the dashes the same, so if I'm opening something by an author, I can easily autocomplete and not have to remember how to make other sorts of dashes on the command line.
  
  OskarS 4 years ago
  
  For a download folder in particular, this does sound like a great idea. You'd break the list in the browser or whatever, but who cares about that?
Tempest1981 4 years ago

Surely you must run into conflicts now and then?
- nybble41 4 years ago
  
  That's the most beautiful part! After running this script there are no more conflicts, because it just silently overwrites all but one version of the "cleaned" filename.
  (Also—that entire function is super inefficient and could be replaced with a single invocation of "rename".)
  
  mtift 4 years ago
  
  Totally inefficient. But for me it's readable and practical. This is mostly just a convenience function for me to help store files in a format I like rather than something I need optimized. If it ever started to feel slow, sure I could optimize. But for now, when I still occasionally download a file that has some weird character and I just prefer to add another line to my function.
  
  nybble41 4 years ago
  
  Without changing the design too much, you could rearrange it like so to avoid renaming multiple times and still have the option to just "add another line":
  # Rename all files in a directory rn() { rename \ -e "s/ /-/g" \ -e "s/_/-/g" \ -e "s/–/-/g" \ -e "s/://g" \ -e "s/$//g" \ -e "s/$//g" \ -e "s/\[//g" \ -e "s/\]//g" \ -e 's/"//g' \ -e "s/'//g" \ -e "s/,//g" \ -e "y/A-Z/a-z/" \ -e "s/---/--/g" \ -e "s/-‎--/--/g" \ * }
  Though I would at least take advantage of character classes to reduce the number of substitutions:
  # Rename all files in a directory rn() { rename \ -e 's/[ _—]/-/g' \ -e 's/[:\[\]",]//g' \ -e "s/'//g" \ -e 'y/A-Z/a-z/' \ -e 's/--+/--/g' \ * }
  (I'm using the `rename` command provided by the `rename` Debian package, a.k.a `file-rename`. The options may vary if you're using a different version.)
donio 4 years ago
https://github.com/dharple/detox is a nice tool for this. Sane defaults but configurable.
In addition to CLI I use it from emacs dired-mode too:
```
    (defun my-dired-detox ()
      (interactive)
      (dired-do-shell-command "detox" nil (dired-get-marked-files))
      (revert-buffer))
```
I bind it to "_" in dired-mode.
tgbugs 4 years ago

Word of warning from hard experience: rn is a really dangerous thing to name a function because it is one char away from rm.
- TheSkyHasEyes 4 years ago
  
  ren would be better than rn. :)
- post-it 4 years ago
  
  Looks like it's typically run without any arguments, so it's probably fine.
  
  lioeters 4 years ago
  
  A typo can go the other way, like "rn somefile" where it was meant to remove a file but instead it renames all files.
- spurgu 4 years ago
  
  One char away also physically on the keyboard (maybe that's what you meant?).
  
  tgbugs 4 years ago
  
  Yeah, the physical layout is the primary concern. I should have noted that since there is ambiguity because n and m also happen to be next to each other in the alphabet.
  
  jatone 4 years ago
  
  laughs in dvorak
  
  kataklasm 4 years ago
  
  cries in colemak
  
  Extigy 4 years ago
  
  I once ran “crontab -r” instead of “crontab -e” and also thought that was terrible design for the same reason.
- eyelidlessness 4 years ago
  
  Note to self: snag “notTerseAtAllMoreVerboseIdentifiersForGreatGood.js” on NPM
- itsbenweeks 4 years ago
  
  Agree. Having this function exit if any arguments are passed to it seems like a good safety measure.
niccl 4 years ago
I use this snippet, to change spaces to underscore for directories and files in the current directory and below. Haven't made it a function yet, but should. I got it from stack overflow or somewhere, but no attribution. Thanks to whoever did it first:
```
   find . -depth -name '* *' | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
```
theshowmustgo 4 years ago
Nice but how do you prevent overwrites? What about directories/folders and the files in that directory/folder?
I have:
```
  Movie Bla (2020)
    Movie Bla (2020).mp4
```
But also:
```
  Movie_Bla_(2020)
    Movie_Bla_(2020).mp4
    Movie_Bla_(2020).srt
```
Would not like to lose files like the the srt.
- BiteCode_dev 4 years ago
  
  rename will stop and output and error.
- mtift 4 years ago
  
  Yeah, sometimes I end up renaming things I don't want to, but it really doesn't happen all that often. And sometimes I throw caution to the wind, add some excitement to my life, and rename a bunch of files (not for anything professional) in some really old directory and hope I don't break anything. But I'm not aiming for perfect with this comment. I just mentioned in another comment, but the vast majority of times I run this is in my ~/Downloads folder on files I don't really worry about breaking.
BiteCode_dev 4 years ago
Thanks to all the comments in this threads, I now have "sudo apt install rename detox" in my install script, and:
```
    normalize_names() {
        rename "s/-/_/g" *
        detox -s lower *
    }
```
in my .bashrc.
I've thrown some edge cases at it, and it handles it super well. It deals with consecutive "_", remove leading garbage, normalize unicode, and even prevents naming conflicts by opting out early.
Thanks you.
mrzool 4 years ago

You might be interested in detox:
https://github.com/dharple/detox
cerved 4 years ago

I wonder if rename has an -e flag like sed. It might be worth baking this into one monolithic regex if you call this often
IX-103 4 years ago

You missed ~ You really don't want to create a directory named "~".....
l0b0 4 years ago

If you're a developer you're doing yourself a big disservice by not learning how to deal with special characters.
- mtift 4 years ago
  
  I agree. I am a developer and I know how to deal with special characters. But this isn't something I use professionally. I just prefer not to have to deal with special characters in the pdfs, m4as, txts, and other files that I use on a daily basis. When I write papers, I'll write ū or Ñ or ç or whatever (incidentally, I have a lot of shortcuts in my .vimrc for those). I would not say I am "afraid" to use spaces in filenames, but I get a certain satisfaction storing academic papers in the author--paper-title.pdf format and my notes in author--paper-title.md because it helps me find things.

TacticalCoder 4 years ago

Define "space". Is the Hangul filler we talked about yesterday a spacing character? Is the zero-width non-breaking space a spacing character? What about the typographic spacing characters?

You should better be very afraid of using spaces in filenames.

You should do everything you can to support them but you have to know you'll invariably encounter countless cases where you'll have this or that tool that won't work properly with them.

I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).

FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.

Every Git repository in the world has that example: let that sink in.

selfhoster11 4 years ago

> FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
I use Git for documents too, not only code. Why shouldn't I use my native language?
- numpad0 4 years ago
  
  Tab completion don’t work well for languages that require IME. That is one reason why I don’t.
  
  selfhoster11 4 years ago
  
  That's actually a good point. On the other hand, not all languages use IMEs. Mine just uses the AltGr modifier key, but is otherwise just a standard QWERTY layout without any features.
  
  dredmorbius 4 years ago
  
  IME == Input method editor?
  https://en.wikipedia.org/wiki/Input_method
  
  numpad0 4 years ago
  
  yup, I type in pronunciation and let it guess what I'm trying to say. Works okay in editors but don't work great with shells in a terminal emulator, so I just prefer not having to use it in shell operations.
  
  glandium 4 years ago
  
  Tab completion works just fine for me with a Japanese IME.
- cerved 4 years ago
  
  non-ascii characters cause annoying hard to fix problems. If you're willing to deal with that - kudos. Personally I don't find it worthwhile
  
  selfhoster11 4 years ago
  
  I haven't had problems yet. Spaces, punctuation, and quotes are the main offenders, most of the time.
kingcharles 4 years ago

You get all those space characters working and then some jerk comes along and uploads a file like this: ŗ̶̧̢͓̳͍͙͔̳̻̥͉̭͓̫̟͍̞̭͉͓͉̮̹͍͚̳̹̬͉͚̰͈̘̐̊̾̈̀̒͒̀͛̓̋̔͊̏͘̚ę̴̨̛̣͙̤̟̬̩̟͙͖̥̹̱̱̊͑͗̇̇͛̆̈́̃͋̓̀̔̍̍̌̐͊̎̓̅̀̕ͅģ̴̹̜̘͍̱̑͐̉̌̐̄̊͛̎́̐̌̅̈́͂͑̈́̋̔͂̊̊̒̒̔͛͆̚͘̕͠e̶̙͕̫̳̘͐̾́̑͆̓͂̿͊̊̍͛͐̌̆͗̌̅̅̔͊̂͛͗̅̕͝͝͝͝x̵̢̧̦̫͖̝̥̹͓̬͖̤̩͚̝̫̋̃̅̈́̆͋̌͑́̎̈́̊̾͒̀̒̎̓͛͊̿̓͊̀̍͐̆̚͝͝-̴̨̮̯͖͖̠̜̲̪͕̘͈͖̮̈́̓̐̃́̅̄̏́̍̉̐͌́̔̓̄͋͗̐̕͜͝ţ̴̢̧̖̗͖̞̮̫̦̼̝̺̼̱̳͓͉̜̟̤̲͖̻͙́̌̈̌̈͆̾̄͊̿̏̓͗̈́̕͜ͅh̶̢̧̨̥̭̼̟̣͖̯̗̤̖̙͉͕̙͎̰̠̝̖͈̻͙̪̮̘̯̻̼͕͓̖̣͈̽́͊̎͐͌̆̍̎̏̿͐̒́͋͑̍̿̎͆̑͆̄͂̀͐̄͑̀͗̿̽̎̾̊̕͝͝͝͝͝ͅi̴͚͈͍̫̮̝̣͖͉͓̯̠̙̭̟̖̘̾̓̄̈́̒̏̽̆̉̿͛̀́̃̋̒̈́͋̂̇̈́͛̕͜͠͠͝ͅs̶͇̖̳̞͉̱̞͓̖͔͔͍̗͇̖̮̹̅͊̔͋͊̈́̎̐̆̋̒̀̍̕͜ͅ.̴̧͎͇̰͉̼̱̰̦̟̑̋̏͌̍͊͑̄̀͌́̆̓͛̒̆̾̉͐̄̂̈́͆̒̃͗̐̂̎̈́̈͛̿́͛̾̚͘͜͝͝ͅȩ̷̡̲̪̱̪̥̳͍̼̰̘̗̹͙͙͓̣̟̩̥̥̖̠̪̮̹̞̥̻͎͖͍̯̂͑̏̑̆̍͋̎͛̅̑̑̏̎̓̀̓̒̈́͊͌̀̈́̒̌͐͂͛̊̍̐͂́̔̌̾͐̈́̋̇̏̚͜͝͝͝͠ͅx̶̧̛͚̗̜̪͍͖̘̙͎͚͇͙̬̱̟̭͓̺̙͍̖̱͚̣̘̪̭͔͔̮͎̬̪̤̹̟͔̩͍̬͕͔̩͐̈́̒̂͛̂̈̀̿̍̔̓̓̀̃̍͆̈́̍̓̌͐̈́̾̇̎̑͌͒̄̆̿̍͆̅͗͆͘͠͝͝ͅͅͅe̷̢̡̡̨̧̛͕͚̬̮̞̥̼͍͔̝̟̝̯͈̟̥͖̱̹̣̩̼̩̅̌͌̑̎̐̀̽̏́͐̋̏̎̎͛͌̀̊͊͒̑͌̎̎̑͊̌̉͆̾̚͘̚͜͠͠͠͝͝ͅͅͅ
- hnuser847 4 years ago
  
  Honest question - what the heck are those characters?
  
  sdenton4 4 years ago
  
  Zalgo text: https://zalgo.org/
  It was a great joke for a couple weeks two internets ago.
  
  Sohcahtoa82 4 years ago
  
  > two internets ago
  It's been like three internets since I heard someone using "internet" as a measurement of time.
  It's actually interesting to think about "generations" of internet, just like generations of people, and how the culture shifted between them.
  There was a time in the early '00s when broadband was catching on, yet YouTube didn't exist. A time when Ebaumsworld and Newgrounds ruled the internet. When Homestar Runner was pop internet culture. Weebls Stuff. The frog blender.
  
  grishka 4 years ago
  
  Combining diacritic marks.
  
  Valgrim 4 years ago
  
  It corrupted text or "Zalgo" text, it relies on diacritics.
  See this answer on stackoverflow:
  https://stackoverflow.com/questions/1732348/regex-match-open...
  
  lmkg 4 years ago
  
  I disagree with calling it "corrupted." We're not tricking the browser into trying to render garbage bytes that are actually the middle of a jpeg or something. It's actually valid Unicode. It's an edge-case which is not seen in regular usage, but it's technically following all of the rules.
  
  Valgrim 4 years ago
  
  It's just the way it's called, not a statement of fact.
  
  orangepurple 4 years ago
  
  In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
  https://en.wikipedia.org/wiki/Combining_character
  
  db48x 4 years ago
  
  Specifically _Vietnamese_ combining characters. The Vietnamese writing system uses multiple combining characters at a time, and stacks them vertically. Throw in a few that wrap around the character like t҉his, some 𝑎lternatᵉ lꬲttꬲr fᵒrms, disturbing imagery, and perhaps a few other tricks, and you have zalgo. See also https://stackoverflow.com/a/1732454/823846
- Loughla 4 years ago
  
  This legitimately made me laugh out loud in my office.
  The characters reach up off the screen as I reply to this. They overlay the comment above you. Amazing. How?
  
  pxndxx 4 years ago
  
  It's usually called Zalgo text, and it's what you get when you start stacking all kinds of Unicode diacritics on poor unsuspecting characters.
  https://en.wikipedia.org/wiki/Zalgo_text
  
  Macha 4 years ago
  
  Interestingly I get different behaviour per browser/OS. Firefox/Linux clips it to the bounding box of the parent element, Firefox/Mac and Safari/Mac clip it to the line height, and only Chrome/Mac lets it extended further.
  
  detritus 4 years ago
  
  Huh, I tried it in Chrome to see how it reacted here and it maintained about the same position as it did in my usual browser, Firefox.
  
  terr-dav 4 years ago
  
  Firefox and Safari on iOS 15 both render all the glyphs attached to the base character. Vivaldi, Chrome and Firefox on Win10 all render them stacked and overlapping the parent and child comments.
  
  kingcharles 4 years ago
  
  This is the best generator I found: https://lingojam.com/GlitchTextGenerator
  
  nyanpasu64 4 years ago
  
  I find http://animalswithinanimals.com/generator/generator.html much more controllable.
- Liquix 4 years ago
  
  For anyone who is curious (and acolytes of Zalgo): "In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character. So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model."
  [https://stackoverflow.com/questions/6579844/how-does-zalgo-t...]
  
  shadowgovt 4 years ago
  
  Hah, lucky for me Chrome on Ubuntu didn't implement the spec correctly. ;)
- jagged-chisel 4 years ago
  
  768 characters is too long for macOS it seems. (References online say HFS+ has a limit of 255 UTF-16 characters. Didn't find anything for APFS immediately... edit: same for APFS)
- quantified 4 years ago
  
  Glad you didn’t choose a sequence that crashes my browser.
- meepmorp 4 years ago
  
  regex this, bravo
- dang 4 years ago
  
  Please don't Zalgo on HN. It's enough to speak its name.
  
  allemagne 4 years ago
  
  It would be one thing if it was making other comments difficult to read or causing browser issues, but I appreciated the demonstration that both would presumably be possible on certain browsers
- aasasd 4 years ago
  
  Until now, I haven't actually thought of what would happen if zalgotext occurred anywhere other than a web browser. Looking forward to the five minutes of fun with the file manager and whatnot.
chungy 4 years ago

> I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
I have an Android phone and I tell MusicBrainz Picard to save all files with ASCII-only names and Windows-compatible names for the ones that get sent over to the phone. Basically for this reason. Sometimes it's players on Android itself, but even more frequently, whatever bluetooth radio I'm connected to freaking out with non-ASCII characters.
- torstenvl 4 years ago
  
  What do you mean, display garbage?
  L'imp?ratrice? L'imp�ratrice? L'impÃ©ratrice? L'imp‚ratrice? L'impÚratrice?

branko_d 4 years ago

I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string - it's a sequence of path components and should be treated as such by our APIs. A path should be parsed once - on user input - and then used in its "sequence form" throughout the software stack.

And "path component" is not an arbitrary string either - e.g. appending a path component to the path should first require converting/parsing the string into the path component, and only if that's successful appending it to the path.

jerf 4 years ago

"Path is not a string - it's a sequence of path components and should be treated as such by our APIs."
For maximum correctness, you want to turn it into a file handle as soon as possible, and do all operations through the variations of the file functions that end in "at", like: https://linux.die.net/man/2/openat
The downside of this approach is that you still technically have to carry the path around with you if you ever want to present it back to the user, because once you have a directory handle, you can get back to the root directory easily enough by following parent links and seeing what directories you end up in, but that may not be what the user "thinks" the path is, and they want to see their path, not a canonicalized one. And they're mostly right. And it's not easy to correctly track changes to their intended path from this basis either.
Basically, I don't know of a really solid, 100% correct way to handle this with any reasonable degree of effort.
- Pxtl 4 years ago
  
  "you want to turn it into a file handle as soon as possible"
  But no sooner.
  For example, I've run into problems where I'm configuring program A server to talk to file location B... but I don't have access to file location B. But the client-side library for talking to the server tries to convert location B into a file handle and then freaks out because I can't access it. When I don't want to access it. I want that program to serve it.
  If it was using simple "path" objects that didn't confirm that I have access to the path, everything would be hunky dory. But because it tried to convert it into a file handle unnecessarily, I get blocked.
- aspaceman 4 years ago
  
  Why not just hold onto both? The users representation and the file handle. Only ever "display" the representation, while you do all operations on the handle. (Not trying to be sarcastic, just curious).
- globular-toast 4 years ago
  
  This goes for most instances of user input. Timestamps is the other common one people get wrong. I've even seen programs that pass around timestamps as strings in multiple formats and as integers (Unix time).
  
  aqfamnzc 4 years ago
  
  As a programming noob, I'm wondering what would be the better way to pass or return a unix time value as opposed to an integer?
  
  globular-toast 4 years ago
  
  Depends on the language but most high-level languages have a timestamp or datetime abstraction which you should be using.
  
  joe_guy 4 years ago
  
  If it's being serialized, consider fully qualified iso8601.
  
  mleonhard 4 years ago
  
  If you need to keep the timezone with it, then use an ISO8601 [0] string: "2021-11-11T15:32:35-07:00".
  Otherwise, use an integer unix timestamp, the number of seconds since 1970-01-01T00:00:00Z: 1636673555. Use an unsigned 32-bit integer or a 64-bit integer to avoid the 2038 problem [1]. JSON's maximum safe integer value is a signed 53-bit integer, so if you're using HTTP JSON RPC, you'll have to check for overflow.
  [0] https://en.wikipedia.org/wiki/ISO_8601
  [1] https://en.wikipedia.org/wiki/Year_2038_problem
  [2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
  
  globular-toast 4 years ago
  
  ISO8601 is a serialisation format. You wouldn't want to use it in internal function calls simply for performance reasons. You also wouldn't want to pass it around as just a "string" type. I think the question was asking about internal function calls. For external data interchange, ISO8601 is the only sane option and deals with all known timezone and leap second bollocks.
- BoorishBears 4 years ago
  
  > For maximum correctness, you want to turn it into a file handle as soon as possible
  This is why I get stressed out when I see paths turned into special objects encoding separators and such.
  It tells me the path is living for way too long compared to the file handle.
  I only want to see path-specific objects if we're modifying the path, and even then I want that to happen as late as possible.
- jmull 4 years ago
  
  > For maximum correctness, you want to turn it into a file handle as soon as possible
  That's not right. You want to resolve a file/folder path to a file/folder at the exact point it makes sense.
  It's a problem if you're using a path when you wanted the file. The file can be switched/modified out from underneath you.
  It's also a problem if you've got the file when you only wanted a reference. Now you can't simply switch/modify the file independent of the reference. E.g., maybe you want config file changes to take effect immediately and transparently.
  You can also have the hybrid case, e.g., where you want the folder directly, but have a relative path to a file that is resolved late.
  If you're unsure, I'd err on the side of late resolution.
- tmerr 4 years ago
  
  Another inconvenience with this approach is that you can keep thousands of paths in memory no problem. But thousands of FDs may cause you to exceed per-process limits.
- cerved 4 years ago
  
  doesn't this lock the file?
dahfizz 4 years ago

> I have an uneasy feeling whenever I see a path parameter declared as string. Path is not a string
I guess that depends on what you mean by "string". `open` and `fopen` need a char* path to open a file. Whatever fancy Path abstraction you use eventually becomes a char* string, because that's what the kernel needs.
- VWWHFSfQ 4 years ago
  
  yeah. it's a string.
  
  dwheeler 4 years ago
  
  On POSIX systems file names are not strings, they are sequences of bytes. They might not be UTF-8 or have any meaning. Python3 had to hack around this, they thought they could force everything to Unicode and discovered that doesn't work.
  
  guntars 4 years ago
  
  Which makes for fun issues like there's no standard way to display a filename in Unix. A system that's, you know, all about files.
  
  warkdarrior 4 years ago
  
  Unix: everything is a file, including file names!
  
  duped 4 years ago
  
  That's probably because paths aren't properties of the file itself, they're helpers to reference the file.
  
  sipos 4 years ago
  
  At least for most Linux systems (not sure about other *nix, but I expect the same?), there is a system default encoding, defined by the locale, and I think decoding the filename in that encoding and displaying the resulting string, is probably the correct way to display a filename? That seems as good as you are likely to get on any system really.
  I think for any POSIX system, either there is locale support defining the encoding, or it uses the POSIX locale, which defines the encoding (ASCII).
  Of course you need to handle cases where filenames cannot be decoded in the system encoding (probably by replacing characters that cannot be decoded), because a filename in a different encoding, or even with no valid encoding, has been used on disk. While systems can say that file names containing bytes that are not valid characters in the system's encoding are not valid file names, that doesn't stop people mounting disks with them, so the problem never goes away if you support opening media from other systems.
  What I am saying is that this is no more a Unix problem than it is a problem on any system that supports removable media.
  
  account42 4 years ago
  
  On POSIX system file paths are C strings, which are sequences of bytes that cannot include the 0 character. UTF-8 or oher meaning is not required for something to be a string.
SAI_Peregrinus 4 years ago

POSIX "Fully portable filenames" allow all characters except 0x2F (/) and 0x00 (NULL). That means file names can include line feeds, backspaces, EOF, etc.
"This is `a
perfectly vali'd.\010! file name\377, despite the weirdness"
anyfoo 4 years ago

Strings following certain rules are entirely valid representations of paths, just like sequences of path components in the chosen language/framework are. Similarly, the sequences of bits that make up the sequences of your language/framework in memory are an entirely valid representation of said sequences of components.
Yes, paths have structure, but saying "a path is not a string" is equivalent of saying "C source code is not a string". Both are strings, and both are something else, represented by strings according to rules. Different internal representations have different advantages and disadvantages. I fully agree that for things such as "adding components" an internal sequence/list representation is better, but strings can pass arbitrary IPC or even ABI boundaries much easier for example. (And you wouldn't bat an eye for example when you see FQDNs like "www.google.com" passed as a string instead of as ["www","google","com"] because the string representation works pretty well.)
- fouric 4 years ago
  
  C source code and paths are both representable by strings, true, but the fact that they're not actually strings is still important, because most people don't know that, and in the case of paths that leads to a lot of edge cases (in the case of source code it leads to a bunch of inefficient and weak tooling, which isn't quite as bad).
  Because neither are strings, their native representation shouldn't be such - it should be something structured, and only when necessary (IPC, FFI, serdes) be serialized into a string representation. This would save people a lot of time and effort.
  
  anyfoo 4 years ago
  
  It really depends. Do you usually keep hostnames as strings? URLs? JPEGs? Why or why not?
  Sure, a browser will hopefully quickly parse that URL and break it up, an image viewer will do the same with a JPEG. Will anything that's only interested opening/displaying that URL or JPEG, through a library or external program?
  POSIX paths are actually remarkably simple in structure[1]. The only caveat is equality and normalization: Without normalization, a path a might be equal to a path b while their representations differ, e.g. "/etc/foo" and "/etc/bar/../foo". But this is the same whether you have a string or a list of strings, you need to normalize in whatever representation you choose to check for equality.
  [1] Almost shocking myself, even Haskell defines its primary FilePath type literally as "String".
naikrovek 4 years ago

things like this are why the Unix philosophy is so bad.
text processing is hard if you must support Unicode, and that means every Unix command line tool must implement or employ a text processor to handle input. it would be much easier if objects were passed back and forth. PowerShell got this right.

foxfluff 4 years ago

I'm hardly afraid but I just think it's poor ergonomics. Same as the move from

   xset m 0 0

    xinput --set-prop 'pointer:Logitech USB Receiver' 'libinput Accel Profile Enabled' 0, 1

Everything seems to be going this way in Linux land. Longer names, harder to type names, camelcase names, spaces... I'm looking forward to an OS that treats command line ergonomics as a first class feature and where camelcase & spaces are verboten.

martin-t 4 years ago

I find this attitude misguided. More descriptive names are more ergonomic for things you only use rarely but they need to be combined with much better autocompletion than most shells provide by default.
- foxfluff 4 years ago
  
  You state that as if that were objective.. but that's not my subjective experience at all. Somehow I have a hard time remembering these long names, (is it --conf or --config or --config-file or --config-path? -c would've done it for me. --set or --set-prop or --set-property or --prop or --property?), and I need to look them up in a man page anyway, and I make more typos typing them, and shell completion rarely works well if at all. I also find it harder to read and edit long lines that wrap.
  Somehow these short letters stick much better for me, and the effort for finding them in the manual is the same, although in case of extra complexity as with xinput, it's even worse with the long names. I don't use either command often, but it's hard to forget xset m. The only thing I remember about xinput is that it's a horribly long lithany of things which I need to look up every time, and the syntax still feels weird.
  
  me_me_me 4 years ago
  
  the most used options for properly written tools have both short single char option like -c and long-form version --config if you need verbose self-describing option.
  If you are using cli tools of github written by a random person, then no wonder you will see non-standard approaches to UX.
  
  sidpatil 4 years ago
  
  PowerShell takes an interesting approach in that it accepts any truncated variant of a long-form flag as a short form, provided it isn't ambiguous (i.e. if the interpreter can't decide which long-form flag to expand a short-form flag to.)
  For example, if a command features a "-ConfigFile" flag, valid short-form variants include "-C", "-Co", "-Con", "-Conf", and so on. But if the command featured an additional flag "-ConfigURL" for example, the aforementioned short-form flags would be ambiguous.
  
  Mindless2112 4 years ago
  
  getopt_long (and thus most GNU programs) work this way. I think it's probably a misfeature though since it means that adding a new option can introduce ambiguity. Having both short (ex. -x) and long (ex. --exclude) options is a less problematic solution.
  
  ufo 4 years ago
  
  The shell ought to be able to help with that. There's no need to remember if it's --conf or --config if you can press --conf<tab>.
  One of the things I like about Fish is that by default it can tab-complete program options and also shows a one-line description of what each of them does. (It grabs that info from the man page).
  
  ori_b 4 years ago
  
  So much of computing is dedicated to solving problems that could be omitted.
  
  Joker_vD 4 years ago
  
  I mean, that's precisely my thoughts on copyright and licensing in general but what can you realistically do?
  
  forgotmypw17 4 years ago
  
  Realistically, on an individual scale, you can pretend it doesn't exist and go on with living your life?
  
  Joker_vD 4 years ago
  
  I very much would if only that pesky State didn't persecute me for that. Apparently, when I refuse to acknowledge the copyright and software license terms, other people get upset to the point of bringing the wrath of that Leviathan of oppression upon me! The nerve of some people!
  
  salawat 4 years ago
  
  Seriously. Just get up from the computer and go do something else. /s
  We computer people are truly an odd bunch.
  
  fouc 4 years ago
  
  > and shell completion rarely works well if at all
  
  foxfluff 4 years ago
  
  I just tried fish. xinput --set-[TAB] and nothing. Apparently it doesn't understand the standard long-option format that is supported by xinput and documented in the man page. You have to know to omit the dashes and then it'll complete. And it's downhill from there.
  Yeah I used to have all kinds of simple as well as supposedly sophisticated completion setups with zsh years ago but I've given up on it since then. It's always half-assed and half the time causes more problems than it solves. Same with bash. There are some places where I must resist the urge to try complete a filename because the shell starts trying to figure out which target it can complete from a Makefile in a large build system and just freezes. The only practical way out is to interrupt and type the command again or wait a stupidly long time. There are other issues like completion trying to be smart and filtering out things it thinks you don't want to complete. Nothing is more frustrating than a shell refusing to complete a filename that you know is there.
  
  throw10920 4 years ago
  
  I run fish. I was able to get long-option completion for gcc, polybar, firefox, man, emacs, xrandr, and fish itself. The only command I was not able to get long-option completion for was xinput. You just picked a bad program to try.
  
  tambourine_man 4 years ago
  
  I'm with you. Terseness is paramount.
  I could never overcome my repulsion for Java and ObjC because of that. On the other hand, I fell at home with crazy RegEx that look like line noise to most people.
  
  yepguy 4 years ago
  
  I think shells could use something like a built-in eldoc[1], in addition to tab completion. It would make terse command line interfaces much more usable if you could see what the positional arguments were for.
  [1]: https://docs.cider.mx/cider/config/eldoc.html
  
  8bitsrule 4 years ago
  
  I hate .methodNameAsLongAsMyArm as well, but there's the opposite extreme:
  As a beginner, I liked short variable names. When I came back a few months later, I learned my lesson. Years later? easier to just start over.
  
  johnchristopher 4 years ago
  
  I like long form version. It helps me remembering what it does and why. Eg: `iptables --insert INPUT --protocol tcp --jump ACCEPT` was more helpful to me than `iptables -i INPUT -p tcp -j ACCEPT` when told how to allow TCP traffic.
  For everyday command like `ls -l` I don't mind but anything more serious I take a more cautious approach.
  
  efreak 4 years ago
  
  The few scripts that I've written for personal use generally lack documentation or help commands of any sort; instead, they take all possible straightforward variants I can think of for each command (`--config`, `--config-file`, `--cfg`, `--conf`, etc). They usually convert everything to lowercase before processing, too. It's easier to fail safely on too much/too little input than it is to provide actual help.
- omnicognate 4 years ago
  
  Spaces don't make anything more descriptive, they just cause completely unnecessary quoting and escaping hassle.
  The amount of time that has been wasted by Windows using "C:\Program Files" instead of "C:\Program_Files" far outweighs any highly questionable aesthetic benefit IMO.
  
  ygra 4 years ago
  
  On the other hand, how much broken code has been fixed to properly deal with paths just because of that? I'd argue that to be a major benefit. Same with Windows Vista forcing developers to write applications that work properly as a non-admin user.
- Too 4 years ago
  
  Short option for interactive terminal. Long option in automation.
  I’ll be damned if I have to remember or lookup what -n means to some obscure program, when reading someone else’s script. Exception given for super common tools where everybody knows like ls -la.
  With the disclaimer that shell scripts, especially ls, aren’t exactly suitable for reliable automation in the first place.
skohan 4 years ago

What's wrong with camelCase? It's easier to type than snake
- thrwyoilarticle 4 years ago
  
  There's a tendency away from snake_case and towards kebab-case in things you interact with via CLI. Even moreso towards nocase.
  Programs like Powershell eschew ease of use in CLI for readability in scripts.
  
  pvaldes 4 years ago
  
  Snake_case is problematic for including filenames in TeX also. This is a big no for me, even if I find it more readable than the other.
  
  JadeNB 4 years ago
  
  > Even moreso towards nocase.
  Nocase (did I break a rule by writing it that way?) seems great when you're enmeshed in the domain and you can see the implicit separators, but then someone looks at your naming from the outside and you're guaranteed to have an 'expertsexchange' in there somewhere.
  
  thrwyoilarticle 4 years ago
  
  oh, fsck
  
  rk06 4 years ago
  
  Powershell is case-insensitive, so camelCase is only a writing preference
  
  thrwyoilarticle 4 years ago
  
  It's still verbose in places
- chrismorgan 4 years ago
  
  camelCase is objectively harder to read than snake_case or kebab-case, though familiarity can mitigate that.
  
  skohan 4 years ago
  
  I'd argue it's at most a tiny bit harder to read, and a lot easier to type. On balance I'd rather avoid making a pinky key one of the keys I have to use the most.
  
  daneel_w 4 years ago
  
  "On balance I'd rather avoid making a pinky key one of the keys I have to use the most."
  And you use something else than your pinky finger for the shift key specifically when typing capitalized letters for camelCase?
  
  skohan 4 years ago
  
  At least it's where they sit naturally on the keyboard. And the shift key is wider specifically so you don't have to be accurate with your pinky when you're pressing it. The underscore is one of the least ergonomic keys there is. And you need both pinkies to do it
  
  daneel_w 4 years ago
  
  I might be misunderstanding. On all layouts I'm familiar with the underscore key is directly next to one of the shift keys, or left of backspace. Neither layout requires the Vulcan death grip. Shift should always be under your pinky fingers to avoid contortions.
  
  skohan 4 years ago
  
  On the US layout it is next to the zero key on the top row.
  
  frenchyatwork 4 years ago
  
  Having used a lot of all the formats, it's argue it's a lot easier to read an a tiny bit harder to type. For typing it's basically just an extra `-` because unless your alternative is nocase.
  For reading, CamelCase has 2 significant ambiguity issues: similarity between I and l, and what do you do with acronyms. Acronyms wouldn't actually be a problem if everybody just wrote them would in snake_case (i.e. only capitalize the first letter), but they don't and so it's anyone's guess whether you're going to get "Id" or "ID".
  There's also a minor issue where if you're on a case-insensitive file system it can be a little difficult to change casing, but adding/removing underscores is easy.
  
  skohan 4 years ago
  
  Adding an underscore everywhere is horrible! The spacebar is huge, and gets your thumbs basically to itself because space will be one of, if not the most commonly typed key. To replace that with one of the least ergonomic keys makes no sense.
  And if CamelCase is so hard to read, why is it the norm for "high level languages"? Shouldn't those be optimized for ease of use?
  
  frenchyatwork 4 years ago
  
  > And if CamelCase is so hard to read, why is it the norm for "high level languages"
  That's over-selling it a bit. It's more common, but not dramatically so. Outside of class names, CamelCase isn't the norm for Python, PHP, CSS, HTML. It's also not the norm for shell scripting, but shell scripting has horrible readability for other reasons.
  I believe CamelCase is more common for languages like Go, C#, and Java because they grew up in large organizations where having god objects/classes with 400 methods is kinda normal and having aMethodWithAReallyLongName is pretty common. One of the advantages of CamelCase is that it does shorten really long names.
Dudeman112 4 years ago

I could infer a lot about the second and what those params mean and what they do.
The first one is some magical incantation.
- foxfluff 4 years ago
  
  Sure. One could also make "move-down-one-line" be the incantation to move the cursor down a line in vi, but I prefer j.
  Ergonomics isn't all about making everything self-descriptive for someone seeing the thing for the first time. It's about making things comfortable to actually use. If it's so long and complicated that you can't even remember how to do it, it's not very comfortable to use. Even if I could remember, xset m 0 0 is still far more comfortable.
  And fwiw you still don't know what 0, 1 in accel profile do; you need to look that up or take a wild guess, and if you want to use that command, you'll also have to know how to look up the device because chances are yours is not the same as mine. So it's not any less magical in the end, just more verbose.
  The "cool" thing about the xinput command is that you don't even find accel profile in the man page. You gotta look elsewhere if you want to understand what it is and what it does and what the parameters are.
  xset m? Yes, that is documented in the man page.
  
  Gigachad 4 years ago
  
  It should be based on frequency of usage. I can tell you that moving down a line in vim is a little more common than toggling the mouse acceleration.
  I would never even type such a command. I would just copy paste it once.
  
  foxfluff 4 years ago
  
  Yeah well, given that mouse acceleration tends to be on by default, I need to turn it off every time I'm on a fresh install or computer I haven't used before. The last time I needed that was yesterday.
  I don't want to waste time searching for a command to copy-paste when it could just be made short, simple, memorable and ergonomic. I could type xset m 0 0 faster than I could open a browser and ask google how to disable acceleration with libinput. And again: you can't just copy-paste the xinput command unless you're lucky enough that it matches your device. On my new computer, the device has a different name than on my old laptop even though it's the same damn mouse.
  
  TheOtherHobbes 4 years ago
  
  It should be, but how would you keep track of usage frequency?
  At least it would push all the "This switch was added by someone playing with UNIX at a university in 1986 and hasn't been used since" options to the end of the list.
  
  ReleaseCandidat 4 years ago
  
  > Ergonomics isn't all about making everything self-descriptive for someone seeing the thing for the first time.
  We're talking about `xset`. It doesn't make sense to optimize that for usage of more than once a year.
  
  foxfluff 4 years ago
  
  The less frequently I need something, the more frustrating it is if it's not short and memorable (or easy to look up in the synopsis or built-in help). Forgetting and googling a needlessly complicated command over and over again every year isn't fun.
  xset achieves that perfectly. If I somehow didn't remember how to set mouse acceleration with it, a quick glance at the synopsis immediately tells me. Or I can just run the command and it'll tell me:
  To set mouse acceleration and threshold: m [acc_mult[/acc_div] [thr]] m default
  Zero frustration, and the command is so short and simple that I end up remembering it without trying.
  This is something I've observed more than once: I easily memorize useful sets of one-letter flags even if I can't remember or know what they all stand for. This just doesn't happen nearly as much with long options. Commands like ls -ctrl or ss -nap quickly become part of my repertoire even if I don't use them very often, but I really couldn't remember ss --numeric --all --processes (if I had written that from memory, it could've ended up as --num --all --pid or --numeric --any --process), and I don't even know what the corresponding long options for ls are. In the rare case when I have to deal with an option that has no short equivalent, I feel like I have to look it up every time if it's been longer than a few weeks.
  You talk of optimization but I think this is just a very basic (and reasonably successful) attempt at sane design. It's not like someone had to go far out of their way to make this in a manner that isn't batshit insane.
- eloisius 4 years ago
  
  But which case should software interfaces optimize for? Ergonomics of someone who uses a tool frequently, or interpretability for casual by-standers of some out-of-context shell command?
- zsmi 4 years ago
  
  Another interpretation is:
  On the first, you think you know what it does, but you're not sure. So maybe it gets looked up.
  On the second, you know you don't know what it does. You so know to look it up.
  Personally, I'll take the second. Assumptions during debugging are dangerous things.
zibzab 4 years ago

I've a feeling you will hate powershell
- akersten 4 years ago
  
  Needlessly long parameter/command names and the bizarre insistence on capital letters are the #1 and #2 reasons I detest PowerShell. Like GP, I resent that Linux tools are moving in that direction.
nomorecommas 4 years ago

Long option names are more descriptive, more easily distinguished, and easier to remember. Your shell should be intelligent enough to provide tab completion for option names, assuming it is configured to.
- Angostura 4 years ago
  
  > Long option names are ... easier to remember ... Your shell should be intelligent enough to provide tab completion
  They are so easy to remember that you need to configure your shell to remember them for you?
- Jiro 4 years ago
  
  Long option names are more difficult to remember because a long option name can be spelled multiple ways and it is difficult to remember which spelling is correct.
- forgotmypw17 4 years ago
  
  >Your shell should be intelligent enough to provide tab completion for option names, assuming it is configured to.
  Wait, are you saying that I need to change my shell or config to make up for another tool's poor design?
  No, thanks.
- kaba0 4 years ago
  
  IMO, powershell got it right. Yeah, it’s syntax is strange, but it has standard flag usage with proper autocomplete, and you can shorten any flag the way you want (eg. fuzzy match) if it is unambiguous.
formerly_proven 4 years ago

Cue nmcli (CLI for Gnome's NetworkManager) which uses UUIDs for everything and (at least a while ago) did not accept partial-but-unique UUIDs. Basically goes "nmcli connection up 5095665a-d82c-4ae6-8964-283623387941".
- apricot 4 years ago
  
  By this point, I'm pretty sure there are people at gnome who compete to see who will make the stupidest suggestion that gets put in production.
  
  MonkeyClub 4 years ago
  
  It's a Gnomespiracy to determine whether worse is actually better.
  
  8bitsrule 4 years ago
  
  Fits right in with COP26. (Could Of Punted?)
- gertlex 4 years ago
  
  Weird, I haven't had to do this. Most(/all?) connections have nice names you can see with `nmcli c`... and so I can do `nmcli c up id DroidNet` and that's pretty dang nice. Pretty sure this worked with Ubuntu 14.04 (though, nmcli has gotten much more featureful since then)
  (The ability to shorthand connection->c and similar is great, too; obviously not unique to nmcli)
- prionassembly 4 years ago
  
  apt-get install nmtui # it's better
  
  O_H_E 4 years ago
  
  nmtui is a life saver tbh
apricot 4 years ago

The problem is we're optimizing for "easy to learn" rather than "easy to use".
- jjoonathan 4 years ago
  
  In a world of broken promises and tool churn, minimizing tooling investment isn't laziness, it's a defense mechanism.
  This is a lesson I had to learn the hard way, multiple times.
  
  forgotmypw17 4 years ago
  
  I've learned this lesson too, and I now avoid using any tools that have broken backwards compatibility in the past 20 years.
- foxfluff 4 years ago
  
  That may be a part of the problem but honestly I don't feel like all these new crazy interfaces are easy to learn either. I mean how do you come up with the lithany xinput calls for? You need to understand the syntax for specifying a device. You need to know that you're to set a libinput property, and you need to know the name of that property, and it's not documented in xinput man page, and of course you need to know the values to pass which again are not documented in xinput man page. You can play with --list-props and then take your search elsewhere because it is completely opaque and doesn't explain what the properties actually do.
  I suspect the number of people who figured all that out without having to find it by googling / arch wiki / whatever is very very low.
  Now I'm not gonna say xset is the easiest interface to figure out, but the syntax for setting mouse acceleration is right there in the synopsis, and if you search down the man page, you'll learn a little more (and also if you just run xset without arguments, it'll tell you how to set mouse acceleration). It might not be the best designed tool but it's something I learned back in the day as a teenager just by looking at the man page.
  I think the real issue is that people nowadays are designing these interfaces to be consumed by interactive configuration tools, GUI apps, and desktop environments; they're more dynamic, more complex, more flexible, but not easier to figure out, not for you on the command line. The command line is just a last resort. Second class citizen if you will.
  
  forgotmypw17 4 years ago
  
  alias mouseoff='xinput set-prop 11 "Device Enabled" 0' alias mouseon='xinput set-prop 11 "Device Enabled" 1'
  Kind of ridiculous if you ask me.
  
  foxfluff 4 years ago
  
  It is, but they actually have a shortcut for that (--disable, --enable).
  
  forgotmypw17 4 years ago
  
  Direct quote from my console:
  $ xinput --disable Segmentation fault (core dumped)
- deckard1 4 years ago
  
  On some level it makes sense. The problem with the command line is familiarity.
  How often do you reach for iptables? If you're like myself, and most home/desktop users, then probably once in a blue moon to set it up and then you leave it alone. But a system admin? Maybe they touch it a few times a week or month. Every time I use iptables I have to relearn how Linux networking works.
  Similarly, the xset/xinput thing. When I need those tools I just create a script or throw it in .bashrc. I adjust the settings once and will not touch them again for a couple years. It makes sense to have long parameters that are readable. I can look at my .bashrc and see exactly what device is getting adjusted.
Pxtl 4 years ago

imho, the fundamental problem is using space as a delimiter. Also, case-sensitivity is a disaster for ergonomics.
If you had comma-delimiting like in an algol-derived language, you wouldn't need to quote things with spaces.
edit: also, code is read more times than it is written, so optimizing for readability over brevity is generally a good move.
ansible 4 years ago

Well, if you think that's bad, behold the recent trend in network interface names on Linux.
We started out with 'eth0', 'eth1', etc. Which adapter was which could change when adding and removing a network card. That was bad, so that prompted the evolution.
Now we have 'enp1s0', 'enp0s31f6', 'enp13s0' and many similar variations. These are supposedly more stable across device changes. As it turns out, it wasn't.
But wait, there is more! Now we have the "predictable names" scheme that produces interface names that are even longer, and not even slightly easier to remember.
Read about the whole sorry saga here:
https://wiki.debian.org/NetworkInterfaceName
I do get that it is not an easy problem to solve, especially in the face of removable network interfaces (like USB Ethernet / WLAN). But surely this is not the best we can do.
- nocman 4 years ago
  
  Missed the 's', it's:
  https://wiki.debian.org/NetworkInterfaceNames
- foxfluff 4 years ago
  
  I was actually ranting about this on IRC last night (yeah now my laptop has two enp* interfaces and enx[MAC])..
  One thing I like about OpenBSD is that buses are scanned and drivers probe in order and there's no race between drivers coming up. Unless your hardware is physically tampered with or broken, all interfaces come up with the same name across reboots. Linux isn't like that (even if you don't touch your hardware, interfaces could swap across reboots), so you need to do something about it.
  As is typical on Linux, the default is unergonomic and if you want something nice, you're on your own to make it so.
  If you already have userspace daemons responsible for device insertion and naming, it really wouldn't have been so hard for it to e.g. automatically add a config file / database entry for each interface the first time is seen. So the devices that came up as eth0 and eth1 are still eth0 and eth1 on the next boot; if I unplug eth0 and add a new card, the new one would be eth2 because eth0 is still reserved for the first card I had.
  
  ReleaseCandidat 4 years ago
  
  > add a config file / database entry for each interface the first time is seen.
  Ubuntu did that with their persistent-net.rules udev rule. That was a part of the PITA of the old naming.
- ReleaseCandidat 4 years ago
  
  > These are supposedly more stable across device changes.
  No. These are stable across reboots. The old eth? weren't. And yes, that had been a PITA.
- account42 4 years ago
  
  If netwok interfaces were files we could just have both short names and stable names, like what we have for block devices.
throw10920 4 years ago

These changes are meant to make it easier to read and understand command-line incantations (and to make them more explicit, which is always good), because the command-line paradigm, being text-based, imposes an unavoidable trade-off between ergonomics and understandability/ease-of-use. It sounds like you prefer ergonomics - although I wouldn't be surprised if most users would prefer ease-of-use.
Of course, if one doesn't write a CLI to begin with, this trade-off doesn't exist - you can have your cake and eat it too.

ourmandave 4 years ago

A lot of my stuff is cross platform so making filenames portable means avoiding spaces.

Ironically, even NASA doesn't like space.

https://www.nas.nasa.gov/hecc/support/kb/portable-file-names...

zibzab 4 years ago

Touché my friend, had a good laugh

hardwaresofton 4 years ago

I am also that age, and kebab-case is the best case for filenames.

2021-01-01-some-important-document.pdf gives me the warm fuzzies. On the off chance that some more differentiation is needed, throw in an underscore and a whole new world opens up

tambourine_man 4 years ago

I go one step further: 2021-11-11_client_project-name.ext
2021-11-11_client_projectName.ext is also OK. But underscore separates fields, hyphens for space replacement.
- pluc 4 years ago
  
  this is the way
  
  tomcam 4 years ago
  
  but the extra Shifts, no thank you
  
  pluc 4 years ago
  
  you gotta involve your pinky or it'll atrophy
  
  reaperducer 4 years ago
  
  Cut most mine off in an unsupervised Halloween pumpkin carving accident when I was a kid. I think the lack of length actually allows me to type faster.
- hardwaresofton 4 years ago
  
  I see and applaud your use of the underscore there, but I must reject the premise!
  work/client/project/2021-11-11-file.ext is more or less how I lay stuff out. I’d say client/project is a folder level distinction (arguably dates too).
  [EDIT] Realistically most of the stuff under <project> is git repos and I usually make a “home” repo where I keep org files for tracking hours, notes, and resources related to the engagement.
  
  cmg 4 years ago
  
  work/client/project/2021-11-11-file.ext is great until you've got a '2021-11-11-project-status.txt' in a few directories and you need to find one quickly! I do a combination: clients/client/project/2021-11-11-client-project-update.txt
  
  renewiltord 4 years ago
  
  I just store it as a content hash and then when I want to find the file, I just have to recreate its content and I can then just get the hash.
  
  ModernMech 4 years ago
  
  It sounds like what everyone in this thread needs is a database file system. This was always my favorite proposed feature of Windows Longhorn that never made the cut. Almost 2 decades later and Microsoft's latest OS still doesn't have this feature.
  
  nayuki 4 years ago
  
  I wrote about what I perceived as deficiencies of hierarchical file systems, and proposed an alternative organization based on tags and hashes. It was discussed on Hacker News last week and many years ago.
  https://www.nayuki.io/page/designing-better-file-organizatio... ; https://news.ycombinator.com/item?id=29141800
  
  tambourine_man 4 years ago
  
  Have you used BeOS?
  
  ModernMech 4 years ago
  
  For sure! I actually used Be before I ever used Linux.
  
  Zababa 4 years ago
  
  I'll be the opposite voice: the file system isn't for precise organisation, it's just for storing. For organisation, the ideal thing to use is tags. Since most file systems don't have tags and using software for that would be a pain, the best way to do this is to list the tags in the file name.
  
  soldeace 4 years ago
  
  I've always thought that personal files, photos, or any other kind of just needed more connections between them to improve my information retrieval experience. That's how I had become a Zettelkasten evangelist. I believed it would be the cure for the information overload disease of our era.
  But life made me use Emacs org-mode more and more, and I'm now in love with tags. Retrieving information has become so easy, especially with org-mode's tags inheritance, that I hardly think making connections between headings or notes is necessary anymore[1]. And I believe that applying tags to filenames (a la Karl Voit [2]) will create the same effect
  [1] A Zettelkasten-like system is still unbeatable imo when it comes to ideas repositories, i.e. a second brain you can talk to and get new insights. It's just not that great for personal knowledge management or project management.
  [2] https://github.com/novoid/filetags
- ridaj 4 years ago
  
  Maybe you mean `2021-11-11_client_project-name_v2_final.ext`
  
  whatusername 4 years ago
  
  2021-11-11_client_project-name_v2_final_ridaj(1).ext
  
  eurasiantiger 4 years ago
  
  Copy (2) of 2021-11-11_client_project-name_v2_final_ridaj(1)__FINAL-v2.ext
- zajio1am 4 years ago
  
  > But underscore separates fields, hyphens for space replacement
  But why not the other way, hyphen-minus for separating fields and underscore for space replacement? That seems to me more consistent with how underscores and dashes are used.
  
  zepearl 4 years ago
  
  I fully agree, that's how I do it :)
  my_project-some_activity-this_document-20210923-v02.txt
ModernMech 4 years ago

Kebab case is the often overlooked benefit of prefix notation and semantic white space in programming languages. Honestly the best case of all cases imo.
- kibwen 4 years ago
  
  One glorious day we'll accept programming languages that require spaces around infix arithmetic operators so that we can make kebab case a reality!
  
  JasonFruit 4 years ago
  
  Lisps, especially Scheme with its `x->something-else` convention, have ruined naming in other languages for me.
  
  eCa 4 years ago
  
  Maybe Raku[1] is for you!
  [1] https://raku.guide/#_syntax_overview (see section 1.7.1)
  
  MaxBarraclough 4 years ago
  
  Forth does something like this, by virtue of its reverse Polish notation.
  In Forth, 'words' (which are roughly analogous to functions and operators) must always be separated by whitespace, as Forth doesn't parse out operators the way most languages do. In exchange, you get the ability to use symbols in identifiers, as Forth has no reason to single out symbols like + as being syntactically special. You can even use a number for the first character. (For that matter, Forth will even let you override the usual interpretation of a numerical literal, but that's always struck me as going a bit far.)
  It gives you a + word, analogous to the + operator of most languages [0]. It also gives you a 1+ word, as an (admittedly slight) abbreviation of the sequence 1 +. [1] If you wanted a 2+ word, you could easily define it yourself.
  (This property of Forth evidently wasn't enough to get it to take over the world, but it's still neat.)
  [0] https://www.complang.tuwien.ac.at/forth/ansforth-cvs/documen...
  [1] https://www.complang.tuwien.ac.at/forth/ansforth-cvs/documen...
Raineer 4 years ago

In my work, today's date would be 21K11, to save space over the longer date.
- blackboxlogic 4 years ago
  
  How do you distinguish 21K111 and 21K111?
  
  inanutshellus 4 years ago
  
  Are you trying to catch GP on differentiating hours, were it to be appended to his time format (1st @ 11 vs 11th @ 1am)?
  Notably he didn't promise any, but presumably one'd need a separator... Maybe, per his "K" usage of the month, one'd use the alphabet again. 11am would be "K" again... or lowercase just for giggles?
  I don't think it reads very well, but I also think one'd get used to it pretty quickly.
  
  blackboxlogic 4 years ago
  
  I was thinking January 11th vs November 1st. Maybe their "date" doesn't need/support day-of-month? Or they typod and I should just focus on my work.
  
  apricot 4 years ago
  
  I imagine January is A and November is K, so 21A11 vs. 21K1 (or maybe 21K01).
  
  blackboxlogic 4 years ago
  
  Ah yes, I missed that K was a month.
- onychomys 4 years ago
  
  Are you working in some embedded system with tiny memory space or something? What's the use of saving one character? Just make it YYMMDD!
jjoonathan 4 years ago

> kebab-case
I hadn't heard that before and I love it.
- sodapopcan 4 years ago
  
  If you hadn't heard kebob-case called that before there's a chance you haven't heard SCREAMING_SNAKE_CASE called that before, and I couldn't live myself if I didn't let you know.
  
  inanutshellus 4 years ago
  
  that's hilarious, thanks for sharing that.
  Perennially relevant xkcd: https://xkcd.com/1053/
  
  sodapopcan 4 years ago
  
  Awe, in turn I have never seen that particular xkcd—it's great! I learned to call it "feigning surprise" and I always try and be conscious of it (though I still catch myself doing it from time-to-time).
- Asraelite 4 years ago
  
  Google considers it too violent apparently. In one of their recent changes to their style guide, they started recommending "dash-case" instead.
  https://developers.google.com/style/word-list#letter-k
  
  prepend 4 years ago
  
  This guide is stupid. They recommend not using “janky.”
  
  cerved 4 years ago
  
  Tbf, dash-case is more descriptive. Kebab doesn't mean skewer everywhere
- FpUser 4 years ago
  
  Same. I had tears in my eyes from laughing. For some inexplicable reason it seems incredibly funny.
ur-whale 4 years ago

> 2021-01-01
Yes on the date format.
Saves you so much time.
- zz865 4 years ago
  
  Agreed on dates ordering problem but 20210101 is so much easier to type.
  
  nicoburns 4 years ago
  
  But much less easy to read!
  
  testplzignore 4 years ago
  
  Years that end in a 1 are awful when doing this, especially in October and November. We've had 20211001, 20211010, 20211101, 20211110, and now today 20211111.
  
  zokier 4 years ago
  
  I just tend to use $(date -Is) so I don't need to think what date it happens to be today. I guess -Id would work if you don't want the time part.
- hnburnsy 4 years ago
  
  I don't bother with the century or the dashes, saves time...
  211111_foobar_v1.txt
  I am old enough that I still save before printing. I think it was Lotus 123 that engrained it for me.
jonnycomputer 4 years ago

I've recently shifted sharply toward the dash from the underscore. I find it more readable, and it doesn't require the shift key. However, I do find it useful to use underscores to create groups, e.g. test-001_2021-10-11.log. Including hours, minutes, seconds is still awkward.
- kingcharles 4 years ago
  
  Burn the witch!
- FpUser 4 years ago
  
  Brother in arms. I just posted similar thing below.
- discreteevent 4 years ago
  
  There's a customer for everything. I've just never liked the aesthetics of the underscore. Also if your underscored thing gets put in some document and then underlined the underscores can become invisible.
  
  jonnycomputer 4 years ago
  
  A lot of this is personal aesthetics, for sure. Personally, I am not a big fan of camel casing. In code, I only use it for class names, generally. I don't find it particularly readable, and for filenames, not all filesystems are case sensitive, so best not to rely on case to differentiate files. Camel case does have the nice property of being more compact, as no character is required. That's its main benefit.
  R traditionally uses the . as a legal character in identifiers. Once you get it used to not being syntactic, I found I actually prefer them to underscores.
apricot 4 years ago

I'm of the opinion that kebab-case is the best case for all identifiers, because it's easy to read and to type. As always, Lispers were right all along.
FpUser 4 years ago

I use this style:
2021-01-01_what-happened_who-did-it_possible-reason
jerry1979 4 years ago

I found that some_document_2021-01-01_v03.pdf works best because it keeps the same document next to its other versions alphabetically, keeps them in date order, and keeps them in a sub-day version order.
jaclaz 4 years ago

As a side note, in the good ol' times of ISO9660 level 1-4 and the various mkisofs parameters, an underscore _ which is a CAPITAL -, may have given issues, only for the record/as a curiosity:
https://web.archive.org/web/20151007005513/http://www.911cd....
P.S. should anyone want to see/run the actual batch, a copy has been uploaded here:
http://reboot.pro/index.php?showtopic=18962&page=29#entry204...
ravel-bar-foo 4 years ago

This used to be my default, and then I used Matlab, and "-" was interpreted as subtraction.

Joker_vD 4 years ago

One of the main reasons why Windows used "Program Files" and "Documents and Settings" was to force the programs (and programmers) to deal with paths with spaces. And you know, for the most part it kinda, more or less worked out although of course even today you will find programs that ask you to install them in a folder without spaces in the path.

Rerarom 4 years ago

VFAT and stuff like that actually provided alternate names like PROGRA~1
- beardyw 4 years ago
  
  Yes, I was doing code to quickly read FAT folders (on a micro controller) and got to the bit about filenames more than 8.3. I decided my life was too short (and processing time) to go and sort out what the "real" file name is. Enforced 8.3 as a requirement!
toyg 4 years ago

The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
Sadly such set includes loads of Java programs. If only SUN had shipped a standard way to generate isolated exe files in 1998... but they worked under the presumption that you'd have a JVM already there, because distributing that monster was difficult in dialup times, so you could just hand people a jar; and the enterprise market did not care, since they had webapp servers. Sadly it's an "optimization" that became obsolete very quickly but wasn't rectified until it was too late (java 9+).
- ReleaseCandidat 4 years ago
  
  > The main culprit for space issues is stuff relying on BAT or CMD files, where escaping variables seems to be a black art.
  Actually it isn't, just use double quotes and add a '~'. It's just about the only thing batch files handle better than shell scripts. set "VARIABLE=%~PATH"
dale_glass 4 years ago

And that was a good idea, if only Microsoft also fixed the CreateProcess function, Windows would be somewhat sane in this regard. But somehow nobody seemed to think of it. Seriously, look at it:
https://docs.microsoft.com/en-us/windows/win32/api/processth...
The arguments are a single string. So you want to pass parameters with spaces in them? You've got to add quotes and stuff all of that into a single string. Instead of doing it in a more sane manner, like oh, the arguments to main().
- IiydAbITMvJkqKf 4 years ago
  
  The root cause is that argv isn't a first-class citizen like on linux, but an abstraction. The kernel only cares about a single string argument. If you use main instead of WinMain, the CRT will transform the single string into an argv for you.
  Oh and cmd.exe uses a different escaping scheme than the CRT.
  
  dale_glass 4 years ago
  
  Microsoft is in full control of the Windows kernel, so they can make it care about whatever they want to, and one would think better argument passing would be a nice quality of life improvement. Less nonsense for developers to deal with, and less weird bugs on the platform.
  
  exciteabletom 4 years ago
  
  Sure, but MS values backwards compatibility a lot.
  They aren't going to break existing API or bloat the kernel with a bunch of functions that do the same thing.
  
  Joker_vD 4 years ago
  
  They can either add a new API which almost nobody would use ― because everyone already learned to use the existing one and either reused or reimplemented the MSVCRT's logic so that most of the software parse the command lines the same way; or they can literally break every single program in existence by breaking the interface of CreateProcess ― which is just as likely as Linux breaking the interface of execve(2).
  Giving CreateProcess a new flag so it would to correctly accept "path\\to\\my\\program.exe\0arg_1\0second argument\0argument with literal \" symbol" (with an implicit \0 terminating it) as lpszCmdLine is an easy part; the hard part would be forcing everyone to switch to using it.
  Also, I'm pretty certain this processing happens in the user space, and Win32 API is already bloated beyond any belief.
- naikrovek 4 years ago
  
  maintaining backwards compatibility means maintaining silly decisions, and Microsoft does both.
makecheck 4 years ago

They may have thought that would happen but I saw just as much stuff end up in C:\Windows or \Users or (always my favorite) those “Documents” that are really just “whatever random crap every app wants to put there”.
Avalaxy 4 years ago

Yet in Microsofts own cmd tool I need to put quotes around my path if I want to refer to any files/folders below those folders.
alerighi 4 years ago

That annoys me every time I use a Windows system. It was a terrible decision, especially since both the command prompt and the new powershell doesn't accept like bash a backspace before a space, you have to quote the whole path! I get that most users on Windows don't use the shell, but as a developer I do a lot, and every time it's a pain (no wonder they added the WSL in Windows after the failure of Powershell...)
- rashil2000 4 years ago
  
  Why would they accept a backslash? Backslash is a path separator on Windows. In most Windows programs, you don't even need to escape the space - arguments can contain spaces and it will understand it, like `notepad My file.txt`
  The escape character on PowerShell is backtick, and on cmd it is caret. You don't need to quote everything.

ReleaseCandidat 4 years ago

Still way too many libraries and programs can't handle spaces in filenames.

And shells and other programs still have problems with perfectly legal characters in filenames too, like '!' or ':'.

chrisseaton 4 years ago

> And shells and other programs still have problems with perfectly legal characters in filenames too, like '!' or ':'.
Without asking you to always quote and escape every file name - what alternative is there? If they tried this you'd probably find you didn't like it.
- zeroimpl 4 years ago
  
  Not exactly - the problem is mostly when doing variable expansion. The fact that bash treats "$x" and $x as different is a bit of a design flaw. Of course there's still an issue with evaluating dynamically generated code, but that problem is partly solved by working with arrays.
  
  chrisseaton 4 years ago
  
  I mean how do you want shells to deal with file names with spaces in? Do you think we should have to quote and escape all file names all the time? If not then how do you think it should work?
  
  rcxdude 4 years ago
  
  Shells should treat data as data, and not have the default behaviour be treating it as code (i.e. you should need to do 'eval $x' or some equivilant if you acutally want the string to be treated as a shell command). This would also mean having a real list type, instead of depending on arbitrary seperators in strings. This is exactly how other languages treat it, and it is not a significant challenge for interactive use (in fact, it would substantially reduce the opportunity for suprises when running commands interactively as well).
HenryKissinger 4 years ago

> Still way too many libraries and programs can't handle spaces in filenames.
"It's nothing."
"What do you mean?"
"It's nothing... It's empty space. I never taught the computer how to read empty space!"
"I never taught Virgil how to fly."
danielvaughn 4 years ago

yep, I still don't use spaces. I also don't use uppercase characters. Just underscores or hyphens.
- boringg 4 years ago
  
  Sometimes I break the rule and use uppercase but never spaces.
  
  ptha 4 years ago
  
  I've had issues when moving between Window/*nix file systems, where Windows file names are case insensitive and *nix systems are case sensitive.
  Build script works fine locally on Windows, but then chokes in *nix test server, as it's effectively a different path.
  
  cerved 4 years ago
  
  file names aren't case insensitive, it's the windows API that is
  
  ptha 4 years ago
  
  I assume you mean that the Windows API (standard for Windows apps) is case insensitive, but if using the WSL (Windows Subsystem for Linux) it's possible to get case sensitivity: https://docs.microsoft.com/en-us/windows/wsl/case-sensitivit...
  
  efreak 4 years ago
  
  Even if you're not using WSL, you've always been able to turn on case sensitivity via a registry key. This has not been recommended in the past due to possible issues with windows itself as well as third party software. This history is mentioned here[0]. Everywhere that mentioned a registry key seems to be referring to windows nfs server, not to general file access, however I know that SFU (Services for Unix) installer had an option to do so, so it's certainly possible.
  As of sometime in 2018, fsutil can set specific directory trees to be treated case sensitive in Windows 10 without setting it for the OS. This ability is mentioned here[1]
  [0]: https://devblogs.microsoft.com/commandline/per-directory-cas... [1]: https://docs.microsoft.com/en-us/windows/wsl/case-sensitivit...
  
  danielvaughn 4 years ago
  
  I’ve had issues with git when changing a filename, if the only change is the casing.
jerf 4 years ago

Was recently encoding my Stargate: SG-1 DVDs to move them to plex. I was encoding it on a system other than what was serving it, so I had to copy it. It's surprisingly difficult to "scp" a file with a colon in it directly.
I also love when you're using bash and you have a file with ! in the name, and you accidentally fail to correctly backslash it, you not only get "bash: !rest_of_filename: event not found", but it also fails to add that command line to the history, so you can't just hit up and fix it. You have to actually go to the mouse and copy and paste.
- AnIdiotOnTheNet 4 years ago
  
  It's almost like in-band signaling isn't a good idea or something.
- kerblang 4 years ago
  
  That sounds like... Puzzle time! I had to cheat, sort of, by looking at the man page:
  > Local file names can be made explicit using absolute or relative pathnames to avoid scp treating file names containing ':' as host specifiers.
  So `scp foo:bar user@host:~` fails because it tries to find the host foo. But `scp ./foo:bar user@host:~` works just fine. I feel kind of stupid for not guessing as much.
- philote 4 years ago
  
  Can't you usually just put quotes around the filename and/or path to prevent all those issues?
  Edit: nope, just tried it and scp still sees the quoted filename as a host + path
  
  warkdarrior 4 years ago
  
  That is just lazy programming. If the input "foo:bar" is ambiguous, the program should try both interpretations (HOST:FILE and FILE) and then present the user with a prompt that provides sufficient information.
  "Does foo:bar refer to the local file `foo:bar' (size: 102kB, date: 2021-11-11) or to the file `bar' on host `foo' (FQDN: foo.example.com, IP address: 1.2.3.4)?
  1: local file `foo:bar'
  2: file `bar' on remote host `foo'
  Your selection: "
- koheripbal 4 years ago
  
  WinSCP currently has a bug that crashes if it tries to sync a folder with a space in the name
remram 4 years ago

If you suspect that the file might be handed to a bash script at any point, being afraid of spaces is very healthy for sure.
Pxtl 4 years ago

Colons are a problem on Windows, so it's reasonable to discourage creating files with colons in the name.
mywittyname 4 years ago

Is "!" legal in Windows? I'm pretty sure it is not, but I'm not on a Windows machine to test.

doodpants 4 years ago

I'm not young, but I've been using Macintosh computers regularly since 1990, and even back then file names could be up to 31 characters long, and could include any character except colon.¹ So I'm pretty comfortable using spaces, and sometimes even non-ASCII characters, in file names.

Also back then Mac file names typically did not include an extension, because the file's type was stored as part of the metadata in its resource fork. I remember one time a friend of mine was visiting and was playing around with a paint program on my Mac. Being used to DOS, when she went to save her file, she typed a very short name, and then asked me what the proper file extension should be. I smirked and said, "That's not how you name files on a Mac. THIS is how you name files on a Mac." And then I named her file "Ailsa's Cool Picture". Her mind was blown. :-)

¹This is because the colon was the path separator. But since the classic Mac OS had no command line interface, the typical user would never type or even see a file path written out.

forgotmypw17 4 years ago

All of that was very cool and impressive and extremely user-friendly.
However, I found the lack of a command-line to be restricting.
- badsectoracula 4 years ago
  
  On the other hand Mac had some great GUI programs.
  Sometimes i think that the command-line is a crutch that keeps programmers from learning how to make good UIs.
  
  forgotmypw17 4 years ago
  
  True, but most Mac apps were virtually inaccessible by keyboard, and with the slow cursor rate made them a nightmare for the wrist.

rob74 4 years ago

Well, you should still be afraid! Be very afraid! Seriously: only a few months ago I was confronted with a video encoding tool that didn't work properly when the file names contained spaces - so yes, even in 2021 it's still safer not to use spaces in file names...

nojs 4 years ago

Not to mention most naively written bash scripts!

xdennis 4 years ago

Looks like I'm in the minority. I always use spaces and non-ASCII characters in filenames.

In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics (fata, fată, fața, față, făta, făță, fâța, fâță).

Given that we have to use diacritics, spaces don't seem like a big deal.

rob74 4 years ago

Hmmm, I thought I was fluent in Romanian (born there and lived there for 26 years), but I only know 5 of those 8 words...
- xdennis 4 years ago
  
  That doesn't seem unusual. Only the first 5 are very common.
- theshrike79 4 years ago
  
  According to Google Translate the first two are "girl" and the rest are "face". =)
  
  xdennis 4 years ago
  
  * fata - the girl
  * fată - girl
  * fața - the face
  * față - face
  * făta - was giving birth
  * făță - a small fish, or a child who won't sit still
  * fâța - was fussing
  * fâță - variant of făță
  As you might infer from the first 4, Romanian uses postfix "the" and for singular feminine words you can't tell the difference if you use only ASCII.
  
  qayxc 4 years ago
  
  Google Translate is a horrible tool for "translating" single words or lists of unrelated words.
  Use a proper dictionary for that. The very nature of statistical models makes proper translation without context impossible for these systems, especially when uncommon words and diacritics are involved.
hdjjhhvvhga 4 years ago

So how did you deal with it in the 80s/90s?
- xdennis 4 years ago
  
  As you would assume: use ASCII and deduce from context. Many people still do that.
  That has lead to phantom diacritics: reading letters in unfamiliar words/names based on what you assume they are. For example some pronounce Chirica as Chirică because they assume someone forgot to type the breve in ă.
  
  apricot 4 years ago
  
  I call it the habanero trap. There is no ñ in "habanero", yet a lot of people say "habanyero", probably by analogy with "jalapeño".
- PeterisP 4 years ago
  
  Not sure about Romanian, but for many other languages people essentially came up with transliteration schemes (multiple, incompatible, ambiguous) to squeeze your language into ascii.
  The resulting text was understandable by the "computer people" but not the general population who did not use the networks back then, perhaps somewhat comparable to when some time ago USA parents encountered the "SMS slang" used by their teenagers.
- octorian 4 years ago
  
  Back in the day there were dozens of character sets that were alternatives to US-ASCII. Having once worked on an Email client, I needed to bake in a bunch of translation tables to convert stuff sent that way into UTF-8.
masklinn 4 years ago

> Given that we have to use diacritics, spaces don't seem like a big deal.
There is one big difference: CLI utilities don't usually care about diacritics (though encoding issues can throw a wrench in that), but they care a lot about spaces. So putting spaces in filenames requires properly quoting or escaping parameters, whereas diacritics does not. That makes one-off shell snippets and scripts a lot more annoying (though TBH I tend to shy away from those anyway, these days).
yread 4 years ago

We have a few words that depend on diacritics to be unique in Czech as well - though not as bad as this example - but people just manage without. Hell, I don't even bother installing the Czech keyboard, if I REALLY need it (like in names), I just google for words that have the character and copy it
vadfa 4 years ago

>In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics
That is what context is for.
selfhoster11 4 years ago

So do I. I have a language, and I'm not afraid to use it. My computer should speak it just as well as I do.
- cerved 4 years ago
  
  There's a server at work that name with a non-ascii character. I've run into compatibility issues lots of times where I can't connect. I prefer to just use English with ASCII and be happy
  
  selfhoster11 4 years ago
  
  Server names are different. They are by and large machine-facing identifiers, whereas filenames have a 50-50 split of whether they are machine-facing, human-facing, or both. They makes their support of Unicode a much more critical (and appealing) proposition.
  
  cerved 4 years ago
  
  everything is a file

hirako2000 4 years ago

I never put spaces, and won't go over 32 characters, preferably less than than 16. even when sending a file to my grand mom. that's how deep rooted the trauma is. and yes, it remains an issue with some parsers and what not.

johnchristopher 4 years ago

I still find files on the internet that my browser can't download because too many characters :(.
Edit: can't save, downloading works.
- denysvitali 4 years ago
  
  This is a Windows-only issue AFAIK. It's the same reason why people decide to put their projects in something like C:\dev
  Apparently it's quite easy to reach the 260 chars limit
  
  johnchristopher 4 years ago
  
  No, it's also a Linux issue.
  
  denysvitali 4 years ago
  
  Too many characters on Linux? Quite difficult to reach to be fair. Do you have an example?
  
  johnchristopher 4 years ago
  
  I have been trying to repro with a small nodejs server but either the server cut off the content-disposition filename or firefox truncates it. When I get that in the wild I'll post an update.
  In the meantime:
  $ touch 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 touch: impossible de faire un touch '1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111': Nom de fichier trop long
  https://serverfault.com/questions/9546/filename-length-limit... 255 bytes it is then.
  Firefox cut off at ~217, httpie at 255.
  
  denysvitali 4 years ago
  
  But it's _per file name_. The path could be waaay longer than Windows' one
  
  johnchristopher 4 years ago
  
  Ah, my bad. I thought we were still talking filenames, not paths.
  
  denysvitali 4 years ago
  
  Probably my mistake. You only discussed about file names and I threw in the path thing :)

DeathArrow 4 years ago

If I'm going to use the file in the command line, I won't use spaces, since I don't know what sick bug I might encounter.

alpaca128 4 years ago

I avoid spaces because they make tab completion more cumbersome in bash.
- cerved 4 years ago
  
  100% this
eloisius 4 years ago

Same. For documents and stuff that I use in normiespace I give them friendly names with capitalization and spaces and such, but for anything I'm going to be working on via CLI I try to use filenames that will be easily chunked as "words" when doing things like double clicking it in terminal to select, ^w to erase it, tab completion etc.

3guk 4 years ago

Somehow the OneDrive clients still refuse to allow leading or trailing spaces in the filenames, along with a few other characters that are not allowed - seems to cause quite a bit of user friction at least with the non-tech guys that I work with who are confused about why OneDrive is one of the few file syncing clients that has these requirements....

luckman212 4 years ago

I have had to deal with that nightmare multiple times this year! It was a real head scratcher at first.
icefo 4 years ago

Gdrive the same "issue". I think it's on purpose to avoid files that seems to have exactly the same name.
This can cause user confusion

sieve 4 years ago

This is a UI/UX problem that I only face when dealing with shells and shell scripts. Never had any issues when spawning processes from within languages/runtimes that support sane argument arrays.

sh, bash and cmd.exe are shit. The shell needs serious rethinking.

Joker_vD 4 years ago

I see that there are lots of comments about problems of TAB-completions with filenames with spaces in this comment section and I am frankly puzzled: both Bash and cmd.exe actually TAB-complete those perfectly fine, inserting quoting where it's needed.
- necovek 4 years ago
  
  I seem to remember bash losing preferred escaping when TAB-completing, but can't reproduce it now with 5.0.17.
  Eg. you'd type `ls -l "Spaced [TAB]` and it would turn it into `ls -l Spaced\ Name`. I remember similar annoyances with other special shell characters (eg. single quotes, dollars, slashes), but that all seems to behave sane now.
  
  xyzzy_plugh 4 years ago
  
  I didn't even know this was a thing, but can't say I've ever preferred an escape style. I actually use backslashes a fair bit, usually just with spaces. I tend to reserve double quotes for variable or shell expansion, explicitly.
  
  necovek 4 years ago
  
  It's not so much about a preference, but your cursor would jump about and you'd need to be on the lookout if you wanted to edit the completion (eg. to change the extension).
- sieve 4 years ago
  
  > inserting quoting where it's needed
  You have to remind yourself to do this manually in scripts if you don't want to see lines full of "No such file or directory."
  One of the reasons the shell is broken is because the character they use as an argument array member separator is something that regular people use to distinguish between two words, such as in a file name.
  
  Joker_vD 4 years ago
  
  Well, writing scripts would be much less painful if $VARNAME did not explode into pieces by default. Alas, this ship has sailed long ago.
  
  goohle 4 years ago
  
  IMHO, it's possible to add a flag to bash, which will turn on this behavior, so problem can be fixed, but it will diverge bash from POSIX sh a lot.
- tremon 4 years ago
  
  And where it isn't needed. If you have a path that contains a variable and a space, bash will happily escape the $, making the path invalid. See the following:
  $ cd $HOME $ mkdir my\ dir $ ls my[tab] $ cd / $ ls $HOME/my[tab] ls: cannot access '$HOME/my dir/': No such file or directory
  That error is because when you press [tab], bash changed the path to \$HOME/my\ dir/ but that isn't obvious from the output and I couldn't find a proper way to include the tab-expanded result in the transcript.
  (edit: this is on GNU bash, version 4.3.48(1)-release but I've seen this behaviour for years)
  
  Joker_vD 4 years ago
  
  Depends on the Bash version, I guess? Mine is 4.4.20(1) and when I do "cd $HOME/my[TAB]", it replaces the input line with "cd /home/joker/my\ dir/", and pressing [ENTER] changes the directory to '/home/joker/my dir', as can be seen from the prompt.
  
  akovaski 4 years ago
  
  The variable escaping behavior has existed for a while https://stackoverflow.com/questions/32463052/bash-tabbing-fo... https://askubuntu.com/questions/70750/how-to-get-bash-to-sto... https://askubuntu.com/questions/41891/bash-auto-complete-for...
  And I experience the problematic behavior on my Ubuntu VM. However, I can get the above describe expansion behavior if I run: shopt -s direxpand

necovek 4 years ago

This is a difference between $@ and "$@" (note the quotes):

  $ cat proba.sh 
  #!/bin/sh
  echo "Using quotes:"
  for i in "$@"; do echo "$i"; done
  echo "No quotes:"
  for i in $@; do echo "$i"; done
  $ ./proba.sh "ho ho ho"
  Using quotes:
  ho ho ho
  No quotes:
  ho
  ho
  ho

tomcam 4 years ago

Damn I didn’t know that. Thanks

IiydAbITMvJkqKf 4 years ago

Posix makefiles don't support spaces in dependency names. Not sure about gmake.

Cmake doesn't support semicolons, because everything in cmake is a string, and ; is the list item separator.

PATH is separated by colons, so you can't add directories containing : to it.

boffinAudio 4 years ago

Every week, I encounter a user - just like I did in the 80's - who cannot explain the difference between a file and a folder.

"What do I use a folder for?", they ask, in the same breath that they request "some way to organize things logically".

The no-filesystem movement has worked hard to eradicate this scourge from user experiences, but I fear that this is the devils work. Computer users should know what a file is, and what its for - and they should know what a folder is for, and why they would want to create one to put their files into it ..

But yet: they don't.

It hasn't improved since the 80's. Taking away the users responsibility to understand these things, only makes computing worse. The fact that "special chars in paths" breaks things, also holds this factor into place, imho.

octorian 4 years ago

> The no-filesystem movement
Is that the movement to store all your data as an amorphous pile of crap, and then provide easy-to-use search tools to actually find the content you're looking for?
On one hand, I really like the search tools that come from this. But I still like to actually organize my data, so I can browse it if I want to. Also, these search tools seem to only work well enough on macOS and fall flat on their face in Windows. (and no idea where Linux falls on this)
- boffinAudio 4 years ago
  
  You had me at "amorphous pile of crap", but lost me at 'actually find the content'... ;)
  Meanwhile, I've got a single directory full of PDF files (over 60,000+) which I routinely "ls -alF | grep <search term>" for, and I've also got some PyPDF scripts for doing deeper content search - but yet I yearn for a way to automatically parse the filenames and organize things categorically into a folder tree resembling a word cloud, symbolic links and all .. one of these days ..

davidjgraph 4 years ago

You think space are bad (and yes I'm old enough that I don't use them)... We work with a company that has forward slashes "/" in their trading name and insist on shared cloud directories involving them to be prefixed with that trading name.

As you as you do anything programmatic in/out of these drives it all hits the fan. So I'd add to the original statement - "Avoid 'technical' companies with special characters in their name", it's just not right...

mherdeg 4 years ago

There was some prior discussion about a generational shift here at https://news.ycombinator.com/item?id=28615884 -- there's an idea that people no longer need to know what files or folders are in order to get things done day-to-day with software ( https://www.theverge.com/22684730/students-file-folder-direc... ).

I'm wondering when the first generation of college students will start who have never used a physical keyboard to input text.

notacoward 4 years ago

If putting spaces in file names makes you queasy, try punctuation - especially punctuation like semicolon or ampersand or single quote that's meaningful to shells and such. <shudder>

Also, emoji.

sokoloff 4 years ago

You don’t name your files with extensions && rm -rf?
hutzlibu 4 years ago

Or for more fun, use language specific characters, like äöüß...
And even more fun is, when it mostly works, but then it doesn't and you notice too late.

tiagod 4 years ago

Honestly, this still causes a lot of problems with some Software. I've had friends asking for help with obscure errors that were ultimately caused by the files they were using being on a path that contains a space or special character.

yboris 4 years ago

I've been stuck for years with a bug in my commercial Electron application where images do not get displayed if the folder path has spaces in it :'(

https://github.com/whyboris/Video-Hub-App/issues/667

Any help would be really appreciated!

toyg 4 years ago

Shells are indeed the main culprits for the continued fear of spaces, but not the only ones. A lot of programs that deal with "metadata" which will then generate database tables and stuff like that, still struggle when working with any sort of special character. And the same for anything that, behind the scenes, just feeds text into regexes.

frzj 4 years ago

Just this weekend I learned that the Espressif Framework doesn't like it aswell.

crescentfresh 4 years ago

Our local development environment has evolved to a complex enough sequence of steps to set up and troubleshoot that I spent 2 weeks creating tooling that you can simply point at source checkout locations and the tool will take care to setup that repo.

It broke on the first try on a jr hire's machine, the source checkout location was `C:\source code`.

kreeben 4 years ago

Slightly off topic but I find myself stuck at being "please for the love of god don't use spaces in git branch names" old. Anno dazumal this might not even have been an issue and I'm just cargo culting.

jrimbault 4 years ago

And on that topic, git branches are case sensitive but windows filesystem API isn't. Git branches are materialized on the filesystem as files and directories.
- masklinn 4 years ago
  
  If people actually abuse git branches being CS, odds are good they're also abusing CS in the repository content.
  The linux kernel is one of the offenders, if you check it out on Windows or macOS (which supports CS but remains CI by default) you'll immediately get garbage in netfilter, because it's an habitual user of having different files with names identical but for the casing e.g. xt_TCPMSS.h and xt_tcpmss.h.
- qayxc 4 years ago
  
  The Windows filesystem API supports CS file- and directory names just fine.
  It can be enabled on a per-directory basis like so:
  > fsutil.exe file setCaseSensitiveInfo C:\folder enable
  NTFS had support for this for decades now - it was designed that way to be POSIX-compliant.
  It's shoddy software that lacks support for it, not the OS or the file system.
- jhallenworld 4 years ago
  
  Yep, I recently got bit by this, someone checked in a branch named something like "x<-->y", Windows was unhappy. I think this is a git bug: git should escape these names for the native platform.
  https://stackoverflow.com/questions/1976007/what-characters-...
chrismorgan 4 years ago

I enjoy choosing fun branch names from time to time. A few of them: Russian when a user reported a typo in a Russian translation; emoji (mostly added emoji rather than pure emoji); and my personal favourite, a ~250 character diatribe about a single-character bug I was fixing (~250 after I discovered that Git’s error messages when you cause it to try to use file names too long for the file system are fairly mediocre).

andreareina 4 years ago

Spaces breaking tab completion is still an issue, so, yeah.

ETA: not broken in a technical sense, but having to escape them isn't the best experience. So it's just easier for me to avoid spaces.

JadeNB 4 years ago

Where? It works fine in bash and I think most shells ….
- andreareina 4 years ago
  
  That was a bit of hyperbole on my end, my bad. But you do have to escape the space, which I'm counting as a minor break.

gadders 4 years ago

Where I used to work they had a risk system that created directories on the window server that matched the book name. They had a trader that named one of his books "COM1"...

stavros 4 years ago

I saw this and felt old, but then the comments in here made me realize that the fear\ is%20real.

wglb 4 years ago

I still find them annoying, doing lots of work on the command line. I use this hack:

  #!/usr/local/bin/sbcl --script
  (load "~/.sbclrc2")
  (require 'replace-all)
  (in-package :replace-all)


  (format t "file is ~s" (second sb-ext:*posix-argv*) (probe-file (second sb-ext:*posix-argv*)))
  (let* ((args sb-ext:*posix-argv*)
    (orig (second args) )
    (newfn (if orig
      (replace-all orig "(" "-") 
      orig))
    (newfn1 (replace-all newfn ")" "_"))
    (newfn2 (replace-all newfn1 " " "-"))
    (newfn3 (replace-all newfn2 "&" "-"))
    (newfn4 (replace-all newfn3 ":" "-")))
    (when orig
 (format t "renaming \"~a\" to \"~a\"~%" orig newfn4)
 (multiple-value-bind (new-name old-truename true-newname)
  (rename-file orig newfn4)
   (format nil "new-name ~a old-true ~a new true ~a" new-name old-truename true-newname))))

GuB-42 4 years ago

Spaces in file names break half of the shell scripts I have encountered.

And it is one of the biggest reason I hate Unix shells as programming languages, it is a minefield. In fact I think that after a dozen lines, Perl is a better option. It has most of what shells are good at (i.e. running commands), but saner and more powerful.

ndesaulniers 4 years ago

my god, I was simply trying to loop over every file in a dir and zip it in a bash one liner. Of course, some of the inputs had spaces in the file names. What an exercise in frustration!!!

jasode 4 years ago

Yes, spaces in filenames introduce edge cases and bugs that people are not always aware of.

E.g. Here's a random StackOverflow q&a about a Git pre-commit hook where the top-voted answer does not properly handle filenames with spaces : https://stackoverflow.com/questions/2412450/git-pre-commit-h...

However, the 2nd and 3rd most upvoted answers do mention "-z" option to handle spaces.: https://stackoverflow.com/questions/2412450/git-pre-commit-h...

dncornholio 4 years ago

Remember when we put + instead of %20? Spaces in URL's are still a nightmare IMO. I still get strange access log entries where some encoding went lose, especially in heavy Javascript enviroments.

Same goes for capitalisation. All filenames should be lowercase.

Maybe it's not strictly necessary, it can avoid headaches.

necovek 4 years ago

Plus sign actually came from https://en.wikipedia.org/wiki/Query_string#Indexed_search

distant_hat 4 years ago

I had a guy in my team use forward slashes in filenames. Terrible idea, caused all sorts of weird issues.

mrweasel 4 years ago

But nice for testing. I spend a few month on Windows while doing a Django project and found a number of bugs no one else discovered because they used Mac or Linux.
zokier 4 years ago

Did you mean backslashes? I don't know if any filesystem/OS supports forward slashes in filenames
- kps 4 years ago
  
  OS X does in the GUI; they're isomorphic to ‘:’ at the UNIX level. (The Mac used ‘:’ as the directory separator.)
  
  rootbear 4 years ago
  
  And a : in a file name at the GUI level gets turned into a dash! I just tried to name a text file "Foo/Bar 10:01.rtf" and it changed it to "Foo/Bar 10-01.rtf"!
  
  kps 4 years ago
  
  In that case the GUI is merely changing the file name you type; in a shell you'll see it as "Foo:Bar 10-01.rtf".
danachow 4 years ago

How was this possible? None of the mainstream operating systems allow this.
- distant_hat 4 years ago
  
  via GUI in OS X.
  
  danachow 4 years ago
  
  Ah so that’s not really putting a slash in the name on disk - finder is just displaying the colon that way - it substitutes with a colon for historical reasons that have to do with pre OSX MacOS (but you can see if you create a file from a program or the command line with a colon in it, it will display as a slash in finder). It shouldn’t cause any problems on its own on the system - but the colon is troublesome if you have to interact with DOS/Windows lineage machines.

alerighi 4 years ago

It's not a matter of being afraid, spaces in filenames are annoying.

I mostly use the shell and navigating in directories with spaces is annoying, you have either to quote it or put a \ before each space. You also have to remember to quote everything, and in bash that can become complex, you start adding quotes everywhere to solve problems caused by spaces (or other special characters like *) in filenames.

So I prefer to not use them, a simple _ is as readable as a space. Only thing is that spaces gets rendered better on graphical file managers, but... that could have been solved (and can still be solved) by simply adding an option to render a _ as a space graphically if there is no ambiguity. I don't care that much since I don't use graphical file managers that much.

shockeychap 4 years ago

Maybe it's just me, but it always seemed like prohibiting spaces and other special characters was a reasonable way to avoid unnecessary complexity (and the bugs that accompany it) when parsing and navigating directory trees and files.

I'm old enough to remember working with 8.3 filenames in DOS, and while the length limitation was maddening, the space part never was. Then Windows 95 came out and all restrictions were thrown out.

Why couldn't we just have a file system that robustly supports long filenames, including variable length extensions, while prohibiting certain special characters - namely spaces, slashes or any directory denoting characters in files, and characters that have special meaning in regex context? (brackets, asterisk, etc.)

tgv 4 years ago

By coincidence, I found another reason just two days ago. A web app lists uploaded files’ names, and (in a rarely used context) lets the user search for them. One user has copied a file name from the web page, and pasted it into the search box, but got no results. Turned out that the file name contained two consecutive spaces, which the browser turns into a single space, hence no match. Every layer between the user and file system can do something unexpected.
kasabali 4 years ago

Related: David Wheeler's Fixing Unix/Linux/POSIX Filenames
https://dwheeler.com/essays/fixing-unix-linux-filenames.html

cmg 4 years ago

I nearly gave up on learning newer front-end JavaScript stuff like React & webpack and so on a few years ago because of spaces in paths.

node-gyp doesn't like it when there's a space anywhere in your working path. Stuff I was messing around with was all in ~/Code Projects at the time, and using npm install on some things just broke. Looking back, I definitely could have done a better job parsing the error messages but still...

There's an issue but it was closed in 2018 as "The workaround is to use a path without blanks" https://github.com/nodejs/node-gyp/issues/439

nvilcins 4 years ago

Tangentially, I frequently add dates to filenames to keep things organized. And _always_ in the `YYYYMMDD` format for clarity and technical reasons; `DDMMYYYY` (or God forbid the Americans' `MMDDYYYY`) never made much sense to me.

wglb 4 years ago
I do this so often that I have an emacs macro or two that helps me out:
```
  (defun mdy ()
    (interactive)
    (insert (format-time-string "%04Y-%02m-%02d")))
```
That inserts the "proper" date format (e.g., 2021-11-11) at the current point.
Then to create a date-stamped file name:
```
  (defun file-mdy (file-name)
    (interactive "sbasename: ")
    (find-file (format "%s-%s.org" (format-time-string "%04Y-%02m-%02d") file-name))
    (save-buffer))
```
And a few others.
Nobody seems to misunderstand this date format. US folks might find it annoying, but understand what it means.

thriftwy 4 years ago

I have had a huge music library on my RAID, and naturally it had a lot of spaces, and non-ASCII, in the file names.

It's cumbersome-ish, but can be made to work.

Then there's shell injection via files containing a newline character in their name...

pixelbeat__ 4 years ago

POSIX portable file names were defined not to have spaces, and just contain '[[:alnum:]_./]'.

The findnl script as part of fslint identifies problematic patterns, and has 4 levels of stringency, with "POSIX" being the most stringent. https://github.com/pixelb/fslint/blob/master/fslint/findnl

snvzz 4 years ago

Spaces in filenames were a mistake to begin with.

Spaces are used to separate parameters in the command line. There's also no real need for filenames to support spaces.

nomel 4 years ago

Or, one could claim that the poor parsing of a text interface shouldn't dictate the for-human names of files, especially when an exceedingly small percentage of users deal with that text interface.
But, of course, if you mix the abstractions of metadata (filename) with location, things won't be trivial.
jfb 4 years ago

The filename belongs to the user. Therefore, it is incumbent on the computer to adapt, not the other way around.

Waterluvian 4 years ago

Even if libraries all handled it, I’d still personally avoid spaces because spaces get semantically used to separate tokens and I see file names as tokens.

rsync 4 years ago

acme.sh - a shell script that I use to create "Let's Encrypt" SSL certificates - creates and maintains directories with asterisks in them:

https://github.com/acmesh-official/acme.sh/issues/1408

This is the sysadmin equivalent of piercing your nose just to make your parents mad.

kazinator 4 years ago

Spaces in file names are a poor idea. File names are identifiers, not titles.

Let's test something: http://example.com/my silly webpage.html.

Hey look, HackerNews just broke a URL with spaces in it. And it's written in a Lisp dialect and all; it's not some Unix job cobbed together with shell, sed and awk. The language has a string data type, and strings are passed to functions without word-breaking interpolations taking place.

You know what else breaks on spaces? Basic everyday gui text manipulation.

Suppose that in a block of text we have the sentence:

> Please look for the Holiday Schedule 2021 file.

If you double click on any part of the name like Schedule, pretty much every text widget on the planet will just select only that word, and not the entire filename.

However, if you have:

> Please look for the holiday-schedule-2021 file.

There is at least a ghost of a chance that a semi-intelligent GUI can pick that out as a word.

There exist good reasons to keep identifiers as clump beyond just command line shells.

It's why we need encoding like %20 in URLs that never pass through a shell script.

Jenda_ 4 years ago

I don't use spaces, because I want to be able to run ad-hoc shell one-liners when working with my data without worrying about quotation and similar stuff.

I also don't use :, as I have ran into problems with both Bash and its completion and FAT FS. Unfortunately, I routinely have timestamps in filenames, so I need to use +%F-%H-%M-%S instead of simple +%F-%T.

One thing has improved, though: I have not run into problems with ěščřžýáíé (which my language is full of) for maybe a decade, except on OpenWRT where space seems to be scarce to support non-ascii.

Edit: I now remember one problem, getting images for a website from an OS X user, which used combining characters instead of direct code points (https://en.wikipedia.org/wiki/Unicode_equivalence#Example), but HTTP requests got normalized in some browsers, leading to strange 404s.

apricot 4 years ago

That's funny because the first operating system I used (Apple DOS 3.3) was very liberal about file names. There was a 30-character limit which was a lot, and it didn't mind spaces in file names. Even control characters were fair game, which made things fun when you accidentally inserted a ^A in a SAVE command.

mikewarot 4 years ago

File names shouldn't have anything except a-z,0-9,_ and perhaps a -. No unicode, no spaces, no nulls.

It's not fear that keeps me from using spaces in file names, it's habit.

If we're going to play this dangerous game, from now on I'll figure out how to use nulls (\0) in my file names, and make all the C/C++ programmers cry.

necovek 4 years ago

I don't use spaces because it's so much faster to type filenames out (including with TAB-completion) in the terminal.

I do, however, use Cyrillic (UTF-8) in filenames, and I regularly try out if moving a file into ASCII-path will let some programs open it (half the time it's that when I am having trouble).

Decabytes 4 years ago

It's just such a pain in the butt to work with files with spaces. In a script it's fine b/c I just surround it in double quotes, but on the command line I hate having to escape the spaces.

This might already exist, but I wonder about a terminal that was really just a multi-line repl to a language. It would be preloaded with libraries that replicated all the features of the gnu core utils, but instead of calling grep like normal, you called a function like grep("args"). The advantage would be that you had access to a full blown programming language at all times. So when you needed to do something more complicated you would still have access to all the standard language features. And when you didn't need that, your canned core utils like functions would work

floatingatoll 4 years ago

Coming from web-heavy and perl5 backgrounds, it's insane to me that people don't treat filenames and arguments and environment variables as tainted user input, and just blindly trust properties about them like "does not contain whitespace or control characters".

Aulig 4 years ago

I had to move my development folders because you can't develop Android apps if your project path contains a space. Not sure where the issue is, if it's gradle or something else.

Edit: thinking about it again, it might not have even been the space but the exclamation mark in my path. Or both.

ur-whale 4 years ago

If any of you reading this have to deal with very large scale data pipelines for data science / ML type processing, and if "don't use spaces and weird chars in file names" hasn't become second nature by now, let me just say: you are very, very brave.

intrasight 4 years ago

My first job as a SW Eng was in 1989 in the nuclear industry. Our folders and files were limited to 8 letters. So names were effectively acronyms. It was actually pretty awesome. Clean and concise. Years later, I still remembered the whole folder structure.

Vrondi 4 years ago

If you're in tech long enough, you can be traumatized by anything. Like the time a vendor-supplied system decided after an update that nothing could have a hyphen in the title, and a lot of existing content just... broke at once. Fun times.

glandium 4 years ago

Spaces in file names are a nightmare in Makefiles.

necovek 4 years ago

Not if you are careful (a bit like "$@" vs $@ in shell scripts).
Edit: replace $@ with quoted version which actually changes the behavior (I was wrong that the difference is between $* and $@).
- chrismorgan 4 years ago
  
  I don’t think it’s fair to claim that any Make implementation supports spaces: there are too many fundamental bugs and breakages, so that lots of rather important Make functionality is off-limits if any of your file names will have spaces.
  https://www.cmcrossroads.com/article/gnu-make-meets-file-nam... explains the situation in GNU Make in 2007 (and I don’t think it’s changed since then, though jgrahamc especially could correct me). Not being able to use such features as $^ and $(patsubst) is severely debilitating for all but the simplest of makefiles.
  
  necovek 4 years ago
  
  That's a fair point, thanks!

xenocyon 4 years ago

Not exactly spaces, but I have been bitten by something like this at my work quite recently. A Confluence page with special characters in the page title was working fine for a while. At some point there was a Confluence version update which made the page URL broken (and apparently unrecoverable, or at least not easily recoverable).

One way to look at it is that people of a certain generation eschew spaces because the tools of their formative years simply couldn't handle spaces - but another is that the olds have learned that generally erring on the side of KISS ("Keep it simple, stupid!") isn't a bad idea.

rkangel 4 years ago

Software engineers - particularly of the more embedded variety - absolutely still have this problem.

The main culprit is GNU Make which does not cope with spaces in filenames. As far as it is concerned an array is a string separated by spaces so it gets very confused. Yes there are some partial workarounds, no none of them consistently work. You learn very quickly to check all code out in a file tree with no spaces in it, otherwise builds can randomly break in strange ways. It's not always clear up front whether Make is going to be involved somewhere in the build, so it's just easier to be safe.

phreack 4 years ago

My username has been my name which has an accented character and has broken countless Windows apps every year since forever, so I just keep a C:/Programs folder where I run stuff. You should never not fear filenames.

ASalazarMX 4 years ago

I am overly aggressive with spaces and special characters in filenames: I use them everywhere and report a bug when they cause errors, because they shouldn't in this UTF-8 age.
I still don't use the special character of my name in my username because that has caused me many hard to fix troubles. Think "cannot recover user password because this user doesn't exist".
efreak 4 years ago

I use c:\programs too, but for different reasons. C:\Programs is for portable applications that don't get installed, can be directly overwritten, and consist of at most two files with relevant names. As a bonus, I can run such programs directly from the run menu. C:\Programs\procexp for Process Explorer, for example.

student2k 4 years ago

I recently find out a windows folder can't end by a space.. But python for example you can create this folder 'example ' every file you create in this folder will be inaccessible, and impossible to delete.

alephan 4 years ago

I've never created a filesystem entry name with a space. Mainly because fear and when fear is not proven, "\" looks so ugly. But I think I'm even worse, I dislike capital letters too.

shoto_io 4 years ago

On a similar note: “it makes sense to add a date to a file name” years old.

NelsonMinar 4 years ago

Nothing old about that; lots of stuff is still broken. What are the odds Homebrew works if installed to a directory with a space in the name? Maybe the core brew manager itself, but all the packages?

jrootabega 4 years ago

I tend to follow a Postel-like system when it comes to this. When I write a script I'll usually get paranoid and make at least token efforts to handle spaces. Which I will then never, ever use.

v-erne 4 years ago

I have come back to this thread, which I have spotted and forgotten something like two days ago, to say that just like a minute ago one of new Jenkins jobs that I added failed because I named the item using space and some custom Gradle/Maven magic tool failed to load one of its own auto generated files (I could tell that space was the culprit because error message printed only second half of item name).

How can I not be afraid of spaces if this happens like every other day with every other custom tool ...

reaperducer 4 years ago

Spaces are still not "permitted" in URLs.

Browsers will take http://example.com/some name.pdf and automagically turn it into http://example.com/some%20name.pdf, and deliver the goods without a problem. But having that space in the URL is still out of spec, and will cause your web page to fail validation, even though it works fine.

prepend 4 years ago

Let me tell you how much of a pain in the ass that my employer forces spaces in the corporate OneDrive directory.

PS-Microsoft is horrible about stupidly named folders being created and dumped in there.

efreak 4 years ago

Depending on the specific issue, the `subst` command may help you. If the OneDrive folder itself has a space in the path, or a necessary subfolder does, you can give that folder a drive letter instead.

1970-01-01 4 years ago

I'm still afraid of any non-8.3 filename.

https://en.wikipedia.org/wiki/8.3_filename

shadowgovt 4 years ago

And honestly, it's a good fear to have; there are contexts where it still just doesn't work.

Last I checked, the standard answer for GNU make is "Spaces are expected to break the tool, that's working as intended, it will never be fixed." And because we build our towering edifices of software on the pillars of the past, I can't guarantee to you that a project of arbitrary complexity won't try to cram a list of filenames through a make script.

uncomputation 4 years ago

I don’t think this is so much an age thing as a programmer thing. Old people will still name files all sorts of things, and a lot of young programmers today avoid spaces.

sclangdon 4 years ago

If you're developing on Windows, I find a good way of dealing with this to convert paths to short format before using them (E.G. GetShortPathName in kernel32.dll).

maydup-nem 4 years ago

Not afraid, but typing a dash in the terminal is easier and shorter than typing a reverse slash and a space. Spaces are kind of a pain in the ass in the terminal, tbh.

ezfe 4 years ago

Quotes around the path is easier and avoids any issues - but tab completion and drag and drop files into terminal handles most cases for me.

sixdimensional 4 years ago

This seems like a case for an axiom I hear infrequently, but I think comes up a lot - things that seem like they should be simple and easy, but are in fact difficult.

foxrider 4 years ago

I must be nightmare customer, because I've always been exploiting my ability to use filenames in full UTF-8. I'm that guy that sends .pdf to your website.

enriquto 4 years ago

Why stop here? Why not put spaces in your variable names also? Allowing spaces only in file names and not in variable names is short-sighted when not inconsistent.

kabdib 4 years ago

My proposal for a shell on the Mac, in the late 80s, was:

- Spaces in filenames get transformed to non-breaking spaces by the filesystem;

- The filesystem treats nbsp as equal to space (just as case-folding treats A=a, B=b, etc.)

Now, argument parsing, mouse double-clicks, etc. all respect filenames as "words", and the output from things like 'ls' just work.

(Yes, I'm well aware that there are case-sensitive filesystems out there. I'd forgotten that iOS was one of those).

meshaneian 4 years ago

As a software engineer, I require testing of paths and files in spaces, and forbid the use of spaces for any system generated file possible to make cli easier.

totetsu 4 years ago

It messes with tab completion in bash is why I avoid spaces

codetrotter 4 years ago

I do it the other way around. I used to be afraid of spaces. But I have come to realize that it is better to learn sooner than later which pieces of software is in such a bad state that they aren’t handling spaces correctly.

That being said, even after all these years I sometimes need to try a few times in order to get the quoting and the escapes right when communicating names of files with spaces through multiple layers of software.

gorgoiler 4 years ago

I like to store data on USB flash drives. After being left to mature for a few years in a humidity and temperature environment, you get some really interesting and complex byte streams where your original file names used to be.

Often they are not even valid UTF8 which, when you uncork the filesystem for the first time in a decade causes the most delightful crashes. The more years the better the aroma.

mhd 4 years ago

I'm »still tempted to write umlauts like 'Mot"orhead' old.«

But also a "use a font that has a proper capital ß" hipster.

vbg 4 years ago

Spaces in file names are a bad idea because spaces delimit the name of separate distinct files,

At least in my crazy old illogical head anyway.

vbg 4 years ago

File names should be long enough to clearly communicate meaning/purpose/context, no more no less.
- koziserek 4 years ago
  
  .doc
  
  koziserek 4 years ago
  
  och my emojis didn't display, sorry
  
  iknowstuff 4 years ago
  
  Hahah how ironic.

cabaalis 4 years ago

I'm hoping to one day be "Windows adds user root folder to the quick links in explorer by default" years old.

jonathanoliver 4 years ago

I always format my filesystems (macOS) as case sensitive and I'm surprised by the software that has a hard time with that.

On Unix/Linux we've grown up with case sensitive by default but everywhere else it still seems to be a problem now and again.

I should qualify this...I'm en-US so I have no idea what the experience is like for anyone else.

pseingatl 4 years ago

You need them for URL's. Running a stand-alone web page maker using Rust. Document structure:

    [Introduction](./Introduction.md)\\
    [Chapter One](./chapter one.md)\\

Crashed on trying to deal with building html when there are spaces in the file name. It is still an issue.

harshadwaj 4 years ago

I have been following the guidelines from this presentation for all my filenames, everywhere and it has been working well so far - https://speakerdeck.com/jennybc/how-to-name-files

alanhaha 4 years ago

Today, WSL will try to add PATH in Windows to PATH in Linux. So if you install something like NodeJS in Windows, and run node in Linux, it will try to call /mnt/c/Program Files/nodejs/node.exe and say "no such file or directory: /mnt/c/Program".

jl6 4 years ago

I had half a feeling that the warning against using spaces in names pre-dates computing, but after a little research into library call numbers and archive accession numbers, which turn out to have both historically included spaces, I have found no evidence to support this feeling.

hackbinary 4 years ago

It seems to me that many of the problems associated with spaces in filenames are due the OS assuming that a space signals the end of a command or filename.

Maybe we ought have to a different character signify the end of a name? Or signfiy a option section, or the next option section of a command?

oytis 4 years ago

In the shell spaces have to be escaped which is annoying. This doesn't change with age I think

fragmede 4 years ago

And I'm older than Google. If you want some hilarity, newlines are allowed in filenames as well (\n, \r, \r\n). Try getting bash to handle that! (It's possible, though annoying. try redirecting to `while read line` in addition to xargs -print0 hackery)

grae_QED 4 years ago

I've never had any problems with this. At this point, it's second nature for me to either use underscores for spaces, or camel caps if there aren't any single character words like 'i' or 'a' in my desired file name.

Pxtl 4 years ago

Yes, but working with filenames with spaces in them is a huge PITA in command-line tools, because you have to quote everything. The ergonomics is just really annoying.

Personally I wish console shells had chosen another delimiter than space, but here we are.

bborud 4 years ago

Not obeying the "Robustness Principle" in software is just poor engineering.

https://en.wikipedia.org/wiki/Robustness_principle

vertere 4 years ago

Definitely applicable here. There's no way we're going to eliminate all problems with spaces etc, so why invite trouble.
I wouldn't say it's always poor engineering though, especially the 'liberal in what you accept' half.
- bborud 4 years ago
  
  Yes, you have a point there, but in this case would being liberal in what you accept be to accept filenames with spaces or (arguably) doing filename handling correctly (ie accept filenames with spaces)?

neilv 4 years ago

I'm apparently in the minority of people who know how to write shell scripts that have a chance of working correctly with filenames with spaces in them... and that's not the only reason I avoid spaces in filenames. :)

qwertox 4 years ago

I have experienced a person using a space in a password for Windows login.

I still don't know how to process this emotionally. Either it is somehow naively really genius, or stupid.

In any case, it scares me, mostly because it is a non-IT person.

anodyne33 4 years ago

Reminds me of the time I watched a coworker's head explode when he tried to extract an archive (from a 'Nix environment) on his Windows machine and was indignant about getting duplicate filename errors.

anodyne33 4 years ago

As a Windows guy case still seems like a weird thing to worry about.

neogodless 4 years ago

I work in Azure Data Factory, and there are places where a space in a name will cause you difficult to troubleshoot errors. But I can never remember where. It's not universal. So I just avoid them entirely.

goto11 4 years ago

I still use the "web safe palette" when choosing color codes for CSS

JadeNB 4 years ago

So, born today, eh?—says the guy who still regularly runs into build scripts that cheerily command that they be run from directories without spaces, since that's easier than proper quoting in the script.

Pensacola 4 years ago

I'm newly afraid to use emojis in domain names: https://tinyprojects.dev/projects/mailoji

AdamN 4 years ago

The meta point here is that spaces are the type of thing that work fine ... until they don't. This class of bug is best avoided entirely, especially if there is an easy workaround (not using spaces).

anovikov 4 years ago

But it still breaks in so many situations and becomes a pain in the ass in so many other ones! I HATE people who use spaces in file names. For me it is a sign of a "deeply nontechnical person".

RickJWagner 4 years ago

Oh, yeah. Me too!

Except nowadays I worry more about user names that get fed into collaborating applications (with different edit criteria) and password characters (again for systems with differing, strange edit rules.)

lostgame 4 years ago

I name almost everything with underlines still. I think it’s a programming habit.

Although lately I have started saving my Logic Pro files with spaces, simply because I prefer it to be the name of the song as-is.

123pie123 4 years ago

I still use the Netbios limitations (15 Characters) when naming servers

bryanrasmussen 4 years ago

I'm “still afraid to use spaces in file names” wise, dammit!

nocman 4 years ago

I would say I'm "wise enough to not use spaces in filenames".
It's not about fear, it's about making good decisions, and avoiding unnecessary complication.

fallingfrog 4 years ago

No way I would put anything but a-z, 0-9, and underscore in any file name. Too many stupid ways it can go wrong. I guess I have very little trust in my fellow programmers!

xvilka 4 years ago

Spaces in path are a pain for the shell autocompletion, since you have to escape them by using either "" for the whole string or use the "\ " instead.

swayvil 4 years ago

Me too. Afraid of dashes too as they might be interpreted as minus. I use a lot of underscores __ _____ _ _ _

Weirdly, my friend hates underscores. But he's a baseball fan

chrisBob 4 years ago

I know I can put spaces in file names, but \ is one of the characters I still can't touch type, so I still hate dealing with them in the terminal.

ajsnigrutin 4 years ago

ascii, no spaces for me

i still get issues with old one-off scripts, that still work, and I forgot to properly quote stuff... plus the urls are pain in the ass with the %20;s.

vbezhenar 4 years ago

[0-9A-Za-z_-]+ for me.
- lkuty 4 years ago
  
  Same here and most of the time it's even just [0-9a-z_]+ It's simple and there are no suprises around the corner

rapind 4 years ago

I wonder why "space" wasn't always simply treated as another character. To save a couple bytes back in the 50s (when it mattered) I assume?

shaoner 4 years ago

Any shell script that uses files should use double quotes for at least the variables: `mv $1 $2` is not safe, should be `mv "$1" "$2"`

morpheuskafka 4 years ago

I'm 19 now and learned this advice from my dad growing up. Still run into situations in my IT work and programming stuff where it makes a difference.

ubermonkey 4 years ago

Our tool has no issues with spaces in fields, but we still advise users not to do it because other systems OFTEN STILL DO, in the year of our lord 2021.

authed 4 years ago

I try to avoid spaces and special characters because issues still happen to this day (just yesterday, I had an issue with a file with an accent in it).

msoucy 4 years ago

My coworkers still don't quote strings in their bash scripts, even when they're paths... and yet they wonder why everything falls apart.

rvense 4 years ago

There was a Discussion yesterday at work about allowing quotation marks and semicolons in some user-set titles. We use Mongo. But I empathize.

spurgu 4 years ago

I'm not "afraid" of it, I just think it's unnecessarily compicated to work with spaces in filenames on the command line.

amelius 4 years ago

You should be still afraid. Many commands such as Unix "xargs" don't work properly with spaces if the right flag is omitted.

darepublic 4 years ago

If you're working on cli this is reasonable

zibzab 4 years ago

Why stop at spaces?

An old prof of mine used to send emails where the subject line was always a valid identifier in C.

Hello_dear_students_where_are_your_reports_

wruza 4 years ago

That identifier is clearly too long.
MISRA C:2004, 5.1 - Identifiers (internal and external) shall not rely on the significance of more than 31 character.

deepsun 4 years ago

All because we use programmatically interfaces that were intended for humans to write: command line, sql, html, email headers.

qayxc 4 years ago

It's worse than that. Whitespace is a hellish invention in the world of computers: there are multiple characters that may or may not render as whitespace with no way to distinguish them by just looking at the output.
Yet to the machine (script, shell, program, ...) it matters a lot, since u0020≠u0009≠u00A0≠u2000≠u2001, etc. whereas the aforementioned codepoints render like this: " " (and yes, that's indeed the five codepoint in that order - at least I typed them that way).
(Ab)Using whitespace like that can lead to all sorts of funny business, not just when dealing with shell scripts and variable expansion.

imchillyb 4 years ago

This is why \Program Files, and \Program Files(x86) exist as they do. With spaces, and strange characters, in the name.

slmjkdbtl 4 years ago

Can someone convince me to not use spaces in music, film, and book files where they have a "standard title"?

duxup 4 years ago

Some react scripts freaked out on me recently because my login (and thus user folder) in windows contained a space.

fortran77 4 years ago

I think people who use a terminal interface, regardless of OS, don't like spaces in file names. I avoid them.

DonHopkins 4 years ago

Then you must also be "still afraid to write Python instead of Bash scripts" years old, too.

hajile 4 years ago

I dislike constantly having to backslash escape files on the command line, so I use dashes instead.

jimnotgym 4 years ago

I won't use a space if I think I may need to address that file from the command line...

dukoid 4 years ago

I'm "still afraid to use more than 8.3 characters in file names" years old!

armandososa 4 years ago

I'm "8 characters max plus a 3 character extension in your file names" old.

adfm 4 years ago

Kids these days will say “What’s a file name?” and mean it. Typing? That’s for the olds.

shmerl 4 years ago

Never use spaces in file names. It shouldn't depend on age, it's common sense.

douglaswelch 4 years ago

Yep. Me too. Early bad experience with spaces in file name and Unix cured me of that.

mrb 4 years ago

Sort of related, but here's a joke: Windows 95 does support long filena~1

mindslight 4 years ago

I\ am\ not\ afraid,\ I\ just\ do\ not\ see\ how\ it\ benefits\ my\ quality\ of\ life.

kazinator 4 years ago

The nice thing about spaces is there are so many to choose from, thanks to Unicode.

rndgermandude 4 years ago

I still feel slight unease sometimes when using more characters than 8.3

Damn, I feel old now :P

timakro 4 years ago

Maybe if we'd do it more software would actually learn to deal with it.

billpg 4 years ago

"You need to add --print0 to your find call and -0 to your xargs."

hknapp 4 years ago

Literally just fixed a bug in our software because of an issue with spaces.

pulse7 4 years ago

I'm still afraid to use national specific characters in file names...

matchagaucho 4 years ago

Keep%20the%20names%40and%20links%20readable%20or%20submit%20to%20encoding

comeonseriously 4 years ago

Without exception, I never ever ever use spaces in filenames. Ever.

MisterTea 4 years ago

2021-11-11_I_have_absolutely_no_idea_what_you_are_talking_about.txt

joshlemer 4 years ago

I don't know that this is really hacker news material guys...

jmull 4 years ago

This is a general issue to this day. So that isn't very old.

forgotmypw17 4 years ago

I'm "whitespace as syntax is stupid" years old

canjobear 4 years ago

Python has made me afraid to use hyphens in file names

bcrl 4 years ago

The Amiga supported spaces in filenames in 1985... =-)

bravetraveler 4 years ago

Admittedly trite/unhelpful comment: avoid xargs

analog31 4 years ago

I\'m%20still%20afraid%20to%20use%spaces%20too.

mgdv 4 years ago

Years of Java has me seeing the world in camel case

throwawayffffas 4 years ago

If a filename doesn't match \w+\.\w+ I hate it

NoblePublius 4 years ago

I love it when characters like | break OneDrive

adulion 4 years ago

I don’t even use spaces in csv column names

oshiar53-0 4 years ago

Yet another reason to ditch Make /s

mindvirus 4 years ago

Heck, I'm still afraid to use caps!

zwieback 4 years ago

Anything more than 8.3 is for sissies.

sva_ 4 years ago

What about long filenames and paths?

ineedasername 4 years ago

Instead of spaces I just use U+2215

antiquark 4 years ago

You mean, I'm linux years old?

meepmorp 4 years ago

This is much older than linux or gnu.

makapuf 4 years ago

Well, I'm using makefiles old

glitcher 4 years ago

Wait, what's a file? :P

cbushko 4 years ago

Base64 is your best friend!

Havoc 4 years ago

I_promise_I'm_not.

LennyHenrysNuts 4 years ago

Me too, I never do it.

luke2m 4 years ago

I’m 15. I am as well.

amitaibu 4 years ago

I can relate! :)

roody15 4 years ago

me too… still use underscore all the time.

douglaswelch 4 years ago

Yep. Yep yep.

ricardobayes 4 years ago

You can now?

uwagar 4 years ago

or Capital letters

ctur 4 years ago

i feel so seen

trudler 4 years ago

tbh using spaces in file names is still stupid.

HNo 4 years ago

Anyone else totally fine with spaces in filenames? I use to rip a lot of CDs back in the day, and never had an issue with the spaces in the file names.

01 - Metallica - Metallica - For Whom the Bell Tolls.mp3

Names like that were common, and had many spaces.