It's not really doing that though. Nc+head/cut/grep are and youtube-dl is grabbing the data. Bash in this case is just orchestrating the order of communication between the tools doing the actual work, as a normal Bash script.
The post is about doing all of this in pure bash builtins like /dev/tcp and bash functions. Not about gluing together tools which do the work.
Yeah, it's a nifty little tool. On the other hand when a really interesting post on pure bash exposition comes up the comments section turns to talking about plain bash scripts they've made or know of instead. There are several normal scripts replied in this post and a few others via links, more than actual conversation around what the post is actually about (so far). That's all I was commenting on, not the value of the tool. It's a hill :).
Even the post isn't using /dev/tcp, but compiled a C file into bash "loadable builtin" (which is something I learned today). It still feels kind of cheating to me tbh.. But cool enough!
See the original GitHub post for how this post is not about scripting other binaries with Bash but pure Bash implementations of functionality. The posted script e.g. does not call nc or grep to make the socket or process the text. What makes this post interesting and upvoted is that it's esoteric.
It's like replying to a thread about implementing a web server in C macros with how you implemented a music jukebox as a standard C program. Great, but also not the point or really even related.
Projects like this reiterate just how important it is, from a security perspective, to ensure your production services are running in containers without an included shell. If an attacker can get a shell, they can do pretty much anything.
Debug containers are now a stable feature in Kubernetes. It honestly boggles my mind how companies will throw so much time, money, and effort into the cybersecurity product du jour when they can get the vast majority of the value by moving everything into distroless, shell-less containers running on managed VMs that are optimized for container workloads.
config BINFMT_SCRIPT
tristate "Kernel support for scripts starting with #!"
default y
help
Say Y here if you want to execute interpreted scripts starting with
#! followed by the path to an interpreter.
I work on multi-tenant k8s clusters at CDNs and used to work at Rancher and have seen just about every multi-tenant / federated deployment there is, nasa, meta, etc. Stuff where even the hardware vfio paths mounting the nic or gpu channels keep users apart from one another, and the entire path out of the cluster are completely apart from k8s and any userspace and would be something like multus as a shim- what exactly are you referring to that can cgroup hop via a shell that we're not currently mitigating? It's the hardware being infected by something we worry about at this level.
A lot of the CDNs even use tools like kubevirt where the segmentation is even further. And then we have gvisor, firecracker, etc.
I admittedly haven't touched k8s code since 1.18 but I can't think of anything like you're referring to and I definitely would like to know about it.
Debug containers have been a bit of a let down in UX. It's rather difficult to do sort of basic stuff, and a lot of stuff is hidden behind flags.
E.g., if you just sort of roll with the defaults, you're dropped into a pod in a very confused state: `ps -ef` says nothing in your debugee is running, and the filesystem of the debugee is nowhere to be found.
You can work around both of those (the first is --target, but the latter requires an intricate SO answer¹) but its the sort of thing that would be nicer out of the box?
The node debugging mode is a bit better: by default, puts you in at least the host pidns, and mounts the host FS.
I've also been disappointed with debug containers. They are often not useful for debugging trick production-only issues because so many of those issues are related to container state, which can be (often is) different inside the container. Certain languages/platforms and developer discipline are better about this than others, like if you're using functional/immutable languages then it's less of an issue.
For applications that aren't super high security, I've been really appreciating using immutable hosts (that get regularly updated/rotated), along with CI/CD that is constantly rebuilding from source, applying latest software updates, and deploying the latest version of the app. Combined with other tools like scanners, and de-bloating your images, it really raises the height of the fruit.
I would agree, but so much of day-to-day Kubernetes is arcane CLI commands to begin with. Other stuff that is non-trivial to do on the CLI but comes up in most reasonable production deployments:
* Rotating secrets without exposing the secret to the shell history file (hint: kubectl apply -f can take - to signify atdin, but not kubectl patch!)
* Ensuring your edits to a ConfigMap pass application-level validation (i.e. your configuration changes won't crash your app, not just that it's a valid ConfigMap)
* Anything to do with user auth or RBAC
* Scaling the default persistent volume size of a StatefulSet
The truth is that Kubernetes is a platform, and just like how most people don't want to run a bare copy of Bash or VIM on their laptop, people will figure out aliases, one-liners, and other functions to help make them effective. So some of working effectively with Kubernetes means, yes, building your own custom debug containers, and writing your own helper shell stuff.
This requires `setopt HIST_IGNORE_SPACE` (zsh) or `HISTCONTROL=ignorespace`/`HISTCONTROL=ignoreboth` (bash/ksh) to be set and may not be enabled by default in many distros (e.g. NixOS doesn't, Alpine doesn’t, etc).
Always check your shell before assuming this will work!
Hey, great work! Reading through I started wonder how necessary the loadables are? It'd be fun to have one that's not dependent on loadbales, even if it's not as clean. E.g. could mktemp be replaced with a timestamp named directory or something? Can rm be avoided by just allowing garbage to pile up? Is finfo something that can be worked around in some way?
I suspect lots of people have written a "use tcpserver or inetd and feed stdout to a shell script" antics.
The thing is, shell can't cope with nulls -- if you do something like
n=$(gzip -9 < /etc/passwd)
gzip -9 < /etc/passwd | sum
echo "$n" | sum
This falls apart because shell just can't deal with nulls.
You can probably hack around all those issues, and may not run into this too much, at first, in a web server, but golly you'll pretty quickly fall into a pit.
Will sanitize strings of non-printable characters. While it is true that you can't have nulls inside bash variables, your example actually contains the correct syntax if you just remove the first and last lines.
well, sure, you can use some external program do process the stdout; then it's no longer "pure bash" which is fine, nobody grades ingots of script based on if they're 90% or 99% or 70% "pure" shell.
But -- importantly -- running
n=$(gzip -9 < /etc/passwd | tr -dc '[[:print:]]')
may process the nulls, but is it reversible? Can I now send $n into gzip -d and get whatever I put into it out?
I can do things that are reversible --
n=$(gzip -9 < /etc/passwd | base64 )
But now I can't process the output "natively" except by calling base64 every time.
And maybe I've gotten myself into this hole because sometimes the contents of $n have nulls and other times not?
Pure shell is a road to madness. Don't ask me how I know...
You are in this mess because you are supposed to be piping binary data or using temp files, not putting it in variables. Also bash scripts are just glue between external programs.
I mentioned the sanitization in the context of taking user input (since we are talking about a bash web server) because I thought you were pointing out a user could do bad things by feeding in nulls.
I'm just pointing out the insanity / inanity of "pure" shell anything. There are lots of other gotchas hiding in shell that you wouldn't encounter in other languages.
As glue, shell's wonderful. Reading from /dev/tcp/ and such is a cute trick but ultimately a dead dead dead end.
As with any language there are going to be foot-guns, gotchas, and edge cases. If you don't feel comfortable with a language (including bash) then don't use it.
Before I really knew how to program I was a systems administrator and used nothing but bash via CGI to build a $2k/month revenue site, so obviously your claims of it being a dead end are just hyperbole.
It's not a matter of comfort or discomfort with a tool; A long time ago I wrote: a web server in shell (using either inetd or tcp_server) in 1996, and wrote a mail user agent / web server to read mail out of a maildir and display it as html (that one had both a y2k bug and a "time_t is 11 digits now?" bug); that one only used tcp_server... I also wrote a web server (in shell) to manage a wireless captive portal. Some of this was on solaris, some on ultrix, some on freebsd....
I'm proud of you for making a ton of money on bash using CGI; "it's not dumb if it works" but ... doing complex stuff like this in shell is ... dumb.
At least, I certainly know better. You do you, though.
I'm simply pointing out that a "pure bash" or other shell program is folly once you get past a trivial level of complexity.
That How-to doesn't mention that shell eats nulls. If you're using shell as glue, that doesn't matter, but if you're using shell to process (not pass to another program) raw tcp connections, you'll need to manage binary data, which is full of nulls.
Perhaps you're not even aware of these issues? Anyhow, go on about your life grand troll, you're the winner.
the shell's not actually doing anything but forking things and connecting file descriptors of processes to each other
gzip -9 < /dev/random | while read line ; do echo "$line" ; done > /tmp/gibberish
The stdout of gzip is being processed by shell, and will make all the nulls go away.
(edited to add another example:)
Similarly - it isn't printing that you can't do -- it's anything -- consider:
case $(cat /bin/sh)
in
$(cat /bin/bash)) echo "they're the same!" ;;
*) echo "they're not the same!" ;;
esac
This is obviously an insane way to see if two files are identical, but worse -- it's going to fail for two different files whose only difference is how many nulls are in the file.
The conectiva Linux distro had a programmer that wrote a book on Shell Script in which he implemented a bash server, but as apache cgi scripts.
I learned to properly program with that book
A long time ago I made a similarly pure bash version of something like tcpdump just parsing various packets and protocols off a raw socket. I wish I still had that code somewhere. It was pretty much the slowest and least-robust thing of all time but was kind of fun to play around with.
The project patches accept to handle multiple requests. However, the project works without the patch being applied, so it's fair to say that this is a pure bash web server.
For your consideration, the jukebox from my old CS club:
To queue up a song, we'd find it on youtube and prepend "http://jukebox.local:10000/" to the url
More nc, cut, grep, head, mkfifo, youtube-dl, and mplayer than anything about Bash.
If bash is passing and handling input from an http request that’s pretty much good enough for me.
It's not really doing that though. Nc+head/cut/grep are and youtube-dl is grabbing the data. Bash in this case is just orchestrating the order of communication between the tools doing the actual work, as a normal Bash script.
The post is about doing all of this in pure bash builtins like /dev/tcp and bash functions. Not about gluing together tools which do the work.
Nonetheless pretty wicked app made with bash
Yeah, it's a nifty little tool. On the other hand when a really interesting post on pure bash exposition comes up the comments section turns to talking about plain bash scripts they've made or know of instead. There are several normal scripts replied in this post and a few others via links, more than actual conversation around what the post is actually about (so far). That's all I was commenting on, not the value of the tool. It's a hill :).
your hill is valid, sorry
Even the post isn't using /dev/tcp, but compiled a C file into bash "loadable builtin" (which is something I learned today). It still feels kind of cheating to me tbh.. But cool enough!
It is a bit of a cheat, to be fair. Official bash loadable module, but not necessarily a part of the static bash binary.
That's not the rules used in this submission though.
> A purely bash web server, no socat, netcat, etc...
as always
Yeah that's how bash scripting works.
See the original GitHub post for how this post is not about scripting other binaries with Bash but pure Bash implementations of functionality. The posted script e.g. does not call nc or grep to make the socket or process the text. What makes this post interesting and upvoted is that it's esoteric.
It's like replying to a thread about implementing a web server in C macros with how you implemented a music jukebox as a standard C program. Great, but also not the point or really even related.
Projects like this reiterate just how important it is, from a security perspective, to ensure your production services are running in containers without an included shell. If an attacker can get a shell, they can do pretty much anything.
Debug containers are now a stable feature in Kubernetes. It honestly boggles my mind how companies will throw so much time, money, and effort into the cybersecurity product du jour when they can get the vast majority of the value by moving everything into distroless, shell-less containers running on managed VMs that are optimized for container workloads.
Out of curiosity, how would you run Python or R workloads in kubernetes without a distro or shell?
Python needs an ld.so and libc (minimally) but not a shell or other external utilities. Shebang scripts are loaded by ld.so, not the shell.
Shebang scripts are supported directly by the kernel via the exec family of system calls, so ld.so shouldn't be involved.
https://github.com/torvalds/linux/blob/master/fs/Kconfig.bin...
Yeah, I remember reading the code in the kernel that handles shebang a long time ago. ld.so is not involved.
Python with batteries included, doesn't that mean exploit tools included?
No personal attack intended, I am wondering this about my own embedded product which contains Python.
https://stackoverflow.com/questions/62581924/is-there-a-way-...
Python is better than shell, so intruder will use it first.
Please do a write-up of these debug features. I'd love to learn about them.
Have you read the docs? https://kubernetes.io/docs/tasks/debug/debug-application/deb...
I work on multi-tenant k8s clusters at CDNs and used to work at Rancher and have seen just about every multi-tenant / federated deployment there is, nasa, meta, etc. Stuff where even the hardware vfio paths mounting the nic or gpu channels keep users apart from one another, and the entire path out of the cluster are completely apart from k8s and any userspace and would be something like multus as a shim- what exactly are you referring to that can cgroup hop via a shell that we're not currently mitigating? It's the hardware being infected by something we worry about at this level.
A lot of the CDNs even use tools like kubevirt where the segmentation is even further. And then we have gvisor, firecracker, etc.
I admittedly haven't touched k8s code since 1.18 but I can't think of anything like you're referring to and I definitely would like to know about it.
Thanks.
Also can go even simpler and use apparmor and/or systemd hardening instead of containers
Debug containers have been a bit of a let down in UX. It's rather difficult to do sort of basic stuff, and a lot of stuff is hidden behind flags.
E.g., if you just sort of roll with the defaults, you're dropped into a pod in a very confused state: `ps -ef` says nothing in your debugee is running, and the filesystem of the debugee is nowhere to be found.
You can work around both of those (the first is --target, but the latter requires an intricate SO answer¹) but its the sort of thing that would be nicer out of the box?
The node debugging mode is a bit better: by default, puts you in at least the host pidns, and mounts the host FS.
¹https://stackoverflow.com/questions/73355970/how-to-get-acce...
As someone who doesn't do much (any) k8s...
Seems only marginally better than using nsenter on a privileged container to just go muck with the host
I've also been disappointed with debug containers. They are often not useful for debugging trick production-only issues because so many of those issues are related to container state, which can be (often is) different inside the container. Certain languages/platforms and developer discipline are better about this than others, like if you're using functional/immutable languages then it's less of an issue.
For applications that aren't super high security, I've been really appreciating using immutable hosts (that get regularly updated/rotated), along with CI/CD that is constantly rebuilding from source, applying latest software updates, and deploying the latest version of the app. Combined with other tools like scanners, and de-bloating your images, it really raises the height of the fruit.
> UX
I would agree, but so much of day-to-day Kubernetes is arcane CLI commands to begin with. Other stuff that is non-trivial to do on the CLI but comes up in most reasonable production deployments:
The truth is that Kubernetes is a platform, and just like how most people don't want to run a bare copy of Bash or VIM on their laptop, people will figure out aliases, one-liners, and other functions to help make them effective. So some of working effectively with Kubernetes means, yes, building your own custom debug containers, and writing your own helper shell stuff.
> without exposing the secret to the shell history file
Any command in shell with a space before it will be omitted from history.
Agree it should take input from stdin.
This requires `setopt HIST_IGNORE_SPACE` (zsh) or `HISTCONTROL=ignorespace`/`HISTCONTROL=ignoreboth` (bash/ksh) to be set and may not be enabled by default in many distros (e.g. NixOS doesn't, Alpine doesn’t, etc).
Always check your shell before assuming this will work!
Creator here:
I'm really happy to see somebody shared it on hackernews :D If you have some questions, feel free to ask me
Hey, great work! Reading through I started wonder how necessary the loadables are? It'd be fun to have one that's not dependent on loadbales, even if it's not as clean. E.g. could mktemp be replaced with a timestamp named directory or something? Can rm be avoided by just allowing garbage to pile up? Is finfo something that can be worked around in some way?
Hello,
You could avoid loadables.
Finfo <- load file inside a variable and get the size Mktemp <- like you said with timestamp Rm <- with a fifo or variable
Ah "stick it in a variable" seems obvious now, good point!
I suspect lots of people have written a "use tcpserver or inetd and feed stdout to a shell script" antics.
The thing is, shell can't cope with nulls -- if you do something like
This falls apart because shell just can't deal with nulls.
You can probably hack around all those issues, and may not run into this too much, at first, in a web server, but golly you'll pretty quickly fall into a pit.
Will sanitize strings of non-printable characters. While it is true that you can't have nulls inside bash variables, your example actually contains the correct syntax if you just remove the first and last lines.
well, sure, you can use some external program do process the stdout; then it's no longer "pure bash" which is fine, nobody grades ingots of script based on if they're 90% or 99% or 70% "pure" shell.
But -- importantly -- running
may process the nulls, but is it reversible? Can I now send $n into gzip -d and get whatever I put into it out?
I can do things that are reversible --
But now I can't process the output "natively" except by calling base64 every time.
And maybe I've gotten myself into this hole because sometimes the contents of $n have nulls and other times not?
Pure shell is a road to madness. Don't ask me how I know...
You are in this mess because you are supposed to be piping binary data or using temp files, not putting it in variables. Also bash scripts are just glue between external programs.
I mentioned the sanitization in the context of taking user input (since we are talking about a bash web server) because I thought you were pointing out a user could do bad things by feeding in nulls.
I'm just pointing out the insanity / inanity of "pure" shell anything. There are lots of other gotchas hiding in shell that you wouldn't encounter in other languages.
As glue, shell's wonderful. Reading from /dev/tcp/ and such is a cute trick but ultimately a dead dead dead end.
As with any language there are going to be foot-guns, gotchas, and edge cases. If you don't feel comfortable with a language (including bash) then don't use it.
Before I really knew how to program I was a systems administrator and used nothing but bash via CGI to build a $2k/month revenue site, so obviously your claims of it being a dead end are just hyperbole.
It's not a matter of comfort or discomfort with a tool; A long time ago I wrote: a web server in shell (using either inetd or tcp_server) in 1996, and wrote a mail user agent / web server to read mail out of a maildir and display it as html (that one had both a y2k bug and a "time_t is 11 digits now?" bug); that one only used tcp_server... I also wrote a web server (in shell) to manage a wireless captive portal. Some of this was on solaris, some on ultrix, some on freebsd....
I'm proud of you for making a ton of money on bash using CGI; "it's not dumb if it works" but ... doing complex stuff like this in shell is ... dumb.
At least, I certainly know better. You do you, though.
> "it's not dumb if it works"
It's not dumb if you know what you are doing. I and others have pointed out how to properly handle binary data in shell scripts.
This might be a good start if you'd like to learn more: https://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
I'm simply pointing out that a "pure bash" or other shell program is folly once you get past a trivial level of complexity.
That How-to doesn't mention that shell eats nulls. If you're using shell as glue, that doesn't matter, but if you're using shell to process (not pass to another program) raw tcp connections, you'll need to manage binary data, which is full of nulls.
Perhaps you're not even aware of these issues? Anyhow, go on about your life grand troll, you're the winner.
Yeah you can pipe nulls between processes just fine. You just can't print them in a shell.
In the case of
the shell's not actually doing anything but forking things and connecting file descriptors of processes to each other
The stdout of gzip is being processed by shell, and will make all the nulls go away.
(edited to add another example:)
Similarly - it isn't printing that you can't do -- it's anything -- consider:
This is obviously an insane way to see if two files are identical, but worse -- it's going to fail for two different files whose only difference is how many nulls are in the file.
or use vis(1) on both sides of bash-land
Related:
Show HN: A pure bash web server. No netcat, socat, etc. - https://news.ycombinator.com/item?id=29794979 - Jan 2022 (97 comments)
The conectiva Linux distro had a programmer that wrote a book on Shell Script in which he implemented a bash server, but as apache cgi scripts. I learned to properly program with that book
https://www.amazon.com.br/Script-Profissional-Aurelio-Marinh...
When socat is around a simple server can also be constructed with it:
# test: curl -v http://localhost:8000/server
There are other such tiny web server tricks out there too, but his GitHub README says:
A long time ago I made a similarly pure bash version of something like tcpdump just parsing various packets and protocols off a raw socket. I wish I still had that code somewhere. It was pretty much the slowest and least-robust thing of all time but was kind of fun to play around with.
cool project
Of course 9front ships a rc based http server
https://werc.cat-v.org/docs/web-server-setup/rc-httpd
Related to this?
https://news.ycombinator.com/item?id=7614718
see also https://bashsta.cc
Your scientists were so preoccupied with whether they could, they didn't stop to think if they should
theres literally c files in that repo how is that pure bash
did you even bother reading the readme?
Shouldn't a program's name be sufficient?
They're digging through code. The least they could do is read the readme.
In the same way that Hacker News is not news for hackers, no.
It's not?
As a hacker, this is news to me.
The project patches accept to handle multiple requests. However, the project works without the patch being applied, so it's fair to say that this is a pure bash web server.
Yep, there's a .csv file as well so clearly Microsoft Excel is in play too. /s
and it's not even a CSV, not a comma in sight!
I feel lied to and I want a refund.
The C file is a patch to Bash to fix a bug in an existing Bash builtin that will be included in the next Bash release.