I have followed this project for quite long time, have even used it once or twice for fun (in floppy days). Makes me wonder why didn't anyone tried to write similar thing for Raspberry, I mean Linux is sometimes too much, this could have spinned off a complete genre of OS for these kind of devices for education as well.
But the answer to your question could probably be answered in one word: Broadcom. By keeping secret the details of their SoC, it's far harder for "from scratch" projects to appear, despite it being one platform. On the other hand, the PC, while far more diverse, still relies plenty on documented, de-facto standard interfaces and there is no shortage of articles on things like "how to write a bootsector".
And yet, as far as I have seen, it is still much easier to run a variety of OS from a RPi than a PC. Simply swap mini SD cards. While PCs can usually boot from USB, I have seen ones that cannot also boot from SD card.
(Bias disclosure: For many years I've been compiling own kernels with embedded filesystem and minimal utilties and then running from removable media, even on the PC, so RPi was easy transition. Further I stopped using disk or swap years ago, even on PC, again making RPi transition painless. Finally, I am more interested in non-graphical tasks so I can live without the use of a GPU.)
The greatest value to me of the RPi is the ability to run a variety of OS. One can readily find other boards that are better than RPi in many ways, but in each case running the variety OS that using a RPi allows for is not as easy, if it is even possible. Despite the disadvantage of RPi's closed bootloader and GPU, the flexibility to run a variety of OS is great advantage over many alternatives, for me.
The way I think about it, user freedom in choosing an OS is actually constrained by "too much choice" in hardware. I reason that this is because it is difficult for OS projects other than Linux to timely add support for each and every different item of hardware coming off the assembly line in each new year. Whereas with the RPi, the "moving target" of hardware support moves less. This IMO opens up possibilities for more OS projects to add support for RPi hardware.
Whether this is the reason a variety of OS are suppported on the RPi I do not know. But I am glad that this variety exists.
The 32-bit version has a fully open source kernel, the 64-bit version's kernel is closed. Light details and strong hinting of drama here: https://board.flatassembler.net/topic.php?t=16822 (just found this to cite the 32/64 open/closed thing and TIL'd)
What IS interesting is that it uses flat assembler. fasm is kind of interesting. Anybody looking to play with it on Linux may find this (very) high-level/vague/rough idea of where to start helpful: https://news.ycombinator.com/item?id=15282073
The comment you linked says that nasm is "tied to gcc", "indirectly". What kind of tie does it have? All I really know about it is that it uses a different syntax than GCC's assembler.
nasm cannot generate ELF binaries directly, only .o files which must be formally linked. So you have to use `ld` on the generated object file(s) to produce a binary. (I've not (yet) tested using LLVM's new linker or the linker stages from other compilers.)
Some extremely anecdotal and unscientific testing I did a little while back seemed to show that fasm's "hello world" example is several scales smaller than nasm's. I decided to be scientific and re-do my test with similar programs.
fasm includes a hello-world program in the distribution:
format ELF64 executable 3
segment readable executable
entry $
mov edx, len
lea rsi, [msg]
mov edi, 1
mov eax, 1
syscall
xor edi, edi
mov eax, 60
syscall
segment readable writeable
msg db 'Hello world!', 10
len = $ - msg
Here I pretty much converted the fasm example to match nasm as closely as possible (using various online references to get an idea of how it should look for nasm):
section .text
global _start
_start:
mov edx, len
mov rsi, msg
mov eax, 1
mov edi, 1
syscall
xor edi, edi
mov eax, 60
syscall
section .data
msg db "hello, world!", 10
len equ $ - msg
What isn't really possible with nasm is to remove the symbol table and _start export. This makes a phenomenal difference in file size.
Here's nasm:
$ ls -lh nasm64 nasm64.o
-rwxr-xr-x 1 i336 users 512 Oct 9 10:10 nasm64
-rw-r--r-- 1 i336 users 880 Oct 9 10:10 nasm64.o
Because fasm doesn't need to mess around with exports or symbols and can just write binaries directly, it absolutely kills nasm:
$ ls -lh hello64
-rwxr-xr-x 1 i336 users 222 Oct 9 10:29 hello64
One of my major interests in assembly language is in having tight control over output file size, so fasm is super interesting to me.
I am very curious as to the file format differences though. Apparently nasm generates SYSV-format ELF files, while fasm generates a Linux-specific variant...?
That's interesting. I'm genuinely curious what this means.
As far as the size differences are concerned, http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm... shows that to make smaller stuff than ld generates, you have to start hand-composing the ELF file. So fasm is pretty much the only way to produce the smallest ELFs with minimal fuss on Linux.
I've hand-written a couple elf binaries just for kicks. The wikipedia page on ELF has a list of current conventions on the 7th byte in the elf header (e_ident[EI_OSABI] in the spec) [1]:
0x00 System V
0x01 HP-UX
0x02 NetBSD
0x03 Linux
0x04 GNU Hurd
0x06 Solaris
0x07 AIX
0x08 IRIX
0x09 FreeBSD
0x0A Tru64
0x0B Novell Modesto
0x0C OpenBSD
0x0D OpenVMS
0x0E NonStop Kernel
0x0F AROS
0x10 Fenix OS
0x11 CloudABI
0x53 Sortix
I don't know enough about linker/loader details, though, to tell you how different ones actually make use of this information. IIRC, the glibc loader just checks that it's 0x0 or 0x3 and leaves it at that.
Thanks for posting this. I was rather interested until you alerted me to the drama surrounding the 64bit source code. Life is to short to waste time on childish bickering and overly aggressive “fixes” for the issue causing the drama.
> The 32-bit version has a fully open source kernel, the 64-bit version's kernel is closed. Light details and strong hinting of drama here...
Huh. That thread really makes it sound like KolibriOS has violated if not law (which may not apply everywhere) then certainly the ethics of authorship credit on derivative works. You don't just remove and replace copyright statements on someone else's licensed work. https://www.gnu.org/licenses/gpl-howto.en.html specifically says: "If you have copied code from other programs covered by the same license, copy their copyright notices too."
It used to, back in the days when every cycle counted.
Not today, more important is maintainability of the source code, because compiled C is fast enough.
Menuetos may be fast, but it also does less - for those who would compare to current OS's - it's not an apples versus apples comparison, no pun intended.
Maintainability is a high-ranked concern for the project maintenance, for sure. But for the sake of the project's applicability and longevity, an even more important reason to write it in C is portability. It's an OS, after all.
Porting Menuet to another architecture would likely be a significant amount of work. Whereas most OSs have very limited arch-specific portions.
At this point, in the evolution of compilers, assembly does not make it faster (by means of "running faster" though it does by means of "it's harder to write, hence bloat is kept smaller")
It is still very possible to hand write assembly that’s faster than optimized C. The compiler can’t just magically assume the same pre/post conditions that you can. Also register allocation is an NP problem (graph coloring).
Can a human keep that level of optimization up for the length of an OS? No idea. Would it even be worth it? Probably not.
Actually compilers are much better at register allocation than humans and have been for many years; the fact that it is NP complete is irrelevant, because NP completeness is just as difficult for humans to deal with. Other than taking advantage of special instructions that are hard for compilers to use, there are few cases where the performance of hand-written assembly is better than compiler-generated code. Compilers cannot assume pre/post conditions, but they can analyze much larger amounts of code than humans and will often detect things that humans miss -- which constants that can be propagated, expressions that can be reduced, etc. Compilers are a lot better at deciding which of two equivalent instruction sequences will be faster on a given CPU architecture.
And yet the most efficient programs are written in the low level languages, and the slowest programs are written in high level languages where optimization is the responsibility of the compiler.
Compilers can definitely do better than humans in some situations, but you can also use an optimizing assembler that helps with register allocation (not as important for x64) and all sorts of peephole optimizations. The compiler doesn't reason about the data, which means the compiler has to make worst-case assumptions that the programmer knows won't occur. This is a big source of inefficiency. Compilers will never be able to make strategic optimizations because the compiler doesn't understand what it's compiling.
Like with chess, computer + human is the winning combination.
He's probably referring to Centaur Chess, where a team of a human assisted by a chess computer (or perhaps more accurately a computer assisted by a human) competes against another similar team.
They're probably saying that a human assisted by a chess program (during training, not during actual game) can on average do better than a human who has practised chess without any artificial analysis/help. They could also perhaps mean that a chess program that takes human input during processing can do better (on average) than one that doesn't. Essentially they imply human + AI is superior to either of them in isolation.
I am not sure how you are defining the "compiler" here; the compiler back end that emits code is where register allocation and peephole optimizations happen.
"The compiler doesn't reason about the data"
It certainly can; it is hard in a language like C due to the deficiencies of the type system, but in other languages the programmer can more explicitly state constraints on values. Compilers can also make useful inferences about constraints based on e.g. conditional statements, loop bounds, etc.
Where humans win is in designing more efficient algorithms. Compilers are great at reducing the constants but still cannot improve the asymptotics. In other words, human effort is better spent on the higher-level concerns, and yes, some languages are better than others for that task.
"Like with chess, computer + human is the winning combination."
Chess-playing programs are so much better than humans that it is not even interesting to compare them anymore.
Are you trying to make the “sufficiently smart compiler” argument here?
The reality is that the fastest compilers for the fastest languages still require babysitting and hand written ASM for peak performance.
> human effort is better spent on the higher-level concerns
For most cases, yes I (and likely most people) agree. Although there are times when we do care and it is important to know that our tools are not infallible.
Actually compilers are much better at register allocation than humans and have been for many years
Having inspected compiler output for many years, I very much disagree. They've gotten a little better, but I still see plenty of unnecessary data shuffling and other things no sane human Asm programmer would ever do. A human has a far better understanding of what he/she wants the program to do, and this also includes things like which variables should go in registers, and when. A compiler can analyse dataflow and somewhat understand, but especially when external functions are involved, the knowledge is still incomplete.
Compilers are a lot better at deciding which of two equivalent instruction sequences will be faster on a given CPU architecture.
This ironically makes them worse overall, since (when told to) they always attempt to optimise for speed, even at the expense of size. They don't know that making one part of the code smaller, although somewhat slower, could allow another more critical part to fit completely into cache and improve performance overall.
Speaking as a long-time Asm programmer myself, rarely do we act like HLL compilers and literally transcribe statement-for-statement what the HLL (e.g. C or C++) source code does. We start with a high-level understanding of the algorithm; maybe pseudocode, maybe flowcharts, maybe something derived from HLL source. Then we think at the register and instruction level and decide how to implement it. It's a somewhat creative process, unlike the literal pattern-matching compilers do. Example: https://news.ycombinator.com/item?id=15423674
"I still see plenty of unnecessary data shuffling and other things no sane human Asm programmer would ever do."
OK, but in the general case, at scale, in a complex program, compilers are better able to make register allocation decisions than humans. Some compilers may not be as good as others, and I would love to see some examples, but it has been a long time since this debate was settled.
"A compiler can analyse dataflow and somewhat understand, but especially when external functions are involved, the knowledge is still incomplete."
Sure, dynamic linking throws a wrench into things, but equally true for humans who may not understand how the function they are calling really works.
"making one part of the code smaller, although somewhat slower, could allow another more critical part to fit completely into cache and improve performance overall."
I am sure it happens. It is far more common for that sort of optimization to be irrelevant due to the size and complexity of the system. Compilers are also capable of optimizing for code size, it is similar to optimizing for speed but not important in most cases.
> OK, but in the general case, at scale, in a complex program, compilers are better able to make register allocation decisions than humans. Some compilers may not be as good as others, and I would love to see some examples, but it has been a long time since this debate was settled.
You keep repeating that assumption, but offer zero proof or even any argument really to why it might be true even.
Back in the late 80s/early 90s, I did a lot of assembly language programming---6809, 68000, 8088 and MIPS, so it's not unfamiliar to me. I haven't really done any assembly language programming since ... eh ... 1995, so okay, I could be a bit rusty.
Anyway, a few weeks ago, I picked up one of the projects I worked in the early 90s, a mildly math heavy program that dealt with a pair of functions that I thought could easily be vectorized and oh! The Pentiums of today have vector instructions, and in reading over the programming manual, yes, it looked like it would be straightforward to write some vector code, in assembly (something I haven't done---should be fun).
My code easily beat clang (using C code) on its highest optimization setting. Easily beat, but it got pretty close. GCC? It (again, using the C version of the code) was smoking my code at -O2 by an embarrassing amount. And it wasn't even using the vectorized instructions! And to make sure, I ran the code on several different systems, so I don't think it was this version of the Pentium having a bad MMX implementation over that Pentium.
I haven't taken the time to figure out what GCC did (perhaps it knows better how to schedule MMX code than I do) as I was doing this for fun, not profit. All the data I was working with was in registers in my code.
> Also register allocation is an NP problem (graph coloring).
No, this strongly depends on the form of the program. SSA form permits polynomial solution of register allocation. (SSA is the most widely used form in compilers; LLVM, GCC and even MSVC use it, and many more.)
My understanding is that the number of colors is just the maximum number of live values at any point, which is easy to determine, so the theoretically NP-complete part of graph-coloring is solved before you even start trying to pick registers. After that, it gets tricky to even define optimality, because you have to trade off the likely runtime costs of different sets of moves (which in my mind includes which variables to spill to the stack when). If there's a formal framework that decisively handles that stuff, someone please let me know.
OTOH, GP only actually said "NP problem", of which P is a subset, so they're not wrong. :)
My tired brain initially thought it saw "GPU" in the acronym soup "GPL GUI PC OS" and started to think "Weird, what must an OS running on a GPU be like?... Oh. Nevermind."
His argument was that while assembly allowed for more optimized local routines, they've lost the perspective on how the system should behave at large (the handling of window drawing routines, etc.)
Folks might forget that before unix, most OSs were written in assembly, the obvious important one not written in asm was Multics (used PL/I). I worked on the Tops-10 and -20 OSs and they were pure assembly. People seem to think that it's not possible to write maintainable, reliable code in assembly, that's just not true. The DEC source was well-documented and well-tested. Sure, it had bugs fixed each release, as every code base has bugs. Still, I was never flummoxed as to what a routine did: after all, it was executing one machine instruction after another, and once you knew the operands, you knew the result. Debugging was at the machine level, instruction by instruction.
Building higher-level code was accomplished just like it's done today: from smaller modules / subroutines. PUSHJ - Push PC and Jump, or 'Call'.
These days, writing software is almost magic, meaning there is a whole 'chain of trust' that must be invoked -- all based on belief that it's all correct -- before proceeding. With assembly, magic stops at the hardware / software boundary; that is the actual machine instructions.
Thanks that is a fantastic comment. I particuliarly appreciate the "These days, writing software is almost magic, meaning there is a whole 'chain of trust' that must be invoked".
Even as late as the 80" there were large business software like telecom switches (ITT's Metaconta or Alcatel's E10) that were written only in assembly. Those software had to be reliable.
Assembly language can be reasonably structured and have comprehensive comments to name subroutines, for example. There is some value to that vs what you would get by de-assembling a binary.
Depending on the GPL, this would also ensure this OS doesn't run on restrictive platforms like iOS.
That’s not the GPL, that’s just the usage of assembly. As a side note, I remember years ago (around iOS 5 or earlier IIRC), there was a version of Bochs compiled for iOS and available on Cydia. It came with an IMG of an installed Windows 95 box and was very slow, but it was “running”[0] x86 on iOS.
[0] Quotes around “running” because Bochs isn’t so much of an emulator as it is an interpreter. It’s still an emulator, but it runs by interpreting each instruction instead of binary translation like “modern” emulators.
To add to this: the reason is that you still have function names (the assembler or linker might remove all but `_start'[0]). If the assembly used any macros, you have those. And the biggest IMO: comments are still there.
Don't let the perl extension fool you. It's mostly ASM, the perl is there to help with the build.
It was pretty easy to find someone that had copied it and violated their license by removing the copyright text, and further claiming to release it into the public domain. Perhaps not on purpose, but...
Just because you can view the source code, doesn't mean you're allowed to modify it and then use the modified version for yourself or publish the modified version.
GPL ensures that you're allowed these things and that anyone, who does modify it, has to then again allow these things to anyone who uses their version.
Yes, a good chunk of GPL – that the source code has to be published along with the software, is pointless here, so a simpler license could have been chosen, but I guess, they went with GPL anyways, as it's a proven license which has already had a few courts look over it.
The common misconception you seem to be displaying is that the advantage of "open source" software is that it comes with the source code.
This is exactly why Richard Stallman objects to the term "open source" and prefers "free software": because the point of the thing is that you are legally free to modify and distribute it. Access to source code is just a prerequisite to be able to practically exercise those freedoms. (I'm making no commentary on whether he's right about this. Personally I tend to say "open source" more often, because that term has basically won out and I'm more likely to be understood.)
By the way, years ago when Microsoft was apparently going to war against open source/free software, there was a bit from some Microsoft executive who said something like, "Open source is pointless. If our customers want to see our source code, they can just ask. But nobody does. Nobody cares about source code." He was exploiting people's misconceptions about what "open source" means. Indeed, nobody cares much about source code if they're not allowed to do anything with it.
(Plus, disassembling a binary is a massive pain compared to just using the commented and labelled assembly file.)
Assembler and machine code aren't the same thing. Your computer does not run assembly, it runs machine code. Assembly languages are still "compiled" into machine code, except that this process is called "assembling" and is much simpler than compiling other languages due to the structure of assembly and machine language being so close.
Whilst that similarity makes it easier to disassemble (i.e. decompile) back to working assembly, the structure of the code can be lost, especially if code is organised in a modular way (using macros, etc...). As modularity improves maintainability, there is a technical benefit to having code written in assembly protected under the GPL.
I think our goal should be to develop even better programming languages that can express high-level concepts succinctly, and that provide even larger sets of compile-time guarantees (e.g. static typing).
I don't see the benefit in writing anything that doesn't need to be super-optimized in assembly language. Compilers are so good at optimizing right now that assembly is only justified for very small pieces of code that for example need to use special new CPU instructions (for e.g. in video decoding), or for particular segments of code that a profiler shows that the compiler can't optimize well enough.
Assembly language is hard to write, read, understand, and reason about. It shares the same memory model as C, with all its flaws. There is nothing impressive about this. It's like someone saying they built a house with their bare hands without using any machines or technology -- machines and modern technology could have helped them build a house faster, and that was of better quality.
Even outside of UB and just in terms of implicit actions, I've also been bitten by destructors/finalizers. They can especially cause mysterious problems if they invoke any synchronization magic (join a thread, exchange state through a pipe/FIFO, acquire locks, etc.).
> assembly is only justified for very small pieces of code
It's also justified if you're doing a side project mostly for fun and you like writing assembly.
If you were a manager or executive and you found out your employee had implemented a new project in assembly language, it really might be reasonable for you to be dismayed at that choice. But that's not the situation here. At all.
Hand-written assembly is still the king when it comes to maximising CPU performance. You can see that most clearly on older machines, but the potential is still there on newer machines too. I'd suggest that there's a chance of a small resurgence in assembly after Moore's Law has truly bitten the dust.
Also, despite being very verbose, it's actually very easy to reason about. That's because you've removed all the layers of abstraction found with higher-level languages that are involved in translating source code to machine code. In assembly, your computer has to run the algorithms in the way you designed them, as it's necessary to spell out in exact detail how the algorithms should run. There's still layers of abstraction available through subroutines and macros, as well as comments for describing what the code does. The issue with assembly isn't with clarity per se, it's with portability and verboseness. Every part of the program has to be written out in painstaking detail, and the work involved in porting to a different architecture is much more than with languages like C.
The ability to tradeoff productivity for more control over the program is valuable in certain situations. The only problem I see with programming directly in assembly is that it's not compatible with different architectures. Perhaps someone should make a proper highlevel assembler similar to LLVM IR but with a stable instruction set that runs on several architectures.
I have followed this project for quite long time, have even used it once or twice for fun (in floppy days). Makes me wonder why didn't anyone tried to write similar thing for Raspberry, I mean Linux is sometimes too much, this could have spinned off a complete genre of OS for these kind of devices for education as well.
why didn't anyone tried to write similar thing for Raspberry
This is close: https://news.ycombinator.com/item?id=7926004
But the answer to your question could probably be answered in one word: Broadcom. By keeping secret the details of their SoC, it's far harder for "from scratch" projects to appear, despite it being one platform. On the other hand, the PC, while far more diverse, still relies plenty on documented, de-facto standard interfaces and there is no shortage of articles on things like "how to write a bootsector".
And yet, as far as I have seen, it is still much easier to run a variety of OS from a RPi than a PC. Simply swap mini SD cards. While PCs can usually boot from USB, I have seen ones that cannot also boot from SD card.
(Bias disclosure: For many years I've been compiling own kernels with embedded filesystem and minimal utilties and then running from removable media, even on the PC, so RPi was easy transition. Further I stopped using disk or swap years ago, even on PC, again making RPi transition painless. Finally, I am more interested in non-graphical tasks so I can live without the use of a GPU.)
The greatest value to me of the RPi is the ability to run a variety of OS. One can readily find other boards that are better than RPi in many ways, but in each case running the variety OS that using a RPi allows for is not as easy, if it is even possible. Despite the disadvantage of RPi's closed bootloader and GPU, the flexibility to run a variety of OS is great advantage over many alternatives, for me.
The way I think about it, user freedom in choosing an OS is actually constrained by "too much choice" in hardware. I reason that this is because it is difficult for OS projects other than Linux to timely add support for each and every different item of hardware coming off the assembly line in each new year. Whereas with the RPi, the "moving target" of hardware support moves less. This IMO opens up possibilities for more OS projects to add support for RPi hardware.
Whether this is the reason a variety of OS are suppported on the RPi I do not know. But I am glad that this variety exists.
For the Pi, there is RISC OS.
https://www.riscosopen.org/
Beat me to it!
Almost GPL.
MenuetOS has both 64-bit and 32-bit variants.
The 32-bit version has a fully open source kernel, the 64-bit version's kernel is closed. Light details and strong hinting of drama here: https://board.flatassembler.net/topic.php?t=16822 (just found this to cite the 32/64 open/closed thing and TIL'd)
What IS interesting is that it uses flat assembler. fasm is kind of interesting. Anybody looking to play with it on Linux may find this (very) high-level/vague/rough idea of where to start helpful: https://news.ycombinator.com/item?id=15282073
http://kolibrios.org/en/
The opensource fork
KolibriOS is the fork of the 32-bit version of MenuetOS, which is already open source, so they're both valid for that matter.
The comment you linked says that nasm is "tied to gcc", "indirectly". What kind of tie does it have? All I really know about it is that it uses a different syntax than GCC's assembler.
GCC uses GAS.
nasm cannot generate ELF binaries directly, only .o files which must be formally linked. So you have to use `ld` on the generated object file(s) to produce a binary. (I've not (yet) tested using LLVM's new linker or the linker stages from other compilers.)
Some extremely anecdotal and unscientific testing I did a little while back seemed to show that fasm's "hello world" example is several scales smaller than nasm's. I decided to be scientific and re-do my test with similar programs.
fasm includes a hello-world program in the distribution:
Here I pretty much converted the fasm example to match nasm as closely as possible (using various online references to get an idea of how it should look for nasm):
What isn't really possible with nasm is to remove the symbol table and _start export. This makes a phenomenal difference in file size.
Here's nasm:
Because fasm doesn't need to mess around with exports or symbols and can just write binaries directly, it absolutely kills nasm:
One of my major interests in assembly language is in having tight control over output file size, so fasm is super interesting to me.
I am very curious as to the file format differences though. Apparently nasm generates SYSV-format ELF files, while fasm generates a Linux-specific variant...?
That's interesting. I'm genuinely curious what this means.
As far as the size differences are concerned, http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm... shows that to make smaller stuff than ld generates, you have to start hand-composing the ELF file. So fasm is pretty much the only way to produce the smallest ELFs with minimal fuss on Linux.
File isn't smart enough to read too much into the actual ELF format, all it's doing is parsing part of the header.
What that difference means is that fasm put a 3 in byte 7 of the header, while nasm put a 0 there.
And apparently putting a 0 there no matter what the target OS ABI is common practice.
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format...
Oh! I see!
Thanks so much for this little detail - you helped me figure out that in the
at the top, that 3 is what gets put into the binary!
Change it to 0, and sure enough it shows as SYSV. That is cool.
...change it to 2 and it shows as NetBSD
...and 1 makes it show as HP-UX
...4 makes GNU/Hurd
O.o
5 is 86Open
6 is Solaris
WAT
7 is "Monterey"?!?!
(Yup, still executes just fine)
8 is IRIX
9 is FreeBSD
10 is Tru64
O.o
11 is Novell Modesto
12 is OpenBSD
13 and 14 don't display anything, and I presume that's because that's where this madness ends.
That was extremely interesting...
I've hand-written a couple elf binaries just for kicks. The wikipedia page on ELF has a list of current conventions on the 7th byte in the elf header (e_ident[EI_OSABI] in the spec) [1]:
I don't know enough about linker/loader details, though, to tell you how different ones actually make use of this information. IIRC, the glibc loader just checks that it's 0x0 or 0x3 and leaves it at that.
[1] https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
Thanks for posting this. I was rather interested until you alerted me to the drama surrounding the 64bit source code. Life is to short to waste time on childish bickering and overly aggressive “fixes” for the issue causing the drama.
> The 32-bit version has a fully open source kernel, the 64-bit version's kernel is closed. Light details and strong hinting of drama here...
Huh. That thread really makes it sound like KolibriOS has violated if not law (which may not apply everywhere) then certainly the ethics of authorship credit on derivative works. You don't just remove and replace copyright statements on someone else's licensed work. https://www.gnu.org/licenses/gpl-howto.en.html specifically says: "If you have copied code from other programs covered by the same license, copy their copyright notices too."
You're cherry picking one comment from one side of the story. That's not the picture I got reading the entire discussion.
For example somebody pointed out that the original copyright has actually been kept: http://websvn.kolibrios.org/filedetails.php?repname=Kolibri+...
This sort of thing no longer makes sense.
It used to, back in the days when every cycle counted.
Not today, more important is maintainability of the source code, because compiled C is fast enough.
Menuetos may be fast, but it also does less - for those who would compare to current OS's - it's not an apples versus apples comparison, no pun intended.
Maintainability is a high-ranked concern for the project maintenance, for sure. But for the sake of the project's applicability and longevity, an even more important reason to write it in C is portability. It's an OS, after all.
Porting Menuet to another architecture would likely be a significant amount of work. Whereas most OSs have very limited arch-specific portions.
I suppose you can run it in bochs or quemu or one of those things if you want to use it on non-x86 hardware.
At this point, in the evolution of compilers, assembly does not make it faster (by means of "running faster" though it does by means of "it's harder to write, hence bloat is kept smaller")
> assembly does not make it faster
It is still very possible to hand write assembly that’s faster than optimized C. The compiler can’t just magically assume the same pre/post conditions that you can. Also register allocation is an NP problem (graph coloring).
Can a human keep that level of optimization up for the length of an OS? No idea. Would it even be worth it? Probably not.
Actually compilers are much better at register allocation than humans and have been for many years; the fact that it is NP complete is irrelevant, because NP completeness is just as difficult for humans to deal with. Other than taking advantage of special instructions that are hard for compilers to use, there are few cases where the performance of hand-written assembly is better than compiler-generated code. Compilers cannot assume pre/post conditions, but they can analyze much larger amounts of code than humans and will often detect things that humans miss -- which constants that can be propagated, expressions that can be reduced, etc. Compilers are a lot better at deciding which of two equivalent instruction sequences will be faster on a given CPU architecture.
And yet the most efficient programs are written in the low level languages, and the slowest programs are written in high level languages where optimization is the responsibility of the compiler.
Compilers can definitely do better than humans in some situations, but you can also use an optimizing assembler that helps with register allocation (not as important for x64) and all sorts of peephole optimizations. The compiler doesn't reason about the data, which means the compiler has to make worst-case assumptions that the programmer knows won't occur. This is a big source of inefficiency. Compilers will never be able to make strategic optimizations because the compiler doesn't understand what it's compiling.
Like with chess, computer + human is the winning combination.
"Like with chess, computer + human is the winning combination."
Curious what you mean here. My layman's observation is that computers have far surpassed humans in that space.
Not starting from scratch. Modern chess programs leverage a lot of human design and analysis work.
Are there any domains in which computers start from scratch?
There are various abstract learning systems that work just fine for simple 2d video games. The problem is they have a lot of overhead.
He's probably referring to Centaur Chess, where a team of a human assisted by a chess computer (or perhaps more accurately a computer assisted by a human) competes against another similar team.
https://en.wikipedia.org/wiki/Advanced_Chess
They're probably saying that a human assisted by a chess program (during training, not during actual game) can on average do better than a human who has practised chess without any artificial analysis/help. They could also perhaps mean that a chess program that takes human input during processing can do better (on average) than one that doesn't. Essentially they imply human + AI is superior to either of them in isolation.
I am not sure how you are defining the "compiler" here; the compiler back end that emits code is where register allocation and peephole optimizations happen.
"The compiler doesn't reason about the data"
It certainly can; it is hard in a language like C due to the deficiencies of the type system, but in other languages the programmer can more explicitly state constraints on values. Compilers can also make useful inferences about constraints based on e.g. conditional statements, loop bounds, etc.
Where humans win is in designing more efficient algorithms. Compilers are great at reducing the constants but still cannot improve the asymptotics. In other words, human effort is better spent on the higher-level concerns, and yes, some languages are better than others for that task.
"Like with chess, computer + human is the winning combination."
Chess-playing programs are so much better than humans that it is not even interesting to compare them anymore.
Are you trying to make the “sufficiently smart compiler” argument here?
The reality is that the fastest compilers for the fastest languages still require babysitting and hand written ASM for peak performance.
> human effort is better spent on the higher-level concerns
For most cases, yes I (and likely most people) agree. Although there are times when we do care and it is important to know that our tools are not infallible.
Actually compilers are much better at register allocation than humans and have been for many years
Having inspected compiler output for many years, I very much disagree. They've gotten a little better, but I still see plenty of unnecessary data shuffling and other things no sane human Asm programmer would ever do. A human has a far better understanding of what he/she wants the program to do, and this also includes things like which variables should go in registers, and when. A compiler can analyse dataflow and somewhat understand, but especially when external functions are involved, the knowledge is still incomplete.
Compilers are a lot better at deciding which of two equivalent instruction sequences will be faster on a given CPU architecture.
This ironically makes them worse overall, since (when told to) they always attempt to optimise for speed, even at the expense of size. They don't know that making one part of the code smaller, although somewhat slower, could allow another more critical part to fit completely into cache and improve performance overall.
Speaking as a long-time Asm programmer myself, rarely do we act like HLL compilers and literally transcribe statement-for-statement what the HLL (e.g. C or C++) source code does. We start with a high-level understanding of the algorithm; maybe pseudocode, maybe flowcharts, maybe something derived from HLL source. Then we think at the register and instruction level and decide how to implement it. It's a somewhat creative process, unlike the literal pattern-matching compilers do. Example: https://news.ycombinator.com/item?id=15423674
"I still see plenty of unnecessary data shuffling and other things no sane human Asm programmer would ever do."
OK, but in the general case, at scale, in a complex program, compilers are better able to make register allocation decisions than humans. Some compilers may not be as good as others, and I would love to see some examples, but it has been a long time since this debate was settled.
"A compiler can analyse dataflow and somewhat understand, but especially when external functions are involved, the knowledge is still incomplete."
https://en.wikipedia.org/wiki/Interprocedural_analysis
Sure, dynamic linking throws a wrench into things, but equally true for humans who may not understand how the function they are calling really works.
"making one part of the code smaller, although somewhat slower, could allow another more critical part to fit completely into cache and improve performance overall."
I am sure it happens. It is far more common for that sort of optimization to be irrelevant due to the size and complexity of the system. Compilers are also capable of optimizing for code size, it is similar to optimizing for speed but not important in most cases.
> OK, but in the general case, at scale, in a complex program, compilers are better able to make register allocation decisions than humans. Some compilers may not be as good as others, and I would love to see some examples, but it has been a long time since this debate was settled.
You keep repeating that assumption, but offer zero proof or even any argument really to why it might be true even.
Back in the late 80s/early 90s, I did a lot of assembly language programming---6809, 68000, 8088 and MIPS, so it's not unfamiliar to me. I haven't really done any assembly language programming since ... eh ... 1995, so okay, I could be a bit rusty.
Anyway, a few weeks ago, I picked up one of the projects I worked in the early 90s, a mildly math heavy program that dealt with a pair of functions that I thought could easily be vectorized and oh! The Pentiums of today have vector instructions, and in reading over the programming manual, yes, it looked like it would be straightforward to write some vector code, in assembly (something I haven't done---should be fun).
My code easily beat clang (using C code) on its highest optimization setting. Easily beat, but it got pretty close. GCC? It (again, using the C version of the code) was smoking my code at -O2 by an embarrassing amount. And it wasn't even using the vectorized instructions! And to make sure, I ran the code on several different systems, so I don't think it was this version of the Pentium having a bad MMX implementation over that Pentium.
I haven't taken the time to figure out what GCC did (perhaps it knows better how to schedule MMX code than I do) as I was doing this for fun, not profit. All the data I was working with was in registers in my code.
> Also register allocation is an NP problem (graph coloring).
No, this strongly depends on the form of the program. SSA form permits polynomial solution of register allocation. (SSA is the most widely used form in compilers; LLVM, GCC and even MSVC use it, and many more.)
My understanding is that the number of colors is just the maximum number of live values at any point, which is easy to determine, so the theoretically NP-complete part of graph-coloring is solved before you even start trying to pick registers. After that, it gets tricky to even define optimality, because you have to trade off the likely runtime costs of different sets of moves (which in my mind includes which variables to spill to the stack when). If there's a formal framework that decisively handles that stuff, someone please let me know.
OTOH, GP only actually said "NP problem", of which P is a subset, so they're not wrong. :)
A bunch of years ago, there was a "Menuet" product that was a cross-platform C++ UI library for Mac/Windows/Whatever.
Did they expand the underpinnings into a complete OS, or is this an unrelated product?
Does it feel any faster than a modern OS not written in assembly?
I remember it being a practically instant bootup from USB on a Pentium 4 (needed a little FDD hack to make it work but it screamed.)
I would be quite surprised if anyone could write that much assembly that's faster than what a modern optimising compiler outputs.
Makes me happy that people are trying though... :)
try it, every application open instantaneously, no load times.
My tired brain initially thought it saw "GPU" in the acronym soup "GPL GUI PC OS" and started to think "Weird, what must an OS running on a GPU be like?... Oh. Nevermind."
Maybe interesting to post here are the comments of Steve Yegge (GeoWorks developer) on large scale assembly projects:
https://youtu.be/tz-Bb-D6teE?t=2161
His argument was that while assembly allowed for more optimized local routines, they've lost the perspective on how the system should behave at large (the handling of window drawing routines, etc.)
Geoworks.
Haven't heard that mentioned in a long while.
I miss those guys, they did some nice stuff. Shame it didn't catch on.
Folks might forget that before unix, most OSs were written in assembly, the obvious important one not written in asm was Multics (used PL/I). I worked on the Tops-10 and -20 OSs and they were pure assembly. People seem to think that it's not possible to write maintainable, reliable code in assembly, that's just not true. The DEC source was well-documented and well-tested. Sure, it had bugs fixed each release, as every code base has bugs. Still, I was never flummoxed as to what a routine did: after all, it was executing one machine instruction after another, and once you knew the operands, you knew the result. Debugging was at the machine level, instruction by instruction.
Building higher-level code was accomplished just like it's done today: from smaller modules / subroutines. PUSHJ - Push PC and Jump, or 'Call'.
These days, writing software is almost magic, meaning there is a whole 'chain of trust' that must be invoked -- all based on belief that it's all correct -- before proceeding. With assembly, magic stops at the hardware / software boundary; that is the actual machine instructions.
Thanks that is a fantastic comment. I particuliarly appreciate the "These days, writing software is almost magic, meaning there is a whole 'chain of trust' that must be invoked".
Even as late as the 80" there were large business software like telecom switches (ITT's Metaconta or Alcatel's E10) that were written only in assembly. Those software had to be reliable.
Burroughs B5000 was written with zero lines of Assembly in 1961 with ESPOL, followed by NEWP, both Algol 60 variants with intrinsics.
Xerox PARC used Mesa with microcoded CPUs, after using BCPL initially.
IBM used PL/8 for their RISC OS and compiler research.
OS/400 was written in PL/M.
There are plenty of other examples, UNIX was not the only one to be written in an high level language, regardless of what C devs advocate.
So what's the point of GPL if it's written in assembler?
Apparently the 64-bit version is under another non-commercial-only license.
Assembly language can be reasonably structured and have comprehensive comments to name subroutines, for example. There is some value to that vs what you would get by de-assembling a binary.
Depending on the GPL, this would also ensure this OS doesn't run on restrictive platforms like iOS.
That’s not the GPL, that’s just the usage of assembly. As a side note, I remember years ago (around iOS 5 or earlier IIRC), there was a version of Bochs compiled for iOS and available on Cydia. It came with an IMG of an installed Windows 95 box and was very slow, but it was “running”[0] x86 on iOS.
[0] Quotes around “running” because Bochs isn’t so much of an emulator as it is an interpreter. It’s still an emulator, but it runs by interpreting each instruction instead of binary translation like “modern” emulators.
Assembly source code is vastly better for modifications compared to a assembled binary? It's not like it's written in a hex editor?
To add to this: the reason is that you still have function names (the assembler or linker might remove all but `_start'[0]). If the assembly used any macros, you have those. And the biggest IMO: comments are still there.
[0]: Or whatever it’s called on $(OS)
Here's an example. OpenSSL uses assembly routines to speed up some cryptographic operations. For example, here's some routines for MD5 on x86_64:
https://github.com/openssl/openssl/blob/master/crypto/md5/as...
Don't let the perl extension fool you. It's mostly ASM, the perl is there to help with the build.
It was pretty easy to find someone that had copied it and violated their license by removing the copyright text, and further claiming to release it into the public domain. Perhaps not on purpose, but...
https://github.com/GaloisInc/hacrypto/blob/master/src/C/libs...
No copyright infringement. They share a common ancestor: http://www.zorinaq.com/papers/md5-amd64.html
Good catch, but perhaps now a better example of why software licensing still matters for ASM :)
Just because you can view the source code, doesn't mean you're allowed to modify it and then use the modified version for yourself or publish the modified version.
GPL ensures that you're allowed these things and that anyone, who does modify it, has to then again allow these things to anyone who uses their version.
Yes, a good chunk of GPL – that the source code has to be published along with the software, is pointless here, so a simpler license could have been chosen, but I guess, they went with GPL anyways, as it's a proven license which has already had a few courts look over it.
The common misconception you seem to be displaying is that the advantage of "open source" software is that it comes with the source code.
This is exactly why Richard Stallman objects to the term "open source" and prefers "free software": because the point of the thing is that you are legally free to modify and distribute it. Access to source code is just a prerequisite to be able to practically exercise those freedoms. (I'm making no commentary on whether he's right about this. Personally I tend to say "open source" more often, because that term has basically won out and I'm more likely to be understood.)
By the way, years ago when Microsoft was apparently going to war against open source/free software, there was a bit from some Microsoft executive who said something like, "Open source is pointless. If our customers want to see our source code, they can just ask. But nobody does. Nobody cares about source code." He was exploiting people's misconceptions about what "open source" means. Indeed, nobody cares much about source code if they're not allowed to do anything with it.
(Plus, disassembling a binary is a massive pain compared to just using the commented and labelled assembly file.)
Assembler and machine code aren't the same thing. Your computer does not run assembly, it runs machine code. Assembly languages are still "compiled" into machine code, except that this process is called "assembling" and is much simpler than compiling other languages due to the structure of assembly and machine language being so close.
Whilst that similarity makes it easier to disassemble (i.e. decompile) back to working assembly, the structure of the code can be lost, especially if code is organised in a modular way (using macros, etc...). As modularity improves maintainability, there is a technical benefit to having code written in assembly protected under the GPL.
I think our goal should be to develop even better programming languages that can express high-level concepts succinctly, and that provide even larger sets of compile-time guarantees (e.g. static typing).
I don't see the benefit in writing anything that doesn't need to be super-optimized in assembly language. Compilers are so good at optimizing right now that assembly is only justified for very small pieces of code that for example need to use special new CPU instructions (for e.g. in video decoding), or for particular segments of code that a profiler shows that the compiler can't optimize well enough.
Assembly language is hard to write, read, understand, and reason about. It shares the same memory model as C, with all its flaws. There is nothing impressive about this. It's like someone saying they built a house with their bare hands without using any machines or technology -- machines and modern technology could have helped them build a house faster, and that was of better quality.
Actually Assembly usually does what I mean, instead of the UB pulllig rug under my feet tricks of C optimizers.
Even outside of UB and just in terms of implicit actions, I've also been bitten by destructors/finalizers. They can especially cause mysterious problems if they invoke any synchronization magic (join a thread, exchange state through a pipe/FIFO, acquire locks, etc.).
Your hardware might still pull the rug from under you with caches, prefetching, NUMA, out-of-order execution, speculative execution and other stuff.
True, but those things are usually easier to track down, than having them and C's UB on top.
These will only make the program run slower, they wont make it produce wrong results, create heisenbugs or even prohibit from running it.
> assembly is only justified for very small pieces of code
It's also justified if you're doing a side project mostly for fun and you like writing assembly.
If you were a manager or executive and you found out your employee had implemented a new project in assembly language, it really might be reasonable for you to be dismayed at that choice. But that's not the situation here. At all.
Hand-written assembly is still the king when it comes to maximising CPU performance. You can see that most clearly on older machines, but the potential is still there on newer machines too. I'd suggest that there's a chance of a small resurgence in assembly after Moore's Law has truly bitten the dust.
Also, despite being very verbose, it's actually very easy to reason about. That's because you've removed all the layers of abstraction found with higher-level languages that are involved in translating source code to machine code. In assembly, your computer has to run the algorithms in the way you designed them, as it's necessary to spell out in exact detail how the algorithms should run. There's still layers of abstraction available through subroutines and macros, as well as comments for describing what the code does. The issue with assembly isn't with clarity per se, it's with portability and verboseness. Every part of the program has to be written out in painstaking detail, and the work involved in porting to a different architecture is much more than with languages like C.
The ability to tradeoff productivity for more control over the program is valuable in certain situations. The only problem I see with programming directly in assembly is that it's not compatible with different architectures. Perhaps someone should make a proper highlevel assembler similar to LLVM IR but with a stable instruction set that runs on several architectures.