The project's pre-pre-alpha will be out very soon (in a week or less) and will already give a nice taste of its design and speed. The primary focus thus far has been major challenges (toolchain, process creation and initialization, signals, glue layer between posix system call layer and libc, etc.). This means that while you will be able to test functions such as fork(), execve(), mmap(), or fopen() with a utf-8 path, some of the easy/easier-to-implement system calls will "surprisingly" still be missing.
For one or the other reason, fork() has become in many discussion of posix on windows something of a fetish. The interface surely has its place, and figuring out how to efficiently implement it took a huge amount of effort, yet the vast majority of applications do not truly need it. Matter of the fact is that even on linux, where fork(2) is natively supported, the sequence fork+execve is more costly than clone+execve (where clone's flags are CLONE_VM && !CLONE_THREAD). For additional reference, see for instance the implementation of posix_spawn in musl libc (http://git.musl-libc.org/cgit/musl/tree/src/process/posix_sp...).
If high performance of the posix layer were not possible the project would have not existed. Among the factors that make high performance possible are 1) direct use of kernel interfaces (aka the Native API, where most of the runtime layer is written as a user-space driver), 2) utf-8 as the primary supported multibyte encoding as a foundational concept rather than an afterthought, and 3) tls implementation that matches in speed the native tls facility.
The entire fork/exec model is bonkers. The most egregious offsense is that it leads to things like Linux's OOM killer. If you ever find yourself writing something called an "OOM killer" you need to stop and seriously reconsider your choices up until that point. Even on the older Unix systems when the fork/exec model was first introduced, it was bloody stupid because there were much less sophisticated memory managers at the time and the performance cost of fork/exec was massive.
No, OOM killer is completely orthogonal to this and is a consequence of not doing correct commit accounting. With strict commit accounting turned on (vm.overcommit_memory=2) fork will correctly fail when there is not sufficient physical backing. But you're correct that fork+exec is a bad model.
The OOM killer exists because of memory overcommitment, which exists because of fork/exec. The justification for overcommitment is that a big-ass process might fork just to do an exec immediately afterwards. If that is the case, then it would be lame to error out the fork because there's not enough room for a second copy of it. But if it doesn't actually exec, then suddenly there's not actually as much memory as it thinks it has as the forked process starts modifying things. This justified overcommitment in the first place and then it snowballed from there.
No, fork is only one path that can lead to overcommit. Allocation of new memory as COW references to a zero page, and COW writable MAP_PRIVATE mappings of files (such as the writable LOAD segments of any executable or library file) also lead to overcommit unless you do proper commit accounting. Any system that does not need to do detailed commit accounting to avoid overcommit is basically wasting the fact that it has virtual memory/MMU.
I never meant to suggest that fork is the only path that leads to overcommitment, but that fork/exec generally insists on overcommitment for reasonable use.
Thanks for sharing this, I wasn't aware of the relationship between the two.
Good luck. Cygwin has explored this space thoroughly. If there were a better option that preserved compatibility, the project would use it. Keep in mind that Cygwin already uses the native API extensively. Read the code.
As for fork: believe me, I'd be happy if, as as side effect of this work, more applications used posix_spawn. But Cygwin can also support an efficient posix_spawn implementation: https://github.com/dcolascione/cygspawn
A lot of the complexity actually comes from mapping POSIX filesystem semantics to Unix ones. stat(2) in Cygwin is ungodly expensive for this reason. Your layer won't be able to avoid this work without providing fewer features.
I have a sequence of tweets here summarizing basic measurements I did on musl's posix_spawn versus fork+exec:
https://twitter.com/RichFelker/status/602313644026761216
There are still plenty of applications that use fork semantically, to keep the same process image in the child, which benefit somewhat from a fast fork. But most places where fork affects performance now are things that are already pessimized by using fork+exec instead of posix_spawn: the shell, make, cgi, etc. (GCC still uses vfork, but GNU make recently switched from vfork to fork because of vfork-related bugs.) Regardless of how fast or slow fork is on midipix (but I expect it to be fairly fast, much faster than cygwin), they would benefit a lot more from just switching to using posix_spawn.
To clarify, the numbers in my tweets are measured on Linux, comparing musl libc's posix_spawn (with CLONE_VM) to plain fork+exec (which should be independent of libc). I just posted the test program on our mailing list: http://www.openwall.com/lists/musl/2015/06/04/1
A quick update regarding toolchain work and release time: we have now taught binutils and gcc to do the right thing for the target with respect to weak symbols and GOT entries. That has been quite a ride, but gladly dynamic linking is now working exactly as desired (that is, without depending on import/export annotation in the libc headers). Focus has now shifted back to the runtime layer, and we are working in full speed towards the pre-pre-alpha release.
Which aspects of midipix, if any, will be under the GPL? If you plan to release any of the runtime components under the GPL, will you also offer a commercial license?
The overall approach is to license cross-platform tools under the MIT license, and the Windows-specific runtime components under GPLv2 and GPLv3 that could be supplemented with a commercial license.
> For one or the other reason, fork() has become in many discussion of posix on windows something of a fetish.
how about this one?
fopen + mmap + unlink
Afaik windows APIs forbid deleting mmaped files.
Not if they are opened with FILE_SHARE_DELETE.
right, but it's still not the same as unlink it seems: http://blogs.msdn.com/b/oldnewthing/archive/2004/06/07/15004...
With FILE_SHARE_DELETE there shouldn't be any particular problem, but I certainly want to test this with the posix flags you had in mind. Do you happen to have a minimal example that you deem problematic on Windows?
has been a while since I tried it. it was basically
open rw, set length, mmap from two 2 processes to have shared memory, unlink & close.
expected behavior is to have a chunk of shared memory without the tempfile in the directory tree
If you delete the backing file of (or perhaps even close its handle -- I don't remember) a memory-mapped file, you get VERY bizarre behavior in Windows. I don't remember what it was, but I remember I did it and the behavior I got was quite nonsensical.