Also relevant, regarding the Linux native performance:
https://mobile.twitter.com/RichFelker/status/602313979894038...
"Rich Felker, May 24, 2015: Some interesting preliminary timing of @musllibc 's posix_spawn vs fork+exec shows it ~25x faster for large parent processes. (~360us vs 9ms). #glibc has a vfork-based posix_spawn but it's only usable for trivial cases; others use fork. @musllibc posix_spawn always uses CLONE_VM. This also means @musllibc posix_spawn will fill the fork gap on NOMMU systems cleanly/safely (unlike vfork) once we get NOMMU working."
Also evilotto's post here:
https://news.ycombinator.com/item?id=19622477
"a 100mb process generally takes >2ms to fork, while a 1mb or less process takes 70us"
Glibc got its main clone-based implementation in 2016, so it should be much more competitive now.
See also https://news.ycombinator.com/item?id=18073906 .