| Svelte Hacker News

felipefar 1 year ago

Hard pass on the garbage collector. We don't need that, and the minimal GC support that was in the standard has been removed from C++23.

pizlonator 1 year ago

> Hard pass on the garbage collector.
Why?
> We don't need that
You do if you want comprehensive use-after-free protection.
> and the minimal GC support that was in the standard has been removed from C++23.
Not related to what I'm doing. The support you cite is for users of the language to write garbage collectors "on top" of the language. Fil-C++'s garbage collector is hidden in the implementation's guts, "below" the language. Fil-C++ is compliant to C++ whether C++ says that GC is allowed or not.
- felipefar 1 year ago
  
  They solve the use-after-free issue by keeping pointed objects alive, not by helping you think better about object lifetimes in your code. That means some objects will live for longer than you initially thought they would, and potentially even introduce circular references. Added to that, they also introduce random, unpredictable slowdowns in your application to run their algorithms.
  I'm not yet sold on Rust, but exploring alternatives for achieving memory safety without needing to put on a GC is commendable.
  
  pizlonator 1 year ago
  
  > They solve the use-after-free issue by keeping pointed objects alive
  That's not at all what Fil-C's garbage collector does.
  If you free() an object in Fil-C, then the capability is marked free and the next GC cycle will:
  - Delete the object.
  - Repoint all capabilities that referred to that object to point to the free singleton capability instead.
  This ensures that:
  - Freeing an object really does free it, regardless of whether the GC sees it as reachable.
  - Using an object after freeing it is guaranteed to trap with a Fil-C panic, and that panic is guaranteed to report that the object has been freed so you know why you're trapping.
  Also, as a bonus, if you don't ever free your objects, then the Fil-C GC will delete them for you once they become unreachable. So, you can write Java-style code in C++, if you want.
  > That means some objects will live for longer than you initially thought they would, and potentially even introduce circular references.
  No idea what you mean there. Maybe you're thinking of reference counting, which isn't a form of garbage collection. (Reference counting is different precisely because it cannot handle cycles.)
  > unpredictable slowdowns in your application to run their algorithms.
  Fil-C's garbage collector is 100% concurrent. It'll never pause your shit.
  
  pornel 1 year ago
  
  Fil-C sounds very similar to Google's MiraclePtr.
  However, Safe C++ (Circle) and Rust do much more than that. They are not limited to pointers on the heap, and the borrowing rules work for all references including the stack. They also work for references that need to be logically short even when the data is not freed, e.g. internal references to data protected by a mutex don't outlive unlocking of the mutex. And all of that is at zero runtime cost, and by guaranteeing the code correctly doesn't create dangling references in the first place, not merely by softening the blow of run-time failures of buggy code.
  
  pizlonator 1 year ago
  
  Fil-C is nothing like MiraclePtr. Fil-C gives you comprehensive memory safety.
  Yes, it handles references to the stack. Misuse traps or leads to other safe outcomes.
  Fil-C makes it so races have memory safe outcomes (like Java).
  Circle and Rust are strictly less safe than Fil-C, since both have unsafe escape hatches. Fil-C doesn't even have an unsafe escape hatch.
  
  pornel 1 year ago
  
  Oh, I remember this project now. I see it still advertises 3x-10x overhead. To me this takes it out of being a contender in the systems programming space.
  This can't be dismissed as a mere quality-of-implementation detail. C and C++ are used because they don't have such overheads, so it takes away the primary reason to use these languages. When non-negligible overhead is not a problem, there are plenty of nicer languages to choose from for writing new code.
  This leaves Fil-C in a role of a sandbox for legacy code, when there's some unsafe C code that won't be rewritten, but still needs to be contained at any cost. But here you need to compete with WASM and RLBox which have lower-overhead implementations.
  
  pizlonator 1 year ago
  
  Fil-C was 200x slower when I started and the latest overheads are lower than 2x in a lot of cases. It’s getting faster every month, though I don’t always update the docs to say the latest numbers (because I’m too busy actually making it faster).
  I think the reason why folks end up using C is often because they have a gigantic amount of C code and for those folks, Fil-C could easily be good enough as is.
  But dismissing it as a contender because the perf isn’t there today even as it’s getting steadily faster (on a super aggressive trajectory) is a bit unfair, I think.
  
  pornel 1 year ago
  
  The success of this project is going to be very non-linear with speed, so it really hangs on where your speed improvements will plateau.
  If you get below 2x, you can compete with WASM and ASAN. If you get it down to 1.1x-1.2x, you can compete with RLBox. If you get down to 1.05 you can call it software-emulated CHERI and kill the whole hardware line before it even comes out.
  If you get it down to 1.01x, Rust will copy you, and then beat you by skipping checks on borrowed references ;)
  
  pizlonator 1 year ago
  
  > If you get below 2x, you can compete with WASM and ASAN.
  I'm at 1.5x for a lot of workloads already. I will be solidly below 2x for sure, once I implement all of the optimizations that are in my near-term plan.
  Wasm and asan don't really give you memory safety. Wasm is a sandbox, but the code running within the sandbox can have memory safety bugs and those bugs can be exploited. Hackers are good at data-only attacks. Say you run a database in wasm - then a data-only attack will give the bad guy access to parts of the database they shouldn't have access to. Fil-C stops all that because Fil-C makes C memory safe rather than just sandboxing it.
  Asan also isn't really memory safe; it just catches enough memory safety bugs to be a useful tool for finding them. But it can be sidestepped. Fil-C can't be sidestepped.
  So, even if Fil-C was slower than wasm or asan, it would still be useful.
  > If you get it down to 1.1x-1.2x, you can compete with RLBox.
  RLBox is like wasm. It's a sandbox, not true memory safety. So, it's not a direct competitor.
  That said, I think I'll probably land at about 1.2x overhead eventually.
  > If you get down to 1.05 you can call it software-emulated CHERI and kill the whole hardware line before it even comes out.
  I can already kill CHERI because Fil-C running on my X86-64 box is faster than anything running on any CHERI HW.
  No, seriously.
  The biggest problem with CHERI is that you need high volume production to achieve good perf in silicon, so a SW solution like Fil-C that is theoretically slower than a HW solution is going to be faster than that HW solution in practice, provided it runs on high volume silicon (Fil-C does).
  I think I'm already there, honestly. If you wanted to run CHERI today, you'd be doing it in QEMU or some dorky and slow IoT thing. Either way, slower than what Fil-C gives you right now.
  > If you get it down to 1.01x, Rust will copy you, and then beat you by skipping checks on borrowed references ;)
  Rust is a different animal. Rust is highly restrictive due to its ownership model, in a way that Fil-C isn't. Plus, there's billions of lines of C code that ain't going to be rewritten in Rust, possibly ever. So, I don't have to be as fast as Rust.
  I don't think there is going to be much copying going on between Rust and Fil-C, because the way that the two languages achieve memory safety is so different. Rust is using static techniques to prevent bad programs from compiling, while Fil-C allows pretty much any C code to compile and uses static techniques to emit only the minimal set of checks.
  
  neonsunset 1 year ago
  
  1.05-1.7x is where C# places vs C. Except you also have an actual type system, rich set of tools to diagnose performance and memory issues and ability to mix and match memory management styles. It is rudimentary when compared to borrow checking and deterministic drop in Rust, but in the year of 2024 almost every language with low-level capabilities is an upgrade over C if it can be used in a particular environment.
- jandrewrogers 1 year ago
  
  Garbage collectors are directly in conflict the requirements of many high-performance software architectures. Some important types of optimization become ineffective. Also, GC overhead remains unacceptably high for many applications; performance-sensitive applications worry about context-switching overhead, and a GC is orders of magnitude worse than that.
  C++ is usually used when people care about performance, and a GC interferes with that.
  
  pizlonator 1 year ago
  
  Fil-C uses a concurrent garbage collector that never pauses the program. There is no pause in Fil-C that looks anything like the cost of a context switch. It’s a design that is suitable for real time systems.
  The GC is similar to what I used here, just much more optimized: http://www.filpizlo.com/papers/pizlo-eurosys2010-fijivm.pdf
  
  jandrewrogers 1 year ago
  
  The GC must interrupt the software because otherwise it would have no resources with which to execute. If I am running a standard thread-per-core architecture under full load with tightly scheduled CPU cache locality for maximum throughput, where do you hide the GC and how do you keep it from polluting the CPU cache or creating unnecessary cache coherency traffic? People have made similar claims about Java GCs for years but performance has never been particularly close, which is generally in agreement with what you would expect from theory. A GC will always lack context that the software has.
  
  pizlonator 1 year ago
  
  Malloc has its own overheads, and they are often worse than those created by GC.
  Something to consider is that Fil-C permits you to do all of the things you would normally do, as a C programmer, to reduce or eliminate allocation - like having structs embedded in structs and avoidance of allocation in core algorithms.
  This makes it quite different from Java or JS where the language forces you to allocate to basically do anything. I think folks conflate “GC overhead” with the overhead of languages that happen to use GC.
  
  jandrewrogers 1 year ago
  
  > I think folks conflate “GC overhead” with the overhead of languages that happen to use GC.
  That is fair to a point. Some GC languages (like Java) are significantly more inefficient than they need to be regardless of the GC.
  Nonetheless, for performance C++ code you don’t see much malloc anywhere, directly or indirectly, so the efficiency doesn’t matter that much. That’s pretty idiomatic. I think this is the real point. As a C++ programmer, if you are not serious about eliminating dynamic allocation then you aren’t serious about performance. Since C++ is used for performance, you don’t see much dynamic allocation that a GC could theoretically help with. Most objects in the hot path have explicitly and carefully managed lifetimes for performance.
  I use GC languages for quite a few things, but it is always for things where performance doesn’t matter. When performance matters, I’ve always been able to beat a GC for performance, and I’ve done my fair share of performance engineering in GC languages.
  
  pizlonator 1 year ago
  
  > Nonetheless, for performance C++ code you don’t see much malloc anywhere, directly or indirectly, so the efficiency doesn’t matter that much. That’s pretty idiomatic.
  If you compile that code with Fil-C++, then you won't see much (or any) GC overhead. The GC kicks in for three things:
  - malloc (this is the main client of the GC)
  - stack allocations that aren't escape analyzed (these are rare, typically small, and generate minuscule GC load - it's common for code I've tested to have zero of these)
  - Fil-C runtime internal allocations (these are super rare - for example if you call sigaction, pthread_create, setjmp, or dlopen then there's some GC object allocated behind the scenes - but you aren't going to be calling those a lot, and if you are then you've got bigger problems than GC).
  I understand your perspective about GC languages. Most GC languages also come with other baggage, which makes it very difficult to avoid the GC. Most C++ code that avoids malloc ends up doing a bunch of `new`s when converted to Java. But C++ code that avoids malloc will avoid the GC if compiled with Fil-C++.
  
  eacnamn 1 year ago
  
  You've lost me here a little, sorry. If you have little to no dynamic allocations, meaning all of your memory will be automatic stack memory, then memory management wouldn't be much of an issue to begin with. But the most common pattern to me seems to be memory that is allocated upfront, and then treated as-if it were automatic in a hot loop, so not reallocated or moved etc., and then deallocated after the hot part is over. How does GC interfere with these use-cases, because I'd imagine it would only kick in after, when you'd want to deallocate anyway, but do this automatically without you messing up.
  
  neonsunset 1 year ago
  
  At this point you could also use C# for vastly better user experience, that builds on top of compiler that is actually GC-aware and does not have a performance penalty, aside from being just weaker than GCC (although that is improving with each release);
  
  pizlonator 1 year ago
  
  C# isn’t C though. The point of Fil-C is you can compile real C programs (like CPython or SQLite and many others) with it.
  Fil-C’s compiler is GC aware.
  
  pjmlp 1 year ago
  
  PTC and Aicas have a good customer base that cares about performance, while selling real time Java implementations, implementations that happen to be used in military scenarios where lack of performance costs lifes on the wrong side.
- 3836293648 1 year ago
  
  Because if you can afford GC you're not using C/++. We need memory safe systems stuff. Higher level memory safety has been solved for decades
  
  pizlonator 1 year ago
  
  I don’t buy that at all.
  If I could use C++ with GC, I would - but I can’t because other than Fil-C++ no GC works soundly with the full C++ language, and those that work at all tend to be unreasonably slow and flaky due to conservatism-in-the-heap (Boehm, I’m looking at you). Reason: GC and memory safety are tightly coupled. GC makes memory safety easy, but GC also requires memory safety to be sound and performant.
  So there isn’t anything else out there quite like Fil-C++. Only Fil-C++ gives you accurate high perf GC and the full C++ language.
  Finally, “affording” GC isn’t really a thing. GC performs well when compared head to head against malloc. It’s a time-memory trade off (GC tends to use more memory but also tends to be faster). If you want to convince yourself, just try a simple test where you allocate objects in a loop. Even JS beats C++ at that benchmark, because GCs support higher allocation rates (just make sure to make both the C++ and the JS version complex enough to not trigger either compiler’s escape analysis, since then you’re not measuring the allocator).
  
  3836293648 1 year ago
  
  "affording" GC is absolutely a thing. You're measuring the wrong thing. It's primarily about latency, not throughput, and GC can only go head-to-head on throughput.
  Secondly you have places where you don't have dynamic memory at all which you're also conveniently ignoring.
  
  pizlonator 1 year ago
  
  Fil-C has a concurrent GC. It doesn’t stop your program, ever.
  If your C code doesn’t dynamically allocate then it won’t create any GC load in Fil-C.
  
  riku_iki 1 year ago
  
  > Fil-C has a concurrent GC. It doesn’t stop your program, ever.
  what about performance/throughput compared to when you allocate stuff on stack?
pjmlp 1 year ago

Unreal C++, C++/CLI, and V8 C++ do need one.
It should never have been there in first place, because it ignored their requirements, and thus it was never adopted by them or anyone else.

gmueckl 1 year ago

From the github README:

> On the other hand, Fil-C is quite slow. It's ~10x slower than legacy C right now (ranging from 3x slower in the best case, xz decompress, to 20x in the worst case, CPython).

That performance loss is severe and makes the approaches totally uninteresting for a most serious use cases. Most applications written in C or C++ don't get to waste that many cycles.

pizlonator 1 year ago

Those are old perf numbers. It’s sub-2x most of the time now, and I’m working on optimizations to make it even faster.
Note that at the start of this year it was 200x slower. I land speed ups all the time but don’t always update the readme every time I land an optimization. Perf is the main focus of my work on Fil-C right now.
- Krutonium 1 year ago
  
  Might I suggest having the CI benchmark it and then update the readme?
  
  pizlonator 1 year ago
  
  If I had the time to set that up then yeah.
  Right now I’m spending all my time actually implementing optimizations and measuring them locally. I did spend the time to give myself a good benchmark suite with a nice harness (I call it PizBench, it includes workloads from xzutils, simdutf, Python, OpenSSL, and others).

elliotpotts 1 year ago

Wow, Fil-C++ looks very interesting! I wonder what % of programs make its pointer tracking fail due to stuffing things in the higher bits, doing integer conversions and so on. It reminds me of CHERI.

pizlonator 1 year ago

You can put stuff in the high and low bits of pointers in Fil-C so long as you clear them before access, otherwise the access itself traps.
Python does shit like that and it works fine in Fil-C, though I needed to make some minor adjustments a like a 50KB ish patch to CPython.