It would really be good if someone could provide an updated overview of all of the "GCs for Rust" created thus far -- for a while I tried to keep up with them, but there are just too many! When we wrote the Alloy paper, we took Manish's survey as a starting point, and covered as many GCs of different kinds as we could squeeze in [1]. But even then there was no way we could squeeze _everything_ in, and I've seen at least 2 or 3 new ones since (safe-gc is older than the Alloy paper, so I'm not including it in that count)!
If anything all those attempts prove that for many scenarios, it is better having automated resource management + (affine, linear, dependent, effects) than the pure affine types approach taken by Rust.
Hence all the Chapel, Swift, D, Linear Haskel, Ox, Idris2, Scala Capture Checking, Koka, and probably many others, efforts going on.
This is also noticeable among rustaceans, otherwise ergonomics for all the .clone() and related machinery, wouldn't be part of the 2026 roadmap.
If you want automated resource management, automate it. I'm really not convinced that Koka-style effects-for-everything are a win relative to just passing objects that own resources through parameters so they have actual values and explicit lifetimes.
I certainly do, which is why outside some hobby coding to try out new language features, as means to understand where Rust stands, I don't have any special use case where I would pick Rust instead of one of those languages.
For me its ideal use case, are places where any form of automated resource management is either not technically possible, or it is a waste of time trying to change mindsets.
> I don't have any special use case where I would pick Rust instead of one of those languages.
Rust isn't a language for a special case, Rust is universal. Those other languages are for special cases, you're simply a special case developer.
Picking a different language for each special case is the primary problem Rust solves, it's the problem of being a jack of all trades and a master of none.
>> If anything all those attempts prove that for many scenarios, it is better having automated resource management + (affine, linear, dependent, effects) than the pure affine types approach taken by Rust.
Can you write the proof down for us please? I'm curious, how do you prove that "for many scenarios" A+B is better than B+A?
The OP is about doing A in safe Rust, which isn't even an option for many of those other languages which are, at the very least, bootstrapped from unsafe code.
The proof is the amount of research that is ongoing for plenty of people that don't consider Rust as it stands today, the ultimate answer in programming languages, including people like Niko Matsakis.
Or are you going to reply his opinion has no value, and should abandon the language ergonomic proposals from the improvements roadmap?
> Or are you going to reply his opinion has no value, and should abandon the language ergonomic proposals from the improvements roadmap?
I don't deny the existence of special cases or the value of looking for better ergonomics even at the expense of university. However, at the moment Rust is the best option for practitioners who don't have enough time to juggle multiple languages.
> it is better having automated resource management
Rust's ownership system is automated resource management. What you're asking for is dynamic lifetime determination, which Rust provides via types that opt out of the hierarchical single-ownership paradigm.
Manual memory management requires the programmer to insert calls to free at specific points in the program. That's manual static lifetime determination. A traditional garbage collector uses runtime analysis to determine when it's safe to call free. That's automatic dynamic lifetime determination. What Rust does is automatic static lifetime determination. Designing your data structures such that they're acyclic is not what anyone means when they say "manual memory management".
This is a very neat proof of concept that the problem is solvable entirely in safe rust. Obviously the ergonomics suffer, but its an interesting datapoint, and one can hope that future rust developments will mitigate some of the papercuts
Given your nick, one of the points of 2026 roadmap is exactly trying to make Rust more Swift like, or any other language with RC as automatic resource mechanism for that matter, removing the .clone() pain.
Or the experiements Niko Matsakis is doing with Dada.
Frankly, it always feels a bit wrong to me to get around Rust's strictness for pointers by just replacing them with IDs/indexes/offsets into some collection. You can still have many of the same memory-safety issues as with pointers, except that now they become logic bugs that are undetectable by the compiler.
Either use unsafe and think about using raw pointers carefully, respecting the soundness rules, or truly redesign it using idiomatic Rust constructs. But don't hide complexity under the rug by using indexes instead of pointers, it's mostly the same thing.
I really enjoyed the write-up though, I learned a lot from it, not to discount that.
Logic bugs are not memory safety issues, they're logic bugs. They cannot result in undefined behavior for the program as a whole, at least in the absence of unsafe code.
As far as I understand, memory safety goes well beyond undefined behavior. It’s a loose term including common pointer manipulation, allocation/deallocation and data-race issues.
Other than UB, using indexes instead of pointers can be exactly as error-prone and leads to the same kind of unexpected runtime crashes, memory corruption and security issues. It’s a false sense of security.
If you inspect the OP implementation in detail, you’ll notice that it is still plenty open to use-after-free bugs and lots of places where it can panic in an equivalent way to a segmentation fault.
> it can panic in an equivalent way to a segmentation fault.
So, like bounds-checked array access? That's also a logic error. Common pointer manipulation (in the presence of discriminated union types), allocation/deallocation errors and data races (when involving array indexes, pointers or tagged data, not just simple valye types) can all lead to UB; bounds-checked array access on its own cannot.
Bugs in the use of indexes can corrupt memory only inside the corresponding array, they cannot corrupt arbitrary memory locations, like pointers.
Such a corrupted array should never cause a program crash, unless there is a second bug where unexpected values in the corrupted array can cause crashes or unless it is intended that any detected bug must abort the program, when it does not matter whether pointers or indexes were used.
Thus using indexes should always be safer than using pointers.
In modern CPUs, indexed addressing is as fast as addressing through a register that holds a pointer, which removes the main reason why pointers were preferred in the seventies of the last century, by the time of the C language design, when in most minicomputers and microcomputers using pointers was faster than using indexes.
Here OP is implementing a full allocator and something close to a new abstract layer of virtual memory backed by vectors and addressed by offsets.
For some use-cases, you will probably keep all dynamically allocated objects in this memory.
At that point you have circumvented most of Rust’s protections and ergonomics. The user will effectively be working with raw pointers into this memory. The implementation itself also doesn’t have the proper checks because it is manipulating indexes instead of pointers.
The post itself notes around the end how it is fairly easy to end up with a dangling pointer or one that points to a different object allocated to the same position after the old one was freed. It also has quite a few places where it can panic. And implementing its Trace trait wrong or not using Roots as intended can have complex consequences. And all that while feeling like this is the most safe GC because it has no unsafe.
I don’t disagree with anything you said, and I also quite like the work from OP. But I hope you can see that this is not fully aligned with Rust’s philosophy.
If we compare 1-to-1 indices to unsafe code, indices always win (assuming they are viable wrt. perf etc.). This is very simple: all else being equal, a mistake in unsafe code (can be) UB while a mistake in indices is at most a logic bug, and logic bug is always preferred to UB.
Of course, it does mean we should compare them 1-to-1, which means we should treat the indices code like we would treat unsafe code. The most important conclusion is to encapsulate it in a data structure without business logic.
Yeah but logic bugs cannot be found by tools. Whereas for UB at least we have address sanitizer and UB sanitizer and valgrind and many similar tools. You can recreate what these tools do in your API, and that’s extra work: a use-after-free bug may be easily detected by these tools, but when you are managing indices yourself, you may have to add assertions yourself to check for things that are logically deleted but not actually deleted.
> But don't hide complexity under the rug by using indexes instead of pointers, it's mostly the same thing.
I think the simple fact that pointers are not guaranteed aligned/valid even if they are in range of a particular slice/collection etc. actually makes it very different
It's quite unfortunate that traced references has to be wrapped in Gc<>, as this means that your types are bound to a GC allocator (right? Maybe I'm wrong!). It also means that trying out a GC is a pain, as you're stuck first doing the rewrite of your types. The necessity of Trace is another burden, but probably an unavoidable one.
In this example, you wrap Gc types in Option, I think that having the Gc type be nullable instead would be an interesting experiment. Having to introduce a lot of places that branch both puts more into the instruction cache, and adds more places that the branch predictor has to track. Besides, you can always add in missing checks if you know that you have a sparse subgraph somewhere. Total potential micro optimization, but it's a bit of fun :-).
I also like how this shows how freelists are a smidge less efficient in safe Rust, as compared to unsafe Rust. In this solution, we have to encode the tag for the discriminated union, this is unnecessary! The unoccupied slots forms its own separate list, and so you know that you will always have unoccupied slots as long as you trace along that path. I assume that that will add up to the maximum alignment of bytes, as arrays need to be well-aligned? So, probably 8 extra bytes per element (you're storing stuff that probably references other stuff).
I think it should be possible to get rid of the Option tag without introducing any unsafe code by changing index in Gc from u32 to std::num::NonZero<u32>.
I found this while looking for a solution for more easily removing some unsafe code from a library that does a lot of C FFI. I didn't end up going with it though, for now I'm taking an approach of mapping valid pointers that I return to the caller and then validating that pointers passed to my library functions are in that valid mapping (and then also using that valid mapping to contain some additional information that doesn't fit in the ABI of the structs that the pointers are for that I use to safely do validation. So e.g., I can store the range of some other valid member pointers as a normal safe rust reference and then index into it with member pointers on the struct, completely avoiding unsafe code despite having this FFI boundary (obviously the FFI boundary itself is still unsafe, but I can take this ugly C struct with a bunch of raw pointers and handle it safely)).
I really like rust because it does NOT have garbage collection. Can someone smarter than me help me understand the benefits of having GC in rust specifically? Does it enable things that are more difficult in "non GC" rust?
I've never really had the urge to use GC in Rust, but if I were to speculate, I'd say easier cyclic references would be one benefit. And depending on the specific GC implementation, you can probably get around many of Rust's ownership rules because Gc<T> pointers are usually `Copy`, so you can pass things around everywhere and not think about references/ownership as much.
Okay I could see that - like implementing linked lists would be easier yeah?
I just feel like not having GC is sort of a deliberate, core design choice for the language. Having strict ownership rules and being forced to think about references feels like a feature not a bug you know? Adding GC feels analogous to the "++" in C++ to me.
Not that I have anything against the efforts people are putting into it though - I'm genuinely curious about what it lets me do better/faster within rust.
I agree completely; I think choosing Rust and then adding a GC is a weird design choice. If I was in a situation where I really, truly needed GC for my memory management, I wouldn't use Rust.
GC is a similar but different set of strict ownership rules (and its own versions of being forced to think about reference invariants). There's an inherently interesting Venn Diagram overlap between Rust borrow mechanics and GC, they aren't entirely separate worlds. They are more like related worlds with slightly different trade-offs. (Similarly there's C# and .NET actively exploring GC-safe relatives of Rust's borrow mechanics in Memory<T>/Span<T> space right now, to great effect.)
In terms of practical, yeah a doubly linked list (or trees with bidirectional pointers to both parent and children, etc) is especially easier to implement in a GC environment than with Rust borrow checking alone. You can do it without a GC, but a GC can be a helpful intermediary.
Why do you like managing your own memory exactly? Modern GCs are quite good, so the domain in which you can beat their performance is narrow and will continue getting narrower as time goes on.
Many modern GC languages and runtimes have benefited greatly from improved escape analysis, which identifies memory allocations that have short, identifiable lifetimes, and removes those from the GC-managed heap entirely.
Of course, many GCs have indeed gotten better in their own right, but not using the GC at all remains the most potent optimization.
The software performance gap between modern GC code and non-GC code is still pretty large almost everywhere. GCs make entire classes of optimization effectively impossible. At the limit, GCs are an abstraction that is always going to make worse runtime choices than purpose-built code.
GCs are useful in the same way that Python is useful even though it is slow. Simplicity of use has intrinsic value but that is a tradeoff against other objectives.
> At the limit, GCs are an abstraction that is always going to make worse runtime choices than purpose-built code.
You can say the same about any abstraction, such as having a generic memory allocator, using a standard library, actually about any code reuse. For example, many programs use printf, but don’t need all its features, so hand-rolling a custom version may lead to smaller, faster code (things get complicated here on systems with shared libraries)
The question always is whether lost performance (execution time, memory usage) is worth faster development.
I agree that GC code can be and usually is slower. But as a counterpoint: allocation can be very slow. Bump allocation is fast. If you know your lifetimes and can free all of your bump-allocated memory in one go, that's pretty much the best you can do. If not, then you can still bump-allocate, but copy away the live stuff while discarding the rest. You keep bump allocation, you may even improve locality by eliminating the fragmentation in your bump allocation arena. On the other hand, you've written a garbage collector and thus inherit the disadvantages of GC.
My point is that you can start out without a GC and do a series of sane and effective optimizations that end you up with a GC. Just as you can start with a GC and optimize by moving more and more of your allocations to non-GC memory and end up without a GC. Which endpoint is faster depends on the workload.
I guess I like the idea of having to think a little more deeply about how memory is used while I'm programming. I know that might sound a bit odd, but before I started learning and using rust, I remember being much less aware of certain things while programming, and then discovering much later that I did something very sub-optimally. So rust is kind of a forcing function in a way - it makes me write better code.
I have also worked on projects (web apps) where GC pauses became annoying to end-users. I wouldn't say this is the fault of GC perse, but writing code without a GC, seems to prevent some performance problems for me, and can also make performance problems stand out sooner in the dev cycle.
To give one trivial example, if you want to write an interpreter for a garbage-collected language in Rust, you need some kind of garbage-collection facility. You could in theory do it C-style by just using raw mostly-untyped pointers everywhere, but this defeats most of the benefit of Rust over C, so you might instead prefer to encapsulate the GC into its own library, so that you can use regular Rust idioms in all the parts of the code that aren't specifically about GC.
More generally, there are various situations where garbage collection is effectively a requirement: namely, when you've got a mutable graph where anything can point to anything else, where some nodes are root nodes and others are not, and so it's possible for cycles of unreachable nodes to occur and you don't want this to result in memory leaks.
You could in theory imagine a language like Rust where garbage collection was integrated more deeply on a syntactic level, such that you might sometimes use it for convenience and not just where it's a requirement (and Rust in fact worked like this for a while pre-1.0), but the way things currently work, integrating a garbage-collection library basically always makes your code more unwieldy and you only do it if you have to.
Oh oh, who would have thought. A memory-safe rust at last.
With no unsafe allowed, even type safe. Unless you forget about their type bugs: https://github.com/Speykious/cve-rs.
So maybe eliminate type and concurrency unsafeties also then in the next decades or so.
The existence of a soundness bug in the typechecker doesn’t refute the value of soundness as a language design contract.
If anything it’s the opposite: issues demonstrated by cve-rs are _language bugs_ and are _fixable_ in principle. “Safe Rust should be memory-safe” is a well-defined, falsifiable contract that the compiler can be measured against. Meanwhile memory unsafety is a feature of the semantics of C++ and so it would be absurd to file a bug against gcc complaining that it compiled your faulty code.
The language design contract is unsafe by default. In memory, types and concurrency. What are you talking about? There are unsafe blocks all over the stdlib. And concurrency safety would need to get rid of their blocking IO, which they haven't even acknowledged.
Sure, but then they shout all over how safe they are. They got rid of the safeties pretty late, when they ripped out their GC, but kept their false promises all over.
No, that's common knowledge. I fixed concurrency safety by forbidding blocking IO. Others also. Maybe there are other ways, but never heard of other ways.
Huh? It doesn't follow that forbidding blocking IO is either necessary or sufficient for concurrency safety, at least under any definition of "safety" I can imagine. What do you mean? You mean async-not-blocking-event-loop stuff? That's not the only way to do more than one IO at a time.
> They got rid of the safeties pretty late, when they ripped out their GC, but kept their false promises all over.
This seems like a non-sequitur to me? The presence/absence of a GC is not dispositive with respect to determining "safety", especially when the GC itself involves unsafe code.
Ah. I believe pretty much every safe language on the planet constantly has bugs in the implementation that can be exploited to cause unsafety. Sometimes they even get CVEs, e.g. in JavaScript VMs.
I don't do Javascript, but any self-respecting language which calls itself safe is actually safe. I worked for decades in actually memory and type safe languages, and never ever heard of a memory or type safety bug.
Just not the cheaters: Rust, Java (until recently), and of course Javascript with its unsafe implementations.
Memory safety bug in a proper lisp? Unheard of, unless you break the GC or do wrong FFI calls.
You've made it clear from this thread that you have no idea what you're talking about. Please do not waste our time by commenting on this topic further.
I think so, assuming I'm thinking of the same thing you are, but I think that's somewhat besides the point. What I'm trying to say is twofold:
- The presence of a GC doesn't guarantee memory safety since there are sources of memory unsafety that GCs don't/can't cover (i.e., escape hatches and/or FFI), not to mention the possibility of bugs in the GC implementation itself.
- The absence of a GC doesn't preclude memory safety since you can "just" refuse to compile anything which you can't prove to be memory-safe modulo escape hatches and/or axioms and/or FFI (and bugs, unfortunately). Formal verification toolchains for C (Frama-C, seL4's setup, etc.) and Ada/SPARK's stricter modes are good examples of this.
In the case of Rust:
- `unsafe` blocks (or at least their precursors) were added in 2011 [0].
- Rust's reference-counting GC was removed in 2014 [1].
That's why I think "ripped out their GC" is a bit of a non-sequitur for "got rid of the safeties". Rust wasn't entirely safe before the GC was removed because `unsafe` already existed. And even after the GC was removed, the entire point of Rust's infamous strictness is to reject programs for which safety can't be proved (modulo the same sources that existed before the GC was removed), so the removal of GC does not necessarily imply losing memory safety either.
> The language design contract is unsafe by default
False. The language design safe by default, something that you can confirm super easily doing just the Rust tutorials and compare the same with C or C++.
Read the repo well:
cve-rs implements the following bugs in safe Rust:
Use after free
Buffer overflow
Segmentation fault
NOT REFUTE IT.
> There are unsafe blocks all over the stdlib
Unsafe blocks is not the same that unsafe code. Are marked areas that are required to do escape automated checks, and there, you are at the level of a C/C++ programmer (where in that languages ALL THE CODE IS MARKED UNSAFE).
If you complaint against that, is the same as complaint against ALL THE CODE writer on C/C++.
---
One thing important to understand about Rust: Rust is a system language and SHOULD be able to implement everyting including, Buffer overflow, Use after free , Segmentation fault and such. You should be able to implement a terrible OS, malware, faulty drivers, etc, minimally because that is required to test safe programs!
I don’t know what your point is. Unsafe blocks in the stdlib isn’t a gotcha. It’s the whole point of unsafe: you provide a validated and ideally proveably correct implementation once, with a safe wrapper. It’s how everything is implemented under the hood.
It’s like double entry accounting when you only have one pen and one writing hand. The system is broken if you ever write down only half of a transaction in one ledger: you always need to record both the flow in and flow out on both ledgers. Writing either one of them is an unsafe operation. But you can only write one thing at a time. So write them in pairs in an unsafe {} block, and reuse that block safely.
So you are fine in calling a language "safe", when it has unsafe blocks, which the compiler skips to check? You have to that manually, and then you are back in C++ land. That's hilarious. You can call it somewhat safe, or mostly safe, but never safe.
Alternatively, you can have a fully safe language, and then to get certain things done you add fundamentally unsafe FFI[1]. Or you use IPC to a process written in an unsafe language. Again, you're "back in C++ land".
It seems like your complaint is that Rust is referred to as a safe language. Which is fine; it's more correct to use the phrase "in safe Rust" rather than assuming that "in Rust" fully implies "safe". That is true, but that's a crack in a sidewalk compared to the chasm of difference between Rust and C++. Why obsess over that crack?
Should we all refer to "Python without FFI or any extensions written in C or another unsafe language" instead of "Python", to avoid asserting that Python-as-it-is-used is a safe language?
[1] Assuming it's FFI to an unsafe language, and that's the main purpose of FFI.
The perfectly safe platonic ideal you are implying cannot exist. Rust is a safe language because it bounds the unsafely to the minimal, clearly demarcated area where it can be reviewed and proven (outside the bounds of the type system) to hold.
Real machines and reality aren't built out of safe primitives. Safe constructs have to be built out of unsafe components. That's just how computers work.
Sure, that's why safe languages forbid users to use unsafe things.
"C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user."
Ellis and Stroustrup, The Annotated C++ Reference Manual.
If the repo is what I think it is, the underlying bug is over a decade old [0]. From what I understand the devs have known how to solve it for pretty much this entire time, but their desired solution is blocked on some rather large refactorings elsewhere in the compiler.
> are there flaws in Rust that aren’t edge cases that would make it not memory safe?
cve-rs is probably the most well-known one, but it's definitely not the only one [1]. That being said, I think there's a not-unreasonable argument to be made that cve-rs counts as an edge case since (as far as I know) there's no publicly-known examples of code that has "organically" run into this.
On a more practical note, I think if you're worried about memory safety in Rust code type system holes are vanishingly unlikely to be the cause compared to more "normal" things like mistakes in-around `unsafe` blocks.
It would really be good if someone could provide an updated overview of all of the "GCs for Rust" created thus far -- for a while I tried to keep up with them, but there are just too many! When we wrote the Alloy paper, we took Manish's survey as a starting point, and covered as many GCs of different kinds as we could squeeze in [1]. But even then there was no way we could squeeze _everything_ in, and I've seen at least 2 or 3 new ones since (safe-gc is older than the Alloy paper, so I'm not including it in that count)!
[1] https://soft-dev.org/pubs/html/hughes_tratt__garbage_collect...
If anything all those attempts prove that for many scenarios, it is better having automated resource management + (affine, linear, dependent, effects) than the pure affine types approach taken by Rust.
Hence all the Chapel, Swift, D, Linear Haskel, Ox, Idris2, Scala Capture Checking, Koka, and probably many others, efforts going on.
This is also noticeable among rustaceans, otherwise ergonomics for all the .clone() and related machinery, wouldn't be part of the 2026 roadmap.
If you want automated resource management, automate it. I'm really not convinced that Koka-style effects-for-everything are a win relative to just passing objects that own resources through parameters so they have actual values and explicit lifetimes.
I certainly do, which is why outside some hobby coding to try out new language features, as means to understand where Rust stands, I don't have any special use case where I would pick Rust instead of one of those languages.
For me its ideal use case, are places where any form of automated resource management is either not technically possible, or it is a waste of time trying to change mindsets.
> I don't have any special use case where I would pick Rust instead of one of those languages.
Rust isn't a language for a special case, Rust is universal. Those other languages are for special cases, you're simply a special case developer.
Picking a different language for each special case is the primary problem Rust solves, it's the problem of being a jack of all trades and a master of none.
>> If anything all those attempts prove that for many scenarios, it is better having automated resource management + (affine, linear, dependent, effects) than the pure affine types approach taken by Rust.
Can you write the proof down for us please? I'm curious, how do you prove that "for many scenarios" A+B is better than B+A?
The OP is about doing A in safe Rust, which isn't even an option for many of those other languages which are, at the very least, bootstrapped from unsafe code.
When every problem looks like a nail...
The proof is the amount of research that is ongoing for plenty of people that don't consider Rust as it stands today, the ultimate answer in programming languages, including people like Niko Matsakis.
Or are you going to reply his opinion has no value, and should abandon the language ergonomic proposals from the improvements roadmap?
> Or are you going to reply his opinion has no value, and should abandon the language ergonomic proposals from the improvements roadmap?
I don't deny the existence of special cases or the value of looking for better ergonomics even at the expense of university. However, at the moment Rust is the best option for practitioners who don't have enough time to juggle multiple languages.
Practitioners who don't have enough time to juggle multiple languages are on the wrong business.
> it is better having automated resource management
Rust's ownership system is automated resource management. What you're asking for is dynamic lifetime determination, which Rust provides via types that opt out of the hierarchical single-ownership paradigm.
Nope, because it has plenty of manual hand holding to keep the compiler happy.
Manual memory management requires the programmer to insert calls to free at specific points in the program. That's manual static lifetime determination. A traditional garbage collector uses runtime analysis to determine when it's safe to call free. That's automatic dynamic lifetime determination. What Rust does is automatic static lifetime determination. Designing your data structures such that they're acyclic is not what anyone means when they say "manual memory management".
Rust requires the programmer to manually design data structures and code algorithms in a way that doesn't trigger borrower checker compile errors.
There is nothing automatic out of it.
Write Rust code, compile error if done incorrectly, manually fix the data structure or algorithm root cause, loop.
This is a very neat proof of concept that the problem is solvable entirely in safe rust. Obviously the ergonomics suffer, but its an interesting datapoint, and one can hope that future rust developments will mitigate some of the papercuts
Given your nick, one of the points of 2026 roadmap is exactly trying to make Rust more Swift like, or any other language with RC as automatic resource mechanism for that matter, removing the .clone() pain.
Or the experiements Niko Matsakis is doing with Dada.
> Given your nick
Weirdly, I picked this screen name about a decade before Swift the language (and several years before Swift the singer) debuted
Too bad HN nicknames cannot be put for sale :)
Ohhh.... you're a logistics nerd. Cool! :P
Frankly, it always feels a bit wrong to me to get around Rust's strictness for pointers by just replacing them with IDs/indexes/offsets into some collection. You can still have many of the same memory-safety issues as with pointers, except that now they become logic bugs that are undetectable by the compiler.
Either use unsafe and think about using raw pointers carefully, respecting the soundness rules, or truly redesign it using idiomatic Rust constructs. But don't hide complexity under the rug by using indexes instead of pointers, it's mostly the same thing.
I really enjoyed the write-up though, I learned a lot from it, not to discount that.
Logic bugs are not memory safety issues, they're logic bugs. They cannot result in undefined behavior for the program as a whole, at least in the absence of unsafe code.
As far as I understand, memory safety goes well beyond undefined behavior. It’s a loose term including common pointer manipulation, allocation/deallocation and data-race issues.
Other than UB, using indexes instead of pointers can be exactly as error-prone and leads to the same kind of unexpected runtime crashes, memory corruption and security issues. It’s a false sense of security.
If you inspect the OP implementation in detail, you’ll notice that it is still plenty open to use-after-free bugs and lots of places where it can panic in an equivalent way to a segmentation fault.
It is quite unsafe while not being “unsafe”.
> it can panic in an equivalent way to a segmentation fault.
So, like bounds-checked array access? That's also a logic error. Common pointer manipulation (in the presence of discriminated union types), allocation/deallocation errors and data races (when involving array indexes, pointers or tagged data, not just simple valye types) can all lead to UB; bounds-checked array access on its own cannot.
Bugs in the use of indexes can corrupt memory only inside the corresponding array, they cannot corrupt arbitrary memory locations, like pointers.
Such a corrupted array should never cause a program crash, unless there is a second bug where unexpected values in the corrupted array can cause crashes or unless it is intended that any detected bug must abort the program, when it does not matter whether pointers or indexes were used.
Thus using indexes should always be safer than using pointers.
In modern CPUs, indexed addressing is as fast as addressing through a register that holds a pointer, which removes the main reason why pointers were preferred in the seventies of the last century, by the time of the C language design, when in most minicomputers and microcomputers using pointers was faster than using indexes.
Here OP is implementing a full allocator and something close to a new abstract layer of virtual memory backed by vectors and addressed by offsets.
For some use-cases, you will probably keep all dynamically allocated objects in this memory.
At that point you have circumvented most of Rust’s protections and ergonomics. The user will effectively be working with raw pointers into this memory. The implementation itself also doesn’t have the proper checks because it is manipulating indexes instead of pointers.
The post itself notes around the end how it is fairly easy to end up with a dangling pointer or one that points to a different object allocated to the same position after the old one was freed. It also has quite a few places where it can panic. And implementing its Trace trait wrong or not using Roots as intended can have complex consequences. And all that while feeling like this is the most safe GC because it has no unsafe.
I don’t disagree with anything you said, and I also quite like the work from OP. But I hope you can see that this is not fully aligned with Rust’s philosophy.
If we compare 1-to-1 indices to unsafe code, indices always win (assuming they are viable wrt. perf etc.). This is very simple: all else being equal, a mistake in unsafe code (can be) UB while a mistake in indices is at most a logic bug, and logic bug is always preferred to UB.
Of course, it does mean we should compare them 1-to-1, which means we should treat the indices code like we would treat unsafe code. The most important conclusion is to encapsulate it in a data structure without business logic.
Yeah but logic bugs cannot be found by tools. Whereas for UB at least we have address sanitizer and UB sanitizer and valgrind and many similar tools. You can recreate what these tools do in your API, and that’s extra work: a use-after-free bug may be easily detected by these tools, but when you are managing indices yourself, you may have to add assertions yourself to check for things that are logically deleted but not actually deleted.
> But don't hide complexity under the rug by using indexes instead of pointers, it's mostly the same thing.
I think the simple fact that pointers are not guaranteed aligned/valid even if they are in range of a particular slice/collection etc. actually makes it very different
It's quite unfortunate that traced references has to be wrapped in Gc<>, as this means that your types are bound to a GC allocator (right? Maybe I'm wrong!). It also means that trying out a GC is a pain, as you're stuck first doing the rewrite of your types. The necessity of Trace is another burden, but probably an unavoidable one.
In this example, you wrap Gc types in Option, I think that having the Gc type be nullable instead would be an interesting experiment. Having to introduce a lot of places that branch both puts more into the instruction cache, and adds more places that the branch predictor has to track. Besides, you can always add in missing checks if you know that you have a sparse subgraph somewhere. Total potential micro optimization, but it's a bit of fun :-).
I also like how this shows how freelists are a smidge less efficient in safe Rust, as compared to unsafe Rust. In this solution, we have to encode the tag for the discriminated union, this is unnecessary! The unoccupied slots forms its own separate list, and so you know that you will always have unoccupied slots as long as you trace along that path. I assume that that will add up to the maximum alignment of bytes, as arrays need to be well-aligned? So, probably 8 extra bytes per element (you're storing stuff that probably references other stuff).
I think it should be possible to get rid of the Option tag without introducing any unsafe code by changing index in Gc from u32 to std::num::NonZero<u32>.
An index of 0 is valid if the collection has any content. This doesn’t solve an out-of-bounds issue with the index (i.e. an index that’s too high)
I found this while looking for a solution for more easily removing some unsafe code from a library that does a lot of C FFI. I didn't end up going with it though, for now I'm taking an approach of mapping valid pointers that I return to the caller and then validating that pointers passed to my library functions are in that valid mapping (and then also using that valid mapping to contain some additional information that doesn't fit in the ABI of the structs that the pointers are for that I use to safely do validation. So e.g., I can store the range of some other valid member pointers as a normal safe rust reference and then index into it with member pointers on the struct, completely avoiding unsafe code despite having this FFI boundary (obviously the FFI boundary itself is still unsafe, but I can take this ugly C struct with a bunch of raw pointers and handle it safely)).
I really like rust because it does NOT have garbage collection. Can someone smarter than me help me understand the benefits of having GC in rust specifically? Does it enable things that are more difficult in "non GC" rust?
I've never really had the urge to use GC in Rust, but if I were to speculate, I'd say easier cyclic references would be one benefit. And depending on the specific GC implementation, you can probably get around many of Rust's ownership rules because Gc<T> pointers are usually `Copy`, so you can pass things around everywhere and not think about references/ownership as much.
Okay I could see that - like implementing linked lists would be easier yeah?
I just feel like not having GC is sort of a deliberate, core design choice for the language. Having strict ownership rules and being forced to think about references feels like a feature not a bug you know? Adding GC feels analogous to the "++" in C++ to me.
Not that I have anything against the efforts people are putting into it though - I'm genuinely curious about what it lets me do better/faster within rust.
I agree completely; I think choosing Rust and then adding a GC is a weird design choice. If I was in a situation where I really, truly needed GC for my memory management, I wouldn't use Rust.
GC is a similar but different set of strict ownership rules (and its own versions of being forced to think about reference invariants). There's an inherently interesting Venn Diagram overlap between Rust borrow mechanics and GC, they aren't entirely separate worlds. They are more like related worlds with slightly different trade-offs. (Similarly there's C# and .NET actively exploring GC-safe relatives of Rust's borrow mechanics in Memory<T>/Span<T> space right now, to great effect.)
In terms of practical, yeah a doubly linked list (or trees with bidirectional pointers to both parent and children, etc) is especially easier to implement in a GC environment than with Rust borrow checking alone. You can do it without a GC, but a GC can be a helpful intermediary.
Why do you like managing your own memory exactly? Modern GCs are quite good, so the domain in which you can beat their performance is narrow and will continue getting narrower as time goes on.
> so the domain in which you can beat their performance
Not in my experience.
Many modern GC languages and runtimes have benefited greatly from improved escape analysis, which identifies memory allocations that have short, identifiable lifetimes, and removes those from the GC-managed heap entirely.
Of course, many GCs have indeed gotten better in their own right, but not using the GC at all remains the most potent optimization.
Rust doesn’t feel like managing your own memory.
The software performance gap between modern GC code and non-GC code is still pretty large almost everywhere. GCs make entire classes of optimization effectively impossible. At the limit, GCs are an abstraction that is always going to make worse runtime choices than purpose-built code.
GCs are useful in the same way that Python is useful even though it is slow. Simplicity of use has intrinsic value but that is a tradeoff against other objectives.
> At the limit, GCs are an abstraction that is always going to make worse runtime choices than purpose-built code.
You can say the same about any abstraction, such as having a generic memory allocator, using a standard library, actually about any code reuse. For example, many programs use printf, but don’t need all its features, so hand-rolling a custom version may lead to smaller, faster code (things get complicated here on systems with shared libraries)
The question always is whether lost performance (execution time, memory usage) is worth faster development.
You're making a new argument that's entirely different than the one they're responding to.
I agree that GC code can be and usually is slower. But as a counterpoint: allocation can be very slow. Bump allocation is fast. If you know your lifetimes and can free all of your bump-allocated memory in one go, that's pretty much the best you can do. If not, then you can still bump-allocate, but copy away the live stuff while discarding the rest. You keep bump allocation, you may even improve locality by eliminating the fragmentation in your bump allocation arena. On the other hand, you've written a garbage collector and thus inherit the disadvantages of GC.
My point is that you can start out without a GC and do a series of sane and effective optimizations that end you up with a GC. Just as you can start with a GC and optimize by moving more and more of your allocations to non-GC memory and end up without a GC. Which endpoint is faster depends on the workload.
I guess I like the idea of having to think a little more deeply about how memory is used while I'm programming. I know that might sound a bit odd, but before I started learning and using rust, I remember being much less aware of certain things while programming, and then discovering much later that I did something very sub-optimally. So rust is kind of a forcing function in a way - it makes me write better code.
I have also worked on projects (web apps) where GC pauses became annoying to end-users. I wouldn't say this is the fault of GC perse, but writing code without a GC, seems to prevent some performance problems for me, and can also make performance problems stand out sooner in the dev cycle.
To give one trivial example, if you want to write an interpreter for a garbage-collected language in Rust, you need some kind of garbage-collection facility. You could in theory do it C-style by just using raw mostly-untyped pointers everywhere, but this defeats most of the benefit of Rust over C, so you might instead prefer to encapsulate the GC into its own library, so that you can use regular Rust idioms in all the parts of the code that aren't specifically about GC.
More generally, there are various situations where garbage collection is effectively a requirement: namely, when you've got a mutable graph where anything can point to anything else, where some nodes are root nodes and others are not, and so it's possible for cycles of unreachable nodes to occur and you don't want this to result in memory leaks.
You could in theory imagine a language like Rust where garbage collection was integrated more deeply on a syntactic level, such that you might sometimes use it for convenience and not just where it's a requirement (and Rust in fact worked like this for a while pre-1.0), but the way things currently work, integrating a garbage-collection library basically always makes your code more unwieldy and you only do it if you have to.
Certain algorithms are a lot easier. It's not needed though
Lock free data structures are one example of a scenario that is easier to implement with GCs.
Oh oh, who would have thought. A memory-safe rust at last. With no unsafe allowed, even type safe. Unless you forget about their type bugs: https://github.com/Speykious/cve-rs.
So maybe eliminate type and concurrency unsafeties also then in the next decades or so.
The existence of a soundness bug in the typechecker doesn’t refute the value of soundness as a language design contract.
If anything it’s the opposite: issues demonstrated by cve-rs are _language bugs_ and are _fixable_ in principle. “Safe Rust should be memory-safe” is a well-defined, falsifiable contract that the compiler can be measured against. Meanwhile memory unsafety is a feature of the semantics of C++ and so it would be absurd to file a bug against gcc complaining that it compiled your faulty code.
The language design contract is unsafe by default. In memory, types and concurrency. What are you talking about? There are unsafe blocks all over the stdlib. And concurrency safety would need to get rid of their blocking IO, which they haven't even acknowledged.
> There are unsafe blocks all over the stdlib
Physics is unsafe. Something, somewhere needs to provide the safe core.
> And concurrency safety would need to get rid of their blocking IO, which they haven't even acknowledged.
Is your position that blocking IO can't be compatible with concurrency safety? That's a strange claim. Can you explain?
Sure, but then they shout all over how safe they are. They got rid of the safeties pretty late, when they ripped out their GC, but kept their false promises all over.
No, that's common knowledge. I fixed concurrency safety by forbidding blocking IO. Others also. Maybe there are other ways, but never heard of other ways.
Huh? It doesn't follow that forbidding blocking IO is either necessary or sufficient for concurrency safety, at least under any definition of "safety" I can imagine. What do you mean? You mean async-not-blocking-event-loop stuff? That's not the only way to do more than one IO at a time.
Non-blocking IO is only one part to provide concurrency safety. Process locks are even worse. All locks are forbidden to avoid dead-locks.
> They got rid of the safeties pretty late, when they ripped out their GC, but kept their false promises all over.
This seems like a non-sequitur to me? The presence/absence of a GC is not dispositive with respect to determining "safety", especially when the GC itself involves unsafe code.
Have you ever seen a GC system with memory unsafeties? I cannot remember any
Ah. I believe pretty much every safe language on the planet constantly has bugs in the implementation that can be exploited to cause unsafety. Sometimes they even get CVEs, e.g. in JavaScript VMs.
I don't do Javascript, but any self-respecting language which calls itself safe is actually safe. I worked for decades in actually memory and type safe languages, and never ever heard of a memory or type safety bug.
Just not the cheaters: Rust, Java (until recently), and of course Javascript with its unsafe implementations.
Memory safety bug in a proper lisp? Unheard of, unless you break the GC or do wrong FFI calls.
You've made it clear from this thread that you have no idea what you're talking about. Please do not waste our time by commenting on this topic further.
Ha, I did maintain two safe languages. How many did you?
I think so, assuming I'm thinking of the same thing you are, but I think that's somewhat besides the point. What I'm trying to say is twofold:
- The presence of a GC doesn't guarantee memory safety since there are sources of memory unsafety that GCs don't/can't cover (i.e., escape hatches and/or FFI), not to mention the possibility of bugs in the GC implementation itself.
- The absence of a GC doesn't preclude memory safety since you can "just" refuse to compile anything which you can't prove to be memory-safe modulo escape hatches and/or axioms and/or FFI (and bugs, unfortunately). Formal verification toolchains for C (Frama-C, seL4's setup, etc.) and Ada/SPARK's stricter modes are good examples of this.
In the case of Rust:
- `unsafe` blocks (or at least their precursors) were added in 2011 [0].
- Rust's reference-counting GC was removed in 2014 [1].
That's why I think "ripped out their GC" is a bit of a non-sequitur for "got rid of the safeties". Rust wasn't entirely safe before the GC was removed because `unsafe` already existed. And even after the GC was removed, the entire point of Rust's infamous strictness is to reject programs for which safety can't be proved (modulo the same sources that existed before the GC was removed), so the removal of GC does not necessarily imply losing memory safety either.
[0]: https://github.com/rust-lang/rust/pull/1036
[1]: https://github.com/rust-lang/rust/pull/17666
> The language design contract is unsafe by default
False. The language design safe by default, something that you can confirm super easily doing just the Rust tutorials and compare the same with C or C++.
Read the repo well:
NOT REFUTE IT.
> There are unsafe blocks all over the stdlib
Unsafe blocks is not the same that unsafe code. Are marked areas that are required to do escape automated checks, and there, you are at the level of a C/C++ programmer (where in that languages ALL THE CODE IS MARKED UNSAFE).
If you complaint against that, is the same as complaint against ALL THE CODE writer on C/C++.
---
One thing important to understand about Rust: Rust is a system language and SHOULD be able to implement everyting including, Buffer overflow, Use after free , Segmentation fault and such. You should be able to implement a terrible OS, malware, faulty drivers, etc, minimally because that is required to test safe programs!
(example: Deterministic Simulation Testing https://turso.tech/blog/introducing-limbo-a-complete-rewrite...).
But what Rust gives is that not assume that you want to do it for most programs.
It is called OCaml, for those that want it.
I see you’ve been downvoted, but honestly this is news to me.
I see that repo is two years old - are there flaws in Rust that aren’t edge cases that would make it not memory safe?
Sure, besides those bugs, read the specs. unsafe blocks. Loud and clear. worse than in Java where the unsafeties were just in a hidden sun class.
I was familiar with the concept of unsafe blocks. What I didn’t know was there were issues outside that.
I don’t know what your point is. Unsafe blocks in the stdlib isn’t a gotcha. It’s the whole point of unsafe: you provide a validated and ideally proveably correct implementation once, with a safe wrapper. It’s how everything is implemented under the hood.
It’s like double entry accounting when you only have one pen and one writing hand. The system is broken if you ever write down only half of a transaction in one ledger: you always need to record both the flow in and flow out on both ledgers. Writing either one of them is an unsafe operation. But you can only write one thing at a time. So write them in pairs in an unsafe {} block, and reuse that block safely.
So you are fine in calling a language "safe", when it has unsafe blocks, which the compiler skips to check? You have to that manually, and then you are back in C++ land. That's hilarious. You can call it somewhat safe, or mostly safe, but never safe.
Alternatively, you can have a fully safe language, and then to get certain things done you add fundamentally unsafe FFI[1]. Or you use IPC to a process written in an unsafe language. Again, you're "back in C++ land".
It seems like your complaint is that Rust is referred to as a safe language. Which is fine; it's more correct to use the phrase "in safe Rust" rather than assuming that "in Rust" fully implies "safe". That is true, but that's a crack in a sidewalk compared to the chasm of difference between Rust and C++. Why obsess over that crack?
Should we all refer to "Python without FFI or any extensions written in C or another unsafe language" instead of "Python", to avoid asserting that Python-as-it-is-used is a safe language?
[1] Assuming it's FFI to an unsafe language, and that's the main purpose of FFI.
The perfectly safe platonic ideal you are implying cannot exist. Rust is a safe language because it bounds the unsafely to the minimal, clearly demarcated area where it can be reviewed and proven (outside the bounds of the type system) to hold.
That is not "back in C++ land."
The perfectly safe ideal does exist and is called safe. Calling unsafe safe is not even Sophism, it is mere lying
Real machines and reality aren't built out of safe primitives. Safe constructs have to be built out of unsafe components. That's just how computers work.
Sure, that's why safe languages forbid users to use unsafe things.
"C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user." Ellis and Stroustrup, The Annotated C++ Reference Manual.
> I see that repo is two years old
If the repo is what I think it is, the underlying bug is over a decade old [0]. From what I understand the devs have known how to solve it for pretty much this entire time, but their desired solution is blocked on some rather large refactorings elsewhere in the compiler.
> are there flaws in Rust that aren’t edge cases that would make it not memory safe?
cve-rs is probably the most well-known one, but it's definitely not the only one [1]. That being said, I think there's a not-unreasonable argument to be made that cve-rs counts as an edge case since (as far as I know) there's no publicly-known examples of code that has "organically" run into this.
On a more practical note, I think if you're worried about memory safety in Rust code type system holes are vanishingly unlikely to be the cause compared to more "normal" things like mistakes in-around `unsafe` blocks.
[0]: https://github.com/rust-lang/rust/issues/25860
[1]: https://github.com/rust-lang/rust/issues?q=state%3Aopen%20la...
They're being downvoted for the snarkiness (https://news.ycombinator.com/newsguidelines.html)