Good thing about rust is it forces you to write code in disciplined manner. I remember myself fighting those rules. Until it becomes natural to me and I can detect why the compiler rejected my code without reading the compiler error (for a simple one of course, but not syntax error). And that habit of carefully writing good code without slowing down carries over when I'm writing code in other languages.
This is a nice article! I think it introduces some concepts that Rust beginners might question ("what is `Copy` and what implements that?", "what's interior mutability? Doesn't that break the aliasing XOR mutability idea?"), but it gets the practical usage info across.
> Every heap allocation has exactly one owner...
This goes one step further, in that every value including on the stack has a single owner. That's why you can't pass the same `let a = [0; 10];` by value multiple times, even though that's an array on the stack.
> Zero overhead — no extra indirection
I think this is slightly misleading for borrowing, since a reference (`&`) is fundamentally a pointer that the compiler ensures upholds some special properties. So borrows come with a pointer deref when used, similarly to a `Box` or `Rc`, etc. I think the intention is to point out the heap allocations when using smart pointers which are definitely more expensive than `&`, but that doesn't really come across.
Only other call-out is that I'm not sure what the various colours indicate throughout. They look really pretty, but I'm not sure when something becomes teal vs yellow vs orange vs purple.
Rust is becoming less special in this area. Languages such as Dlang, Vlang, and Julia have added optional ownership and borrowing. As these offerings are optional, many can see this as greater programmer freedom to decide what to use for their projects, with languages that are easier to use or read.
> Rust is becoming less special in this area. Languages such as Dlang, Vlang, and Julia have added optional ownership and borrowing
Isn't the crux that Rust does those things without a garbage collector, that's the novel part? Someone correct me if I'm wrong (likely), but I think all those languages have garbage collectors, which Rust doesn't.
Yes, it's a critical distinction that's important in many systems domains, but getting some form of ownership policy and method - even if implemented with a GC I think is a step forward in terms of building reliable code.
The thing about it being optional in some languages is that it's an experiment, but one that as a feature it really pays off the more code in the ecosystem is compliant to ownership tracking. For rust, it's the vast majority of it (with opt out explicitly findable..) For languages offering it optionally, it's harder to assemble the full benefit.
> without a garbage collector, that's the novel part?
That's not quite how it works in various languages. You appear to be thinking of the garbage collector as something inseparable from the language.
Both Dlang and Vlang have optional garbage collectors, that can be turned off. In the case of Vlang, none of its libraries depend on the garbage collector. Vlang offers optional (flexible) memory management, somewhat similar to Nim (but they presently don't have optional ownership).
In the case of Julia and Vlang, their optional ownership is new and experimental. Dlang's optional ownership has been around for some years now, showing that it could be done.
Dlang and Vlang allow you to choose the type of memory management (along with some other languages) that you would like to use. Vlang does it by command line flags. You can turn off garbage collection and turn on ownership.
The moment you turn on garbage collection in Dlang, you've introduced stop the world pauses to all your Dlang threads.
If you were to implement a Rust GC, then Rust can guarantee that references haven't escaped the current thread, which means there is no stop the world pause anymore, only the current thread gets paused, which is acceptable.
Unless your memory allocator runs a form of garbage collection, which most of the advanced ones do! Worst memory performance issue I've ever seen was in a C++ program where the deallocation of a large object graph from one spot completely trashed the performance of the application across many threads...
It's not a question of support, it's a question of defaults. Rust code exercises a single-ownership discipline by default, and you must opt-in to dynamic multi-ownership (via refcounting or otherwise). In languages that have pervasive dynamic multi-ownership by default (via GC, etc.), enforced single-ownership is instead opt-in, which means that you cannot expect code in the wild to use it. In Rust, you can expect code in the wild to exhibit a single-ownership discipline; that's just the prevailing style for Rust code, as encouraged by the design of the language itself.
In fact, we can see this "defaults matter" problem in Rust as well. Note that Rust by-default assumes that code is running in a context where a dynamic allocator is available, but allows one to opt-out of this ("no_std" mode). Code written for embedded devices or baremetal contexts uniformly opt into this mode, but because it's not the default, you can't just pull any old library off the shelf and expect it to work for you, so the ecosystem is much smaller and less mature. Defaults matter.
True, defaults matter. In many cases, however, using a language that defaults to a GC for memory safety is often preferable or easier.
The argument is often about when ownership and borrowing is truly necessary. Rust has its uses, but arguably not all the time and with everything, because of its defaults.
I'm going to thoughtfully disagree here. The more I use Rust, the more I feel like we got it wrong all the way back in the mists of ALGOL, and that copy-by-default (which, in (non-immutable) managed languages, translates to multi-ownership-by-default) was a mistake in the same league as null. Being able to convincingly express resource consumption in a first-class way is just too broadly useful, IMO. Certainly we can imagine a "high-level Rust" that doesn't make you care so much about pointers and the stack/heap distinction, but I still think such a language would want to make single-ownership the default. To that end, rather than any of the aforementioned languages, I'll suggest Austral as looking fairly interesting, although it's still far from high-level: https://austral-lang.org/
I’m writing some C# code at the moment, and the fact that `ObjectDisposedException` exists should give you a clue that “consume” semantics are desirable in all languages.
It opens up entirely new avenues for statically error-free programming, letting you model things like “if the caller has an instance of this type, I can guarantee that this other larger proposition is true”. Namely without also having to handle the case where the user smuggled another instance from another call site.
I understand its all boxes inside boxes inside boxes, but as an outsider it looks confusing mixing data type semantics with memory managment semantics.
Memory _is_ data at the level Rust works. It just has a few abstractions in the stdlib that hide this. And the code you highlighted is taking steps in the "let's add abstraction" direction
It bothers me too, maybe for different reasons, as an experienced rustacean. Code like this is a world of pain (in any language, might I add, even though Rust is the only one that shows you why with the syntax).
Internally shared ownership is a huge anti-pattern and rarely achieves what you’re actually trying to do. Graphs like these are much better represented by splitting the topology from the node data, and using indices or keys to form edges instead of (smart) pointers.
You can use pointers, but then the correct choice is actually raw pointers and `unsafe`, with a safe whole-graph-level API for traversal and access. This is what Rust’s own collection types do internally.
Depending on the shape of the graph (arbitrarily connected, DAG, etc.), it just boils down to whatever convenient way you have of storing lists.
Node data in one list. Indices of node parents in another list of the same length. If you need to find children quickly, you can have another list of lists containing children indices.
The key is to consider the node’s position in the list as its ID, and refer to nodes by that index instead of the pointer.
In most cases, this model is both easier to work with and more performant. For example, you can usually get by with 32-bit indices instead of 64-bit pointers, so things are much more likely to be in cache.
Thanks. So you basically make pointers manually with your own separated stack. Wouldn't make sense to have the ability to just allocate anew stact you dedicate to a graph and let the ownership system deals with it for you? Seems like a missing abstraction.
For what it's worth, you can pull the same tricks in any language.
I wouldn't like to work with it, but I wouldn't be too surprised if I encountered a List<Optional<WeakReference<Holder<Node>>>>. A list, containing optionally-present weak references to a holder object (which you might need if you intend to update such an object from a lambda, as those can only take effectively final variables from their parent scope).
I think there's value to the Rust implementation. In many other languages, these wrappers are either non-optional or hidden away with syntax, but they're still present. The Rust approach puts the developer in charge of how data should be exchanged (copied/borrowed/on stack/in heap/etc.). You can write extremely efficient programs that stack very few of those types, with many restrictions on how they ca be passed around to methods, or you can make your life easier at the cost of some performance and clutter.
For clarity, you could do something like this:
type RVec<T> = Vec<Rc<T>>;
type RNode = RefCell<Node>;
struct Node {
value: i32,
children: RVec<RNode>,
parent: Option<Weak<RNode>>
}
That would allow for some much-needed brevity, but it would also fill your code with custom types that others will need to learn about.
I see beginners trip over themselves when they get obsessed with allocation (never mind they're coming from like, Python) but in a language like Rust (or C++) you are writing programs with the intent to control how memory is managed. So it makes a lot of sense to tie memory to the types as a part of their semantics.
It's not without problems, but the idea is less confusing in practice than it seems.
I would also mention Rc/Arc is Cow<T> -it lets you borrow by default and only clone when you actually mutate. It sits in the middle of the spectrum andsaves a lot of allocations in code paths where most callers just read.
This is all well and dandy for some usage scenarios but breaks in others, eg. scene graphs and GUI's.
A scene graph needs 2 mutable references, and has nothing to do with ownership. Same issue exists with GUI's. The pattern that Rust forces is to always request a reference, which incurs a performance penalty while retrieving the same reference again and again and again.
I am not familiar with scene graphs, but what is the problem with borrowing or refcounting? This article showed how you can have multiple mutable references in Rust, even multiple mutable references running in parallel threads.
Ref counting is for ownership, it doesnt convey intent. It kind of accidently works but is the wrong abstraction, especially in code bases where ownership is known.
If you're building data structures that have specific requirements and you know you'll implement it correctly, you can use raw pointers like `*mut T`. That's why they're in the language, for when you need to do something that the borrow checker can't verify and don't want the overhead of runtime borrow checks.
Solid intro. One pattern worth adding: when you're reaching for Rc<RefCell<T>> for graph-like or self-referential data, an index-based arena (Vec<Node> with usize "pointers") is often a third path that keeps you in plain-ownership land — at the cost of pointer identity and some bookkeeping. petgraph, slotmap, and most ECS libraries formalize this. Doesn't replace Arc for genuinely cross-thread shared lifetimes, but it eliminates a lot of cases where people first reach for it.
The Vec approach can also have a nice benefit in terms of memory usage if you construct a lot of small structs (say, a linked list-style structure with a value and a forward reference). Rather than using 64 bits for a pointer, you can store an 8/16/32 bit number, depending on how many items you need. If the stars align, it can also help with keeping more items in the CPU cache lines.
I'm often curious how well the arena approach works for highly dynamic graphs. You end up having to reimplement the ref counting or a custom GC, which carries more risk.
Even memory locality of a arena doesn't provide benefit, because memory order does not mirror execution order.
Good thing about rust is it forces you to write code in disciplined manner. I remember myself fighting those rules. Until it becomes natural to me and I can detect why the compiler rejected my code without reading the compiler error (for a simple one of course, but not syntax error). And that habit of carefully writing good code without slowing down carries over when I'm writing code in other languages.
This is a nice article! I think it introduces some concepts that Rust beginners might question ("what is `Copy` and what implements that?", "what's interior mutability? Doesn't that break the aliasing XOR mutability idea?"), but it gets the practical usage info across.
> Every heap allocation has exactly one owner...
This goes one step further, in that every value including on the stack has a single owner. That's why you can't pass the same `let a = [0; 10];` by value multiple times, even though that's an array on the stack.
> Zero overhead — no extra indirection
I think this is slightly misleading for borrowing, since a reference (`&`) is fundamentally a pointer that the compiler ensures upholds some special properties. So borrows come with a pointer deref when used, similarly to a `Box` or `Rc`, etc. I think the intention is to point out the heap allocations when using smart pointers which are definitely more expensive than `&`, but that doesn't really come across.
Only other call-out is that I'm not sure what the various colours indicate throughout. They look really pretty, but I'm not sure when something becomes teal vs yellow vs orange vs purple.
Rust is becoming less special in this area. Languages such as Dlang, Vlang, and Julia have added optional ownership and borrowing. As these offerings are optional, many can see this as greater programmer freedom to decide what to use for their projects, with languages that are easier to use or read.
> Rust is becoming less special in this area. Languages such as Dlang, Vlang, and Julia have added optional ownership and borrowing
Isn't the crux that Rust does those things without a garbage collector, that's the novel part? Someone correct me if I'm wrong (likely), but I think all those languages have garbage collectors, which Rust doesn't.
Yes, it's a critical distinction that's important in many systems domains, but getting some form of ownership policy and method - even if implemented with a GC I think is a step forward in terms of building reliable code.
The thing about it being optional in some languages is that it's an experiment, but one that as a feature it really pays off the more code in the ecosystem is compliant to ownership tracking. For rust, it's the vast majority of it (with opt out explicitly findable..) For languages offering it optionally, it's harder to assemble the full benefit.
> without a garbage collector, that's the novel part?
That's not quite how it works in various languages. You appear to be thinking of the garbage collector as something inseparable from the language.
Both Dlang and Vlang have optional garbage collectors, that can be turned off. In the case of Vlang, none of its libraries depend on the garbage collector. Vlang offers optional (flexible) memory management, somewhat similar to Nim (but they presently don't have optional ownership).
In the case of Julia and Vlang, their optional ownership is new and experimental. Dlang's optional ownership has been around for some years now, showing that it could be done.
Dlang and Vlang allow you to choose the type of memory management (along with some other languages) that you would like to use. Vlang does it by command line flags. You can turn off garbage collection and turn on ownership.
> Both Dlang and Vlang have optional garbage collectors, that can be turned off.
Until you need a library that was written with the assumption of using a garbage collector.
My previous post explained how this is not the case for Vlang. None of Vlang's libraries depend on using a garbage collector.
The moment you turn on garbage collection in Dlang, you've introduced stop the world pauses to all your Dlang threads.
If you were to implement a Rust GC, then Rust can guarantee that references haven't escaped the current thread, which means there is no stop the world pause anymore, only the current thread gets paused, which is acceptable.
Unless your memory allocator runs a form of garbage collection, which most of the advanced ones do! Worst memory performance issue I've ever seen was in a C++ program where the deallocation of a large object graph from one spot completely trashed the performance of the application across many threads...
It's not a question of support, it's a question of defaults. Rust code exercises a single-ownership discipline by default, and you must opt-in to dynamic multi-ownership (via refcounting or otherwise). In languages that have pervasive dynamic multi-ownership by default (via GC, etc.), enforced single-ownership is instead opt-in, which means that you cannot expect code in the wild to use it. In Rust, you can expect code in the wild to exhibit a single-ownership discipline; that's just the prevailing style for Rust code, as encouraged by the design of the language itself.
In fact, we can see this "defaults matter" problem in Rust as well. Note that Rust by-default assumes that code is running in a context where a dynamic allocator is available, but allows one to opt-out of this ("no_std" mode). Code written for embedded devices or baremetal contexts uniformly opt into this mode, but because it's not the default, you can't just pull any old library off the shelf and expect it to work for you, so the ecosystem is much smaller and less mature. Defaults matter.
True, defaults matter. In many cases, however, using a language that defaults to a GC for memory safety is often preferable or easier.
The argument is often about when ownership and borrowing is truly necessary. Rust has its uses, but arguably not all the time and with everything, because of its defaults.
I'm going to thoughtfully disagree here. The more I use Rust, the more I feel like we got it wrong all the way back in the mists of ALGOL, and that copy-by-default (which, in (non-immutable) managed languages, translates to multi-ownership-by-default) was a mistake in the same league as null. Being able to convincingly express resource consumption in a first-class way is just too broadly useful, IMO. Certainly we can imagine a "high-level Rust" that doesn't make you care so much about pointers and the stack/heap distinction, but I still think such a language would want to make single-ownership the default. To that end, rather than any of the aforementioned languages, I'll suggest Austral as looking fairly interesting, although it's still far from high-level: https://austral-lang.org/
I’m writing some C# code at the moment, and the fact that `ObjectDisposedException` exists should give you a clue that “consume” semantics are desirable in all languages.
It opens up entirely new avenues for statically error-free programming, letting you model things like “if the caller has an instance of this type, I can guarantee that this other larger proposition is true”. Namely without also having to handle the case where the user smuggled another instance from another call site.
This is really, really useful.
This is the way forward, convenient automatic resource management, with improved type systems for low level coding.
In various forms, affine types, linear types, dependent types, effects, formal proofs.
The list is already rather long, D, Chapel, Swift, Linear Haskell, Ox, OCaml, Koka, Ada/SPARK, Mojo,...
I've not really used Rust, and I can certainly appreciate the goal of the language. However this code bothers me, and I can't really articulate why.
I understand its all boxes inside boxes inside boxes, but as an outsider it looks confusing mixing data type semantics with memory managment semantics.
Memory _is_ data at the level Rust works. It just has a few abstractions in the stdlib that hide this. And the code you highlighted is taking steps in the "let's add abstraction" direction
yeah it's too many generic depth...
there ought to be a crate that handles "Vec<Rc<RefCell<..." part (maybe https://crates.io/crates/orx-concurrent-vec ? haven't tried it yet)
It bothers me too, maybe for different reasons, as an experienced rustacean. Code like this is a world of pain (in any language, might I add, even though Rust is the only one that shows you why with the syntax).
Internally shared ownership is a huge anti-pattern and rarely achieves what you’re actually trying to do. Graphs like these are much better represented by splitting the topology from the node data, and using indices or keys to form edges instead of (smart) pointers.
You can use pointers, but then the correct choice is actually raw pointers and `unsafe`, with a safe whole-graph-level API for traversal and access. This is what Rust’s own collection types do internally.
Do you have a simple example of how this is done correctly?
I always heard that but comming from Python internal references would be the way to go so it's not natural to me.
Depending on the shape of the graph (arbitrarily connected, DAG, etc.), it just boils down to whatever convenient way you have of storing lists.
Node data in one list. Indices of node parents in another list of the same length. If you need to find children quickly, you can have another list of lists containing children indices.
The key is to consider the node’s position in the list as its ID, and refer to nodes by that index instead of the pointer.
In most cases, this model is both easier to work with and more performant. For example, you can usually get by with 32-bit indices instead of 64-bit pointers, so things are much more likely to be in cache.
All these Weak, RC are needed for allocation/deallocation of memory. Having data in list with indexes means you don't have this functionality.
Thanks. So you basically make pointers manually with your own separated stack. Wouldn't make sense to have the ability to just allocate anew stact you dedicate to a graph and let the ownership system deals with it for you? Seems like a missing abstraction.
For what it's worth, you can pull the same tricks in any language.
I wouldn't like to work with it, but I wouldn't be too surprised if I encountered a List<Optional<WeakReference<Holder<Node>>>>. A list, containing optionally-present weak references to a holder object (which you might need if you intend to update such an object from a lambda, as those can only take effectively final variables from their parent scope).
I think there's value to the Rust implementation. In many other languages, these wrappers are either non-optional or hidden away with syntax, but they're still present. The Rust approach puts the developer in charge of how data should be exchanged (copied/borrowed/on stack/in heap/etc.). You can write extremely efficient programs that stack very few of those types, with many restrictions on how they ca be passed around to methods, or you can make your life easier at the cost of some performance and clutter.
For clarity, you could do something like this:
That would allow for some much-needed brevity, but it would also fill your code with custom types that others will need to learn about.
I see beginners trip over themselves when they get obsessed with allocation (never mind they're coming from like, Python) but in a language like Rust (or C++) you are writing programs with the intent to control how memory is managed. So it makes a lot of sense to tie memory to the types as a part of their semantics.
It's not without problems, but the idea is less confusing in practice than it seems.
I would also mention Rc/Arc is Cow<T> -it lets you borrow by default and only clone when you actually mutate. It sits in the middle of the spectrum andsaves a lot of allocations in code paths where most callers just read.
This is all well and dandy for some usage scenarios but breaks in others, eg. scene graphs and GUI's.
A scene graph needs 2 mutable references, and has nothing to do with ownership. Same issue exists with GUI's. The pattern that Rust forces is to always request a reference, which incurs a performance penalty while retrieving the same reference again and again and again.
I am not familiar with scene graphs, but what is the problem with borrowing or refcounting? This article showed how you can have multiple mutable references in Rust, even multiple mutable references running in parallel threads.
Ref counting is for ownership, it doesnt convey intent. It kind of accidently works but is the wrong abstraction, especially in code bases where ownership is known.
If you're building data structures that have specific requirements and you know you'll implement it correctly, you can use raw pointers like `*mut T`. That's why they're in the language, for when you need to do something that the borrow checker can't verify and don't want the overhead of runtime borrow checks.
Branch predictor can predict these pretty damn well
Solid intro. One pattern worth adding: when you're reaching for Rc<RefCell<T>> for graph-like or self-referential data, an index-based arena (Vec<Node> with usize "pointers") is often a third path that keeps you in plain-ownership land — at the cost of pointer identity and some bookkeeping. petgraph, slotmap, and most ECS libraries formalize this. Doesn't replace Arc for genuinely cross-thread shared lifetimes, but it eliminates a lot of cases where people first reach for it.
The Vec approach can also have a nice benefit in terms of memory usage if you construct a lot of small structs (say, a linked list-style structure with a value and a forward reference). Rather than using 64 bits for a pointer, you can store an 8/16/32 bit number, depending on how many items you need. If the stars align, it can also help with keeping more items in the CPU cache lines.
I'm often curious how well the arena approach works for highly dynamic graphs. You end up having to reimplement the ref counting or a custom GC, which carries more risk.
Even memory locality of a arena doesn't provide benefit, because memory order does not mirror execution order.