Re: The Case for Rust (in the base system)

From: David Chisnall <theraven_at_FreeBSD.org>
Date: Mon, 22 Jan 2024 10:13:30 UTC
On 21 Jan 2024, at 16:04, Alan Somers <asomers@FreeBSD.org> wrote:
> 
> Perhaps it will.  But Like David Chisnall, I'm afraid that if FreeBSD never
> modernizes, then it itself will go out of fashion by the 2040s.

Apparently I’m participating in this thread already.  I’m getting over a nasty cold and my head is full of cotton wool, so apologies in advance if this is more rambling than normal:

I hope it’s no surprise to anyone that I am in favour of languages that give stronger guarantees to programmers and let you think more abut the problems.  I can’t imagine going back to writing anything non-trivial in a language without RAII or a rich set of generic collections.

To give a bit of personal background: In my previous role, I was one of the coauthors of the internal strategy document that argued for safe languages at Microsoft.  Our rough recommendation was:

 - No new C code.  There are *always* better options.
 - C++ code should follow the Core Guidelines and use static analysis.  New C++ code is acceptable in projects that are already C/C++ and need to incrementally improve.
 - Rust in new projects that need a systems programming language.
 - Managed languages anywhere where a systems language is not needed (i.e. most places).

Between modern C++ with static analysers and Rust, there was a small safety delta.  The recommendation was primarily based on a human-factors decision: it’s far easier to prevent people from committing code that doesn’t compile than it is to prevent them from committing code that raises static analysis warnings.  If a project isn’t doing pre-merge static analysis, it’s basically impossible.  Between using modern C++ (even just smart pointers and ranges) and C, there is an enormous safety delta.  

The unstable Rust ecosystem was less of an issue for Microsoft because they had a large compiler team and were happy to maintain security back-ports of any critical crates.  The same software supply chain things applied for Rust as everything else: no random pulling from Cargo, dependencies need to be cloned internally and run through a load of compliance things.  That’s probably the only sensible way of interacting with the Rust ecosystem.

For userspace, I’d love to see FreeBSD more actively support the cap-std project in Rust, which makes it incredibly easy to write Rust programs that play nicely with Capsicum.

It’s unclear to me that now is the right time to support Rust in the base system, because there’s still a lot of churn.  Facebook has effectively forked Rust because their (huge) Rust codebase doesn’t build with newer compilers.  If you’re Microsoft or Facebook, maintaining an old Rust compiler for a few years and back-porting things to work with that language snapshot is a cost that may be worth paying.  I don’t think the FreeBSD project has the resources to do so.  A limited set of dependencies may work.


There are a few caveats about Rust:

First, it’s quite hard to find competent Rust developers.  Here are the OpenHub stats on new F/OSS code being written in Rust, C, and C++:

https://openhub.net/languages/compare?language_name%5B%5D=c&language_name%5B%5D=cpp&language_name%5B%5D=rust&language_name%5B%5D=-1&language_name%5B%5D=-1&measure=loc_changed

C++ has been slowly trending up, and C down, for the last decade.  Rust is trending up a lot, but it’s starting from zero and there’s still a lot more C or C++ code being written than Rust.  It’s now easier to hire systems programmers to write C++ than C, and easier to hire either than to hire good Rust programmers.  This tradeoff may be very different for an open source project because there are a lot of *very* enthusiastic Rust developers and attracting a dozen or two of them to contribute would be a huge win.  People tend to be less enthusiastic about C or C++.

Most of the new kernels written in the last 20 years have been C++, most of the new kernels written in the last four years have been Rust.  Make of that what you will.

Neither Rust nor C++ guarantee safety.  C++ can always escape to bare pointers (it’s code smell, but it’s sometimes unavoidable).  Rust has unsafe and requires it for any data structure that isn’t a tree (either directly or via some existing code such as the RC / ARC traits).  One of our concerns was the degree to which the different uses of unsafe in various Rust crates compose.  There was a paper a couple of years ago that found a lot of vulnerabilities from this composition.  I don’t personally have a great deal of faith that unique ownership at an object level with a load of heuristics about when it’s safe to alias is the right long-term model.  Verona went a very different way and I hope Rust may be able to retrofit our ideas at some point.  

One project that I worked with, for example, was bitten by the fact that unsafe in Rust means ‘I promise to follow all of the Rust rules, you just can’t mechanically check them’.  It read a value from an MMIO register into a variable typed as an enumeration.  Outside of the unsafe block, it then checked that the value was in range.  Rust enumerations are type safe and so the compiler helpfully elided this check.  Moving the check into the unsafe block fixed it, but ran counter to the generic ‘put as little in unsafe blocks as humanly possible’ advice that is given for Rust programmers.

When I looked at a couple of hobbyist kernels written in Rust, they had trivial security vulnerabilities due to not sanitising system call arguments.  This was depressing because both Rust and C++ make it trivial to wrap userspace pointers in a smart pointer type that does the checks automatically.  

In snmalloc, for example, we use C++ templates to express the lifecycle of memory throughout its allocation flow.  This would also be possible in Rust, but isn’t free in either language: you have to use the tools provided, but the outcome is that we can statically check a lot of properties at compile time.

With one of my other hats, I am the maintainer of an RTOS that is written in C++ and runs on a platform where the hardware enforces spatial and temporal memory safety.  To date, I don’t believe we’ve had any bugs that would have been prevented by Rust.  All of the memory-safety bugs (we have had some, and we catch them fairly easily because they lead to traps and so are easy to add tests for) have been in code that’s doing intrinsically unsafe things (memory allocators, for example).  We use C++20, with moderately heavy use of concepts.  We have a ring buffer implementation that uses a mixture of static_asserts and templates to verify the wrapping behaviour at compile time and that’s just one example of a place where we do a lot of compile-time checks that are impossible in C.

I’d also like to clear up a few misunderstandings about C++:

 - The Itanium C++ ABI has been stable for 20+ years.  C++ shared libraries compiled with clang and linked against those compiled with GCC (or vice versa), or different versions of the same compiler has been standard practice for a long time.  Both libstdc++ and libc++ use inner namespaces for the standard-library types and so allow something like symbol versioning but exposed at the language level.  You can see ABI breaks if one library uses a newer version of a type and the other an older one, but that’s why we only bump those forward on major releases: C++ DSOs compiled for FreeBSD 13 may not link with binaries compiled for FreeBSD 14.

 - Command-line argument parsing and JSON are not part of the C++ standard library, but there are de-facto standards.  Nlohmann JSON[1] and CLI11[2] are widely used (it’s been a long time since I’ve seen a project that used anything else) and have very easy-to-use interfaces.  I believe (I am a member of the C++ standards committee, but I only recently joined and have not participated in discussions around this) that a big part of the reason it isn’t in the core specification is that there is a de-facto standard and there’s little urgency in adding it to the core.




Finally, one of the key things that we found was that a lot of projects used C/C++ out of inertia.  They don’t have peak memory or sub-millisecond-latency constraints and could easily be written in a managed language, often even in an interpreted one.  We have Lua in the base system.  I’d love to see a richer set of things exposed to Lua.  I played a bit with a kqueue wrapper using Sol2[3] that lets you write Lua coroutines and have them implicitly yield on blocking operations.  

I’d love to see a generic process manager in the base system that subsumes devd and inetd written in Lua, with C++ wrappers around pdfork (ideally pdvfork, but it doesn’t exist yet) and friends, exposed via sol2.  The code in C++ is dealing directly with low-level system interfaces and would not be safer in Rust, but all of the parsing and control-plane logic can live in a safe GC’d language.  You can run a lot of Lua code in the time it takes one fork call to execute.

If we exposed type info from dynamic sysctls generically (I think there’s a project working on this?) then things like sysstat could be written in Lua.  I was experimenting with Dear ImGui for this, since it had back ends that rendered in X11, Wayland, in a terminal, or remotely over a websocket.  Unfortunately, the latter two were never merged and are probably unmaintained (the author is also the person behind llama.cpp and so probably isn’t going to work on it for a while).  Being able to run management tools in a terminal and click on a URL to open them in the web browser would be amazing, but doesn’t require a new systems programming language.

I’d love to see a default that anything intended to run with elevated privilege is written in Lua.

David

[1] https://github.com/nlohmann/json
[2] https://github.com/CLIUtils/CLI11
[3] https://sol2.readthedocs.io/