Re: The Case for Rust (in the base system)

From: Robert R. Russell <robert_at_rrbrussell.com>
Date: Mon, 22 Jan 2024 22:54:52 UTC
On Mon, 22 Jan 2024 10:13:30 +0000
David Chisnall <theraven@FreeBSD.org> wrote:

> On 21 Jan 2024, at 16:04, Alan Somers <asomers@FreeBSD.org> wrote:
> > 
> > Perhaps it will.  But Like David Chisnall, I'm afraid that if
> > FreeBSD never modernizes, then it itself will go out of fashion by
> > the 2040s.  
> 
> Apparently I’m participating in this thread already.  I’m getting
> over a nasty cold and my head is full of cotton wool, so apologies in
> advance if this is more rambling than normal:
> 
> I hope it’s no surprise to anyone that I am in favour of languages
> that give stronger guarantees to programmers and let you think more
> abut the problems.  I can’t imagine going back to writing anything
> non-trivial in a language without RAII or a rich set of generic
> collections.
> 
> To give a bit of personal background: In my previous role, I was one
> of the coauthors of the internal strategy document that argued for
> safe languages at Microsoft.  Our rough recommendation was:
> 
>  - No new C code.  There are *always* better options.
>  - C++ code should follow the Core Guidelines and use static
> analysis.  New C++ code is acceptable in projects that are already
> C/C++ and need to incrementally improve.
>  - Rust in new projects that need a systems programming language.
>  - Managed languages anywhere where a systems language is not needed
> (i.e. most places).
> 
> Between modern C++ with static analysers and Rust, there was a small
> safety delta.  The recommendation was primarily based on a
> human-factors decision: it’s far easier to prevent people from
> committing code that doesn’t compile than it is to prevent them from
> committing code that raises static analysis warnings.  If a project
> isn’t doing pre-merge static analysis, it’s basically impossible.
> Between using modern C++ (even just smart pointers and ranges) and C,
> there is an enormous safety delta.  
> 
> The unstable Rust ecosystem was less of an issue for Microsoft
> because they had a large compiler team and were happy to maintain
> security back-ports of any critical crates.  The same software supply
> chain things applied for Rust as everything else: no random pulling
> from Cargo, dependencies need to be cloned internally and run through
> a load of compliance things.  That’s probably the only sensible way
> of interacting with the Rust ecosystem.
> 
> For userspace, I’d love to see FreeBSD more actively support the
> cap-std project in Rust, which makes it incredibly easy to write Rust
> programs that play nicely with Capsicum.
> 
> It’s unclear to me that now is the right time to support Rust in the
> base system, because there’s still a lot of churn.  Facebook has
> effectively forked Rust because their (huge) Rust codebase doesn’t
> build with newer compilers.  If you’re Microsoft or Facebook,
> maintaining an old Rust compiler for a few years and back-porting
> things to work with that language snapshot is a cost that may be
> worth paying.  I don’t think the FreeBSD project has the resources to
> do so.  A limited set of dependencies may work.
> 
> 
> There are a few caveats about Rust:
> 
> First, it’s quite hard to find competent Rust developers.  Here are
> the OpenHub stats on new F/OSS code being written in Rust, C, and C++:
> 
> https://openhub.net/languages/compare?language_name%5B%5D=c&language_name%5B%5D=cpp&language_name%5B%5D=rust&language_name%5B%5D=-1&language_name%5B%5D=-1&measure=loc_changed
> 
> C++ has been slowly trending up, and C down, for the last decade.
> Rust is trending up a lot, but it’s starting from zero and there’s
> still a lot more C or C++ code being written than Rust.  It’s now
> easier to hire systems programmers to write C++ than C, and easier to
> hire either than to hire good Rust programmers.  This tradeoff may be
> very different for an open source project because there are a lot of
> *very* enthusiastic Rust developers and attracting a dozen or two of
> them to contribute would be a huge win.  People tend to be less
> enthusiastic about C or C++.
> 
> Most of the new kernels written in the last 20 years have been C++,
> most of the new kernels written in the last four years have been
> Rust.  Make of that what you will.
> 
> Neither Rust nor C++ guarantee safety.  C++ can always escape to bare
> pointers (it’s code smell, but it’s sometimes unavoidable).  Rust has
> unsafe and requires it for any data structure that isn’t a tree
> (either directly or via some existing code such as the RC / ARC
> traits).  One of our concerns was the degree to which the different
> uses of unsafe in various Rust crates compose.  There was a paper a
> couple of years ago that found a lot of vulnerabilities from this
> composition.  I don’t personally have a great deal of faith that
> unique ownership at an object level with a load of heuristics about
> when it’s safe to alias is the right long-term model.  Verona went a
> very different way and I hope Rust may be able to retrofit our ideas
> at some point.  
> 
> One project that I worked with, for example, was bitten by the fact
> that unsafe in Rust means ‘I promise to follow all of the Rust rules,
> you just can’t mechanically check them’.  It read a value from an
> MMIO register into a variable typed as an enumeration.  Outside of
> the unsafe block, it then checked that the value was in range.  Rust
> enumerations are type safe and so the compiler helpfully elided this
> check.  Moving the check into the unsafe block fixed it, but ran
> counter to the generic ‘put as little in unsafe blocks as humanly
> possible’ advice that is given for Rust programmers.
> 
> When I looked at a couple of hobbyist kernels written in Rust, they
> had trivial security vulnerabilities due to not sanitising system
> call arguments.  This was depressing because both Rust and C++ make
> it trivial to wrap userspace pointers in a smart pointer type that
> does the checks automatically.  
> 
> In snmalloc, for example, we use C++ templates to express the
> lifecycle of memory throughout its allocation flow.  This would also
> be possible in Rust, but isn’t free in either language: you have to
> use the tools provided, but the outcome is that we can statically
> check a lot of properties at compile time.
> 
> With one of my other hats, I am the maintainer of an RTOS that is
> written in C++ and runs on a platform where the hardware enforces
> spatial and temporal memory safety.  To date, I don’t believe we’ve
> had any bugs that would have been prevented by Rust.  All of the
> memory-safety bugs (we have had some, and we catch them fairly easily
> because they lead to traps and so are easy to add tests for) have
> been in code that’s doing intrinsically unsafe things (memory
> allocators, for example).  We use C++20, with moderately heavy use of
> concepts.  We have a ring buffer implementation that uses a mixture
> of static_asserts and templates to verify the wrapping behaviour at
> compile time and that’s just one example of a place where we do a lot
> of compile-time checks that are impossible in C.
> 
> I’d also like to clear up a few misunderstandings about C++:
> 
>  - The Itanium C++ ABI has been stable for 20+ years.  C++ shared
> libraries compiled with clang and linked against those compiled with
> GCC (or vice versa), or different versions of the same compiler has
> been standard practice for a long time.  Both libstdc++ and libc++
> use inner namespaces for the standard-library types and so allow
> something like symbol versioning but exposed at the language level.
> You can see ABI breaks if one library uses a newer version of a type
> and the other an older one, but that’s why we only bump those forward
> on major releases: C++ DSOs compiled for FreeBSD 13 may not link with
> binaries compiled for FreeBSD 14.
> 
>  - Command-line argument parsing and JSON are not part of the C++
> standard library, but there are de-facto standards.  Nlohmann JSON[1]
> and CLI11[2] are widely used (it’s been a long time since I’ve seen a
> project that used anything else) and have very easy-to-use
> interfaces.  I believe (I am a member of the C++ standards committee,
> but I only recently joined and have not participated in discussions
> around this) that a big part of the reason it isn’t in the core
> specification is that there is a de-facto standard and there’s little
> urgency in adding it to the core.
> 
> 
> 
> 
> Finally, one of the key things that we found was that a lot of
> projects used C/C++ out of inertia.  They don’t have peak memory or
> sub-millisecond-latency constraints and could easily be written in a
> managed language, often even in an interpreted one.  We have Lua in
> the base system.  I’d love to see a richer set of things exposed to
> Lua.  I played a bit with a kqueue wrapper using Sol2[3] that lets
> you write Lua coroutines and have them implicitly yield on blocking
> operations.  
> 
> I’d love to see a generic process manager in the base system that
> subsumes devd and inetd written in Lua, with C++ wrappers around
> pdfork (ideally pdvfork, but it doesn’t exist yet) and friends,
> exposed via sol2.  The code in C++ is dealing directly with low-level
> system interfaces and would not be safer in Rust, but all of the
> parsing and control-plane logic can live in a safe GC’d language.
> You can run a lot of Lua code in the time it takes one fork call to
> execute.
> 
> If we exposed type info from dynamic sysctls generically (I think
> there’s a project working on this?) then things like sysstat could be
> written in Lua.  I was experimenting with Dear ImGui for this, since
> it had back ends that rendered in X11, Wayland, in a terminal, or
> remotely over a websocket.  Unfortunately, the latter two were never
> merged and are probably unmaintained (the author is also the person
> behind llama.cpp and so probably isn’t going to work on it for a
> while).  Being able to run management tools in a terminal and click
> on a URL to open them in the web browser would be amazing, but
> doesn’t require a new systems programming language.
> 
> I’d love to see a default that anything intended to run with elevated
> privilege is written in Lua.
> 
> David
> 
> [1] https://github.com/nlohmann/json
> [2] https://github.com/CLIUtils/CLI11
> [3] https://sol2.readthedocs.io/

If you had to estimate what is the cost of enforcing better C++ code?

I am not familiar with Lua and most of my experience with Lua like
languages have included dynamic code injection as an attack vector. Is
it feasible to protect Lua from that problem in the use case you
propose?