Performance of Rust Language [pdf]

19 days ago by jandrewrogers

I would summarize it thusly: Rust is roughly as performant as C. This matches my experience and Rust is more ergonomic than C in many regards. The caveat is that modern C++ is notably more performant than C and by implication Rust. This also matches my experience for both C and Rust.

I think most of this is attributable to the ergonomics of compile-time expressiveness. C++ can effortlessly do things that require mountains of ugly boilerplate and macros in C or Rust. In principle they can express the same things but no one wants to write or deal with that ugly boilerplate so the equivalency is never realized in real code bases.

Zig is interesting because it slots in as a C-like language with a competent and expressive compile-time story. I don’t use Zig but I recognize its game.

19 days ago by afdbcreid

Is C++ more performant than C? I find this hard to believe. C++ does not have any construct that cannot be replicated, or is not common, in C. The only candidate is using virtualization and void* pointers instead of monomorphized generics which some C code does for the lack of better options, but that's not a problem in Rust as well.

If anything, Rust has the potential to be more performant than C due to its aliasing rules (C has `restrict` but it's rarely used, standard C++ does not have even that). The current perf stats show it does make Rust code faster but just a little bit, although we don't utilize the full optimization potential currently (LLVM does not do many possible optimizations here, and `noalias` is weaker than Rust's aliasing rules). It can also affect autovectorization, and if it does the effect could be dramatic.

19 days ago by jandrewrogers

Modern C++ metaprogramming materially impacts performance in practice. I’ve done performance engineering for decades in both C and modern C++ and I would assert that the difference isn’t arguable.

The poor applicability of auto-vectorization is another area where C++ is strong. You can transparently codegen e.g. AVX512 from intrinsics directly in C++ in contexts that would be opaque to auto-vectorization and difficult to generalize in C. This allows you to get some degree of “auto-vectorization” where the compiler can’t see it because it works at the wrong level of abstraction.

With sufficiently heroic efforts you can write C that matches the performance of C++. I’m not arguing that. Virtually no one writes C to that standard, including myself when I was writing high-performance C because the effort was too high, so it is a bit of a strawman.

It is the difference between theory and practice. All code bases have a finite budget. C++ can do a lot more optimization in the same budget as C.

19 days ago by globalnode

So youre saying the metaprogramming facilities of C++ allow the compiler to better optimise high level human readable code more effectively than C. Thats a fair point and one I'd never even thought of before, I always thought C was faster because of things like v-tables and all that stuff.

18 days ago by greenavocado

[flagged]

19 days ago by amelius

> I find this hard to believe. C++ does not have any construct that cannot be replicated, or is not common, in C.

But this is not a valid argument, as all languages are Turing complete, and most modern languages can do low level stuff at optimum speeds. As an extreme example, in Java, you could just allocate a large chunk of memory and run an allocator inside of it and sidestep the GC entirely.

With a programming language the question is thus not what can you do with it and how fast can it run with infinite effort, but what are the ergonomics, and what performance will you get in practice.

19 days ago by loeg

C++ you get templated generic algorithms that in practice no one really does with C because macros suck too much. So in C typically you'd have a runtime generic routine that doesn't inline. A classic example here is qsort() vs std::sort().

19 days ago by flohofwoe

> So in C typically you'd have a runtime generic routine that doesn't inline.

With LTO you get many of the same advantages as C++ template code, there's nothing magic about C++ template optimizations, it's all about whether the compiler can see all function bodies in a call hierarchy.

19 days ago by afdbcreid

I explicitly acknowledged that:

> The only candidate is using virtualization and void* pointers instead of monomorphized generics which some C code does for the lack of better options, but that's not a problem in Rust as well.

But in fact, if speed is a concern to you, even in C you will use "templated" sorting (via macros or code generation).

19 days ago by nicoburns

Rust also has these advantages of course

18 days ago by bluGill

It is not uncommon to realize your C program is valid C++ and get a performance improvement just by building with a C++ compiler no other change. The difference is small but C++ has a stronger type system which allows the compiler a few more optimizations. Of course it is possible that resulting program no longer does what you want but actually needing weaker types is rare.

Restrict could make things go different but I've never heard someone say otherwise.

Note that we are talking about differences that are tiny here. They can be measured if you are careful but they are almost guaranteed to not be something anyone would notice if they were not measuring

18 days ago by root-parent

>> The caveat is that modern C++ is notably more performant than C and by implication Rust.

Please provide proof for this outrageous statement.

18 days ago by mynegation

I was also dumbfounded by this claim. The only thing I could think of were C++ monomorphic templates that will avoid the penalty of some indirection and DIY dynamic typing.

18 days ago by pjmlp

And compile time programming, meaning that you can prepare some algorithms and data structures at compile time, at the expense of executable size.

Compile time reflection will make this even easier.

18 days ago by musicale

Is it outrageous because "performant" is kind of a vague term. Does it mean... Fast? GPU-friendly? Scalable? Energy-efficient? Reliable? User-friendly? Maintainable? For what kind of applications?

Modern Fortran has a lot to offer for scientific and numeric computation - easier to learn than C++, and easier to optimize in many cases. Scales from small systems to supercomputers, and there is even CUDA Fortran.

18 days ago by Asraelite

> GPU-friendly? Scalable? Energy-efficient? Reliable? User-friendly? Maintainable?

Nobody uses "performant" to refer to any of those. It usually means either high throughput, or some aggregate of high throughput + low latency + low memory usage.

18 days ago by api

I think they may be talking about math algorithm heavy code, where C++'s looser almost-just-a-substitution generics system (really "templates" not even quite generics) can be used to create abstractions that compile everything down to inlined maximally performant code.

This type of code tends to be hard to maintain though.

AFAIK you can get there in Rust but it's a little more cumbersome. You have to implement a lot of operators, and for that type of code you might actually benefit from #[inline(always)] which is discouraged in normal Rust.

18 days ago by pjmlp

> This type of code tends to be hard to maintain though

Depends on which C++ version one needs to support, in C++20 and later, it is relatively maintainable with concepts, constexpre, and reflection.

18 days ago by cogman10

> modern C++ is notably more performant than C and by implication Rust

I don't think this holds. Rust has the same facilities which C++ has. Rust's metaprogramming capabilities are now on par with C++ (they weren't always). Rust has a similar generics implementation which allows it to do what C++ does in terms of method dispatch and generation. And now Rust has pretty much the same compile time constant generation capabilities that C++ has.

I don't think there's a part of C++ which isn't in Rust at this point. The only thing potentially missing is the experience and investment using those features.

18 days ago by spacechild1

> Rust's metaprogramming capabilities are now on par with C++ (they weren't always).

Is that really true, though? I haven't really written any Rust code, so I have no idea, but I don't think Rust has static reflection. Also, aren't const generics much more limited? I've also heard there is no template specialization and no "if constexpr". Or what about dynamic allocations in constexpr functions?

18 days ago by cogman10

> I don't think Rust has static reflection.

Before C++ in fact through procedural macros. You can do everything you can do with C++ static reflection.

Now, it could be better. Proc macros require you to pull in secondary packages for parsing the token stream. But all the sorts of operations you can do via static reflection you can do via proc macros. That's how the most popular rust serialization package like serde works. It's also how some more popular database libs work like sqlx.

> Also, aren't const generics much more limited? I've also heard there is no template specialization and no "if constexpr".

Both have been added and expanded. AFAIK they are now roughly on par with what C++ const expressions can do. What they can't do, proc macros can do.

> Or what about dynamic allocations in constexpr functions?

IDK if that's possible in rust. Const expr capabilities of rust have been rapidly expanding though in the last year.

18 days ago by undefined

[deleted]

18 days ago by zamalek

Last I checked (which was a while ago to be fair), LLVM machine code quality still lagged behind GCC - so things should be slightly more interesting with the GCC back end.

There were also some bugs (hence disabled optimization passes) and missed opportunities from the lack of aliasing Rust precipitates - again, not sure where those sit - and GCC will have to play catch up here (unless there are other languages that exercise this part of the backend).

18 days ago by pjmlp

Reflection, Rust community did a very good job driving away the person that cared to do that work, to the point he went back to C and C++ ISO comittees.

Several features on C23 were done thanks to his work.

Also compile time execution is much more rich in C++ than Rust, regardling language features and standard library that can be used at compile time.

Naturally none of the languages is standing still, and they will both improve on that regard.

18 days ago by cogman10

I agree. Rust could definitely be more ergonomic and IIRC the main reason it wasn't made that way years ago was because the users of proc macros vetoed the new 2.0 API. IIRC over stilly things like it'd make some of their other crates pointless.

18 days ago by stephc_int13

C++ being more performant than C is not something that I've seen in any benchmarks or personal experience.

In practice, some of the cases about specialization that was made possible with C++ constructs is also achieved by modern C compilers.

19 days ago by gobdovan

Rust is in an awkward position of being already complicated enough that adding proofs for skipping bounds checks probably will not happen for a long time, even though this kind of low-level operation is where a lot of optimisation is lost.

Compounding on this, Rust is also unstable underneath, since there is no public, stable contract for carrying high-level semantics from HIR into MIR. Because these high-level invariants are lost during compilation, the compiler cannot easily use them to prove and eliminate low-level safety checks. But even if the frontend was perfect, Rust relies on LLVM's language-neutral SCEV, which operates purely on low-level math and cannot reason about high-level language semantics.

Ultimately, a lot of things would need to change for Rust to pay no performance for safety features.

19 days ago by aw1621107

> Compounding on this, Rust is also unstable underneath, since there is no public, stable contract for carrying high-level semantics from HIR into MIR. Because these high-level invariants are lost during compilation

Not sure if I'm just out of the loop, but I'm having a hard time following this line of reasoning. Why is a public and/or stable contract needed to carry high-level semantics from HIR to MIR? Neither seems necessary to me; from what I understand HIR and MIR are rustc-internal so public contracts shouldn't matter, and the lack of stability means the Rust devs aren't precluded by backwards compatibility from modifying the IRs to add the ability to carry such invariants.

19 days ago by gobdovan

Whoops! Although there is no public contract between HIR and MIR, the public part was not relevant here. What I wanted to highlight is that if they'd want to add proper proof machinery to eliminate low-level safety checks, they'd have to do it at: surface language, which is already complex enough; then HIR->MIR boundary with clean provenance (which I think current MIR would collapse too aggressively) and which may require a much clearer contract; then, even if they do the full front- and mid- ends properly, if you leave it up to LLVM, it ends up in SCEV, which is language neutral and would not be a good fit to support the proof obligations that would be specific to Rust.

I dug up a proposal from 2021 around bounds check hoisting in MIR, and from the discussion, details are pretty thorny [0]. It's narrower than general proofs but the frictions are very similar. The easiest example that shows HIR -> MIR difficulties is the part around `for i in 0..32 { a[i] = 1; }`. At the source level the range fact is super obvious, but after the for-loop/iterator lowering the MIR optimiser has to recover that `i` comes from exactly that range before it can turn 32 checks into the one hot-path check. Then it also would have to check for panic strategy to maintain the correct behaviour after optimisation.

[0] https://github.com/rust-lang/rust/issues/92327

19 days ago by nicoburns

Of course you can write the above as:

a[0..32].iter_mut().for_each(|el| *el = 1)

and have per-iteration bounds checks elided in Rust today.

19 days ago by aw1621107

OK, I think that makes more sense. Thanks for taking the time to explain!

19 days ago by afdbcreid

The overhead of bounds checking varies a lot. In the common case it's negligible (few percents), but in some cases, depending on what you build, it can go up to even 20%. And if it prevents autovectorization it can cost even more.

There are techniques to minimize the perf loss, though (safely), and of course you can use unsafe code. If you do it smartly, in the vast majority of cases bound checks do not matter (in fact, even in C++ there is a push for a hardened standard library that does bound checks, and e.g. Google uses that).

Rust will never include full proofs, but it might include ranged integers which can minimize bound checks even more.

19 days ago by CrazyboyQCD

[dead]

18 days ago by IshKebab

The benchmarks in this talk show that the bounds checks are mostly insignificant, and actually it's the integer overflow checks that are far more costly.

Actually nm, I forgot those are disabled in release mode. Good decision I guess.

18 days ago by IcyWindows

Do they even count towards safety if they aren't in release mode?

18 days ago by IshKebab

You can enable them in release mode optionally. But I would say not. Really we need ISAs to provide a no-overhead way to check integer overflow.

18 days ago by peterfirefly

You can sometimes just add asserts for the index variable(s) and have the LLVM optimizer go "hmm, I should try to prove that those are true" and then get the range checks optimized away.

19 days ago by encodedrose

If I followed, Rust's memory safety guarantee means sacrificing roughly ~3% performance with some worst case paths being ~15% (compared to C++ performance)?

19 days ago by marcosdumay

That's on the typical performance for bounds checking in C too.

But no, "memory safety" includes most of the things discussed on the slides, and those number are for bounds checking only.

19 days ago by encodedrose

Ah, I was using GH's webui instead of downloading to view the PDF and it stopped loading at slide#47...rereading it now paints a much better picture. Thanks!

19 days ago by Animats

There's a discussion of "delayed bounds checking", but not "hoisted bounds checking", where bounds checking is done early. Consider

    let mut tab: [usize;100] = [0;100];
    ...
    for i in 0..101 {
        tab[i] = i;
    }

This must panic at i=100. Panic becomes inevitable at entry to the loop. Is the compiler entitled to generate a check that will panic at loop entry? The slides suggest that Rust does not hoist such checks, and, so, with nested loops, it has trouble getting checks out of the loop, which prevents vectorization.

19 days ago by afdbcreid

Currently LLVM cannot do that because the panic message includes the erroneous index. You can do it manually though if you add `_ = tab[100]`.

Even if the panic message would not include the index, LLVM was unable to do that if the previous iterations had side effects (for example if `tab` is not a local variable).

18 days ago by guerby

On https://godbolt.org/ select Ada and compiler option "-O2"

    function Square(num : Integer) return Integer is
        tab : array (0..100) of integer;
    begin
        for i in 0..101 loop 
            tab(i):=i; 
        end loop;
        return tab(100);
    end Square;

The assembly code generated is :

    sub     rsp, 8    #,
    mov     esi, 11   #,
    mov     edi, OFFSET FLAT:.LC0     #,
    call    "__gnat_rcheck_CE_Index_Check"  #

Loop is not run and exeption handler is called directly.

Link : https://godbolt.org/z/qT4TsKPxz

18 days ago by Animats

Right, that's the extreme case, where the problem is detected at compile time. Unfortunately, it's not a user-visible error message at compile time.

Need to try an example where the size isn't known until run time.

19 days ago by simonask

Panics in Rust do not currently time-travel like that (including panics from failed bounds checks), and that's a good thing. The reason is that panicking does not imply terminating the process - they can be caught and handled, just like exceptions in C++. In fact, they use the same stack unwinding mechanism by default.

What the compiler is allowed to do is to shorten the loop by one and unconditionally panic after the loop, but this falls under the purview of the LLVM optimizer.

19 days ago by afdbcreid

It's true that panics (unlike UB) cannot automatically time-travel, but your justification is weak. Recovering from panics can only prevent this optimization if the loop have side effects, and LLVM knows when panic=abort is set.

18 days ago by Animats

The post-panic situation is a problem in Rust. After a panic, you're in a somewhat abnormal state. Rust panics are not supposed to be a catchable exception system. If something other than program termination is in the near future, that's a problem.

That does create a problem for early panics, panicking when panic becomes inevitable but has not happened yet. This deserves more thought.

18 days ago by simonask

I mean, sure, dead code elimination applies to all optimized code. The important thing to understand is that panicking in Rust does not get magic treatment by the Rust compiler. It’s just a function that is declared in the type system to never return.

19 days ago by jojomodding

Once it shortens the loop, the compiler can also observe that `tab` is a local variable and therefore move the writes up "to the initializer." It can then see that the variable is unused and delete it, and also delete the loop.

19 days ago by edevrk

[flagged]

19 days ago by nicoburns

A little slower but safe is a pretty good default I think. Most of the time you're not in a hot loop and even a 5% slowdown would be negligable.

And in the cases where you are in a hot loop you just have to put in a little extra effort to optimise it and gain the performance back, either by writing the code in a way that allows the compiler to prove correctness (e.g. using an iterator or assert), or by using the unsage keyword to "pinky-promise" to the compiler that your usage is correct.

IME that extra effort in performance-critical places almost always ends up being a lot less than the effort needed to avoid correctness/safety issues in mundane boilerplate/glue/plumbing code in C++.

Especially as Rust's package management system means that often you don't even have to do that optimisation work yourself: you can just pull in a crate that's done it for you (and Rust's safety guarantees make that a much less scary thing to do than it is in C++)

19 days ago by kibwen

C++'s experience has caused Rust to rightfully learn the lesson that you don't allow optimizations to change the semantics of the program like that. Rust's goal is to be fast enough that any performance difference between C or C++ is too negligible to bother considering, and it's achieved that. It's not going to sacrifice reliability on the altar just to make up a measly 3% gap. There are plenty of ways that Rust's stricter semantics allow it to produce faster code than C++ (no move constructors or implicit copy constructors, thorough reference aliasing information, automatic generic struct layout optimizations, safe non-atomic refcounting, safe concurrent stack references, less defensive copying, etc.), it does not need to "convince C++ language people" of anything.

19 days ago by zozbot234

If you can detect a case where a panic "time traveling" would meaningfully improve performance, you could simply let the compiler issue an optional warning that would allow for auto-fixing the code to have those different semantics.

18 days ago by santiagolertora

[dead]

19 days ago by suis_siva

I worked professionally with C, C++, Zig and Rust (in that order). My experience is that writing performant code is by far the easiest in C++, and by far the most difficult in C. Most of this, in practical experience, is due to ergonomics, in my opinion.

Templates in C++ benefit from being part of the core language, -- stick a `template` above your `class`, and you're in metaprogramming land. Stick a template specialization, and you've done a niche optimization. You didn't need a separate crate or a whole macro DSL. Variadic templates are also really really nice for monomorphizing N-ary generic functions. The duck typing of templates makes

This is precisely where I struggle with Rust the most -- monomorphization is limited within generics, so you end up going to the `proc_macro` hell, which involves a separate crate, a separate Cargo.toml, etc.

Zig seems like it would fit the bill -- and doing micro-optimizations within zig is surprisingly easy. The language's comptime facilities allow for really good niche optimizations -- however, the language also has some strange decisions. The allocator interface is notoriously a vtable, so a lot of the DOD optimizations that andrewrk has spoken numerously of (and to be clear -- I did learn a lot about DOD from his talks back when I was a wee engineer), raise one of my eyebrows.

C seems like it should be fast, but implementing any data structure, any generic algorithm in C is impossible. Either you're copy-pasting, or you're making macro DSLs. None of which is great.

---

To further talk about the C++ situation -- the monomorphic allocator interface was always awesome. Compared to Zig's vtables and Rust's nothing (up until a couple days ago), having a way to pass custom allocators with types was awesome. The new std::pmr::* interfaces and containers are also really exciting -- monomorphization, as beautiful as it is, does cost a lot -- refactoring it is not easy, compilation times are a mess. Sometimes the right tool is a vtable interface, and, C++ gives you those facilities.

And this is C++'s no1 problem when it comes to performance too -- it's a leviathan -- it'll give you the tools to write REALLY fast code, but it will also give you inheritance -- forget about your caches then.

When I was working at Tesla, there were some pretty gnarly vtable jumps in firmware (of all places), and I suspect part of that could've been alleviated if people knew more about CRTP.

So, here's where I land -- C++ really will give you the tools it can to let you write the fastest code possible. But it will also give you the tools to make your code really slow. Committee language means everyone in the committee needs to be happy.

Rust, on the other hand, is really designed to promote safe-but-very-fast practices -- had the firmware that I discussed used Rust, my guess is that we would've gravitated towards generics and monomorphization, rather than the heavy dynamic inheritance. C++, when it comes to performance, as it does to all other things, is a barreled shotgun. Rust's design almost always promotes the best available pattern and that's why I rarely reach out for C++.

17 days ago by jccx70

[dead]

19 days ago by jarym

I've been doing more and more Rust. Even with sscache the compile times are not great so for any moderately sized codebase that requires frequent rebuilds I don't know how everyone else is doing it

19 days ago by wongarsu

I'd assume mostly by avoiding the need for frequent rebuilds. Incremental builds are pretty fast (at least fast enough for my needs on a moderate codebase), full rebuilds can be brutal

There are also some optimization tricks related to how you split your code among crates, since a unit of compilation is mostly one crate. Putting your FFI code in a separate crate (-sys crates are the norm) and splitting some of your code in libraries that can be compiled in parallel are the common examples

19 days ago by unsolved73

the linking of the project can take more time than actual compilation.

Use the lld linker instead of the default one, see https://kerkour.com/rust-production-checklist#use-the-lld-li...

17 days ago by culebron21

I split into subcrates and also reduce the number of proc macros (e.g. got rid of Serde).

18 days ago by peterfirefly

On Windows? Use a dev drive.

19 days ago by Panzerschrek

For a couple of years I have written an advanced software rasterizer (like in old PC games) using Rust. With a little bit of unsafe code it was doable and result performance was great. I only used unsafe in places mentioned in the article above, like in tight loops where the compiler's optimizer struggles to remove bounds checks and in a couple of places where CPU intrinsics were used.

19 days ago by DeathArrow

I was looking at Zig. It's performant, it's easier to reason about Zig code than Rust code but its api is unstable, there are a lot of breaking changes. Coding agents have a difficult time write proper Zig because of the breaking changes and of the small amount of new Zig code in the wild.

Performance of Rust Language [pdf]

Daily Digest