18

I have been told that Rust is both safer and faster than C++. If that is true, how can that be even possible? I mean, a safer language means that more code is written inside the compiler, right? More code means more instructions for the processor to process.

In this case, I'm thinking about arrays. In C and C++, you can violate the range without any errors or warnings. You only get garbage values back. But in Rust, the compiler won't compile the program.

No, I'm not a Rust user. I'm a C user.

Sebastian Redl
  • 14,950
  • 7
  • 54
  • 51
euraad
  • 305
  • 2
  • 6
  • 39
    Your assumptions that more security means more code and that more code means slower execution are both wrong. And your assertion about boundary checks in Rust not compiling the program are also wrong. – Euphoric Aug 10 '23 at 13:11
  • 26
    "I mean, a safer language means that more code is written inside the compiler, right?" What do you mean by this? A safer language might need a more complex compiler to type-check it and enforce its safety, but that would make the compilation time slower, not the run time. And for what it's worth, C++ isn't exactly knowing for having dashing compile times, either. – Alexander Aug 10 '23 at 13:23
  • 4
    @Alexander: to be fair, safety is usually achieved by compile time *and* run time measures. – Doc Brown Aug 10 '23 at 13:44
  • I suggest that you focus the question about how Rust can implement bounds checks while having (reportedly) near-C performance on standard benchmarks. – JimmyJames Aug 10 '23 at 13:51
  • This may help: https://nnethercote.github.io/perf-book/bounds-checks.html – JimmyJames Aug 10 '23 at 14:35
  • 55
    Safety obtained at compile time is free at run time. – candied_orange Aug 10 '23 at 14:41
  • 1
    @DocBrown Agree, but his quote says "more code is written inside the compiler", which I understood to mean he's only thinking about static time checks, not code written _by_ the compiler, to perform runtime checks – Alexander Aug 10 '23 at 15:01
  • 19
    @candied_orange: A good example of this was Microsoft Research's Singularity OS where both the OS and the applications were written in Sing#, a type-safe, memory-safe, pointer-safe language. As a result, Singularity could remove a lot of the runtime checks typically used in other OSs: all code ran in Ring 0 of the CPU, all code ran in a single address space, etc. MS called these processes "SIPs" for "Software-Isolated Processes". SIPs could only communicate via message passing, but the protocol definition was provided in machine-readable format as part of the installation package, so the OS … – Jörg W Mittag Aug 10 '23 at 15:03
  • 8
    … could generate code which uses shared memory. Since this code was generated by the OS, the OS could guarantee that all memory accesses were safe, thus allowing for message-passing semantics and safety with shared-memory performance. – Jörg W Mittag Aug 10 '23 at 15:04
  • 4
    ‘You only get garbage values back’ if you're lucky! Your entire program becomes garbage. Such are the perils of undefined behaviour. – LeopardShark Aug 11 '23 at 19:00
  • 1
    @LeopardShark: Luck would have nothing to do with it if if one uses an implementation which, *as anticipated by the authors of the Standard*, extends the semantics of the language by specifying how it will process cases for which the Standard imposes no requirements. – supercat Aug 11 '23 at 22:28
  • @Alexander *something* has to check array accesses for boundary errors, and that something is extra code which gets executed every time you try to access an array element. That's just one example of run-time safety checks. – RonJohn Aug 12 '23 at 01:28
  • @RonJohn I know about those, but that isn't what OP was referring to (or he misspoke). I was responding to the quote: "I mean, a safer language means that more code is written ***inside*** the compiler, right?" He didn't say "more code is written ***by*** the compiler." I thought he was referring to the compile-time cost of more complex type systems, language rules and static analysis. He may have just misspoke, and indeed, some safety features require runtime checks to enforce, but not all (something something "zero cost abstraction", "Rust", etc. ;) ). – Alexander Aug 12 '23 at 05:28
  • @supercat This is absolutely true, but the list of such implementations [seems to exclude Clang](https://godbolt.org/z/Edzjvoz99) (although it does give a warning). – LeopardShark Aug 12 '23 at 10:49
  • 1
    @Alexander there are enough ESL people asking questions on SE that I usually overlook such slight inconsistencies. (Not in *answers*, though; that would get a comment asking for clarification.) – RonJohn Aug 12 '23 at 17:26
  • @LeopardShark: The Standard allows compilers which are intended for tasks involving exclusively trustworthy inputs to behave in completely arbitrary fashion if a two-dimensional array is accessed with an inner subscript and compilers like clang and gcc are designed to identify inputs which would cause such accesses, conditions that could only be false if such inputs were received, and bypass such conditional checks. When using compilers that perform such transforms, it's impossible to predict anything about program behavior, but people wanting to sell compilers design them to... – supercat Aug 12 '23 at 18:52
  • ...produce efficient code without violating the principle "if the environment would always process a read of a certain address without side effects, an action that performs a read from that address will never have side effects beyond yielding a possibly meaningless value". Any compiler that doesn't go out of its way to disregard that principle would naturally uphold it, and violations of that principle will seldom yield performance wins outside cases where nothing an implementation might do when given invalid data would be deemed unacceptable. In other cases, the effect of violating... – supercat Aug 12 '23 at 18:58
  • ...that principle is to make it imossible to generate the most efficient machine code *that can be guaranteed to satisfy application requirements*. – supercat Aug 12 '23 at 19:01

5 Answers5

57

You seem to have quite a bit of misconceptions, which I'll address alongside your question. If anything is still unclear, please feel free to comment on this answer so I can address it

Safer

Yes, Rust is safer.

Safety is one of Rust raison d'être, after all. Specifically, outside of unsafe blocks, Rust is memory safe and type safe, that is:

  • Memory safe: it is not possible to access memory that has not been allocated, or has already been deallocated.
  • Type safe: it is not possible to interpret a memory as if it held the value of a given type, while it actually holds the value of another.

Furthermore, a number of operations that would be Undefined Behavior in C or C++ have well defined semantics in Rust instead, such as signed integer overflows.

Memory Safety -- which underpins Type Safety -- is generally further split into two categories:

  • Spatial Safety: the inability to access out of bounds.
  • Temporal Safety: the inability to access before allocation/after deallocation.

Rust achieves Temporal Safety by compile-time checks. This does mean a bit more code in the compiler1, but there's no run-time footprint -- whether memory or instructions -- so it's "free" at run-time.

Rust achieves Spatial Safety with a combination of library-provided abstractions, and run-time checks. For example, iteration in Rust is achieved by for x in collection. For an array this boils down to pointer increment -- just like in C or C++, so no run-time overhead -- except that since the user doesn't manipulate the pointers directly, it's safe.

Apart from that, there are various run-time checks: bounds-checks for index access2, null-checks for Option, etc... those may or may not be optimized, and thus may lead to a slight run-time overhead. The "trick" of Rust, however, is that if performance really matters (as profiled) and the optimizer is not managing to optimize well-enough, it's always possible to try and massage the code (front-loading a bounds-check, for example) or in the worst case to delve down to unsafe Rust, and thus achieve the required performance target with only localized "Here Be Dragons" code.

1 Rust compile times are on-par with C++, hence quite slower than C, but the safety checks (borrow checks) are an insignificant part of that. Extensive use of generics, traits, and type-inference make the language much more difficult to compile, and the compilation model (full crate at a time) makes parallelization trickier. Still, work is in progress to parallelize the rustc front-end, which should bring compile times down substantially... in a few years.

2 Rust does NOT check indexes at compile-time in general but checks them at run-time instead. A branch itself is typically not a problem at CPU level, if well-predicted. The real impact of bounds-checks is that unless optimized out, they prevent auto-vectorization and a number of other optimizations. This is why front-loading a bounds-check (before the loop) can be so effective performance-wise: it may unlock all those optimizations.

Faster

Idiomatic Rust is likely faster.

First of all, in a "pedal to the metal" situation, all 3 languages can achieve the same performance. Heck, all 3 languages allow embedding assembly, if it comes to that.

So, what we are really asking, is whether idiomatic (not pessimized, not maximally optimized) code may be faster in one language or another, and in those conditions Rust has a certain number of advantages.

The first and foremost advantage of Rust is one of culture. The Rust community is very concerned about performance. This may seem trivial, but it has deep implications. Most notably, Rust code is always compiled from source, and there is no stable ABI (yet). This means that even the standard library provided data-structures can be remodeled extensively as long as their API is left untouched, and leads to Rust having the best-performing HashMap3 of all languages in its standard library:

  1. It started with Robin-Hood Hashing with Backward Shifting Deletion, which was already faster than std::unordered_map.
  2. Then moved to a completely different hash-map, based on Swiss Table once Abseil was released.

By comparison, the C++ standard library implementations of std::unordered_map cannot be changed4, nor can their std::regex implementations...

The second advantage of Rust is trust5: the Rust developer can put their trust in the compiler, and count on it to prevent users from accidentally fiddling with private data or violating soundness:

  • In C, it's common to forward-declare structs in header, but never expose their definition so users can't (accidentally) fiddle with them. Unfortunately, the use of such "opaque pointers" then is generally followed by heap-allocating the struct... which is detrimental to performance. In C++ and Rust, that's never a problem, so C++ and Rust programs allocate less.
  • In C and C++, it's common to program defensively. Copies are made when the lifetime of the input is unclear, for example, to avoid use-after-free. In Rust, that's never a problem, so Rust developers are more likely to be brash and avoid copies... knowing the compiler has their back.

And finally, the Rust language has a few tricks up its sleeve that may lead to better code out of the box:

  • Rust has fine aliasing control from the get go:
    • This means no Strict Aliasing rule is necessary, greatly avoiding char* (and co) performance pitfalls.
    • This means noalias (the LLVM equivalent of __restrict) being used profusely and automatically, as &mut T is __restrict T*.
  • Rust has niche optimizations. That is, it knows some types do not use all their values, and will "pack" enum discriminants in the unused values whenever possible. As a result Option<bool> is 1 byte (like bool) and Option<NonNull<T>> (an option non-null pointer to T) is the same size as *mut T (a possibly null pointer to T).

(It's not all roses, though, the defined behavior of signed integer overflows prevent a number of integer-based loop optimizations... though the impact of that is likely low)

3 Bryan Cantrill, who used to be a C kernel hacker at Sun Microsystems, was surprised the first time he naively translated a little C program he had around to Rust to get a feel of the language. Being its first Rust foray, he expected his lack of proficiency in the language would result in slow code... but his Rust program ran faster than his C one, on top of being shorter! He double-checked the output, and after concluding both were correct, did the only thing that made sense: he profiled them. It turns out that in his C program he had hand-rolled a quick hash-map implementation, since dependencies are so annoying in C, while in Rust he had just used the standard one... and his quick C implementation was quite naive. Nowadays, Bryan Cantrill is a Rust kernel hacker ;)

4 And that's on top of inheriting a crippling "memory stability" guarantee from std::map, in order to be more of a drop-in replacement.

5 You can't spell Trust without Rust.

8bittree
  • 5,637
  • 3
  • 27
  • 37
Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
48

The assumptions are wrong twice:

  • safer code does not necessarily mean more instructions, when the safety is obtained with safer language design and rules;
  • one language is not faster than another: it's language implementations that are.

About the claim on faster and safer, there is an extensive study comparing performances of languages that concludes with a benchmark that Rust can be faster and more energy efficient than C++. At the same time, you can't have it all and it appears that for this specific benchmark, C++ is more memory efficient.

But general statements about language performance and safety are always misleading:

  • Benchmarks are only valid for one kind of problem and with a given algorithm. If you look at the details of the study, you'll find out that for 2 among the 10 benchmarks (e.g. binary trees), C++ is faster and more energy efficient.
  • Benchmarks measure a particular implementation and not a general language. For C++ there can be much slower code depending on compiler options. And you can have compilers with higher or lower performance for the same language.
  • Real life software also often require use of libraries or frameworks, and safety is then the one of the weakest link in the chain.

My advice: choose the language that is the most suitable for the kind of problems you are trying to solve.

Disclosure: I'm not a Rustler

Christophe
  • 74,672
  • 10
  • 115
  • 187
  • 15
    "Disclosure: I'm not a Rustler" – Proof: https://rustaceans.org/ :-D – Jörg W Mittag Aug 10 '23 at 14:57
  • 6
    Let me add, two benchmarks for two programs written in two different languages are obviously not comparing the same thing., even when translated by a human in a 1:1 manner. Of course. it depends a lot on how similar the two languages are. – Doc Brown Aug 10 '23 at 15:12
  • 3
    @DocBrown: That's especially true in many cases where e.g. language #2 would be slightly slower than #1 when performing a task in the same manner but language #2 also allow the task to be done twice as quickly using an approach which language #1 doesn't support. – supercat Aug 11 '23 at 22:32
  • @supercat: not really - my point is that oversimplifying comparisons (which your comment still does) don't make much sense. It means when you solve the same task with, lets say, three different implementations A1, A2, A3 in one language, and B1, B2, B3 in another one, it may be unclear which Ax to benchmark against which By for a fair comparison, especially when both languages and environments are sufficiently different. – Doc Brown Aug 12 '23 at 03:43
  • @Christophe "one language is not faster than another: it's language implementations that are." - this is true. It is also true that language features introduce constraints that impose an upper bound on speed. Python would be difficult to make fast, since it offers so much runtime dynamism, etc. – Bennet Aug 12 '23 at 16:18
  • @Bennet I still think it depends on the problem to solve. If your code spend most of the time searching and using maps, the python code will not be slower, at least not if you use cython and some of the newer JIT optimizer on the market. In te end, if the performance is that important, problem specific benchmarking may be your friend. – Christophe Aug 12 '23 at 17:39
  • @supercat conversely, one can make safe programs using modern C++ eg vectors, smart pointers and avoid most of the risks that you can have using low level allocation lecanism. Likewise, I'm pretty sure that you can write code doomed to deadlock in rust, whatever the safety of the language. As said, I'm not so in general statements and believe more in fact based decision making – Christophe Aug 12 '23 at 17:42
  • @Christophe: I was trying to agree with your statement that fair comparisons are hard. If language #1 only supports an inherently slow way of doing something, but at tries to support it as efficiently as possible, while language #2 supports better ways of doing things, and the maintainers of #2 expect that anyone who cares about performance would use faster ways, using the same approach in both would unfairly hobble language #2. – supercat Aug 12 '23 at 20:01
  • @Bennet is it true? Or is it only the case that two programs are (statistically or asymptotically) comparable? What does it mean to say that (for example) CPython is faster or slower than LuaJIT or rustc? I cant come up with a reasonable definition that doesn’t admit some absurdity or else is incredibly hard to measure . – D. Ben Knoble Aug 12 '23 at 23:31
23

To look at this a different way, Fortran is:

  • Safer than C, because it doesn't have C-style unrestricted pointers.
    • "Unrestricted" here meaning "you can do arbitrary operations on them" rather than anything directly related to C's restrict keyword, although the two concepts are related.
  • At least for some applications, faster than C - the lack of unrestricted pointers means its easier for the compiler to make optimisations; unrestricted pointers mean the compiler has to be conservative about the assumptions it can make as to when the values of things might change in a non-local manner.

Does this mean Fortran is "better" than C? No, there are things you can do in C you can't do in Fortran, and some things are faster in C than in Fortran. The two languages have made different trade-offs so end up with different characteristics; the same is true for C++ vs Rust, or any other pair of languages.

Philip Kendall
  • 22,899
  • 9
  • 58
  • 61
11

I have been told that Rust is both safer and faster than C++. If that is true, how can that be even possible?

One way Rust does this is that it refuses to compile many programs for which C++ would assume that the programmer knows what they are doing. C++ is notorious for undefined behavior, where if you e.g. have a pointer to object that has been already destroyed, anything can happen.

Rust would refuse to compile such a program and any other program for which it cannot determine that no dangling pointers exist. This can include programs that are completely functional but the properties cannot be readily determined.

In practice I find this puts a different kind of load on the programmer. Instead of having to convince yourself of correctness, you now have to convince the compiler. Less possibility for mistakes, but it requires more rigorous approach to programming.

As for speed, the theory is that when the compiler knows more, it can optimize better. So far I'd say this does not always happen in reality with Rust, but that depends a lot on the specific code involved.

jpa
  • 1,368
  • 7
  • 11
3

Optimization in modern C and C++ is based around the notion that in any situation where a useful optimizing transform might have any observable effect on a program's execution, nothing an implementation does would be deemed unacceptable.

Suppose the following function is called by a function which ignores its return value:

char arr[65537];
unsigned test(unsigned x)
{
  unsigned i=1;
  while((i & 0xFFFF) != x)
    i*=3;
  if (x < 65536)
    arr[x] = 1;
  return i;
}

There are three plausible ways an optimized version of the function might behave if code which calls the function ignores its return value, and passes a value of x happens to be 65536 or larger:

  1. The generated code might loop forever without returning, which is what would happen if the program is processed as a sequence of sequential steps.

  2. The generated code will skip the loop since nothing will care about the value of i, but skip the assignment to arr[x] because x is not less than 65536.

  3. The generated code will unconditionally store 1 to arr[x], even if x is 65536 or bigger, on the presumption that the function will never be invoked in any circumstance where storing 1 to arr[x] would be unacceptable.

If having a function hang when given certain inputs would be tolerable (e.g. because the environment would have a way of killing the process after a timeout) but unconstrained writes to memory would not be, requiring that programmers add extra code to prevent outcome #3 above would nullify any advantage that approach #2 could have offered. If, however, a language can e.g. treat the execution side-effect-free loop as unsequenced relative to following operations which have no dependencies upon values computed therein, yielding behavior #2, such treatment would be observably inconsistent with that of processing the program as a sequential series of steps, but may nonetheless be acceptable in many cases where behavior #3 would not.

While I don't know to what extent Rust does this, a language which allows programmers to safely write code whose behavior may deviate from sequential program execution and yet still satisfy application requirements can offer compilers many more useful opportunities for optimization than one which makes it necessary for programmers to write code in ways that allow no such flexibility.

bug
  • 3
  • 2
supercat
  • 8,335
  • 22
  • 28
  • Minor nit: In this example, isn’t the `i` guaranteed to wrap around in overflow, since you aren’t truncating the value of `i` after multiplication? There’s no undefined behavior, although the width of `i` is implementation-defined. Therefore, none of the optimizations are allowed. (They would be if `i` were signed.) – Davislor Aug 12 '23 at 22:22
  • 1
    @Davislor Yes, the multiplication will wrap around without issue, however the loop would never exit if `x > 0xffff`. One unintuitive rule of C and C++ is that an implementation may assume any code path eventually has some "observable" effect (I/O, system call, etc.). Therefore an infinite loop with no observable effects is itself undefined behavior, which a compiler may compile to an infinite loop, a no-op, or even nasal demons. – Vaelus Aug 13 '23 at 14:56
  • @Vaelus: Clang treats the loop as UB, but it's not clear that the Standard would allow that (except for the One Program Rule, which means it doesn't forbid anything). The Standard says that it uses precisely three ways of describing UB, and its terminology with regard to loops doesn't match any of those. In nearly all fields of human endeavor except the development of certain C compilers, the phrase "may assume" allows actions, but not causal inferences, and does not allow unbounded consequences if the assumption proves false. In a physics problem, "You may assume gravity is uniform..."... – supercat Aug 14 '23 at 15:10
  • ...wouldn't mean that the entire problem is meaningless because gravity would never be perfectly uniform, but is instead shorthand for *one may assume that imprecision in the answer which occurs because of gravitational deviations will be viewed as tolerable". There may not have been consensus as to exactly what deviations from pure sequential behavior should be presumed tolerable, but that doesn't imply any consensus intention to invite unbounded deviations in cases where any deviations might occur. – supercat Aug 14 '23 at 15:15
  • @Vaelus: More useful would be a general permission to say that sequencing rules are agnostic to the existence of side effects not anticipated by the Standard, and side effects which would not be considered observable may be deferred until the end of the universe. If a compiler transforms a piece of code X which does not rely upon some side effect from some other piece of code Y into a piece of code X' which does rely on Y, however, then it would need to recognize as observable *any side effects upon which X' relies*. – supercat Aug 14 '23 at 15:24
  • I'm referring to the forward progress guarantee in [section 6.9.2.2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4849.pdf#subsubsection.6.9.2.2) of the C++20 (draft) standard. The section lists a set of assumptions an implementation may make about any thread of execution, and doesn't impose any requirements if the assumption don't hold. To me, violating programs seem to fit the definition of undefined behavior. Regardless, I agree that it would be useful for the standard to be more specific or for implementations to provide some guarantees beyond what the standard requires here. – Vaelus Aug 14 '23 at 21:57
  • @Vaelus: Neither the C nor C++ Standard does a good job of recognizing situations where an optimizing transform might affect program behavior in a method which, although observable, would be *compatible with many useful programs' application requirements*. Many programs have as a primary application requirement "Do not allow creators of malicious data to execute arbitrary code of their choosing". If having a program hang when given a maliciously contrived data file may be annoying but nonetheless acceptable. but having it execute malicious code within that file would not be, ... – supercat Aug 14 '23 at 22:04
  • @Vaelus: ...allowing an implementation to treat side-effect-free loops as no-ops regardless of whether they terminate (but guaranteeing that they would either behave as no-ops or block program execution) would allow more useful optimizations than would saying that the only way for a program to uphold the "no malicious code execution" requirement would be to prevent side-effect-free endless loops at all costs. Unfortunately, there's no way the authors of the Standard can say that compiler writers have grossly misunderstood their intention. – supercat Aug 14 '23 at 22:07