Why Java, C# and the like needs to be managed code

Question

As far as I know the big argument for C#, Java and other high level languages having to be memory managed by a runtime environment is that the programmer does not take care of garbage collection or destruction of objects (if she does not want to). But is that really something a compiler could not do? It should be fairly algorithmic to know at compile time when an object is no longer needed, so the cleanup code could be generated and inserted in the right places in the compiled file. So you could create software with these languages that are compiled to machine code and run natively. (Ok, let's ignore that these are now cross platform, let's pretend we don't need that.)

You might be interested in the reference counting systems used in Swift, CPython, Rust and C++ — Alexander, Feb 05 '21 at 14:06
Memory management is a runtime concept; programs don't know at compile-time when an object is no longer needed. Thought experiment: does the compiler know what data someone is going to put into a program? — Robert Harvey, Feb 05 '21 at 14:06
What about when object cleanup depends on variable conditions that only exist at run time? How would the compiler know how to handle those? — FrustratedWithFormsDesigner, Feb 05 '21 at 14:16
@Robert Harvey Ok but C++ programmers are supposed to know it before run time, that's why they put the memory managing part in the source code. So this decision needs to be made before runtime and the compiler is also doing its job before run time. Maybe what I feel the need for here is a smart compiler, something with serious AI capabilities and the future might provide that? I mean any decision a human must make could be made by a smart enough algorithm too, — stevie, Feb 05 '21 at 14:36
C++ programmers use RAII, which is essentially a fancy term for creating the resources your class needs in the constructor, and disposing of those resources in the destructor. I suppose you could write a compiler that does all that housekeeping for you, though I would argue that you've essentially re-invented garbage collection if you do that. — Robert Harvey, Feb 05 '21 at 14:42
"It should be fairly algorithmic to know at compile time when an object is no longer needed". Sure, if you can solve the halting problem. — Polygnome, Feb 05 '21 at 14:44
"it should be fairly algorithmic to know at compile time when an object is no longer needed" - Nope. https://softwareengineering.stackexchange.com/questions/370579/why-arent-java-objects-deleted-immediately-after-they-are-no-longer-referenced — whatsisname, Feb 05 '21 at 16:05
@Polygnome Are you saying that programmers can solve the halting problem instead? — stevie, Feb 05 '21 at 17:52
@stevie: If you subscribe to the Strong Church-Turing Thesis, then no. A human cannot compute anything that a computer can't compute. — Jörg W Mittag, Feb 05 '21 at 19:51
@stevie Humans can *guess* And do over- und under-approximations in an intuitive way a compiler can't. And we can solve the problem of memory allocation for specific instances of programs. We can recognize that we write a program where reasoning becomes hard and refactor to avoid pitfalls. We do not need to solve it in the general case like a compiler would need to. If we humans write a program we can't cope with, we re-write. A compiler would need to solve the problem for every possible program. — Polygnome, Feb 06 '21 at 11:51

score 7 · Answer 1 · answered Feb 05 '21 at 14:18

7

Some cleanup code is easy to know where to insert and some not. The memory that is easy to clean up is usually allocated on the stack. Heap memory is trickier to know statically when it is safe to clean up. Think of things like a shared list of items that is added and deleted from various functions.

Your idea is the premise behind the Rust programming language, and they could only achieve it by requiring some awkward restrictions and annotations about reference sharing. Even then, there are certain situations where they have to fall back to reference counting.

Put another way, garbage collection may be somewhat slower, but it is much easier to implement in a programming language, and requires fewer restrictions and demands of the programmer.

answered Feb 05 '21 at 14:18

Karl Bielefeldt

146,727
38
279
479

4

"garbage collection may be somewhat slower" sometimes it can be *faster* – Caleth Feb 05 '21 at 14:19
" Even then, there are certain situations where they have to fall back to reference counting." Isn't reference counting prone to fall victim to islands (A refernces B, B references A, but no one refences either of them)? – Polygnome Feb 05 '21 at 14:45
1

@Polygnome Yes, and that's why many languages or libraries ([including Rust](https://doc.rust-lang.org/book/ch15-06-reference-cycles.html)) that have reference counting include the distinction between strong and weak references. Strong references will keep the referenced object alive, weak references will not. If A has a strong reference to B and B has a weak reference to A, both will be cleaned up if no one else has a strong reference to either of them. – 8bittree Feb 05 '21 at 15:47
@Caleth, agreed. The primary point with garbage collection is not that it is slower, but that it occurs at a time that is undetermined at compile-time. The main trade-offs are potentially increased program latency during the sweeps, and increased peak memory consumption due to the accumulation of unswept garbage. – Steve Feb 05 '21 at 20:45
@8bittree But if both strongly reference each other? Then you are back to the island problem. With reference counting, the programmer still has to be very aware of such pitfalls. I don't know enough of Rust to see why I would ever want to rely on reference counting, it seem superbly dangerous. – Polygnome Feb 06 '21 at 11:54

score 6 · Accepted Answer · answered Feb 05 '21 at 20:50

It should be fairly algorithmic to know at compile time when an object is no longer needed.

It is not. As Jörg notes in his answer, it is a provably undecidable problem in languages like C# and Java that allow for things like unsafe code, introspection, code generation, virtual dispatch, and user casting.

For example, consider code where you pass a reference as an argument to an interface's function. The implementation of that function might not even be in the code you're compiling. How could you possibly know what the function does with the reference?

score 1 · Answer 3 · answered Feb 05 '21 at 19:50

1

But is that really something a compiler could not do?

Yes.

As with practically all questions of the form "couldn't the compiler know this", the answer is that no, it is not possible, because it is equivalent to Solving the Halting Problem.

answered Feb 05 '21 at 19:50

Jörg W Mittag

101,921
24
218
318

In the modern vernacular: any answer that states "because it is equivalent to Solving the Halting Problem" is just "begging the question." – Robert Harvey Feb 05 '21 at 19:51

score 0 · Answer 4 · answered Feb 06 '21 at 00:47

Many compiled languages including C++ do offer nice things like "reference-counted objects" and interpreter-style memory management. But, they necessarily require your discipline in order to use them properly. As long as you do that, they work great. (But, if you screw it up, they can't help you.)

Interpreters, on the other hand, control their entire implementation themselves. They never "do the wrong thing," and they don't allow the language-user to do so either.

Why Java, C# and the like needs to be managed code

4 Answers4