45

While programming in C#, I stumbled upon a strange language design decision that I just can't understand.

So, C# (and the CLR) has two aggregate data types: struct (value-type, stored on the stack, no inheritance) and class (reference-type, stored on the heap, has inheritance).

This setup sounds nice at first, but then you stumble upon a method taking a parameter of an aggregate type, and to figure out if it is actually of a value type or of a reference type, you have to find its type's declaration. It can get really confusing at times.

The generally accepted solution to the problem seems to be declaring all structs as "immutable" (setting their fields to readonly) to prevent possible mistakes, limiting structs' usefulness.

C++, for example, employs a much more usable model: it allows you to create an object instance either on the stack or on the heap and pass it by value or by reference (or by pointer). I keep hearing that C# was inspired by C++, and I just can't understand why didn't it take on this one technique. Combining class and struct into one construct with two different allocation options (heap and stack) and passing them around as values or (explicitly) as references via the ref and out keywords seems like a nice thing.

The question is, why did class and struct become separate concepts in C# and the CLR instead of one aggregate type with two allocation options?

Mints97
  • 789
  • 6
  • 12
  • Related to managed vs unmanaged data. – Basile Starynkevitch Feb 26 '15 at 20:43
  • 35
    "The generally accepted solution to the problem seems to be declaring all structs as "immutable"...limiting structs' usefulness" Many people would argue that making *anything* immutable generally makes it more useful whenever it's not the cause of a performance bottleneck. Also, `struct`s aren't always stored on the stack; consider an object with a `struct` field. That aside, as Mason Wheeler mentioned the slicing problem is probably the biggest reason. – Doval Feb 26 '15 at 20:59
  • 7
    It is not true that C# was inspired by C++; rather C# was inspired by all the (well meant and good sounding at the time) mistakes in the design of both C++ and Java. – Pieter Geerkens Feb 26 '15 at 23:01
  • 1
    It is also regarded as best practice to make, to the greatest extent possible, all the objects comprising the Model (as opposed to the View-Model) immutable in order to simplify analysis and parallelization of the code. – Pieter Geerkens Feb 26 '15 at 23:03
  • 21
    Note: stack and heap are implementation details. There is nothing that says that struct instances have to be allocated on the stack and class instances have to be allocated on the heap. And it is in fact not even true. For example, it is very well possible that a compiler might determine using Escape Analysis that a variable cannot escape the local scope and thus allocate it on the stack even if it is a class instance. It doesn't even say that there has to be a stack or a heap at all. You could allocate environment frames as a linked list on the heap and not even have a stack at all. – Jörg W Mittag Feb 27 '15 at 04:39
  • 6
    Quoting Eric Lippert about [_the mistaken belief that the type system has anything whatsoever to do with the storage allocation strategy_](http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx). – Zev Spitz Nov 03 '15 at 14:46
  • 3
    "but then you stumble upon a method taking a parameter of an aggregate type, and to figure out if it is actually of a value type or of a reference type, you have to find its type's declaration" Umm, why does that matter, exactly? The method asks you to pass a value. You pass a value. At which point do you have to care about whether it's a reference type or a value type? – Luaan Feb 10 '16 at 21:38

4 Answers4

60

The reason C# (and Java and essentially every other OO language developed after C++) did not copy C++'s model in this aspect is because the way C++ does it is a horrendous mess.

You correctly identified the relevant points above: struct: value type, no inheritance. class: reference type, has inheritance. Inheritance and value types (or more specifically, polymorphism and pass-by-value) don't mix; if you pass an object of type Derived to a method argument of type Base, and then call a virtual method on it, the only way to get proper behavior is to ensure that what got passed was a reference.

Between that and all the other messes that you run into in C++ by having inheritable objects as value types (copy constructors and object slicing come to mind!) the best solution is to Just Say No.

Good language design isn't just implementing features, it's also knowing what features not to implement, and one of the best ways to do this is by learning from the mistakes of those who came before you.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 2
    See also [this answer](http://programmers.stackexchange.com/questions/118295/did-the-developers-of-java-consciously-abandon-raii/118357#118357) on why Java doesn't have custom value-objects. – BlueRaja - Danny Pflughoeft Feb 26 '15 at 22:47
  • 2
    really nice summary. clear concise and to the point. – Anonymous Type Feb 27 '15 at 01:29
  • 31
    This is just yet another pointless subjective rant on C++. Can't downvote, but would if I could. – Bartek Banachewicz Feb 27 '15 at 12:42
  • 2
    @Mgetz: That's not object slicing; that's just non-polymorphic functions at work. Slicing is something highly unfortunate that happens when assigning a value-typed object to a derived class, that can lead to bizarre data corruption issues. – Mason Wheeler Feb 27 '15 at 12:47
  • 5
    @BartekBanachewicz: Pray tell, which of the facts that I stated are subjective? That passing objects as values breaks polymorphism? That objects as value types requires hassles like copy constructors and causes messes like object slicing? That all the hidden gotchas that it introduces is the reason why C# and other OO languages chose not to follow C++'s object model? – Mason Wheeler Feb 27 '15 at 14:34
  • 19
    @MasonWheeler: "*is a horrendous mess*" sounds subjective enough. This was already discussed in a long comment thread to [another answer of yours](http://programmers.stackexchange.com/questions/271139/is-the-finally-portion-of-a-try-catch-finally-construct-even-necessa/271254#271254); the thread got nuked (unfortunately, because it contained useful comments although in flame war sauce). I don't think it's worth repeating the whole thing here, but "C# got it right and C++ got it wrong" (which seems to be the message you're trying to convey) is indeed a subjective statement. – Andy Prowl Feb 27 '15 at 14:41
  • 1
    @AndyProwl: Perhaps it's a subjective conclusion to draw, but it's one well-supported by objective facts, which I have stated clearly. If you have facts which present support a different position, feel free to present them. – Mason Wheeler Feb 27 '15 at 14:44
  • 17
    @MasonWheeler: I did in the thread that got nuked, and so did several other people - which is why I think it's unfortunate that it got deleted. I don't think it is a good idea to replicate that thread here, but the short version is: in C++ the *user* of the type, not its *designer*, gets to decide with what semantics a type should be used (reference semantics or value semantics). This has advantages and disadvantages: you rage against the cons without considering (or knowing?) the pros. That's why the analysis is subjective. – Andy Prowl Feb 27 '15 at 14:49
  • 4
    _"That objects as value types requires hassles like copy constructors"_ Lol, back in 1992 maybe. Look up the _rule of zero_. – Lightness Races in Orbit Feb 27 '15 at 14:51
  • 1
    I can't tell what you're trying to say in your second paragraph. You refer to the OP's original observations that _in C#_ structs have no inheritance and classes are reference types, but then you follow it up by seemingly using it as an example of poor design in C++. Don't get me wrong: C++ _is_ horrible. But I can't see any valid arguments made for that in this answer which purports to do so. – Lightness Races in Orbit Feb 27 '15 at 14:52
  • 3
    @LightnessRacesinOrbit: Actually that's exactly what I'm doing. I explain that in C#, structs have no inheritance and classes are reference types, and then demonstrate how in C++, which the OP was asking about, not following this pattern causes messes by violating Liskov Substitution and leading to issues like object slicing. – Mason Wheeler Feb 27 '15 at 14:55
  • 7
    *sigh* I believe the LSP violation discussion has already taken place. And I kinda believe that the majority of the people agreed that the LSP mention is pretty weird and unrelated, but can't check because *a mod nuked the comment thread*. – Bartek Banachewicz Feb 27 '15 at 14:56
  • @MasonWheeler: Okay - you might be able to make a clearer distinction in your answer between your observations about C++ and your observations about C#, and why you think each observation is a pro over the other. – Lightness Races in Orbit Feb 27 '15 at 14:57
  • 9
    If you move your last paragraph to the top and delete the current first paragraph I think you have the perfect argument. But the current first paragraph is just subjective. – Martin York Feb 27 '15 at 16:57
  • 1
    @LokiAstari: I think it could be made less argumentative, but without loss of meaning, if "is a horrendous mess" were replaced with with "has led to many difficulties and complications which the creators of .NET did not wish to repeat". I don't think any even the most ardent C++ supporter would claim that the C++ approach doesn't lead to many difficulties, nor would .NET advocates claim that the C++ approach has no advantages whatsoever. The only point of contention is the relative magnitude of the problems and advantages. – supercat May 08 '15 at 18:10
  • 3
    @AndyProwl: _"C# got it right and C++ got it wrong"_ is an unfair summarization. The posted answer is more accurately summarized as "The makers of C# chose to avoid something in C++ that they saw as problematic (and Mason is agreeing with that decision)" – Flater Oct 02 '18 at 07:12
21

By analogy, C# is basically like a set of mechanic's tools where somebody has read that you should generally avoid pliers and adjustable wrenches, so it doesn't include adjustable wrenches at all, and the pliers are locked in a special drawer marked "unsafe", and can only be used with approval from a supervisor, after signing a disclaimer absolving your employer of any responsibility for your health.

C++, by comparison, not only includes adjustable wrenches and pliers, but some rather odd-ball special purpose tools whose purpose aren't immediately apparent, and if you don't know the right way to hold them, they might easily cut off your thumb (but once you understand how to use them, can do things that are essentially impossible with the basic tools in the C# toolbox). In addition, it has a lathe, milling machine, surface grinder, metal-cutting band-saw, etc., to let you design and create entirely new tools any time you feel the need (but yes, those machinist's tools can and will cause serious injuries if you don't know what you're doing with them--or even if you just get careless).

That reflects the basic difference in philosophy: C++ attempts to give you all the tools you might need for essentially any design you might want. It makes almost no attempt at controlling how you use those tools, so it's also easy to use them to produce designs that only work well in rare situations, as well as designs that are probably just a lousy idea and nobody knows of a situation in which they're likely to work at all well. In particular, a great deal of this is done by decoupling design decisions--even those that in practice really are nearly always coupled. As a result, there's a huge difference between just writing C++, and writing C++ well. To write C++ well, you need to know a lot of idioms and rules of thumb (including rules of thumb about how seriously to reconsider before breaking other rules of thumb). As a result, C++ is oriented much more toward ease of use (by experts) than ease of learning. There are also (all too many) circumstances in which it's not really terribly easy to use either.

C# does a lot more to try to force (or at least extremely strongly suggest) what the language designers considered good design practices. Quite a few things that are decoupled in C++ (but usually go together in practice) are directly coupled in C#. It does allow for "unsafe" code to push the boundaries a little, but honestly, not a whole lot.

The result is that on one hand there are quite a few designs that can be expressed fairly directly in C++ that are substantially clumsier to express in C#. On the other hand, it's a whole lot easier to learn C#, and the chances of producing a really horrible design that won't work for your situation (or probably any other) are drastically reduced. In many (probably even most) cases, you can get a solid, workable design by simply "going with the flow", so to speak. Or, as one of my friends (at least I like to think of him as a friend--not sure if he really agrees) likes to put it, C# makes it easy to fall into the pit of success.

So looking more specifically at the question of how class and struct got how they are in the two languages: objects created in an inheritance hierarchy where you might use an object of a derived class in the guise of its base class/interface, you're pretty much stuck with the fact that you normally need to do so via some sort of pointer or reference--at a concrete level, what happens is that the object of the derived class contains something memory that can be treated as an instance of the base class/interface, and the derived object is manipulated via the address of that part of memory.

In C++, it's up to the programmer to do that correctly--when he's using inheritance, it's up to him to ensure that (for example) a function that works with polymorphic classes in a hierarchy does so via a pointer or reference to the base class.

In C#, what is fundamentally the same separation between the types is much more explicit, and enforced by the language itself. The programmer doesn't need to take any steps to pass an instance of a class by reference, because that'll happen by default.

Jerry Coffin
  • 44,385
  • 5
  • 89
  • 162
  • 3
    As a C++ fan, I think this is an excellent summary of the differences between C# and the Swiss Army Chainsaw. – David Thornley Oct 01 '18 at 23:18
  • 1
    @DavidThornley: I did at least attempt to write what I thought would be a somewhat balanced comparison. Not to point fingers, but some of what I saw when I wrote this struck me as...somewhat inaccurate (to put it nicely). – Jerry Coffin Oct 01 '18 at 23:30
7

This is from "C#: Why Do We Need Another Language?" - Gunnerson, Eric:

Simplicity was an important design goal for C#.

It's possible to go overboard on simplicity and language purity but purity for purity's sake is of little use to the professional programmer. We therefore tried to balance our desire to have a simple and concise language with solving the real-world problems that programmers face.

[...]

Value types, operator overloading and user-defined conversions all add complexity to the language, but allow an important user scenario to be tremendously simplified.

Reference semantics for objects is a way to avoid a lot of troubles (of course and not only object slicing) but real world problems can sometimes require objects with value semantic (e.g take a look at Sounds like I should never use reference semantics, right? for a different point of view).

What better approach to take, therefore, than segregate those dirty, ugly and bad objects-with-value-semantic under the tag of struct?

manlio
  • 4,166
  • 3
  • 23
  • 35
  • 1
    I don't know, maybe not using those dirty, ugly, and bad objects-with-reference-semantics? – Bartek Banachewicz Feb 27 '15 at 15:13
  • Maybe... I'm a lost cause. – manlio Feb 27 '15 at 15:26
  • 2
    IMHO, one of the biggest defects in the design of Java is the lack of any means to declare whether a variable is being used to encapsulate *identity* or *ownership*, and one of the biggest defects in C# is the lack of a means to distinguish operations on a variable from operations on an object to which a variable holds a reference. Even if the Runtime didn't care about such distinctions, being able to specify in a language whether a variable of type `int[]` should be shareable or changeable (arrays can be either, but generally not both) would help make wrong code look wrong. – supercat Feb 27 '15 at 23:42
4

Rather than thinking of value types deriving from Object, it would be more helpful to think of storage-location types existing in an entirely separate universe from class instance types, but for every value type to have a corresponding heap-object type. A storage location of structure type simply holds a concatenation of the type's public and private fields, and the heap type is auto-generated according to a pattern like:

// Defined structure
struct Point : IEquatable<Point>
{
  public int X,Y;
  public Point(int x, int y) { X=x; Y=y; }
  public bool Equals(Point other) { return X==other.X && y==other.Y; }
  public bool Equals(Object other)
  { return other != null && other.GetType()==typeof(this) && Equals(Point(other)); }
  public bool ToString() { return String.Format("[{0},{1}", x, y); }
  public bool GetHashCode() { return unchecked(x+y*65531); }
}        
// Auto-generated class
class boxed_Point: IEquatable<Point>
{
  public Point value; // Fake name; C++/CLI, though not C#, allow full access
  public boxed_Point(Point v) { value=v; }
  // Members chain to each member of the original
  public bool Equals(Point other) { return value.Equals(other); }
  public bool Equals(Object other) { return value.Equals(other); }
  public String ToString() { return value.ToString(); }
  public Int32 GetHashCode() { return value.GetHashCode(); }
}

and for a statement like: Console.WriteLine("The value is {0}", somePoint);

to be translated as: boxed_Point box1 = new boxed_Point(somePoint); Console.WriteLine("The value is {0}", box1);

In practice, because storage location types and heap instance types exist in separate universes, it's not necessary to call the heap-instance types things like like boxed_Int32; since the system would know what contexts require the heap-object instance and which ones require a storage location.

Some people think that any value types which don't behave like objects should be considered "evil". I take the opposite view: since storage locations of value types are neither objects nor references to objects, the expectation that they should behave like objects should be considered unhelpful. In cases where a struct can usefully behave like an object, there's nothing wrong with having one do so, but each struct is at its heart nothing more than an aggregation of public and private fields stuck together with duct tape.

supercat
  • 8,335
  • 22
  • 28