What's the difference between a subclass and a subtype?

Question

The highest rated answer to this question about the Liskov Substitution Principle takes pains to distinguish between the terms subtype and subclass. It also makes the point that some languages conflate the two, whereas others do not.

For the object-oriented languages that I am most familiar with (Python, C++), "type" and "class" are synonymous concepts. In terms of C++, what would it mean to have a distinction between subtype and subclass? Say, for example, that Foo is a subclass, but not a subtype, of FooBase. If foo is an instance of Foo, would this line:

FooBase* fbPoint = &foo;

no longer be valid?

Actually, in Python "type" and "class" are *distinct* concepts. In fact, Python being dynamically typed, "type" isn't a concept *at all* in Python. Unfortunately, the Python developers don't understand that, and *still* conflate the two. — Jörg W Mittag, Dec 13 '17 at 20:35
"Type" and "class" are distinct in C++ as well. "Array of ints" is a type; what class is it? "pointer to a variable of type int" is a type; what class is it? These things aren't any class, but they are surely types. — Eric Lippert, Dec 13 '17 at 22:08
I was wondering this very thing after reading that question and that answer. — Uyghur Lives Matter, Dec 13 '17 at 22:53
@JorgWMittag If there is no concept of a "type" in python then someone should tell whoever writes the documentation: https://docs.python.org/3/library/stdtypes.html — Matt, Dec 14 '17 at 17:35
@Matt to be fair, types made it in in 3.5 which is pretty recently, especially by what-I-am-allowed-to-use-in-production standards. — Jared Smith, Dec 14 '17 at 20:15

Robert Harvey · Answer 1 · 2017-12-13T20:14:56.457

62

Subtyping is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability, meaning that program elements, typically subroutines or functions, written to operate on elements of the supertype can also operate on elements of the subtype.

If S is a subtype of T, the subtyping relation is often written S <: T, to mean that any term of type S can be safely used in a context where a term of type T is expected. The precise semantics of subtyping crucially depends on the particulars of what "safely used in a context where" means in a given programming language.

Subclassing should not be confused with subtyping. In general, subtyping establishes an is-a relationship, whereas subclassing only reuses implementation and establishes a syntactic relationship, not necessarily a semantic relationship (inheritance does not ensure behavioral subtyping).

To distinguish these concepts, subtyping is also known as interface inheritance, whereas subclassing is known as implementation inheritance or code inheritance.

References
Subtyping
Inheritance

edited Dec 13 '17 at 20:14

answered Dec 13 '17 at 19:52

Robert Harvey

198,589
55
464
673

1

Very well said. It might be worth mentioning in the context of the question that C++ programmers often employ pure virtual base classes to communicate subtyping relationships to the type system. Generic programming approaches are often preferred of course. – Aluan Haddad Dec 13 '17 at 20:59
6

"The precise semantics of subtyping crucially depends on the particulars of what "safely used in a context where" means in a given programming language." … and the LSP defines a somewhat reasonable idea of what "safely" means and tells us what constraints those particulars have to satisfy in order to enable this particular form of "safety". – Jörg W Mittag Dec 13 '17 at 21:24
1

One more sample on the pile: if I understood correctly, in C++, `public` inheritance introduces a subtype while `private` inheritance introduces a subclass. – Quentin Dec 14 '17 at 12:27
@Quentin public inheritance is both a subtype and a subclass, but private yet is only a subclass, but not a subtype. You can have subtyping without subclassing with structures like Java interfaces – eques Dec 14 '17 at 17:25

score 34 · Answer 2 · answered Dec 13 '17 at 21:30

A type, in the context that we are talking about here, is essentially a set of behavioral guarantees. A contract, if you will. Or, borrowing terminology from Smalltalk, a protocol.

A class is a bundle of methods. It is a set of behavior implementations.

Subtyping is a means of refining the protocol. Subclassing is a means of differential code re-use, i.e. re-using code by only describing the difference in behavior.

If you have used Java or C♯, then you may have come across the advice that all types should be interface types. In fact, if you read William Cook's On Understanding Data Abstraction, Revisited, then you may know that in order to do OO in those languages, you must only use interfaces as types. (Also, fun fact: Java cribbed interfaces directly from Objective-C's protocols, which in turn are taken directly from Smalltalk.)

Now, if we follow that coding advice to its logical conclusion and imagine a version of Java, where only interfaces are types, and classes and primitives aren't, then one interface inheriting from another will create a subtyping relationship, whereas one class inheriting from another will be merely for differential code-reuse via super.

As far as I know, there are no mainstream statically typed languages which distinguish strictly between inheriting code (implementation inheritance / subclassing) and inheriting contracts (subtyping). In Java and C♯, interface inheritance is pure subtyping (or at least was, until the introduction of default methods in Java 8 and likely C♯ 8 as well), but class inheritance is also subtyping as well as implementation inheritance. I remember reading about an experimental statically typed object-oriented LISP dialect, which strictly distinguished between mixins (which contain behavior), structs (which contain state), interfaces (which describe behavior), and classes (which compose zero or more structs with one or more mixins and conform to one or more interfaces). Only classes can be instantiated, and only interfaces can be used as types.

In a dynamically typed OO language such as Python, Ruby, ECMAScript, or Smalltalk, we generally think of the type(s) of an object as the set of protocols to which it conforms. Note the plural: an object can have multiple types, and I'm not just talking about the fact that each object of type String is also an object of type Object. (BTW: note how I used class names to talk about types? How stupid of me!) An object can implement multiple protocols. For example, in Ruby, Arrays can be appended to, they can be indexed, they can be iterated over, and they can be compared. That's four different protocols that they implement!

Now, Ruby doesn't have types. But the Ruby community has types! They only exist in the heads of programmers, though. And in documentation. For example, any object that respond to a method called each by yielding its elements one by one is considered to be an enumerable object. And there is a mixin called Enumerable which depends on this protocol. So, if your object has the correct type (which exists only in the programmer's head), then it is allowed to mix in (inherit from) the Enumerable mixin, and it well get all sorts of cool methods for free, like map, reduce, filter and so on.

Likewise, if an object responds to <=>, then it is considered to implement the comparable protocol, and it can mix in the Comparable mixin and get stuff like <, <=, >, <=, ==, between?, and clamp for free. However, it can also implement all of those methods itself, and not inherit from Comparable at all, and it would still be considered comparable.

A good example is the StringIO library, which essentially fakes I/O streams with strings. It implements all the same methods as the IO class, but there is no inheritance relationship between the two. Nevertheless, a StringIO can be used everywhere an IO can be used. This is very useful in unit tests, where you can replace a file or stdin with a StringIO without having to make any further changes to your program. Since StringIO conforms to the same protocol as IO, they are both of the same type, even though they are different classes, and share no relationship (other than the trivial that they both extend Object at some point).

It might be helpful if languages allowed programs to simultaneously declare a class type and an interface for which that class is an implementation, and also allow implementations to specify "constructors" (which would chain to constructors of classes specified by the interface). For types of objects for which references would be shared publicly, the preferred pattern would be that the class type only be used when creating derived classes; most references should be of the interface type. Being able to specify interface constructors would be helpful in situations where... — supercat, Dec 14 '17 at 16:56
...e.g. code needs a collection which will allow a certain set of values to be read by index, but doesn't really care what type it is. While there are sound reasons for recognizing classes and interfaces as distinct kinds of type, there are many situations where they should be able to work more closely together than languages presently allow. — supercat, Dec 14 '17 at 16:59
Do you have a reference or some keywords I could search for some more information about the experimental LISP dialect you mentioned that formally differentiates mixins, structs, interfaces, and classes? — tel, Feb 01 '18 at 22:59
@tel: No, sorry. It was probably about 15-20 years ago, and at that time my interests were all over the place. I couldn't possibly begin to tell you what I was looking for when I stumbled across this. — Jörg W Mittag, Feb 02 '18 at 00:45
Awww. That was the most interesting detail in all of these answers. The fact that a formal separation of those concepts is actually possible within the implementation of a language really helped to crystalize the class/type distinction for me. I guess I'll go looking for that LISP myself, in any case. Do you happen to remember if you read about it in a journal article/book, or if you just heard about it in conversation? — tel, Feb 02 '18 at 00:56
@tel: It might have been some iteration of Mikel Evins's [Bard](http://bardcode.net/) or its Categories object system that I was thinking of. Note that both Bard and Categories have changed significantly multiple times, and most information about them was contained in "stream-of-consciousness"-style blog posts by its author (on blog hosts which no longer exist, to complicate matters even further). However, Bard isn't a good fit to my memory: it's not (statically) typed, and it doesn't have mixins. It *does* distinguish between protocols, classes (implementation), and schemas (representation). — Jörg W Mittag, Feb 03 '18 at 21:38
Very informative answer. "As far as I know, there are no mainstream statically typed languages which distinguish strictly between inheriting code (implementation inheritance / subclassing) and inheriting contracts (subtyping)" C++ has both pure interface inheritance (thanks to abstract methods) and pure implementation inheritance (thanks to private inheritance). — Géry Ogam, Dec 16 '20 at 18:39
OCaml seems to distinguish between subclassing and subtyping, but I guess it isn't really a "mainstream statically typed language". — wlnirvana, Mar 24 '22 at 14:34

score 2 · Answer 3 · answered Dec 14 '17 at 05:00

It is perhaps first useful to distinguish between a type and a class and then dive into the difference between subtyping and subclassing.

For the rest of this answer I'm going to assume that the types in discussion are static types (since subtyping usually comes up in a static context).

I'm going to develop a toy pseudocode to help illustrate the difference between a type and a class because most languages conflate them at least in part (for good reason that I'll briefly touch on).

Let's start with a type. A type is a label for an expression in your code. This label's value and whether it is consistent (for some type system-specific definition of consistent) with all the other labels' value can be determined by an external program (a typechecker) without running your program. That's what makes these labels special and deserving of their own name.

In our toy language we might allow for the creation of labels like so.

declare type Int
declare type String

Then we might label various values as being of this type.

0 is of type Int
1 is of type Int
-1 is of type Int
...

"" is of type String
"a" is of type String
"b" is of type String
...

With these statements our typechecker can now reject statements such as

0 is of type String

if one of the requirements of our type system is that every expression has a unique type.

Let's leave aside for now how clunky this is and how you're going to have problems assigning an infinite number of expressions types. We can return to it later.

A class on the other hand is a collection of methods and fields that are grouped together (potentially with access modifiers such as private or public).

class StringClass:
  defMethod concatenate(otherString): ...
  defField size: ...

An instance of this class gets the ability to either create or use preexisting definitions of these methods and fields.

We could choose to associate a class with a type such that every instance of a class is automatically labeled with that type.

associate StringClass with String

But not every type needs to have an associated class.

# Hmm... Doesn't look like there's a class for Int

It's also conceivable that in our toy language not every class has a type, especially if not all our expressions have types. It's a bit trickier (but not impossible) to imagine what type system consistency rules would look like if some expressions had types and some didn't.

Moreover in our toy language these associations do not have to be unique. We could associate two classes with the same type.

associate MyCustomStringClass with String

Now keep in mind there's no requirement for our typechecker to track the value of an expression (and in most cases it won't or is impossible to do so). All it knows are the labels you've told it. As a reminder previously the typechecker was only able to reject the statement 0 is of type String because of our artificially created type rule that expressions must have unique types and we already had labeled the expression 0 something else. It didn't have any special knowledge of the value of 0.

So what about subtyping? Well subtyping is a name for a common rule in typechecking that relaxes the other rules you might have. Namely if A is subtype of B then everywhere your typechecker demands a label of B, it will also accept an A.

For example we might do the following for our numbers instead of what we had previously.

declare type NaturalNum
declare type Int
NaturalNum is subtype of Int

0 is of type NaturalNum
1 is of type NaturalNum
-1 is of type Int
...

Subclassing is a shorthand for declaring a new class that allows you to reuse previously declared methods and fields.

class ExtendedStringClass is subclass of StringClass:
  # We get concatenate and size for free!
  def addQuestionMark: ...

We don't have to associate instances of ExtendedStringClass with String like we did with StringClass since, after all it's a whole new class, we just didn't have to write as much. This would allow us to give ExtendedStringClass a type that is incompatible with String from the typechecker's point of view.

Likewise we could have decided to make a whole new class NewClass and done

associate NewClass with String

Now every instance of StringClass can be substituted with NewClass from the typechecker's point of view.

So in theory subtyping and subclassing are completely different things. But no language I know of that has types and classes actually does things this way. Let's start paring down our language and explain the rationale behind some of our decisions.

First off, even though in theory completely different classes could be given the same type or a class could be given the same type as values that are not instances of any class, this severely hampers the usefulness of the typechecker. The typechecker is effectively robbed of the ability to check whether the method or field you're calling within an expression actually exists on that value, which is probably a check you'd like if you're going to the trouble of playing along with a typechecker. After all, who knows what the value actually underneath that String label is; it might be something that doesn't have, e.g., a concatenate method at all!

Okay so let's stipulate that every class automatically generates a new type of the same name as that class and associates instances with that type. That lets us get rid of associate as well as the different names between StringClass and String.

For the same reason, we probably want to automatically establish a subtype relationship between the types of two classes where one is a subclass of another. After all the subclass is guaranteed to have all the methods and fields the parent class does, but the opposite is not true. Therefore while the subclass can pass anytime you need a type of the parent class, the type of the parent class should be rejected if you need the type of the subclass.

If you combine this with the stipulation that all user defined values must be instances of a class, then you can have is subclass of pull double duty and get rid of is subtype of.

And this gets us to the characteristics that most of the popular statically typed OO languages share. There are a set of "primitive" types (e.g. int, float, etc.) which are not associated with any class and are not user-defined. Then you have all the user-defined classes which automatically have types of the same name and identify subclassing with subtyping.

The final note I'll make is around the clunkiness of declaring types separately from values. Most languages conflate the creation of the two, so that a type declaration also is a declaration for generating entirely new values that are automatically labeled with that type. For example, a class declaration typically both creates the type as well as a way of instantiating values of that type. This gets rid of some of the clunkiness and, in the presence of constructors, also lets you create label infinitely many values with a type in one stroke.

What's the difference between a subclass and a subtype?

3 Answers3