56

Background

I am designing a language, as a side project. I have a working assembler, static analyser, and virtual machine for it. Since I can already compile and run non-trivial programs using the infrastructure I've built I thought about giving a presentation at my university.

During my talk I mentioned that the VM provides a type system, was asked "What is your type system for?". After answering I got laughed at by the person asking the question.

Thus, even though I am almost certainly going to lose reputation for asking this question, I turn to Programmers.

My understanding

As I understand them, type systems are used to provide additional layer of information about entities in a program, so that the runtime, or the compiler, or any other piece of machinery, knows what to do with the strings of bits it operates on. They also help maintain contracts - the compiler (or code analyser, or runtime, or any other program) can verify that at any given point the program operates on values programmers expect it to operate on.

Types can also be used to provide information to those human programmers. For example, I find this declaration:

function sqrt(double n) -> double;

more useful than this one

sqrt(n)

The former gives plenty of information: that the sqrt identifier is a function, takes a single double as input, and produces another double as output. The latter tells you that it is probably a function taking a single parameter.

My answer

So, after being asked "What is your type system for?" I answered as follows:

The type system is dynamic (types are assigned to values, not to variables holding them), but strong without surprising coercion rules (you can't add string to integer as they represent incompatible types, but you can add integer to floating point number).

The type system is used by the VM to ensure that operands for instructions are valid; and can be used by programmers to ensure that parameters passed to their functions are valid (i.e. of correct type).
The type system supports subtyping and multiple inheritance (both features are available to programmers), and types are considered when dynamic dispatch of methods on objects is used - VM uses types to check by what function is a given message implemented for given type.

The follow-up question was "And how is type assigned to a value?". So I explained that all values are boxed, and have a pointer pointing to a type definition structure which provides information about name of the type, what messages it responds to, and what types it inherits from.

After that, I got laughed at, and my answer was dismissed with the comment "That is not a real typesystem.".

So - if what I described does not qualify as a "real typesystem", what would? Was that person right that what I provide cannot be considered a typesystem?

Newtopian
  • 7,201
  • 3
  • 35
  • 52
Mael
  • 2,305
  • 1
  • 13
  • 26
  • 21
    When people talk about type systems they're usually talking about static typing. Dynamic typing isn't very interesting to the kind of people that care about type systems because it guarantees almost nothing. E.g. What kind of value can variable x hold? Anything. – Doval Oct 14 '16 at 18:19
  • @Doval I agree that dynamic typing is not *that* interesting and the guarantees it gives are a bit weaker. I myself am a proponent of static typing. At the lowest level in the VM I have to be able to move values around without knowing their exact types and be able to store any value in VM registers, so I opted for dynamic typing with runtime checks for whatever I am not able verify statically. – Mael Oct 14 '16 at 18:42
  • 7
    I'd be curious to hear what they had to say to defend/explain their reaction. – Newtopian Oct 14 '16 at 18:48
  • 20
    @Doval Dynamic typing can guarantee you don't enter a nonsensical state by doing something like adding 5 to your cat. Sure, it won't prevent you from *trying*, but it can at least stop it from actually happening and give you a chance to figure out what went wrong and take corrective actions, things that a truly typeless language can't. – 8bittree Oct 14 '16 at 20:46
  • 1
    @8bittree I don't disagree, but using a typeless language is very rare, so dynamic typing is usually the baseline for comparison. – Doval Oct 14 '16 at 21:03
  • 10
    The person took issue with your answer to "And how is type assigned to a value?". They wanted to hear about typing rules, not box-and-pointer diagrams. Laughing was absolutely rude, though. – gardenhead Oct 14 '16 at 21:56
  • You may be interested in http://stackoverflow.com/questions/9154388/does-untyped-also-mean-dynamically-typed-in-the-academic-cs-world. – ruakh Oct 14 '16 at 22:07
  • 12
    The laughing person is most likely a zealot for some particular language (family) with a strong type system (Haskell seems popular), and would ridicule anything less strong (and thus a toy) than that, or more strong (and thus impractical), or just different. Engaging in discussion with zealots is dangerous and futile. Laughing like that is just so rude that it indicates this kind of deeper issues. You're lucky they didn't start preaching... – hyde Oct 15 '16 at 07:28
  • 4
    "Thus, even though I am almost certainly going to lose reputation for asking this question, I turn to Programmers." I'm not sure whether you meant rep here on Programmers or reputation among your colleagues, but either way, trying to fill a gap in your knowledge is what these sites *exist for*. You have a well thought out question requesting someone help you identify the information you're missing. That's nothing to be ashamed of. – jpmc26 Oct 15 '16 at 17:35
  • I also suspect that the person posing the questions isn't nearly as knowledgeable as they want you to think. The questions the person asked ("What is your type system for?" and "And how is type assign to a value?") seem ill posed and unclear. At best, I would argue this person has poor communication skills. An expert in the field of type systems would already *know* the purpose of a type system, and I can't imagine the question of "assignment" being answered with anything *but* low level details. Also, if they had showed any humility, they would not have laughed or derided your implementation. – jpmc26 Oct 15 '16 at 18:08
  • They laughed because statically-typed language implementations also have boxed values in their VMs. The boxes indicate that types are missing or have been erased; your boxes showed a *lack* of a type system or type judgements. (This is not to say that they were not rude!) – Corbin Dec 02 '21 at 15:13

3 Answers3

34

That all seems like a fine description of what type systems provide. And your implementation sounds like a reasonable enough one for what it's doing.

For some languages, you won't need the runtime information since your language doesn't do runtime dispatch (or you do single dispatch via vtables or another mechanism, so don't need the type information). For some languages, just having a symbol/placeholder is sufficient since you only care about type equality, not its name or inheritance.

Depending on your environment, the person may have wanted more formalism in your type system. They want to know what you can prove with it, not what programmers can do with it. This is pretty common in academia unfortunately. Though academics do such things because it's pretty easy to have flaws in your type system that allow things to escape correctness. It's possible they spotted one of these.

If you had further questions, Types and Programming Languages is the canonical book on the subject and can help you to learn some of the rigor needed by academics, as well as some of the terminology to help describe things.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 3
    "Depending on your environment, the person may have wanted more formalism in your type system." That's probably it. I didn't concentrate on what I can *prove* with the type system, but rather thought about it as a tool. Thanks for the book recommendation! – Mael Oct 14 '16 at 18:54
  • 1
    @Mael Some type systems are used as logics (see *logical frameworks*). so basically the type gives the formulas and the programs are the proofs of those formulas (e.g. the function type `a -> b` can be seen as *a implies b*, i.e. if you give me a value of type `a` I can obtain a value of type `b`). However for this to be consistent the language must be total, and thus non-Turing complete. So *all* real-life type systems actually define an inconsistent logic. – Bakuriu Oct 15 '16 at 07:29
23

I like @Telastyn's answer especially because of the reference to academic interest in formalism.

Allow me to add to the discussion.

What is a type system?

A type system is a mechanism for defining, detecting, and preventing illegal program states. It works by defining and applying constraints. The constraint definitions are types, and, the constraint applications are usages of types, e.g. in variable declaration.

Type definitions typically support composition operators (e.g. various forms of conjunction, as in structures, subclassing, and, disjunction, as in enums, unions).

The constraints, usages of types, sometimes also allow composition operators (e.g. at least this, exactly this, either this or that, this provided that something else holds).

If the type system is available in the language and applied at compile time toward the objective of being able to issue compile-time errors, it is a static type system; these prevent many illegal programs from compiling let alone running, hence it is preventing illegal program states.

(A static type system stops a program from running whether or not it is known (or undecidable) that the program will ever reach that unsound code it is complaining about. A static type system detects certain kinds of nonsense (violations of the declared constraints) and judges the program in error before it ever runs.)

If a type system is applied at runtime, it is a dynamic type system that prevents illegal program states: but by stopping the program in mid run, instead of preventing it from running in the first place.

A fairly common type system offering is to provide both static and dynamic features.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91
  • I don't think so-called hybrid type systems are very common at all. What languages do you have in mind? – gardenhead Oct 14 '16 at 22:03
  • 3
    @gardenhead, the ability to downcast is not a static type system feature, therefore it is usually checked at runtime dynamically. – Erik Eidt Oct 14 '16 at 22:26
  • 1
    @gardenhead: most statically typed languages allow you to defer typing to runtime, be it simply with C's `void *` pointers (very weak), C#'s dynamic objects, or Haskell's existentially quantified GADTs (which give you rather stronger guarantees than the statically typed values in most other languages). – leftaroundabout Oct 14 '16 at 22:33
  • True, I forgot about "casting". But casting is just a crutch for a weak type system. – gardenhead Oct 14 '16 at 23:02
  • @gardenhead As well as static languages providing dynamic options, many dynamic languages provide some static typing. For instance, Dart, Python, and Hack, all have modes or tools for performing static analysis based on the concept of "gradual typing". – IMSoP Oct 15 '16 at 17:46
  • @gardenhead: No type system can express all possible relevant combinations of constraints. I don't think it's fair to regard casts as a "crutch", given that they make it possible to have a type system be stronger in cases that don't use casts than would otherwise be possible without loss of functionality. – supercat Oct 15 '16 at 18:52
  • @supercat I'm not sure what you mean. What do you define as a constraint, and a relevant combination thereof? Personally I can't think of an example where casting is necessary. – gardenhead Oct 15 '16 at 22:18
  • @gardenhead: A document contains a number of objects that support different features, and one wants functions to perform some action on all the objects that support it (e.g. do a "contrast adjust" on every embedded picture within a word-processing document). While it may be possible to have every object implement every function (with many implementations being no-ops), or keep separate lists for every kind of object, it's generally cleaner to have a list of general-document-object references and cast references that happen to have other abilities to types that support those abilities. – supercat Oct 17 '16 at 14:48
  • @supercat No, casting is not needed there. We must have very different definitions of "clean". The best options is to use a sum-type that ranges over all the different object types. – gardenhead Oct 17 '16 at 16:04
  • @gardenhead: Using sum types may be somewhat workable if it's possible for new interfaces to supply default implementations for methods that aren't supported pre-existing classes, but using sum types for everything would forfeit the advantages of having not only a type "Reference to thing that might support X", but also one for "Reference to thing that is expected to usefully support X". There needs to be a way for code which has a former-type reference to a latter-type instance to give it to code that expects a latter-type reference. One could avoid the need for casts by... – supercat Oct 17 '16 at 17:47
  • ...having everything that would use the latter type use the former instead, but that would effectively make the type system weaker, not stronger. Much of the purpose of a type system is to ensure that if a function requires objects with some trait, a compiler can squawk if an attempt is made to pass something else. A test which correctly rejects most problematic usages can be helpful even if it would occasionally squawk at what should be useful constructs, provided there's a way of telling the compiler "You may not be able to tell this Y is really a X, but I know that it is". – supercat Oct 17 '16 at 17:47
  • @supercat I have no idea what you just said, but it sounds like your design is needlessly complicated. Sum types solve this exact issue. Also not sure why you're talking about references here. – gardenhead Oct 17 '16 at 18:05
  • @gardenhead: What I call a reference would in some systems be called a pointer. How do some types distinguish between "Thing that might support X [but isn't particularly expected to]" and "Thing that is expected to port X", or do you not consider such a distinction helpful? – supercat Oct 17 '16 at 18:15
  • Such a distinction is not helpful. A type either supports an operation or it doesn't. There are no "probably" types. – gardenhead Oct 17 '16 at 18:19
  • @ErikEidt thanks for wonderful explanation !! I know its different ask - but would you help me (or point me to) understand effect system ? Recently I came across article which says that _side effects in a program can be controlled by having type and effect system_. I am finding it difficult to understand "how". – rahulaga-msft Mar 11 '19 at 13:12
17

Oh man, I am excited to try to answer this question as best I can. I hope I can get my thoughts properly in order.

As @Doval mentioned and the questioner pointed out (albeit rudely), you do not really have a type system. You have a system of dynamic checks using tags, which is in general much weaker, and also much less interesting.

The question of "what is a type system" can be quite philosophical, and we could fill a book with different viewpoints on the matter. However, as this is a site for programmers, I'll try to keep my answer as practical as possible (and really, types are extremely practical in programming, despite what some may think).

Overview

Let's start with a seat-of-the-pants understand of what a type system is good for, before diving into the more formal underpinnings. A type system imposes structure on our programs. They tell us how we may plug various functions and expression together. Without structure, programs are untenable and wildly complex, ready to cause harm at the slightest mistake of the programmer.

Writing programs with a type system is like driving a care in mint condition - the brakes work, the doors close safely, the engine is oiled, etc. Writing programs without a type system is like riding a motor cycle without a helmet and with wheels made out of spaghetti. You have absolutely no control over your.

To ground the discussion, let's say we have a language with literal expression num[n] and str[s] that represent the numeral n and the string s, respectively, and primitive functions plus and concat, with the intended meaning. Clearly, you don't want to be able to write something like plus "hello" "world" or concat 2 4. But how can we prevent this? A priori, there is no method to distinguish the numeral 2 from the string literal "world". What we would like to say is that these expressions should be used in different contexts; they have different types.

Languages and Types

Let's step back a bit: what is a programming language? In general, we can divide a programming language into two layers: the syntax and the semantics. These are also called the statics and the dynamics, respectively. It turns out that the type system is necessary to mediate the interaction among these two parts.

Syntax

A program is a tree. Don't be fooled by the lines of text you write on a computer; these are just the human-readable representations of a program. The program itself is an Abstract Syntax Tree. For example, in C we might write:

int square(int x) { 
    return x * x;
 }

That is the concrete syntax for the program (fragment). The tree representation is:

     function square
     /     |       \
   int   int x    return
                     |
                   times
                  /    \
                 x      x

A programming language provides a grammar defining the valid trees of that language (either concrete or abstract syntax may be used). This is usually done using something like BNF notation. I would assume you've done this for the language you've created.

Semantics

OK, we know what a program is, but it's just a static tree structure. Presumably, we want our program to actually compute something. We need semantics.

Semantics of programming languages is a rich field of study. Broadly speaking, there are two approaches: denotational semantics and operational semantics. Denotational semantics describes a program by mapping it into some underlying mathematical structure (e.g. the natural numbers, continuous functions, etc). that provides meaning to our program. Operational semantics, on the contrary, defines a program by detailing how it executes. In my opinion, operational semantics are more intuitive to programmers (including myself), so let's stick with that.

I won't go through how to define a formal operational semantics (the details are a bit involved), but basically, we want rules like the following:

  1. num[n] is a value
  2. str[s] is a value
  3. If num[n1] and num[n2] evaluate to the integers n_1$ and $n_2$, thenplus(num[n1], num[n2])` evaluates to the integer $n_1 + n_2$.
  4. If str[s1] and str[s2] evaluates to the strings s1 and s2, then concat(str[s1], str[s2]) evaluates to the string s1s2.

Etc. The rules are in practice a lot more formal, but you get the gist. However, we soon run into a problem. What happens when we write the following:

concat(num[5], str[hello])

Hm. This is quite a conundrum. We have not defined a rule anywhere for how to concatenate a number with a string. We could attempt to create such a rule, but we intuitively know that this operation is meaningless. We don't want this program to be valid. And thus we are led inexorably to types.

Types

A program is a tree as defined by a language's grammar. Programs are given meaning by execution rules. But some programs are unable to be executed; that is, some programs are meaningless. These programs are ill-typed. Thus, typing characterizes meaningful programs in a language. If a program is well-typed, we can execute it.

Let's give some examples. Again, as with the evaluation rules, I will present typing rules informally, but they can be made rigorous. Here are some rules:

  1. A token of the form num[n] has type nat.
  2. A token of the form str[s] has type str.
  3. If expression e1 has type nat and expression e2 has type nat, then the expression plus(e1, e2) has type nat.
  4. If expression e1 has type str and expression e2 has type str, then expression concat(e1, e2) has type str.

Thus, according to these rules, there is plus(num[5], num[2]) is has type nat, but we cannot assign a type to plus(num[5], str["hello"]). We say a program (or expression) is well-typed if we can assign it any type, and it is ill-typed otherwise. A type system is sound if all well-typed programs can be executed. Haskell is sound; C is not.

Conclusion

There are other views on types. Types in some sense correspondg to intuitionistic logic, and they can also be viewed as objects in category theory. Understanding these connections is fascinating, but it is not essential if one merely wants to write or even design a programming language. However, understanding types as a tool for controlling program formations is essential to programming language design, and development. I have only scratched the surface of what types can express. I hope you think they are worthwhile enough to incorporate into your language.

gardenhead
  • 4,719
  • 2
  • 19
  • 24
  • 4
    +1. The greatest trick the dynamic typing enthusiasts ever pulled was convincing the world you could have "types" without a type system. :-) – ruakh Oct 14 '16 at 22:02
  • 1
    Since [you can't automatically verify anything interesting](https://en.wikipedia.org/wiki/Rice's_theorem) for arbitrary programs, every type system must provide a cast operator (or the moral equivalent), or else it sacrifices Turing completeness. This [includes Haskell](https://hackage.haskell.org/package/base-4.9.0.0/docs/System-IO-Unsafe.html), of course. – Kevin Oct 15 '16 at 04:26
  • 1
    @Kevin I am well aware of Rice's theorem, but it's not as relevant as you might thing. To begin with, a large majority of programs do not require unbounded recursion. If we work in a language that only has primitive recursion, such as Godel's System T, then we can verify interesting properties using a type system, including halting. Most programs in the real world are rather simple - I can't think of the last time I actually had a need for casting. Turing completeness is overrated. – gardenhead Oct 15 '16 at 05:04
  • 2
    Second, Rice's theorem only says that we can't verify a property for *all* programs; we may still be able to verify a property for the lion's share of programs in the wild. So it is true that in a general-purpose programming language, there will always be semantically valid programs that are ill-typed. But this happens very rarely in practice. – gardenhead Oct 15 '16 at 05:06
  • 1
    Personally, I can't think of the last time I wrote a program for serious purposes where said program was complex enough that I actually needed the compiler to do the static type checking for me. These days, it's all just glue code that shells out to some library or binary. Who needs to type-check that? – Kevin Oct 15 '16 at 05:08
  • 9
    “Dynamic typing isn’t really typing” has always seemed to me like classical musicians saying “Pop music isn’t really music”, or Evangelicals saying “Catholics aren’t really Christians”. Yes, static type systems are powerful and fascinating and important, and dynamic typing is something different. But (as other answers describe) there’s a range of useful things beyond static type systems that are traditionally called typing, and that do all share important commonalities. Why the need to insist Our Kind of Typing as the One True Typing? – Peter LeFanu Lumsdaine Oct 15 '16 at 09:23
  • 1
    @PeterLeFanuLumsdaine I agree with you that it's futile to try to get everyone to agree on these terms. Arguing over definitions is pointless and not very interesting. I'm fine with the term "dynamic typing", as long as people actually understand the differences between dynamic and static typing, and why those differences are so important. – gardenhead Oct 15 '16 at 15:17
  • @Kevin I agree with you that software engineering is basically just glue code these days. But as a fellow gluer, I actually think that static typing is more important than ever. The only way to put libraries together is with types. Types are the interfaces between modules. – gardenhead Oct 15 '16 at 15:18
  • 1
    I'm struggling to follow how this answer demonstrates that the OP "doesn't have a type system", or why the rules you discuss can't be implemented in a dynamic context. I suspect it's because you've tried to keep the explanations simple and top level, but this explanation didn't really get me beyond "a type system is a system of types". – IMSoP Oct 15 '16 at 18:01
  • 1
    @IMSoP I'm sorry I couldn't explain it better then. There's only so much room in an answer. I suggest reading a book on type systems to get the full picture. Suffice it to say, the rules I have laid out cannot be implemented with "dynamic types", because without typing rules there is no way to verify that a program is well-typed. You won't find out until it's too late. – gardenhead Oct 15 '16 at 18:11
  • 5
    @IMSoP: for something shorter than a book, Chris Smith’s essay [*What to know before debating type systems*](http://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html) is great, setting out why dynamic typing really is quite different thing from static typing. – Peter LeFanu Lumsdaine Oct 15 '16 at 21:36
  • @PeterLeFanuLumsdaine Thanks, that's an interesting read. I'm going to have to digest it (and the other things I've read today) for a while. I feel like enlightenment is in sight, but not quite in reach... – IMSoP Oct 15 '16 at 22:32
  • 1
    @PeterLeFanuLumsdaine: Consider a language like Java, which has both a static type system and dynamic type tagging (which it calls "runtime-types"). So, question: do people describe Java as "dynamically typed" or as "statically typed"? Answer: as "statically typed", because everyone implicitly agrees that the presence or absence of static typing is massively more significant. As a practical matter, I have nothing against dynamic typing -- I'm a big fan of Perl -- but I think that referring to it as a "type system" is really stretching the terminology. – ruakh Oct 18 '16 at 18:53
  • @ruakh I think the miscommunication comes from people who are not familiar with "type system" as a term of art in computer science theory (or simply happy not using it that way). Perl certainly has *a system of types*, and in plain English it would be perfectly reasonable to call that "a type system". Unfortunately, computer science is riddled with ambiguous language like this, because it attempts to borrow terms from other fields which aren't a perfect fit. – IMSoP Oct 18 '16 at 20:29
  • @IMSoP: You may be right; but I think that in large measure this use of "type system" actually comes from people who *have* heard the term, but simply weren't aware of its full significance. People rarely talk about "value systems", "variable systems", "expression systems", "class systems", etc., even though languages *do* have systems of values, variables, expressions, and classes; I don't think they'd talk about "type systems" as much if it weren't a term of art that sounded familiar. But I could be wrong. :-) – ruakh Oct 18 '16 at 21:08