219

I'm starting to learn Haskell. I'm very new to it, and I am just reading through a couple of the online books to get my head around its basic constructs.

One of the 'memes' that people familiar with it have often talked about, is the whole "if it compiles, it will work*" thing - which I think is related to the strength of the type system.

I'm trying to understand why exactly Haskell is better than other statically typed languages in this regard.

Put another way, I assume in Java, you could do something heinous like bury ArrayList<String>() to contain something that really should be ArrayList<Animal>(). The heinous thing here is is that your string contains elephant, giraffe, etc, and if someone puts in Mercedes - your compiler won't help you.

If I did do ArrayList<Animal>() then, at some later point in time, if I decide my program isn't really about animals, it's about vehicles, then I can change, say, a function that produces ArrayList<Animal> to produce ArrayList<Vehicle> and my IDE should tell me everywhere there is a compilation break.

My assumption is that this is what people mean by a strong type system, but it's not obvious to me why Haskell's is better. Put another way, you can write good or bad Java, I assume you can do the same in Haskell (i.e stuff things into strings/ints that should really be first-class data types).

I suspect I'm missing something important/basic.
I would very happy to be shown the error of my ways!

Trevor Hickey
  • 993
  • 1
  • 9
  • 16
phatmanace
  • 2,445
  • 3
  • 14
  • 11
  • 36
    I'll let people more knowledgeable than I write real answers, but the gist of it is this: staticly-typed languages like C# have a type system that attempt to help you to write *defensible* code; type systems like Haskell's attempt to help you write *correct* (i.e provable) code. The basic principle at work is moving things that can be checked into the compile stage; Haskell checks *more* things at compile time. – Robert Harvey Apr 16 '15 at 18:49
  • 8
    I do not know much about Haskell, but I can speak about Java. While it _appears_ strongly-typed, it still allows you to do "heinous" things as you said. For almost every guarantee Java makes regarding its type system, there is a way around it. –  Apr 17 '15 at 02:41
  • 13
    I don't know why all the answers mention `Maybe` only towards the end. If I had to choose just one thing that more popular languages should borrow from Haskell, this would be it. It's a very simple idea (so not very interesting from a theoretical point of view), but this alone would make our jobs so much easier. – Paul Apr 17 '15 at 09:07
  • 1
    There will be great answers here, but in an attempt to assist, study type signatures. They allow humans and programs to *reason* about programs in a way that will illustrate how Java is in the mushy middle re: types. – Michael Easter Apr 17 '15 at 14:04
  • 7
    For fairness I must point out that "the whole if it compiles, it will work thing" is a *slogan* not a literal statement of fact. Yes, we Haskell programmers know that passing the type checker gives a good chance of correctness, for some limited notions of correctness, but it's certainly not a literally and universally "true" statement! – Tom Ellis Aug 07 '15 at 06:16
  • "Functional programming combines the flexibility and power of abstract mathematics with the intuitive clarity of abstract mathematics." [xkcd.com/1270](https://xkcd.com/1270) – Ajedi32 Aug 17 '15 at 13:48
  • 1
    Nobody seems to have mentioned this very prominently, but... in Java, you can just cast *everything* to `Object` and be done with it. In Haskell, you really, really *cannot* do that. Haskell doesn't really have casts *at all*, which forces you to write out the real types. (`Data.Dynamic` clouds the issue slightly, but that's an optional feature you have to turn on in a pre-type basis...) – MathematicalOrchid Jan 20 '16 at 14:34
  • Well in Java you get static guarantees on types of functions, but if you have any mutability it will make your pure functions context sensitive on the global state and thus cause "action at a distance" and thus you can have errors from context. In Java there is no way to guarantee that any methods are pure functions I believe. My guess is if you could quantify it Java can statically guarantee that 10-30% of your errors as type errors (Looking at JavaScript to TypeScript conversion data its about ~15%), while Haskell guarantees 30-80%. – aoeu256 Mar 25 '21 at 14:13

6 Answers6

249

Here's an unordered list of type system features available in Haskell and either unavailable or less nice in Java (to my knowledge, which is admittedly weak w.r.t. Java)

  • Safety. Haskell's types have pretty good "type safety" properties. This is pretty specific, but it essentially means that values at some type cannot wantonly transform into another type. This is sometimes at odds with mutability (see OCaml's value restriction)
  • Algebraic Data Types. Types in Haskell have essentially the same structure as high school mathematics. This is outrageously simple and consistent, yet, as it turns out, as powerful as you could possibly want. It's simply a great foundation for a type system.
    • Datatype-generic programming. This is not the same as generic types (see generalization). Instead, due to the simplicity of the type structure as noted before it's relatively easy to write code which operates generically over that structure. Later I talk about how something like Equality might be auto-derived for a user-defined type by a Haskell compiler. Essentially the way that it does this is walk over the common, simple structure underlying any user-defined type and match it up between values—a very natural form of structural equality.
  • Mutually recursive types. This is just an essential component of writing non-trivial types.
    • Nested types. This allows you to define recursive types over variables which recurse at different types. For instance, one type of balanced trees is data Bt a = Here a | There (Bt (a, a)). Think carefully about the valid values of Bt a and notice how that type works. It's tricky!
  • Generalization. This is almost too silly to not have in a type system (ahem, looking at you, Go). It's important to have notions of type variables and the ability to talk about code which is independent of the choice of that variable. Hindley Milner is a type system which is derived from System F. Haskell's type system is an elaboration of HM typing and System F is essentially the hearth of generalization. What I mean to say is that Haskell has a very good generalization story.
  • Abstract types. Haskell's story here is not great but also not non-existent. It's possible to write types which have a public interface but a private implementation. This allows us to both admit changes to the implementation code at a later time and, importantly since it's the basis of all operation in Haskell, write "magic" types which have well-defined interfaces such as IO. Java probably actually has a nicer abstract type story, to be honest, but I don't think until Interfaces became more popular was that genuinely true.
  • Parametricity. Haskell values do not have any universal operations. Java violates this with things like reference equality and hashing and even more flagrantly with coercions. What this means is that you get free theorems about types which allow you to know the meaning of an operation or value to a remarkable degree entirely from its type---certain types are such that there can only be a very small number of inhabitants.
  • Higher-kinded types show up all the type when encoding trickier things. Functor/Applicative/Monad, Foldable/Traversable, the entire mtl effect typing system, generalized functor fixpoints. The list goes on and on. There are a lot of things which are best expressed at higher kinds and relatively few type systems even allow the user to talk about these things.
  • Type classes. If you think of type systems as logics—which is useful—then you often are demanded to prove things. In many cases this is essentially line noise: there may be only one right answer and it's a waste of time and effort for the programmer to state this. Typeclasses are a way for Haskell to generate the proofs for you. In more concrete terms, this lets you solve simple "type equation systems" like "At which type are we intending to (+) things together? Oh, Integer, ok! Let's inline the right code now!". At more complex systems you might be establishing more interesting constraints.
    • Constraint calculus. Constraints in Haskell—which are the mechanism for reaching into the typeclass prolog system—are structurally typed. This gives a very simple form of subtyping relationship which lets you assemble complex constraints from simpler ones. The entire mtl library is based on this idea.
    • Deriving. In order to drive the canonicity of the typeclass system it's necessary to write a lot of often trivial code to describe the constraints user-defined types must instantiate. Do to the very normal structure of Haskell types, it is often possible to ask the compiler to do this boilerplate for you.
    • Type class prolog. The Haskell type class solver—the system which is generating those "proofs" I referred to earlier—is essentially a crippled form of Prolog with nicer semantic properties. This means you can encode really hairy things in type prolog and expect them to be handled all at compile time. A good example might be solving for a proof that two heterogenous lists are equivalent if you forget about the order—they're equivalent heterogenous "sets".
    • Multi-parameter type classes and functional dependencies. These are just massively useful refinements to base typeclass prolog. If you know Prolog, you can imagine how much the expressive power increases when you can write predicates of more than one variable.
  • Pretty good inference. Languages based on Hindley Milner type systems have pretty good inference. HM itself has complete inference which means that you never need to write a type variable. Haskell 98, the simplest form of Haskell, already throws that out in some very rare circumstances. Generally, modern Haskell has been an experiment in slowly reducing the space of complete inference while adding more power to HM and seeing when users complain. People very rarely complain—Haskell's inference is pretty good.
  • Very, very, very weak subtyping only. I mentioned earlier that the constraint system from typeclass prolog has a notion of structural subtyping. That is the only form of subtyping in Haskell. Subtyping is terrible for reasoning and inference. It makes each of those problems significantly harder (a system of inequalities instead of a system of equalities). It's also really easy to misunderstand (Is subclassing the same as subtyping? Of course not! But people very frequently confuse that and many languages aid in that confusion! How did we end up here? I suppose nobody ever examines the LSP.)
    • Note recently (early 2017) Steven Dolan published his thesis on MLsub, a variant of ML and Hindley-Milner type inference which has a very nice subtyping story (see also). This doesn't obviate what I've written above—most subtyping systems are broken and have bad inference—but does suggest that we just today may have discovered some promising ways to have complete inference and subtyping play together nicely. Now, to be totally clear, Java's notions of subtyping are in no way able to take advantage of Dolan's algorithms and systems. It requires a rethinking of what subtyping means.
  • Higher rank types. I talked about generalization earlier, but more than just mere generalization it's useful to be able to talk about types which have generalized variables within them. For instance, a mapping between higher order structures which is oblivious (see parametricity) to what those structures "contain" has a type like (forall a. f a -> g a). In straight HM you can write a function at this type, but with higher-rank types you demand such a function as an argument like so: mapFree :: (forall a . f a -> g a) -> Free f -> Free g. Notice that the a variable is bound only within the argument. This means that the definer of the function mapFree gets to decide what a is instantiated at when they use it, not the user of mapFree.
  • Existential types. While higher-rank types allow us to talk about universal quantification, existential types let us talk about existential quantification: the idea that there merely exists some unknown type satisfying some equations. This ends up being useful and to go on for longer about it would take a long while.
  • Type families. Sometimes the typeclass mechanisms are inconvenient since we don't always think in Prolog. Type families let us write straight functional relationships between types.
    • Closed type families. Type families are by default open which is annoying because it means that while you can extend them at any time you cannot "invert" them with any hope of success. This is because you cannot prove injectiveness, but with closed type families you can.
  • Kind indexed types and type promotion. I'm getting really exotic at this point, but these have practical use from time to time. If you'd like to write a type of handles which are either open or closed then you can do that very nicely. Notice in the following snippet that State is a very simple algebraic type which had its values promoted into the type-level as well. Then, subsequently, we can talk about type constructors like Handle as taking arguments at specific kinds like State. It's confusing to understand all the details, but also so very right.

    data State = Open | Closed
    
    data Handle :: State -> * -> * where
      OpenHandle :: {- something -} -> Handle Open a
      ClosedHandle :: {- something -} -> Handle Closed a
    
  • Runtime type representations that work. Java is notorious for having type erasure and having that feature rain on some people's parades. Type erasure is the right way to go, however, as if you have a function getRepr :: a -> TypeRepr then you at the very least violate parametricity. What's worse is that if that's a user-generated function which is used to trigger unsafe coercions at runtime... then you've got a massive safety concern. Haskell's Typeable system allows the creation of a safe coerce :: (Typeable a, Typeable b) => a -> Maybe b. This system relies on Typeable being implemented in the compiler (and not userland) and also could not be given such nice semantics without Haskell's typeclass mechanism and the laws it is guaranteed to follow.

More than just these however the value of Haskell's type system also relates to how the types describe the language. Here are a few features of Haskell which drive value through the type system.

  • Purity. Haskell allows no side effects for a very, very, very wide definition of "side effect". This forces you to put more information into types since types govern inputs and outputs and without side effects everything must be accounted for in the inputs and outputs.
    • IO. Subsequently, Haskell needed a way to talk about side effects—since any real program must include some—so a combination of typeclasses, higher kinded types, and abstract types gave rise to the notion of using a particular, super-special type called IO a to represent side-effecting computations which result in values of type a. This is the foundation of a very nice effect system embedded inside of a pure language.
  • Lack of null. Everyone knows that null is the billion dollar mistake of modern programming languages. Algebraic types, in particular the ability to just append a "does not exist" state onto types you have by transforming a type A into the type Maybe A, completely mitigate the problem of null.
  • Polymorphic recursion. This lets you define recursive functions which generalize type variables despite using them at different types in each recursive call in their own generalization. This is difficult to talk about, but especially useful for talking about nested types. Look back to the Bt a type from before and try to write a function to compute its size: size :: Bt a -> Int. It'll look a bit like size (Here a) = 1 and size (There bt) = 2 * size bt. Operationally that isn't too complex, but notice that the recursive call to size in the last equation occurs at a different type, yet the overall definition has a nice generalized type size :: Bt a -> Int. Note that this is a feature which breaks total inference, but if you provide a type signature then Haskell will allow it.

I could keep going, but this list ought to get you started-and-then-some.

J. Abrahamson
  • 2,767
  • 1
  • 15
  • 8
  • 5
    Your last statements about polymorphic recursion are false. Inferring polymorphic recursive function types is undecidable, so you must provide a type signature somewhere, and at that point the type is *not* fully inferred. Though you could extend GHC inference algorithm with a bounded fixed-point search algorithm to automatically infer *some* polymorphic recursive definitions (I've done something like this for my bachelors. It's surprising how many (useful) examples you can type with just very very few iterations of the fixed point search). – Bakuriu Apr 17 '15 at 06:16
  • 3
    Oh, you're absolutely correct. My mistake---I'll fix that. – J. Abrahamson Apr 17 '15 at 12:41
  • 7
    Null was not a billion-dollar "mistake". There are cases where it is not possible to *statically* verify that pointer will not be dereferenced before any *meaningful* could possibly *exist*; having an attempted dereference trap in such a case is often better than requiring that the pointer initially identify a meaningless object. I think the biggest null-related mistake was having implementations which, given `char *p = NULL;`, will trap on `*p=1234`, but will not trap on `char *q = p+5678;` nor `*q = 1234;` – supercat Apr 17 '15 at 20:19
  • 40
    That's just straight quoted from Tony Hoare: http://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions. While I'm sure there are times when `null` is necessary in pointer arithmetic, I instead interpret that to say that pointer arithmetic is a bad place to host the semantics of your language not that null isn't still a mistake. – J. Abrahamson Apr 17 '15 at 22:27
  • 21
    @supercat, you can indeed write a language without null. Whether to allow it or not is a choice. – Paul Draper Apr 18 '15 at 02:14
  • 6
    @supercat - That problem also exists in Haskell, but in a different form. Haskell is normally lazy and immutable, and therefore allows you to write `p = undefined` so long as `p` is not evaluated. More usefully, you can put `undefined` in some sort of mutable reference, again so long as you don't evaluate it. The more serious challenge is with lazy computations that may not terminate, which of course is undecideable. The main difference is that these are all unambiguously *programming faults*, and are never used to express ordinary logic. – Christian Conkle Aug 07 '15 at 14:58
  • @PaulDraper: If a language has persistable pointers, there are going to exist times when a pointer exists but there is nothing useful it can identify. While one could design a language that would enforce the "Null Object Pattern" [create a useless dummy object and make what would be null pointers point to that instead] or auto-implement the lazy-creation pattern [implied by VB6 "Dim Foo as New Bar" syntax], both behaviors are prone to mask problems which would have been caught by a system that traps null dereferences. – supercat Aug 07 '15 at 15:33
  • @ChristianConkle: I'm not familiar with Haskell, but I often find myself wishing that Java and .NET made a distinction between references that are used to *encapsulate* objects and those which *identify* them. References that encapsulate immutable objects should interpret `null` as a default object; those that encapsulate mutable objects should interpret `null` as a need to lazily-create an object. References that *identify* mutable objects should interpret `null` as not identifying any object; it's generally not necessary to identify immutable objects. – supercat Aug 07 '15 at 15:38
  • @ChristianConkle: By the sound of it, Haskell favors the "encapsulation" semantic, as compared with the "identification" semantic favored by Java and .NET; does Haskell have a linguistic distinction between references that encapsulate versus identify things? – supercat Aug 07 '15 at 15:40
  • 6
    @supercat Haskell lacks reference semantics entirely (this is the notionn of *referential transparency* which entails that everything is preserved by replacing references with their referents). Thus, I think your question is ill-posed. – J. Abrahamson Aug 07 '15 at 15:47
  • @J.Abrahamson: IMHO, the concept of `null` is essential in cases where references identify things. I wouldn't such a concept would generally be necessary in a language where references can *only* encapsulate things, though I suppose such a thing might be required if it's possible to have objects hold mutual references. Out of curiosity, is that possible in Haskell [the ability to create such structures would not be inconsistent with referential transparency, though I'm not sure how one would determine that sub-structures were equivalent]. – supercat Aug 07 '15 at 15:59
  • 3
    @supercat I'm not confident that I understand your terminology. I'd suggest that many mutual reference situations are handled in Haskell through laziness and "tying the knot", but it is true that some mutual references are more clearly stated by explicit mutation. The ST monad can be used to accomplish this. – J. Abrahamson Aug 07 '15 at 16:01
  • @J.Abrahamson: If one had a CD-ROM where a nested directory /foo/bar contained hard-link /foo/bar/next mapped to /foo, such a structure would be equivalent to having a directory /foo/bar/next which contained everything /foo contained except with /foo/bar/next/bar/next containing a either a link to /foo, a link to /foo/bar/next, or else containing another directory which was recursively like /foo, but there would be no way to tell whether /foo and /foo/bar/next were deeply equivalent without being able to *identify* directories. Could Haskell create or handle something analogous? – supercat Aug 07 '15 at 16:58
  • 2
    @supercat "If a language has persistable pointers, there are going to exist times when a pointer exists but there is nothing useful it can identify". Perhaps in the language implementation, that is true, but that is different than the language. The idea of a high-level language is to hide the unpleasantness of low-level languages, whether than is garbage collection, untyped raw data in memory, differences in architecture instruction sets, or invalid memory addresses. – Paul Draper Aug 07 '15 at 17:11
  • @supercat, no there is not. [Is there a sense of object equality in Haskell?](http://stackoverflow.com/questions/27762441/is-there-a-sense-of-object-equality-in-haskell) But the idea is that as soon as you try to use the distinction in any real way, there becomes a different and better solution. I don't have the link, but Paul Phillips (biggest contributor to scalac), lamented that Scala had universal reference equality. It prevents a lot of helpful assumptions, both for programmers and for compiler writers trying to optimize. – Paul Draper Aug 07 '15 at 17:18
  • @PaulDraper: If I were designing a language/framework, I would require that every item define a universal "equality" relation and a universal "equivalence" relation; objects must report themselves equivalent to themselves, but objects could also report themselves equivalent to other objects (every object would include a protected method to determine whether a given reference identified it, and the *default* equivalence relation would chain to that). Such semantics would maintain referential transparency but still allow self-referential object graphs to be analyzed in finite time. – supercat Aug 07 '15 at 17:41
  • @PaulDraper: BTW, I'd also include a protected "makeEquivalentTo(object target)` method whose base implementation would confirm that the `target` was of the *exact* same type as `this`, and which would instruct the GC that some or all of the references to one of the objects may be replaced with references to the other at any time the GC finds convenient, and such replacement would likely improve performance. If two separately-constructed thousand-character strings are compared once and found to be equal, it should be possible to recognize them as such forevermore without rescanning them. – supercat Aug 07 '15 at 17:52
  • 1
    @supercat As Paul notes there is no sense of object equality ("reference equality") in Haskell excepting where you explicitly create your own via tags (and then are personally responsible for ensuring equality behaves properly). In fact, there are types in Haskell which have no (internal) notion of equality at all! – J. Abrahamson Aug 07 '15 at 18:12
  • 2
    @supercat Additionally, there's no way to have referential transparency *and* allow for analysis of self-referential graphs. If you can identify whether two references point to the same place then you are exactly violating referential transparency. – J. Abrahamson Aug 07 '15 at 18:13
  • This isn't to say that such algorithms are impossible to write or use in Haskell... it's just that the base semantics doesn't offer them. Instead, you need to use monadically augmented semantics via something like ST. – J. Abrahamson Aug 07 '15 at 18:16
  • I wrote a series of blog posts on this actually a little while back: http://jspha.com/posts/mutable_algorithms_in_immutable_languges_part_1/ – J. Abrahamson Aug 07 '15 at 18:16
  • @J.Abrahamson: I suppose one might say that rules which allow an object may report itself as equivalent or non-equivalent to another object from which it would be otherwise indistinguishable, but require objects to be equivalent to themselves, aren't totally referentially-transparent, but the only thing I can think of which would be possible under a totally-transparent model that wouldn't be possible under mine would be the construction of a graph which contains a finite number of nodes and cannot be distinguished in finite time from one with an infinite number of nodes. – supercat Aug 07 '15 at 18:30
  • @J.Abrahamson: The fact that `foo->next->next->next` reports itself equivalent to `foo` wouldn't imply that there were only three physical nodes in the chain; the number of physical nodes in the chain could be any multiple of three. If `foo->next` reports itself non-equivalent to `foo` (which would, together with the earlier statement, imply that `foo->next->next->next->next` could not be equivalent to `foo`), that would imply that the number of physical nodes in the chain *must* be a multiple of three. – supercat Aug 07 '15 at 18:41
  • 3
    @supercat If you throw away referential transparency it's not that you lose power (in fact you gain it, certainly) but instead that you lose the ability to reason equationally. This is important for human faculties but also for compiler transforms and optimizations. – J. Abrahamson Aug 07 '15 at 20:07
  • @J.Abrahamson: Can you point me to an example of reasoning or optimization which would work under a fully-transparent model but not under a semi-transparent one? – supercat Aug 07 '15 at 20:15
  • 2
    @supercat Sure, if `f` is a function which potentially examines its arguments to see that they're they have reference equality then the operation `f(mk(), mk()) ===> let x = mk() in f(x, x)` is an invalid transformation. – J. Abrahamson Aug 07 '15 at 21:50
  • @J.Abrahamson: If repeated calls to `mk()` are specified as arbitrarily returning one or more indistinguishable objects, such specification would make the transform valid under the semi-transparent model; I can't think how the transform would be valid under any model if `mk()` returned objects which could be distinguished from each other. The only connection between a semi-transparent equivalence relation and reference equality is the one implied by the nature of an equivalence relation--every object must be considered equivalent to itself. – supercat Aug 07 '15 at 22:19
  • 2
    @supercat sorry, I should have been explicit, in this example mk is a pure function. It returns identical objects by virtue of computing them identically, but they're different objects by virtue of them not necessarily being at the same memory address. Equational semantics must respect the first (indeed that is more or less definitional) but *must ignore* the second. If f can detect sharing then it violates referential transparency and violates equational reasoning. – J. Abrahamson Aug 07 '15 at 22:23
  • @J.Abrahamson: Objects with private fields and the ability to restrict instance generation to their own constructor could use them implement an equivalence relation where no instance would compare equal to any other (i.e. reference equality) even in a fully-referentially-transparent system; would Haskell not allow objects to do that? – supercat Aug 07 '15 at 22:31
  • 2
    @supercat Yes, Haskell disallows this. Every call to a constructor (not that they exist) would produce a structurally equivalent object. They may not have any equality at all, but if one is defined it can be no more fine than one which respects referential transparency. – J. Abrahamson Aug 07 '15 at 22:49
  • **Comments are not for extended discussions. Please continue this conversation in our [Chat Room](http://chat.stackexchange.com/rooms/21/the-whiteboard). Thank you.** – maple_shaft Aug 08 '15 at 16:27
  • What is "the same types as high school mathematics"? I wasn't aware of types in my teens and as I learn more I realise I made many type errors at that time. (Also I think my teachers did too.) – isomorphismes Sep 26 '15 at 08:27
  • "High School Mathematics" is actually sort of a term of art referring to the idea of mathematics involving +, *, -, /, solving for x, simple derivatives, etc. With the exceptions of - and / all those operations work great on ADTs. – J. Abrahamson Sep 26 '15 at 12:58
  • What's nicer about Java's abstract type story? Haskell's module system (wimpy though it is) and class system seem to do a pretty good job. Obviously, languages like SML, Agda, and Racket that have real module systems offer certain advantages, but I wasn't aware of Java being one of those. – dfeuer Apr 13 '18 at 04:57
  • I think this post should not have omitted Dependent Types, as this is likely to be the first thing C++ programmers miss when they start to program in Haskell. – BitTickler Dec 05 '19 at 04:57
  • @BitTickler I'm not familiar with many C++ techniques, but my understanding is that C++ does not have dependent types. It has a system for performing compile-time computation and richly parameterizing types based on that, but cannot introduce true dependency. I'm not an expert on dependent types either, but I tend to understand there as being a few decent tests like: can you model Pi types, can you model Sigma types, and can you model `f = \x: Bool -> if x then "true" else 0` such that `f True : String` and `f False : Int`? That said, Haskell does have limited dependent types and Java doesn't. – J. Abrahamson Dec 06 '19 at 12:36
  • @dfeuer I wrote that to indicate that there's more richness to how interfaces, abstract methods, inheritance, public/protected/private and the like combine to discuss information and functionality hiding. You can get away with all of that in Haskell as well if you're careful, but you're working with simpler tools. In common usage, you get one level of hidden structure through hiding constructors and the (admittedly, very nice) composition of type classes for defining abstract types. I happen to *like* Haskell's solution here better, but Java has a rich set of tools here. – J. Abrahamson Dec 06 '19 at 12:43
82
  • Full type inference. You can actually use complex types ubiquitously without feeling like, "Holy crap, all I ever do is write type signatures."
  • Types are fully algebraic, which makes it very easy to express some complex ideas.
  • Haskell has type classes, which are sort of like interfaces, except you don't have to put all the implementations for one type in the same place. You can create implementations of your own type classes for existing third-party types, without needing access to their source.
  • Higher-order and recursive functions have a tendency to put more functionality into the purview of the type checker. Take filter, for example. In an imperative language, you might write a for loop to implement the same functionality, but you won't have the same static type guarantees, because a for loop has no concept of a return type.
  • Lack of subtypes greatly simplifies parametric polymorphism.
  • Higher-kinded types (types of types) are relatively easy to specify and use in Haskell, which lets you create abstractions around types that are completely unfathomable in Java.
Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 7
    Nice answer - could you give me a simple example of a higher kinded-type, think that would help me understand why it's impossible to do in java. – phatmanace Apr 16 '15 at 20:16
  • 3
    There are some good examples [here](http://stackoverflow.com/questions/21170493/when-are-higher-kinded-types-useful). – Karl Bielefeldt Apr 16 '15 at 20:27
  • 3
    Pattern matching is also really important, it means you can use the type of an object to make decisions super easily. – Benjamin Gruenbaum Apr 16 '15 at 21:09
  • 2
    @BenjaminGruenbaum I don't think I'd call that a type system feature. – Doval Apr 16 '15 at 21:14
  • @Doval it certainly is a reason it's popular though, working with data types with pattern matching on type is one of the key things that make using a type system like that nice. It's part of what makes it revered. – Benjamin Gruenbaum Apr 16 '15 at 21:15
  • 1
    @BenjaminGruenbaum it's a nice feature, but it's not related to static types. There are untyped languages with pattern matching, as well as typed languages without pattern matching. – Phil Apr 16 '15 at 21:41
  • 1
    I'd also add that Java's `void` type (instead of `unit`) and lack of tuples gets in the way of function composition. (You can roll your own tuple class if you want, but a method argument list isn't compatible with it.) – Doval Apr 16 '15 at 22:26
  • 1
    I believe you can add an other big item to the list: Haskell is a functional language. This means that you can derive some properties from the type of a function that tell you something about what that function can do and what the function cannot do. In an imperative language a function can have arbitrary side-effects independently of its type. This is a big difference if you want your type system to help proving correctness versus avoiding only obvious bugs. – Bakuriu Apr 17 '15 at 06:12
  • 3
    While ADTs and HKTs are definetly part of the answer I doubt anyone asking this question is going to know why they are useful, I suggest that both sections need to be expanded to explain this – jk. Apr 17 '15 at 06:38
  • Nitpick: kinds themselves could be described as "types of types", but higher-kinded types can't. – kini Apr 17 '15 at 10:36
69
a :: Integer
b :: Maybe Integer
c :: IO Integer
d :: Either String Integer

In Haskell: an integer, an integer that might be null, an integer whose value came from the outside world, and an integer that might be a string instead, are all distinct types - and the compiler will enforce this. You cannot compile a Haskell program which fails to respect these distinctions.

(You can, however, omit the type declarations. In most cases, the compiler can determine the most general type for your variables which will result in a successful compilation. Isn't that neat?)

WolfeFan
  • 1,073
  • 6
  • 7
  • 13
    +1 while this answer is not complete I think it is much better pitched at the level of the question – jk. Apr 17 '15 at 08:19
  • 1
    +1 Though it would help explaining that other languages also have `Maybe` (e.g. Java's `Optional` and Scala's `Option`), but in those languages it's a half-baked solution, since you can always assign `null` to a variable of that type, and have your program explode at run-time. This cannot happen with Haskell [1], because *there is no null value*, so you simply cannot cheat. ([1]: actually, you can generate a similar error to a NullPointerException using partial functions such as `fromJust` when you have a `Nothing`, but those functions are probably frowned upon). – Andres F. Apr 17 '15 at 13:25
  • 3
    "an integer whose value came from the outside world" - wouldn't `IO Integer` be closer to 'subprogram which, when executed, gives integer'? As a) in `main = c >> c` the value returned by first `c` may be different then by second `c` while `a` will have the same value regardless of its position (as long as we are in single scope) b) there are types which denotes values from outside world to enforce its sanatisation (i.e. not to put them directly but first check if input from user is correct/not malicious). – Maciej Piechotka Apr 17 '15 at 16:17
  • 5
    Maciej, that would be more accurate. I was striving for simplicity. – WolfeFan Apr 17 '15 at 16:20
33

Lots of people have listed good things about Haskell. But in answer to your specific question "why does the type system make programs more correct?", I suspect the answer is "parametric polymorphism".

Consider the following Haskell function:

foobar :: x -> y -> y

There is literally only one possible way to implement this function. Just by the type signature, I can tell precisely what this function does, because there is only one possible thing it can do. [OK, not quite, but almost!]

Stop and think about that for a moment. That's actually a really big deal! It means if I write a function with this signature, it's actually impossible for the function to do anything other than what I intended. (The type signature itself can still be wrong, of course. No programming language will ever prevent all bugs.)

Consider this function:

fubar :: Int -> (x -> y) -> y

This function is impossible. You literally cannot implement this function. I can tell that just from the type signature.

As you can see, a Haskell type signature tells you a hell of a lot!


Compare to C#. (Sorry, my Java is a little rusty.)

public static TY foobar<TX, TY>(TX in1, TY in2)

There are a couple of things this method could do:

  • Return in2 as the result.
  • Loop forever, and never return anything.
  • Throw an exception, and never return anything.

Actually, Haskell has these three options too. But C# also gives you the additional options:

  • Return null. (Haskell does not have null.)
  • Modify in2 before returning it. (Haskell does not have in-place modification.)
  • Use reflection. (Haskell does not have reflection.)
  • Perform multiple I/O actions before returning a result. (Haskell won't let you perform I/O unless you declare that you perform I/O here.)

Reflection is a particularly big hammer; using reflection, I can construct a new TY object out of thin air, and return that! I can inspect both of the objects, and do different actions depending on what I find. I can make arbitrary modifications to both of the objects passed in.

I/O is a similarly big hammer. The code could be displaying messages to the user, or opening database connections, or reformatting your harddisk, or anything, really.


The Haskell foobar function, by contrast, can only take some data and return that data, unchanged. It cannot "look at" the data, because its type is unknown at compile-time. It cannot create new data, because... well, how do you construct data of any possible type? You'd need reflection for that. It can't perform any I/O, because the type signature doesn't declare that I/O is being performed. So it can't interact with the filesystem or the network, or even running threads in the same program! (I.e., it is 100% guaranteed thread-safe.)

As you can see, by not letting you do a whole bunch of stuff, Haskell is allowing you to make very strong guarantees about what your code actually does. So tight, in fact, that (for really polymorphic code) there's usually only one possible way that the pieces can fit together.

(To be clear: It's still possible to write Haskell functions where the type signature doesn't tell you much. Int -> Int could be just about anything. But even then, we know that the same input will always produce the same output with 100% certainty. Java doesn't even guarantee that!)

  • 4
    +1 Great answer! This is very powerful and often underappreciated by newcomers to Haskell. By the way, a simpler "impossible" function would be `fubar :: a -> b`, wouldn't it? (Yes, I'm aware of `unsafeCoerce`. I assume we aren't talking about anything with "unsafe" in its name, and neither should newcomers worry about it! :D) – Andres F. Apr 17 '15 at 13:38
  • There are lots of simpler type signatures that you can't write, yes. For example, `foobar :: x` is pretty unimplementable... – MathematicalOrchid Apr 17 '15 at 14:52
  • Actually, you _can't_ make pure code thread-unsafe, but you can still make it multi-threaded. Your options are "before you evaluate this, evaluate this", "when you evaluate this, you might want to evaluate this in a separate thread as well", and "when you evaluate this, you might want to evaluate this as well, possibly in a separate thread". The default is "do as you wish", which essentially means "evaluate as late as possible". – John Dvorak Apr 19 '15 at 14:52
  • More prosaically you can call instance methods on in1 or in2 that have side effects. Or you can modify global state (which, granted, is modeled as an IO action in Haskell, but might not be what most people think of as IO). – Doug McClean Aug 08 '15 at 00:21
  • It looks to me like `x -> y -> y` should be a projection onto the right term? Why isn't this doable with an integer? – isomorphismes Oct 03 '15 at 17:45
  • 2
    @isomorphismes The type `x -> y -> y` is perfectly implementable. The type `(x -> y) -> y` is not. The type `x -> y -> y` takes two inputs, and returns the second one. The type `(x -> y) -> y` takes *a function* that operates on `x`, and somehow has to make a `y` out of that... – MathematicalOrchid Jan 20 '16 at 14:05
18

A related SO question.

I assume you can do the same in haskell (i.e stuff things into strings/ints that should really be first class datatypes)

No, you really can't - at least not in the same way that Java can. In Java, this sort of thing happens:

String x = (String)someNonString;

and Java will happily try and cast your non-String as a String. Haskell does not allow this sort of thing, eliminating a whole class of runtime errors.

null is part of the type system (as Nothing) so needs to be explicitly asked for and handled, eliminating a whole other class of runtime errors.

There's a bunch of other subtle benefits too - especially around reuse and type-classes - that I don't have the expertise to know well enough to communicate.

Mostly though, it is because Haskell's type system allows a lot of expressiveness. You can do a whole lot of stuff with only a few rules. Consider the ever-present Haskell tree:

data Tree a = Leaf a | Branch (Tree a) (Tree a) 

You've defined an entire generic binary tree (and two data constructors) in a fairly readable one line of code. All just using a few rules (having sum types and product types). That is 3-4 code files and classes in Java.

Especially among those prone to revere type systems, this sort of conciseness/elegance is highly valued.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • I understood only NullPointerExceptions from your answer. Could you include more examples? – Jesvin Jose Apr 17 '15 at 07:07
  • 2
    Not necessarily true, [JLS §5.5.1](https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.5.1): _If T is a class type, then either |S| <: |T|, or |T| <: |S|. Otherwise, a compile-time error occurs._ So the compiler will not allow you to cast inconvertible types - there are obviously ways around it. – Boris the Spider Apr 17 '15 at 07:38
  • In my opinion the simplest way to put the advantages of type classes is that they're like `interface`s that can be added after-the-fact, and they don't "forget" the type that's implementing them. That is, you can ensure that two arguments to a function have the same type, unlike with `interface`s where two `List`s may have different implementations. You could technically do something very similar in Java by adding a type parameter to every interface, but 99% of existing interfaces don't do it and you'll confuse the hell out of your peers. – Doval Apr 17 '15 at 11:49
  • 2
    @BoristheSpider True, but casting exceptions almost always involve downcasting from a superclass to a subclass or from an interface to a class, and it's not unusual for the superclass to be `Object`. – Doval Apr 17 '15 at 13:26
  • 2
    I think the point in the question about strings is not to do with casting and runtime type errors, but the fact that if you don't want to *use* types, Java won't make you - as in, actually *storing* your data in serialized form, abusing strings as an ad-hoc `any` type. Haskell won't stop you doing this either, since ...well, it has strings. Haskell can give you tools, it can't forcibly *stop* you from doing stupid things if you insist on Greenspunning enough of an interpreter to reinvent `null` in a nested context. No language can. – Alex Celeste Apr 18 '15 at 01:24
  • Apropos casting: An example of safe "casting" in Haskell: [ConstraintKinds explained in a super simple example](http://stackoverflow.com/a/31328543/309483) – Janus Troelsen Jan 30 '17 at 21:00
1

One of the 'memes' that people familiar with it have often talked about, is the whole "if it compiles, it will work*" thing - which I think is related to the strength of the type system.

This is mostly true with small programs. Haskell prevents you from making mistakes that are easy in other languages (e.g. comparing an Int32 and a Word32 and something explodes), but it does not prevent you from all mistakes.

Haskell does actually make refactoring a lot easier. If your program was previously correct and it typechecks, there's a fair chance it will still be correct after minor emendations.

I'm trying to understand why exactly Haskell is better than other statically typed languages in this regard.

Types in Haskell are fairly lightweight, in that it's easy to declare new types. This is in contrast to a language like Rust, where everything is just a bit more cumbersome.

My assumption is that this is what people mean by a strong type system, but it's not obvious to me why Haskell's is better.

Haskell has many features beyond simple sum and product types; it has universally quantified types (e.g. id :: a -> a) as well. You can also create record types containing functions, which is quite different from a language such as Java or Rust.

GHC can also derive some instances based on types alone, and since the advent of generics, you can write functions that are generic among types. This is quite convenient, and it is more fluent than the same in Java.

Another difference is that Haskell tends to have relatively good type errors (as of writing, at least). Haskell's type inference is sophisticated, and it is quite rare that you will need to provide type annotations in order to get something to compile. This is in contrast with Rust, where type inference can sometimes require annotations even when the compiler could in principle deduce the type.

Finally, Haskell has typeclasses, among them the famous monad. Monads happen to be a particularly nice way to handle errors; they basically give you nearly all the convenience of null without the horrid debugging and without giving up any of your type safety. So the ability to write functions on these types actually matters quite a bit when it comes to encouraging us to use them!

Put another way, you can write good or bad Java, I assume you can do the same in Haskell

That is perhaps true, but it is missing a crucial point: the point where you start shooting yourself in the foot in Haskell is further along than the point when you start shooting yourself in the foot in Java.