247

Large codebases are more difficult to maintain when they are written in dynamic languages. At least that's what Yevgeniy Brikman, lead developer bringing the Play Framework to LinkedIn says in a video presentation recorded at JaxConf 2013 (minute 44).

Why does he say this? What are the reasons?

samthebrand
  • 368
  • 2
  • 12
  • 27
Jus12
  • 2,501
  • 4
  • 16
  • 8
  • 2
    Have you cross checked with other sources? – mouviciel Dec 17 '13 at 10:16
  • 42
    I'm the author of the Play Framework talk mentioned in the question. I was going to write a reply, but [Eric Lippert's answer below](http://programmers.stackexchange.com/a/221658/5939) says it better than I could have, so I upvoted it instead and recommend everyone reads it. – Yevgeniy Brikman Feb 09 '14 at 21:00
  • There is alot of bias here as static languages have so much boiler plate that they invariably end up large. I have first hand experience of this: http://stackoverflow.com/questions/5232654/java-to-clojure-rewrite – yazz.com Nov 26 '14 at 09:30
  • 1
    @Zubair Static does *not* mean boiler plate code. Have you checked out Scala? You experimented with Clojure; thats why your view is biased. – Jus12 Nov 26 '14 at 11:43
  • @Jus12 - Yes, that is a good point. It probably makes sense to make the question more specific, as Scala is so different to Java for example, with Java being so verbose – yazz.com Nov 27 '14 at 08:08
  • 3
    Does this comparison account for the fact that a large codebase tends to be become larger when written in a statically typed language. In other words, comparing a 1 million line codebase in Java vs Ruby is a biased comparison, since the Ruby probably does a lot more. The correct comparison would perhaps be a 1 million line Ruby codebase vs a 5 million line Java codebase. Is the Java codebase still more maintainable? I suppose not. – Kartick Vaddadi Apr 23 '16 at 11:12
  • Why is this closed as duplicate? It's the better QA page (with tremendously better answers) than the here linked "duplicate". I think the closing as duplicate shall be turned around: thus redirecting people from the old but "meh" QA page to this better QA page even tho when common duplicate-flagging rules dictate otherwise – BlueWizard Dec 16 '18 at 23:01

6 Answers6

608

dynamic languages make for harder to maintain large codebases

Caveat: I have not watched the presentation.

I have been on the design committees for JavaScript (a very dynamic language), C# (a mostly static language) and Visual Basic (which is both static and dynamic), so I have a number of thoughts on this subject; too many to easily fit into an answer here.

Let me begin by saying that it is hard to maintain a large codebase, period. Big code is hard to write no matter what tools you have at your disposal. Your question does not imply that maintaining a large codebase in a statically-typed language is "easy"; rather the question presupposes merely that it is an even harder problem to maintain a large codebase in a dynamic language than in a static language. That said, there are reasons why the effort expended in maintaining a large codebase in a dynamic language is somewhat larger than the effort expended for statically typed languages. I'll explore a few of those in this post.

But we are getting ahead of ourselves. We should clearly define what we mean by a "dynamic" language; by "dynamic" language I mean the opposite of a "static" language.

A "statically-typed" language is a language designed to facilitate automatic correctness checking by a tool that has access to only the source code, not the running state of the program. The facts that are deduced by the tool are called "types". The language designers produce a set of rules about what makes a program "type safe", and the tool seeks to prove that the program follows those rules; if it does not then it produces a type error.

A "dynamically-typed" language by contrast is one not designed to facilitate this kind of checking. The meaning of the data stored in any particular location can only be easily determined by inspection while the program is running.

(We could also make a distinction between dynamically scoped and lexically scoped languages, but let's not go there for the purposes of this discussion. A dynamically typed language need not be dynamically scoped and a statically typed language need not be lexically scoped, but there is often a correlation between the two.)

So now that we have our terms straight let's talk about large codebases. Large codebases tend to have some common characteristics:

  • They are too large for any one person to understand every detail.
  • They are often worked on by large teams whose personnel changes over time.
  • They are often worked on for a long time, with multiple versions.

All these characteristics present impediments to understanding the code, and therefore present impediments to correctly changing the code. In short: time is money; making correct changes to a large codebase is expensive due to the nature of these impediments to understanding.

Since budgets are finite and we want to do as much as we can with the resources we have, the maintainers of large codebases seek to lower the cost of making correct changes by mitigating these impediments. Some of the ways that large teams mitigate these impediments are:

  • Modularization: Code is factored into "modules" of some sort where each module has a clear responsibility. The action of the code can be documented and understood without a user having to understand its implementation details.
  • Encapsulation: Modules make a distinction between their "public" surface area and their "private" implementation details so that the latter can be improved without affecting the correctness of the program as a whole.
  • Re-use: When a problem is solved correctly once, it is solved for all time; the solution can be re-used in the creation of new solutions. Techniques such as making a library of utility functions, or making functionality in a base class that can be extended by a derived class, or architectures that encourage composition, are all techniques for code re-use. Again, the point is to lower costs.
  • Annotation: Code is annotated to describe the valid values that might go into a variable, for instance.
  • Automatic detection of errors: A team working on a large program is wise to build a device which determines early when a programming error has been made and tells you about it so that it can be fixed quickly, before the error is compounded with more errors. Techniques such as writing a test suite, or running a static analyzer fall into this category.

A statically typed language is an example of the latter; you get in the compiler itself a device which looks for type errors and informs you of them before you check the broken code change into the repository. A manifestly typed language requires that storage locations be annotated with facts about what can go into them.

So for that reason alone, dynamically typed languages make it harder to maintain a large codebase, because the work that is done by the compiler "for free" is now work that you must do in the form of writing test suites. If you want to annotate the meaning of your variables, you must come up with a system for doing so, and if a new team member accidentally violates it, that must be caught in code review, not by the compiler.

Now here is the key point I have been building up to: there is a strong correlation between a language being dynamically typed and a language also lacking all the other facilities that make lowering the cost of maintaining a large codebase easier, and that is the key reason why it is more difficult to maintain a large codebase in a dynamic language. And similarly there is a correlation between a language being statically typed and having facilities that make programming in the larger easier.

Let's take JavaScript for example. (I worked on the original versions of JScript at Microsoft from 1996 through 2001.) The by-design purpose of JavaScript was to make the monkey dance when you moused over it. Scripts were often a single line. We considered ten line scripts to be pretty normal, hundred line scripts to be huge, and thousand line scripts were unheard of. The language was absolutely not designed for programming in the large, and our implementation decisions, performance targets, and so on, were based on that assumption.

Since JavaScript was specifically designed for programs where one person could see the whole thing on a single page, JavaScript is not only dynamically typed, but it also lacks a great many other facilities that are commonly used when programming in the large:

  • There is no modularization system; there are no classes, interfaces, or even namespaces. These elements are in other languages to help organize large codebases.
  • The inheritance system -- prototype inheritance -- is both weak and poorly understood. It is by no means obvious how to correctly build prototypes for deep hierarchies (a captain is a kind of pirate, a pirate is a kind of person, a person is a kind of thing...) in out-of-the-box JavaScript.
  • There is no encapsulation whatsoever; every property of every object is yielded up to the for-in construct, and is modifiable at will by any part of the program.
  • There is no way to annotate any restriction on storage; any variable may hold any value.

But it's not just the lack of facilities that make programming in the large easier. There are also features that make it harder.

  • JavaScript's error management system is designed with the assumption that the script is running on a web page, that failure is likely, that the cost of failure is low, and that the user who sees the failure is the person least able to fix it: the browser user, not the code's author. Therefore as many errors as possible fail silently and the program keeps trying to muddle on through. This is a reasonable characteristic given the goals of the language, but it surely makes programming in the larger harder because it increases the difficulty of writing test cases. If nothing ever fails it is harder to write tests that detect failure!

  • Code can modify itself based on user input via facilities such as eval or adding new script blocks to the browser DOM dynamically. Any static analysis tool might not even know what code makes up the program!

  • And so on.

Clearly it is possible to overcome these impediments and build a large program in JavaScript; many multiple-million-line JavaScript programs now exist. But the large teams who build those programs use tools and have discipline to overcome the impediments that JavaScript throws in your way:

  • They write test cases for every identifier ever used in the program. In a world where misspellings are silently ignored, this is necessary. This is a cost.
  • They write code in type-checked languages and compile that to JavaScript, such as TypeScript.
  • They use frameworks that encourage programming in a style more amenable to analysis, more amenable to modularization, and less likely to produce common errors.
  • They have good discipline about naming conventions, about division of responsibilities, about what the public surface of a given object is, and so on. Again, this is a cost; those tasks would be performed by a compiler in a typical statically-typed language.

In conclusion, it is not merely the dynamic nature of typing that increases the cost of maintaining a large codebase. That alone does increase costs, but that is far from the whole story. I could design you a language that was dynamically typed but also had namespaces, modules, inheritance, libraries, private members, and so on -- in fact, C# 4 is such a language -- and such a language would be both dynamic and highly suited for programming in the large.

Rather it is also everything else that is frequently missing from a dynamic language that increases costs in a large codebase. Dynamic languages which also include facilities for good testing, for modularization, reuse, encapsulation, and so on, can indeed decrease costs when programming in the large, but many frequently-used dynamic languages do not have these facilities built in. Someone has to build them, and that adds cost.

samthebrand
  • 368
  • 2
  • 12
  • 27
Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
  • 1
    Eric, can you provide details on where the strong correlations you mention come from? – Thiago Silva Dec 18 '13 at 04:42
  • 44
    @ThiagoSilva: Languages are purpose-built. If you are building a language for programming in the large you are highly likely to add all the features that make programming in the large cheaper, which then entails a lot of "ceremony" and restrictions in what you can write. If you are building a language for, say, making the monkey dance when you mouse over it then you want a one-line program to be one line. Dynamic typing is natural for such a scenario because it is gives a lot of flexibility to the developer. – Eric Lippert Dec 18 '13 at 15:53
  • I agree when you point where cost increases for dynamic langs. I can see that there *might* be a correlation of some dynamic langs and the lack of corresponding reliable tools available to tame correctness. But those things by themselves do not indicate in any way that dynamic lang based systems are more expensive / hard-to-maintain than "static" counterparts (what about the points where "static" systems cost/complexity increases?). "Strong correlation" is a strong statement so I was looking for data. The only related study I know of is this: http://goo.gl/FnszYZ – Thiago Silva Dec 18 '13 at 19:15
  • 4
    It would be interesting to compare this with other dynamic languages than JavaScript where all of the sticking points which you mentioned are far less critical and there's much better tool support. For example, I primarily develop in Python where I have static analysis tools, a robust module system, culture of unit testing, etc. and the language has been adding features like annotation support to improve this. There's an interesting discussion in how different languages have adopted features to encourage that kind of practice, irrespective of the static/dynamic divide. – Chris Adams Jan 27 '14 at 18:27
  • 6
    One minor correction. You stated that "every property of every object is yielded up to the for-in construct." This is not true in ECMAScript 5, which adds the concept of property enumerability. For instance, [`Object.defineProperty`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/defineProperty#Description) will allow you to create a non-enumerable property which is not yielded to for-in. – Joshua Clanton Jan 27 '14 at 19:27
  • 3
    @JoshuaClanton And anyway, you can easily get encapsulation using closures: https://gist.github.com/briangordon/8656177 – Rag Jan 27 '14 at 20:05
  • 5
    @JoshuaClanton: Thanks for that clarification; I have not been keeping up with the changes in the specification since I stopped being on the committee in 2001. – Eric Lippert Jan 27 '14 at 20:38
  • Hey @EricLippert, great answer, but I was wondering if you could clarify what you mean by "weak" in "...prototype inheritance -- is both weak..."? I think it is a fairly widely held view since the introduction of self even that prototype based inheritance is more expressive than class based. In particular [Delegation is inheritance](http://dl.acm.org/citation.cfm?id=38820) suggests maybe there is more to it than only an ontology (the "deep hierarchies" that don't come out-of-the-box)? – kieran Jan 27 '14 at 22:38
  • 3
    "a language that was dynamically typed but also had namespaces, modules, inheritance, libraries, private members, and so on . . . such a language would be both dynamic and highly suited for programming in the large". Python & Ruby? – B Robster Jan 27 '14 at 22:45
  • 2
    @kieran: "weak" is a bad choice of words on my part, since it implies a criticism without stating it, or even describing what would be "strong". Prototype inheritance is I think *conceptually* quite reasonable, but I think the way it has been implemented in JS is confusing. – Eric Lippert Jan 27 '14 at 23:10
  • 5
    @kieran: As an example of what I mean, consider `function Reptile(){ } Reptile.prototype = new Object(); var lizard = new Reptile(); var b = lizard.constructor == Reptile;` Even sophisticated JS developers would be forgiven for assuming that `b` is `true`, but thanks to the crazy way JS does prototypes, that's `false`. – Eric Lippert Jan 27 '14 at 23:12
  • 18
    While I generally agree with your opinions, you're way off here. Modern JavaScript is _very very_ different from what Microsoft did in 1996-2001. You have module systems (AMD, CommonJS) , you have encapsulation by convention like in Python (or closures, but I don't find that necessary), there are ways to annotate storage on variables (by using getters/setters for example) and the inheritance system is a lot better understood than it was 13 years ago. It's trivial to build strong robust applications in JavaScript today. Your example should read "_let's take JavaScript from 2001 for example_". – Benjamin Gruenbaum Jan 28 '14 at 08:22
  • 1
    Aside from JavaScript, I agree with the general sentiment of the answer (that is "Typing is not the issue but poor infrastructure is"). Only now languages like JavaScript and Python are getting things like proper DI containers and modern testing frameworks and even more importantly - only now the community is adopting these concepts. Evolved design practices in the community are just as important as the language itself (if not more) and the adoption rates of best practices are certainly something that has been really improving in Python, Ruby and JavaScript these last few years. – Benjamin Gruenbaum Jan 28 '14 at 08:25
  • 1
    EEK. Even `new Object()` is bad javascript. And the rationale is... be sure to set `prototype.constructor` when you're providing the default one, lest one won't be set. – John Dvorak Jan 28 '14 at 08:29
  • 3
    Incredibly well explained. What was described explains well how to compensate for added complexity in dynamic languages – Will Buck Jan 28 '14 at 11:10
  • Also, I just want to add that hearing an uninformed answer from someone usually as knowledgable as you is very surprising and somewhat disappointing. There are plenty of problems with JS but you simply listed ones that do not exist today. – Benjamin Gruenbaum Jan 28 '14 at 13:39
  • 60
    @BenjaminGruenbaum: Your criticisms are warranted; however, I would suggest to you that a close look at many large modern JS codebases finds many examples of the sorts of problems I cite; just because disciplines exist does not imply that everyone knows about them and uses them. And you might also be surprised at the number of times a JS loop contains a `braek;` or `cotninue;` statement -- perfectly legal! Doesn't do what was intended. Still got checked in. – Eric Lippert Jan 28 '14 at 14:57
  • 8
    @EricLippert Please name some such codebases. JavaScript codebases I've worked with would have caught that error at three different steps (before check in). First in the IDE, second with static analysis commit hooks (jshint for example) and third in the CI server. JavaScript has not changed much, but the culture certainly has. People write poor code in any language when they do not understand the underlying issues and actual challenges (it's also a repeating motif in your blog, I'm a fan). How many C# code bases abuse `GC.Collect()`? Or casting? It is mainly education that promotes good code. – Benjamin Gruenbaum Jan 28 '14 at 15:18
  • 16
    Benjamin, if you are talking about a mature development team who uses IDE, static analysis tool and CI server, you can definitely workaround the limitations of a dynamically typed language. But in a world where there are teams who haven't moved to DVCS and CI server tools, having the language enforce the rules while you are writing code is tremendously useful. And jslint is useful, but it can NEVER be as powerful as a static analysis tool targeting a statically typed language simply because there is not enough type information to analyze. – SolutionYogi Jan 28 '14 at 16:45
  • 10
    And you wanted an example, check this commit: https://github.com/emberjs/ember.js/commit/0324551b3b09ce8bd6eb738f07a9622d730fd0bc The developer forgot to follow certain rule and had to go back and fix it. In statically typed language, you can enforce certain set of rules so that a developer never 'forgets'. Mind you, I am not trying to knock this developer. Developers are humans at the end and sometimes we will 'forget' to do certain things. But if you have huge set of rules and a decent sized team, it's simply better to have a compiler/static analyzer enforce those rules. – SolutionYogi Jan 28 '14 at 16:49
  • 3
    @SolutionYogi so your point is that a _really bad_ dev team benefits from prohibitive defaults? Can't disagree with that, also, don't care much about bad teams. My point was that these issues are an _education problem_ and not a language problem. JSHint is a _very basic_ static analysis tools (there are much better ones available like closure compiler for example, jshint was given as an example for `braek` or `conitnue`). In your example the developer forgot something in a configuration object (like a `Dictionary` in C#) - I fail to how a see static type check would have helped (a test would). – Benjamin Gruenbaum Jan 28 '14 at 17:06
  • 7
    @BenjaminGruenbaum It depends on your definition of 'prohibitive defaults'. At the end of the day, purpose of any type system is to let you write set of rules/constraints to solve your business problem. You are suggesting that unit testing and tools like static analysis, CI should be used to enforce these rules. I am suggesting that one should use every available tool at their disposal (including static type system). I personally find that static type system provides me with more tools to design/enforce rules. Yes, I do give up some flexibility but the trade off is worth for me and my team. – SolutionYogi Jan 28 '14 at 17:12
  • 3
    I can not give you a direct C# equivalent for the code I linked, but if I expected all set of 'Meta' classes to have property called 'proto', I will design an interface called 'IMetaProvider' and let compiler ensure that every implementation provides definition for 'proto'. – SolutionYogi Jan 28 '14 at 17:13
  • 2
    Yes, but in the code above in JavaScript is the equivalent of forgetting a field in `IMetaProvider`, something the typecheck would not have found for you. My main issue with these type systems is that they do not _really_ enforce behavior and enforcing structure is _usually_ not really that big of a deal. The thing is while unit tests for behavior and CI is something you'd do _anyway_, having to slow down to specify types can be frustrating and type inference is not nearly good enough (yet!). Anyway, I think we've derailed from the issue of JavaScript enough :) – Benjamin Gruenbaum Jan 28 '14 at 17:47
  • 2
    @EricLippert: With Javascript Harmony (pretty much ECMAScript vNext), we get block scoping for variables with `let` (instead of `var`'s function scope), modules (with encapsulation and the whole shebang), iterators, generators, arrow function syntax, destructuring assignment, and other goodies. With that, and with "use strict" semantics (which affect eval, undeclared variables, and other issues with the original JavaScript), and with full suites of unit testing tools, how do you feel about modern server-side JavaScript? Do you feel it's now easier to maintain large projects with? – configurator Jan 28 '14 at 22:03
  • 9
    @configurator: Since that was the intention of the design committee in adding almost all of those features, I would certainly hope so! – Eric Lippert Jan 28 '14 at 22:55
  • 3
    @BenjaminGruenbaum I completely agree that the type system can only capture structure and not the behavior (for now). We disagree on how much value one can get by enforced structure. Also, I feel that static type system tools will be getting smarter and help us enforce behavior as well. E.g. look at Code Contracts and Pex. Please look at this: http://www.pexforfun.com/ Pex will not be possible without static type system. – SolutionYogi Jan 28 '14 at 23:38
  • Still surprised that this discussion still rages on. Nothing wrong with typing. It's just that programmers suck at it. "Proper" dynamic languages are *not* untyped, but just smarter doing it (that is, the runtime environment has more information at its disposal). Optimising code is much easier with dynamic languages as compiler writers will confirm. And as to the readability … well that is an argument that confuses me since my experience is the opposite, having worked in both C-syntax languages and Smalltalk. – robject Feb 11 '14 at 06:07
  • 2
    @BenjaminGruenbaum PHP devs love writing breokn code https://github.com/search?l=php&q=braek%3B&ref=cmdform&type=Code – Ark-kun Apr 07 '14 at 17:49
  • 3
    Clearly a certain number of the posters here are still thinking of "static typing" as equivalent to "weak manifest typing" like in C or Java... There are much better languages now like Haskell, Scala, OCaml and so on (and their more recent descendants) that provides the programmer with a very strong type system that allows to enforce not only structures but some behaviors too and do that while minimizing the burden on the programmer of specifying types by inferring most if not all types (though specifying top-level type signature is good documentation as well as good practice). – Jedai Jun 28 '14 at 16:40
  • C# is evolving in this direction but is not quite there yet. (there are other advantages to using C# though...) – Jedai Jun 28 '14 at 16:41
  • 44
    I think this post left out a major benefit of static typing- the ability to refactor. Try renaming a variable in a million line JS application or finding all references of a variable. Without static analysis your IDE cannot possibly do an effective job with such an operation. – MgSam Jun 29 '14 at 13:39
  • I always suprised how often typo's are brought up in these discussions about static typing. Typo's is not a practical problem - it means that you're code actually has **never** properly run. No one writes code and then doesn't run it's code path at least once. Renaming and refactoring on the other hand is for me by far the biggest argument for static typing. – Dirk Boer Nov 16 '15 at 08:39
  • This is an insightful post. One point you mention is reuse, and statically typed languages often have rigid type specifications. If you have a Java function that takes in a List, iterates over it, and prints out each entry, that function may also work for a Set, but unfortunately you've specified that it takes a List, so the compiler won't let you pass a Set. Dynamically typed languages don't have type specifications (generally), so you can't over-specify the type and rule out opportunities for reuse. – Kartick Vaddadi Apr 23 '16 at 11:11
  • 1
    @VaddadiKartick: On the other hand, suppose that tomorrow someone wants to modify the method in a way that relies on it being not just iterable, but actually a List? In Java, if it's declared as taking a List, then you know you can safely assume that its argument is actually a List. In a dynamically typed language, you can't easily tell if such a change is safe. (In other words: *interface contracts* are beneficial for programming in the large, and static types are frequently a valuable component of such contracts.) – ruakh May 10 '16 at 05:17
  • I agree with you, that dynamically typed languages don't handle this case in an ideal manner, and that interface contracts are beneficial. But in statically typed languages like Java, C++ or Objective-C, contracts are often over-specified. In the above example, the function that prints a List doesn't NEED it to be a List, just that it happened to be declared that way. Ideally, you'd have type inference — the compiler should enforce that the object you're passing as an argument has the methods you're invoking and no others. Then you can reuse the above function to print a Set. – Kartick Vaddadi May 10 '16 at 06:11
  • 1
    @Ark-kun so do JS devs https://github.com/search?l=javascript&q=braek%3B&ref=cmdform&type=Code – Jus12 May 31 '16 at 11:07
  • 1
    @DirkBoer By "no one" I assume you mean "I don't". Just see the listed links where such code has been checked in. :) – Jus12 Jun 02 '16 at 03:04
  • 6
    As a maintainer I would also add that if in a dynamic language I see a variable named `sp` as a parameter to a function, lets say I see a function `function myfunction(sp)` then I have **no idea** what this variable is. On the other hand on a static language I would see this `function myfunction(sp: SocketProvider)` I would then be able to click SockerProvider and see its declaration! life much easier this way for a code maintainer!!! – Jas Jun 16 '16 at 08:38
59

Because they deliberately abandon some of the tools that programming languages offer to assert things you know about the code.

The best-known and most obvious example of this is strict/strong/mandatory/explicit typing (note that the terminology is very much disputed, but most people agree that some languages are stricter than others). When used well, it acts as a permanent assertion about the kind of values you're expecting to occur in a particular place, which can make reasoning about the possible behaviour of a line, routine or module easier, simply because there are fewer possible cases. If you're only ever going to treat someone's name as a string, many coders are therefore willing to type a declaration, to not make exceptions to this rule, and to accept the occasional compilation error when they have made a slip of the finger (forgot quotes) or of the brain (forgot that this rating is not supposed to allow fractions).

Others think that this restricts their creative expressivity, slows down development and introduces work that the compiler should do (e.g. via type inference) or that isn't necessary at all(they'll just remember to stick to strings). One problem with this is that people are quite bad at predicting what kind of errors they will make: almost everybody overestimates their own ability, often grossly. More insidiously, the problem becomes gradually worse the larger your code base - most people can, in fact, remember that the customer name is a string, but add 78 other entities to the mix, all with IDs, some with names and some with serial 'numbers', some of which really are numeric (require computation to be done them) but others of which require letters to be stored, and after a while it can become pretty hard to remember whether the field you're reading is actually guaranteed to evaluate to an int or not.

Therefore, many decisions that suit a quick prototype project well work much less well in a huge production project - often without anyone noticing the tipping point. This is why there is no one-size-fits-all language, paradigm or framework (and why arguing which language is better are silly).

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 5
    Both restrictive languages and non-restrictive languages have their high and low points, I suppose. However in my experience, having more degrees of freedom is like programming in assembly in that you can do everything, but it is difficult to do any complex programs. Nice answer. – Neil Dec 17 '13 at 11:24
  • "This is why... arguing which language is better are silly" -- amen! – Brian S Dec 17 '13 at 15:15
  • 39
    The word you're looking for is *static* typing. Statically typed languages can be strict or lax, they can be strong or weak, they can be mandatory or optional, and they can be explicit or implicit, but what they all have in common is that *types are facts that can be deduced from the text of the program without actually running it*. "Dynamic" languages are so called because facts about the program can sometimes not be known until the program is actually running. – Eric Lippert Dec 17 '13 at 15:49
  • 3
    I'd like to add that there is a hell of a difference between the types in Java (which allow easy and dangerous cast-to-Object violations of the type system) and Haskell (which requires on average one explicit type signature per five or six functions, the rest is inferred, but will spank you if you try anything funny). – Kile Kasmir Asmussen Dec 17 '13 at 16:10
  • 7
    "Others think that this restricts their creative expressivity, slows down development ..." -- I would add "introduces artificial complexity into the design" to the list. Two strong examples of different kinds of complexity that a "static typed" language can force you to cope with is (a) monads as present in Haskell; and (b) Peter Norvig showing that 16 of 23 patterns of the design patterns of that popular book are "invisible" or simpler in "dynamic typed" languages whereas in other languages they are mostly bloat working around static checking limitations: http://norvig.com/design-patterns/ – Thiago Silva Dec 17 '13 at 16:24
  • @ThiagoSilva Granted, monads in haskell are a tough concept to grok; but for a softer language like Ocaml, the comment still holds. And you do not have to make your own monads in Haskell, you can use them without much knowledge of their inner workings (same with signatures/structures/functors in Ocaml). Also, every single design patter is meaningless in functional languages in general. – Kile Kasmir Asmussen Dec 20 '13 at 17:53
  • 3
    @ThiagoSilva: Monads are not an example of complexity *per se*. Many people find them hard to learn, but as abstractions go they are quite simple--the difficulty is that they are also quite abstract. In fact, monads often *simplify* a design by making it **more explicit**: they just highlight things which are magical and unacknowledged in other languages. And Norvig's design-pattern article is not relevant to statically typed functional languages at all; it's not a comment about static typing in general but rather about Java-style type systems (which we can all agree are a mess). – Tikhon Jelvis Jan 27 '14 at 21:51
  • @TikhonJelvis (and KarlDamgaardAsmussen), I regret putting monads as an example, as I'm not experienced enough to evaluate it in depth. Now, while norvig's has little to say to the "better languages", it shows an extreme practice of what I think is still present in statically checked languages of today: a tension that forces one to add artificialities to the design and/or face maintenance nightmares. ADT, for instance, has a strong foundation and present us with some open questions (and while they're open, we have to work around them). So does statically checking exceptions, etc. – Thiago Silva Jan 28 '14 at 15:49
  • 1
    Only someone that really doesn't know what he's talking about would use monads as an example of complexity forced on static typing languages... The fact that they were only there in Haskell but have since been introduced (under miscellaneous names) in other languages should clearly show you that monads are a powerful tool that helps to tame complexity in programs. Monads are perfectly appropriate to be used in dynamic languages by the way, though those won't be able to enforce their use for certain tasks thus losing a part of their power. – Jedai Jun 28 '14 at 16:48
  • I would not agree that a check which prevents your banking application from crashing because of attempt to assign string to int is an "artificial complexity" – DarkWanderer Jul 01 '14 at 07:17
  • "Monad" isn't even a "type" that you create and then use in different ways. A type can have *monadic structure*. In haskell you can "prove" this to the compiler by implementing a type class instance, but the structure is still there even if you won't write out the word. Saying that a language "doesn't have monads" just because it doesn't have special syntax/compiler support is like saying C# doesn't have numbers because there's no `INumber` interface that all numeric types inherit from. C# has numbers because there are types that behave like numbers, not becuase they are named after integers. – sara Jun 17 '16 at 07:40
30

Why don't you ask the author of that presentation? It's his claim, after all, he should back it up.

There are plenty of very large, very complex, very successful projects developed in dynamic languages. And there are plenty of spectacular failures of projects written in statically typed languages (e.g. the FBI Virtual Case File).

It is probably true that projects written in dynamic languages tend to be smaller than projects written in statically typed languages, but that is a red herring: most projects written in statically typed languages tend to be written in languages like Java or C, which are not very expressive. Whereas most projects written in dynamic languages tend to be written in very expressive languages like Scheme, CommonLisp, Clojure, Smalltalk, Ruby, Python.

So, the reason why those projects are smaller is not because you can't write large projects in dynamic languages, it's because you don't need to write large projects in expressive languages … it simply takes much fewer lines of code, much less complexity to do the same thing in a more expressive language.

Projects written in Haskell, for example, also tend to be pretty small. Not because you can't write large systems in Haskell, but simply because you don't have to.

But let's at least take a look at what a static type system has to offer for writing large systems: a type system prevents you from writing certain programs. That's its job. You write a program, present it to the type checker, and the type checker says: "No, you can't write that, sorry." And in particular, type systems are designed in such a way that the type checker prevents you from writing "bad" programs. Programs that have errors. So, in that sense, yes, a static type system helps in developing large systems.

However, there is a problem: we have the Halting Problem, Rice's Theorem and many other Incompleteness Theorems which basically tell us one thing: it is impossible to write a type checker which can always determine whether a program is type-safe or not. There will always be an infinite number of programs for which the type checker can't decide whether they are type-safe or not. And there is only one sane thing to do for the type checker: reject these programs as not type-safe. And an infinite number of those programs will, in fact, not be type-safe. However, also an infinite number of those programs will be type-safe! And some of those will even be useful! So, the type checker has just prevented us from writing a useful, type-safe program, just because it cannot prove its type-safety.

IOW: the purpose of a type system is to limit expressiveness.

But, what if one of those rejected programs actually solves our problem in an elegant, easy to maintain manner? Then we cannot write that program.

I'd say it's basically a give-and-take: statically typed languages restrict you from writing bad programs at the expense of occasionally also preventing you from writing good programs. Dynamic languages don't prevent you from writing good programs at the expense of also not preventing you from writing bad programs.

The more important aspect for maintainability of large systems is expressiveness, simply because you don't need to create as large and complex a system in the first place.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 3
    you should also include Scala, which has awesome type inference. – Jus12 Dec 17 '13 at 12:38
  • 3
    An interesting way of looking at it. Personally, I find that for large projects, type-checkers can be a life saver. They implicitly provide an extremely basic kind of unit-testing (if we can call it that - it's simply testing whether the structures agree or not). This is the kind of testing I have no time to write manually, but needs to happen when the project grows beyond what you can easily hold in your head. I suspect this happens fairly rapidly for most systems, regardless of expressiveness; how is this problem typically solved in the dynamic world? – Daniel B Dec 17 '13 at 12:47
  • 5
    Just to add to the above comment, I mean: the nature of many (medium / large) problems is such that you will need a couple of hundred entities to model it, expressive language or not. An expressive one might cut down the code by a factor of 10x or more, but it will still not be manageable without additional tooling; I'm wondering what this tooling is. – Daniel B Dec 17 '13 at 12:49
  • 1
    @DanielB: I think that depends on the power of your type system. You have languages like Java that make a joke out of their types by having things like `compareTo` return an int. That's a function with three possible return values, and we use a data type that can represent 2^32? Insanity. Consider a language like Haskell, where the type system is *in and of itself* an expressive system, and you absolutely can write functions with a pretty high level of semantic checking from the type system. – Phoshi Dec 17 '13 at 13:29
  • @Phoshi agreed, but even that joke of a type system catches many menial issues which happen day to day (especially when working with large, boring applications). I'm genuinely curious what exactly takes the place of a type system for dynamic languages, and I'm guessing Jörg has some pointers or counter-arguments. – Daniel B Dec 17 '13 at 13:36
  • 5
    @DanielB: Depends on the language. Remember, dynamic != weak. In python, for example, "1" != 1, and if you try to use them interchangeably you'll get type errors at runtime. Mostly the closest you get is duck typing, though--if you have the wrong type and call a method, runtime exception. It's not anywhere near as robust as a proper static type system, but it's *not* untyped. – Phoshi Dec 17 '13 at 14:02
  • @Phoshi I understand; to elaborate on my question, let me rephrase it like this: if we assume that I want to neither exhaustively regression test a system by hand after every change, nor do I want to write exhaustive unit tests asserting the exact spelling of every field / property, is there any way to get a level of assurance similar to what a static compiler typically gives? Given that modern IDEs are getting pretty good at guessing what's going on in dynamic languages, shouldn't there be some automated assistance (even if it's a "looser" check than for statically typed languages)? – Daniel B Dec 19 '13 at 05:59
  • 1
    @DanielB: What you want isn't a function of static typing per-se, but rather the static analyzer in the compiler. For compiled strongly typed languages this is required, but for dynamic strongly typed languages you could theoretically skip it. Nobody would, though. It's *harder* to static analyse a dynamic language, which is why they're mostly just subject to dynamic analysis, however there's no reason why this must be so. Check out [pylint](https://bitbucket.org/logilab/pylint/), for example, which professes to catch many typing errors (Though obviously not all, as it's a static analyser) – Phoshi Dec 19 '13 at 09:23
  • Thanks, @Phoshi, that's exactly what I was after. In retrospect, it makes sense that the *lints would take on this type of responsibility. – Daniel B Dec 19 '13 at 09:35
  • You present type systems as simplistic things that provide a yes/no answer -- does this program pass? A type system represents what the compiler knows about your program. In modern IDE environments, it also represents what the IDE knows about your program. Programming in the large is greatly aided by providing your tools with comprehensive models. To me your response is excessively philosophical. – Ross Judson Jan 27 '14 at 18:39
  • 1
    A type system can actually make a language *more* expressive. Consider Haskell typeclasses--they enable really useful things like overloaded literals that are difficult if not impossible to replicate in other languages. – Tikhon Jelvis Jan 27 '14 at 21:55
  • @TikhonJelvis: Ioke has overloaded literals, and is one of the most dynamic languages ever created. A type system rejects illegal programs. If the type system didn't exist, those programs would be legal. Ergo, without the type system you can write programs that you can't write with the type system, IOW you can express more programs without the type system. Now, there are things you can express in the type system itself, but then you just shift the problem around: in Haskell (with extensions) or Scala, you cannot even guarantee that compilation will eventually terminate. – Jörg W Mittag Jan 28 '14 at 16:02
  • 1
    Note: I love Scala, I love Haskell, I love powerful expressive flexible static type systems. I just don't buy the idea that you cannot program without them. – Jörg W Mittag Jan 28 '14 at 16:03
  • 1
    @JörgWMittag: It doesn't seem to have overloaded literals the way Haskell does: it just has a few nested types of numbers. For example, you can't add your own numeric types, especially if they don't fit into the existing hierarchy. I still don't see any way to do this without either having explicit annotations or using typeclasses and inference. Besides numbers, consider things like `Read` which also relies on typeclases. My real point is that type systems do not **just** reject illegal programs. With type inference, they can make programs that would be underspecified without types work. – Tikhon Jelvis Jan 28 '14 at 22:24
  • 3
    I'm the author of the Play Framework talk mentioned in the question. I was going to write a reply, but [Eric Lippert's answer](http://programmers.stackexchange.com/a/221658/5939) says it better than I could have, so I upvoted it instead and recommend everyone reads it. Also, remember that a "large codebase" can be "large" across multiple dimensions, including lines of code, how many people work on it simultaneously, and how long it has been around. All of these factors increase "code rot"; static typing is not a magic bullet, but rather one *tool* to decrease code rot. – Yevgeniy Brikman Feb 09 '14 at 21:06
  • @JörgWMittag >"Ergo, without the type system you can write programs that you can't write with the type system" http://en.wikipedia.org/wiki/Turing_completeness – Ark-kun Apr 07 '14 at 17:57
  • @JörgWMittag >"Theorems which basically tell us one thing: it is impossible to write a type checker which can always determine whether a program is type-safe or not" I don't see how those theorems prove that every type checker may fail. – Ark-kun Apr 07 '14 at 18:01
  • @JörgWMittag >"So, the reason why those projects are smaller is not because ... it's because ...". No proof given. – Ark-kun Apr 07 '14 at 18:03
  • @JörgWMittag BTW, can you spot the problem in this code file? http://svn.code.sf.net/p/tikiwiki/code/trunk/lib/ical/iCal/ValueDataType.php – Ark-kun Apr 07 '14 at 18:06
  • I disagree that the purpose of the type system is to limit expressiveness. Strictly and explicitly typed languages/systems are much more verbose than dynamic languages. Haskell is a good example of a language that is strongly typed and inferred. It is still statically typed, it's just inferred. Ruby is known to allow you to do whatever the **** you want, but that's like writing poems in gibberish which limits comprehension... and writing code that a computer can read is easy, but writing code that can be understood by a human is hard. – Aleksandr Panzin Jul 01 '14 at 10:16
  • @Ark-kun Out of curiosity, can you elaborate on the problem with the code you linked to? – Jus12 Sep 25 '14 at 04:15
  • @Jus12 It's a bit embarrassing. I guess, I've ultimately proven my point. Even I can no longer find it. As far as I remember, it was some typo in variable name (like "valeu") or constant (like "fasle"). In static languages this would be caught by the compiler. I think I searched GitHub/Internet for some particular typo and chose that particular code file because TikiWiki is a popular piece of software. https://github.com/search?l=php&q=fasle&ref=searchresults&type=Code&utf8=%E2%9C%93 If you ever find it, let me know =) – Ark-kun Sep 26 '14 at 14:29
12

Explicit static types are a universally-understood and guaranteed correct form of documentation which is not available in dynamic languages. If this is not compensated for, your dynamic code will simply be harder to read and understand.

MikeFHay
  • 506
  • 2
  • 8
  • What kind of compensation do you mean? My experience, having worked in C syntax languages and Smalltalk, is exactly the opposite. – robject Feb 11 '14 at 05:59
  • 1
    You can add type checking for dynamic languages. See, for instance, how this is done for python: https://pypi.python.org/pypi/optypecheck – Carlo Pires Jun 30 '14 at 14:44
  • Python has inline unit tests called doc tests that are automatically tested. Doc tests go further in providing documentation than types do as they give you an example of code usage. – aoeu256 May 26 '18 at 14:54
  • Modern ides for Python(Pydev,Pycharm) can use type inference to tell you the type of things without you having to type it explicitly. There are also ways of logging previous calls to a function/methods although its not mainstream. If you set up a breakpoint in a function you can access the locals() in the Python REPL(Pydev & Pycharm connect the REPL to the current context), and developing your application while its still running not only allows you to access all the types, but the values. – aoeu256 May 26 '18 at 15:39
6

Consider a large codebase including database bindings and a rich testsuite and let me highlight a few advantages of static languages over dynamic languages. (Some examples may be idiosyncratic and not apply to any static or dynamic language.)

The general idea—as others pointed that out—is that the type system is a “dimension” of your program which exposes some information to automated tools processing your program (compiler, code analysis tools, etc.). With a dynamic language, this information is basically stripped and therefore not available. With a static language, this information can be used to help writing correct programs.

When you fix a bug, you start with a program that looks good to your compiler but has faulty logic. When you fix the bug, you make an edit fixing locally the logic of your program (e.g. within a class) but breaking this logic at other places (e.g. classes collaborating with the previous one). Since a program written in a static language exposes much more information to the compiler¹ than a program written in a dynamic language, the compiler will help you to locate the other places where the logic breaks more than a compiler for a dynamic language will do. This is because a local modification will break the type correctness of the program at other places, thus forcing you to fix the type correctness globally before having a chance to run the program again.

A static language enforce type-correctness of a program, and you can assume that all type errors you encounter when working on the program would correspond to a runtime failure in an hypothetical translation of the program in a dynamic language, thus the former has less bugs than the latter. As a consequence, it requires less coverage tests, less unit tests and less bugfixes, in one word, it is easier to maintain.

Of course, there is a tradeoff: while it is possible to expose a lot of information in the type system and thus taking the chance to write reliable programs, it might be difficult to combine this with a flexible~API.

Here is a few examples of information that one can encode in the type system:

Const correctness the compiler can guarantee that a value is passed “read-only” to a procedure.²

Database schema the compiler can guarantee that code binding the program to a database corresponds to the database definition. This is very useful when this definition changes. (Maintainance!)

System resources the compiler can guarantee that code using a system resourcce only does it when the resource is in the correct state. For instance, it is possible to encode the attribute close or open of a file in the type system.

¹ It is not useful to distinguish between a compiler and an interpreter here, if such a difference exists.

luser droog
  • 419
  • 4
  • 12
user40989
  • 2,860
  • 18
  • 35
  • 1
    “all type errors you encounter when working on the program would correspond to a runtime failure”: this isn't true (which is a big argument of dynamic typing proponents). However, if there would be no runtime failure, this is for a nonobvious reason that needs to be documented. Rather than document in the form of a comment, you might as well document it in a way the compiler understands *and checks*. I would say that all type errors you encounter when working on the program would correspond to a runtime failure or maintenance nightmare. – Gilles 'SO- stop being evil' Dec 17 '13 at 16:07
4

Because static typing enables better tooling, which improves the productivity of a programmer when he tries to understand, refactor or extend a large existing code base.

For instance, in a large program, we'll likely have several methods with the same name. We instance, we might have an add method that adds something to a set, another that adds two integers, another that deposits money into a bank account, ...). In small programs, such name collisions are unlikely to occur. In large programs worked on by several people, they occur naturally.

In a statically typed language, such methods can be distinguished by the types they operate on. In particular, a development environment can discover, for each method invocation expression, which method is being invoked, enabling it to show a tooltip with that method's documentation, find all call sites for a method, or to support refactorings (such as method inlining, method renaming, modifying the parameter list, ...).

meriton
  • 4,022
  • 17
  • 18
  • 2
    Making everything global? That's not a necessity of dynamic languages, I'd call that a badly-written codebase... – Izkata Dec 17 '13 at 16:42
  • 2
    Perhaps I should have mentioned I am talking about object oriented programming languages with dynamic dispatch. Such methods are not global, but figuring out which implementation is going to be called requires knowledge about the type of the receiver. – meriton Dec 17 '13 at 16:47
  • 2
    Showing tooltips on mouse-over was a standard feature of dynamic language IDEs, long before programmers in static languages had IDEs or even mice. Automated refactoring tools were invented in dynamic languages, heck, IDEs were invented there. Refactoring tools for dynamic languages still can do things that e.g. Eclipse, IDEA or Visual Studio can't, such as refactoring code hat has already been deployed or refactoring code that hasn't been written yet. – Jörg W Mittag Dec 17 '13 at 17:27
  • 2
    I didn't claim that tooltips, IDEs or mice were invented in statically typed languages. I only claim that in an object oriented language, a function's name is in general insuffient to identify the function, and hence tooling can not know *which* function is being called, and display the *right* tooltip, or inline the *right* function, and so on - at least not without asking the user. – meriton Dec 17 '13 at 18:43
  • Modern IDEs for dynamic languages can use type inference to generate this information when the program is used like a static language program. Optional types can also help tip the IDE. In theory in a dynamic language you can log the arguments and return value of previous function call, and use this information for type inference. If you keep the program running stopped at a breakpoint, it can tell you not only the types but the values of all objects. Pydev & Pycharm lets the Python REPL access the local scope. – aoeu256 May 26 '18 at 16:00