5

I'm trying to wrap my head around Scala, and one thing that keeps throwing me is the ordering of a variable/value declaration when specifying the type.

val a = 0

makes perfect sense. This looks pretty much like any other language.

val a: Int = 0

parses really weird in my head; it just seems nonsensical. Why is the type immediately on the left of the assignment operator? When I cut this in my head, I see "... Int = 0", which obviously doesn't make any sense.

Is there a logical reason behind this that I can refer to? Obviously, as I look at Scala code more, I will adjust to it, but I'm also curious why Martin Odersky would choose to arrange it as such. It can't be just to stand out from other languages, where (as far as I know of), the type identifier, if there is one, precedes the declaration.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
Carcigenicate
  • 2,634
  • 3
  • 24
  • 38
  • 5
    I don't know much Scala, but `val a: int = 0` is valid Standard ML/Ocaml/F#. So rather than standing out, it fits right in with other functional languages which have probably influenced Scala (e.g. pattern matching). – Doval Dec 04 '14 at 01:39
  • Indeed, and in almost every academic paper on types, the type annotation is after the symbol/identifier. Since Scala is largely academic in origin that likely contributes. – Telastyn Dec 04 '14 at 02:38
  • 12
    This is the syntax that is used by a huge number of programming languages, e.g. the entire Pascal family, both designed by Niklaus Wirth himself (Pascal, Modula, Modula-2, Oberon, Oberon-2) as well as others (Modula-3, Turbo Pascal, Active Oberon, Delphi, Object Pascal). It is also the syntax used by almost all functional languages (ML, SML, Caml, OCaml, F#, Haskell, Miranda, Frege). Many imperative languages outside the Pascal family use it as well, e.g. Go, Visual Basic, VB.NET, TypeScript. Plus, it's also the notation used in math. Note that Odersky studied under Prof. Wirth. – Jörg W Mittag Dec 04 '14 at 02:56
  • 1
    The only other functional language I know is Haskell, and despite what Jorg says, It doesn't really use this arrangement. Haskell's explicit signatures resemble what Sebastian describes as ambiguous (let a = 0 :: Int), only it uses a double colon instead, and refers to the whole expression. – Carcigenicate Dec 04 '14 at 11:48
  • @Carcigenicate Isn't it convention in Haskell to write the type annotation in the line preceding the declaration? – Doval Dec 04 '14 at 12:39
  • @Doval Yes, but that doesn't result in form similar this. In that case the signature precedes the definition. The snippet I included in the comment above was to show to similarity between it and the "ambiguous" case in Sebastian's answer, and contrast Jorg's comment. – Carcigenicate Dec 04 '14 at 12:55
  • You can see a kind of reasoning by Brian Beckman in his tutorial "dont fear the monad": https://www.youtube.com/watch?v=ZhuHCtR3xq8 @8.30 min – nawfal Oct 25 '15 at 01:38
  • @gnat How can this be a duplicate of that if I asked this more than a year before that one? – Carcigenicate Apr 24 '16 at 21:37
  • @Carcigenicate duplicates age doesn't matter [as explained eg here](http://meta.stackexchange.com/a/147651/165773) – gnat Apr 24 '16 at 21:40

2 Answers2

13

stand out from other languages

No. As Jörg already commented, this form is actually used in many languages. It is probably the most common form of variable declaration by number of languages that use it. It was used back then with Pascal and related languages and it is now being used by all the new ones like TypeScript, Go, Rust—and Scala.

type identifier, if there is one, precedes the declaration

The

type identifier [ = value ]

form of declaration in C was in some respects a big mistake. Its serious problem is that it makes the grammar of the language contextual. Type and object identifiers look the same syntactically, but this form of declaration cannot be recognized without knowing that the first identifier identifies a type. So the compiler can't build the syntax tree without referring to the table of already defined types. This causes problems to templates, because the interpretation may depend on the parameter, so the compiler can't know whether it is looking at a type yet.

In C++ this means you have to use typename keyword in the ambiguous cases. Java and C# dodge this by not having typedefs, so you can't have related types, but that seriously limits usefulness of their templates. And it still complicates the compiler anyway.

On the other hand with declarations in the form

keyword identifier [ : type ] [ = value ]

the identifier after : (and some keywords like new) always means type and identifier in any other place never does and the grammar is context-free and everything is much simpler.

It is also more regular when the type is optional. You just omit it. In the C form, you have to replace it with special keyword.

Jan Hudec
  • 18,250
  • 1
  • 39
  • 62
  • This is a bit of a tangent, but another way in which C's declaration syntax is an unfortunate historical mistake is that C objects are *not* variables in the mathematical sense. A variable stands for an unknown *value* and once bound doesn't change. A C object is a reference to a block of memory that just happens to be implicitly dereferenced for you. Both the terminology and the use of the `=` symbol [are highly confusing to beginners](http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/) because it breaks the mental model they've built up in math classes. – Doval Dec 04 '14 at 15:36
  • @Doval: Well, in procedural programming "variable" always means a box that can contain a value and did so long before C. Only functional and logical programming commonly comes with variables in the mathematical sense. The terminology mismatch is a bit unfortunate, but it's the first thing you have to understand when learning programming independent of language. And the use of `:=` does not really make much difference. `<-` looks like a better symbol, but the only language I know that uses it is R (which might get it from S+, but I don't know that). – Jan Hudec Dec 04 '14 at 16:14
  • @JanHudec: Smalltalk used `←` initially (and `↑` for return). However, these characters only existed in the character sets and on the keyboards of Xerox's own workstations, they didn't exist anywhere else. When transferring Smalltalk source code to an ASCII-based system, those codepoints are interpreted as `_` (and I forgot the other one). I believe Squeak *still* accepts `_` for assignment, but the spec was changed to use `:=` for assignment and `^` for return. – Jörg W Mittag Sep 02 '15 at 12:37
3

You're thinking about it the wrong way. The type isn't immediately to the left of the assignment, it's immediately to the right of the declarator. This syntax has the advantage of being unambiguous, whereas for example val a = 0 : Int is ambiguous: does the type specifier refer to the literal, the declaration, or the entire statement? And if the initializer is more complicated than just a literal, it gets really confusing.

Sebastian Redl
  • 14,950
  • 7
  • 54
  • 51
  • Note that `val a: Long = 0: Short` is legal. It is a type *annotation* for `a` and a type *ascription* for `0`. It doesn't make much sense here, but it *is* legal. – Jörg W Mittag Sep 02 '15 at 12:33