Why are scientific programming languages so weird?

Question

It seems to me that programming languages meant for use in science and engineering are consistently weird compared to general-purpose languages. Some examples off the top of my head:

In Matlab, each function has to be placed in a separate file
In R, <- is the assigment operator, as opposed to = in almost every other language
Matlab, R, Julia and others are all 1-indexed
Matlab uses % for comments, and not the standard # or //

Of course, these languages all have several design features that actually make them easier to use for scientific applications, such as more natural matrix notation. Still, they all inexplicably make all these bizarre choices which don't make anything easier and could easily have been avoided if the language designers had just chosen to do what 99% of other languages do. Is the reason vendor lock-in? A lack of contact with the wider software development community? Something else?

I read this thread and didn't find the explanations satisfactory. Just because R were designed as a scientific language doesn't mean it had to completely ignore conventions and use <- instead of =.

Short answer: because they were made for scientists, not for programmers. — Bart van Ingen Schenau, Feb 21 '14 at 11:03
Short answer: Because every language you think is normal was influenced by a common ancestor, C. — Ross Patterson, Feb 21 '14 at 11:17
I think you'll struggle to find *any* conventions across languages. It depends on their heritage. — Robbie Dee, Feb 21 '14 at 11:19
Nothing of that is weird. It's just _different_. Because there is no particular reason to choose one syntax over the other except what the specific author of the specific language is used to. — Jan Hudec, Feb 21 '14 at 13:05
Your 99% is wrong. If you only know C and its derivatives you might think so, but well over 50% of non-C languages use something different for assignment, indexing and/or comments. — david.pfx, Feb 21 '14 at 13:59
Not even maths uses `=` for assignment because maths has no assignment. — phresnel, Feb 21 '14 at 14:56
There is no such a "convention" that "=" must be an assignment operator. By convention, it's an equality predicate and nothing else. — SK-logic, Feb 23 '14 at 16:03
Pascal and its derivatives do not use `=` as an assignment operator. Nor do they use `#` or `//` for comments. — Gort the Robot, Feb 23 '14 at 22:42
@phresnel "Not even maths uses = for assignment" - correct. "maths has no assignment" - correct for secondary school maths only. — Gangnus, Feb 23 '14 at 22:58
@StevenBurnap Pascal is a derivative of Simula, not vice versa. — Gangnus, Feb 23 '14 at 22:59
@gangnus I was referring to Ada, Module 2, Delphi, ObjectPascal, etc. — Gort the Robot, Feb 24 '14 at 01:14
@StevenBurnap I see. I only wanted to recall that all of them, including Pascal, come from Simula. Simula was the first OOP language. And had features that were hardly realized in all its children. Some of these old languages were fantastic. — Gangnus, Feb 24 '14 at 09:00
Your standards are not that standard. Algol, COBOL, and BASIC for example all use 1-based indexes. F# and OCAML use <- as assignment operators, and Pascal uses := as assignment operator. And most assembler languages use ; for comments, afaik. — Pete, Feb 24 '14 at 11:21
@Gangnus: Hmm. Would you have some examples? Maybe I am messing up terminology? — phresnel, Feb 24 '14 at 19:49
@phresnel Math, among other divisions, has logic. And logic, among other divisions, has formal systems theory. These ones HAVE assignment operators. At my university and books that I had read, it looked as '→'. And surely, it couldn't be '=', as the meaning of the last was set long ago. — Gangnus, Feb 25 '14 at 08:39
@Gangnus: But does '→' designate the mutation of the target operand? — phresnel, Feb 25 '14 at 11:02
@Gangnus: My math seems not strong enough. Can you give me something to study that contains mutables? — phresnel, Feb 25 '14 at 11:18
@Gangnus: Actually I know these. However, I fail to see how 'S → ...' is equivalent to a mutation-assignment. I see them more as a set of rules, such that when you apply `S -> SS` upon `Foobar`, we get `FoobarFoobar`, yet the rule itself is still the same. They transform the input, but themselves remain unchanged. — phresnel, Feb 25 '14 at 11:39
let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/13246/discussion-between-gangnus-and-phresnel) — Gangnus, Feb 25 '14 at 12:32
@phresnel sorry, look better this one: http://en.wikipedia.org/wiki/Hoare_logic#Assignment_axiom_schema — Gangnus, Feb 25 '14 at 12:36
@Gangnus: I finally realise there is a thin line between pure maths and pure information science, if any. I accept your initial suggestion regarding secondary school math; thx for my _This Week's Enlightenment_ :) Really interesting to see a mathematical definition of assignment (I am not sure if we should remove comments; they contain some very valuable informations and misconceptions) — phresnel, Feb 26 '14 at 10:30

Gangnus · Answer 1 · 2016-01-08T11:10:15.823

21

There are different conventions. Conventions in mathematics, logic, and applied sciences and conventions in IT. The first ones are far older.
The scientific languages are made to make the life of THEIR users more convenient. The user is seen as a scientist, who can realize some algorithm from time to time or to check some theory, without the need to learn something really new. So, the languages for scientists MUST be made up to non - IT standards. Because they are not meant for the use of IT people. They are up to OTHER standards and that is good because of the target auditory. Because the good SW UI, and language is SW UI, must be done based on needs of user, not of the coder.
Our IT standards are industry standards. IT is industry. Science is not industry. Scientists are proud of it. And they would reluctantly take anything from our practice into theirs. And they don't like standards at all. And nobody likes foreign standards. So, if somebody will make a scientific language that will look up to IT standards, it would be hardly selling well, because of the dislike of the target auditory, even if it were objectively more convenient.

And even if we'll judge only according to IT standards... Sorry, what standards do you mean? Have you tried to write a prog in APL or SNOBOL? These two language are, IMHO, the MOST powerful in appropriate fields (counting and strings). But the syntax is something VERY strange (and effective) Reading a line of APL code could take days. On the other hand, such line is a serious piece of SW. You'd return to Mathlab with tears of relief.

As for "=", many people have problems to be accustomed that it is not equality, but assignment. BTW, in Pascal it IS equality and assignment is ":=".

And you really think that == for equality is more natural? On the contrary, mixing = and == is the MOST common error in C programming, it happens very often even in contemporary IDEs, with their automatic control.

About indexing from 1 - it is the only natural one. When you were a child, you had learned poems and songs, where you counted: one, two, three... And not 0,1,2... In school math we studied that the counting starts from 1, and that 0 doesn't belong to natural/counting numbers. Only with the definition of functions non-natural indices come. After all, the 0 was invented many thousands of years after our ancestor raised a finger up.

0-start was more simple to realize and immediately got into IT practice after C appearance. But in Fortran, the first language, the 1-indexing is used. The same with other languages of the pre-industrial epoch.

And yes, I had read Dyjkstra's article on naturality of the 0-based counting. And totally disagree with his argumentation. It is natural for musicians ony. And even 0 enthusiasts that create the C and Java compilers, count the lines of the code STARTING FROM 1!

edited Jan 08 '16 at 11:10

answered Feb 21 '14 at 10:44

Gangnus

2,805
4
21
31

1

":=" for assignment and 1-based indexing are used in Smalltalk too. – Rory Hunter Feb 21 '14 at 10:52
@RoryHunter yes, thank you. I wonder, how "inconvenient" for the QA would be SNOBOL or APL. The problem is that for most of programmers the convenience of the user is irrelevant. Even when the user is another programmer. And they do not understand, why somebody makes SW that is up to user's standards instead of being up to to IT standards. – Gangnus Feb 21 '14 at 11:03
Fortran (one of the oldest programming languages, predating C) also uses 1-based indexing – Bart van Ingen Schenau Feb 21 '14 at 11:04
1

I don't buy that 0 based indexing is because of ease of implementation (FORTRAN pretty much disproves this). https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html gives some reasons one might prefer 0-based indexing, but note the choice is fairly arbitrary. – jk. Feb 21 '14 at 11:26
@jk. I started from Fortran sometimes. So, for me it was strange, too, that what worked in most weak computers, suddenly was hard for more powerful ones. But I am not a system programmer and can't have any argumentation of my own in this question. – Gangnus Feb 21 '14 at 11:45
@jk. I am suspicious, that when C was invented, it was specially made unconvenient, to divide the IT pro from the wide public. – Gangnus Feb 21 '14 at 11:47
2

FORTRAN had 1-based indexing. PASCAL allowed arbitrary-based indexing: you could declare an array whose index ranged over, for example, -42 to +57. (See http://en.wikipedia.org/wiki/Eight_queens_puzzle#Exercise_in_algorithm_design for an example where this is useful.) – John R. Strohm Feb 21 '14 at 12:43
@Gangnus The general public weren't programming back when C was invented. Home computing didn't really take off until about 5 years later. – Robbie Dee Feb 21 '14 at 12:45
@RobbieDee Excuse me, and what is your conclusion? – Gangnus Feb 21 '14 at 12:49
@JohnR.Strohm Yes, of course. I have too shortened the sentence. Sorry. – Gangnus Feb 21 '14 at 12:55
2

@Gangnus I think it is a mistake to compare modern languages to C and deem it to be intentionally hard to read. It was designed to be a high level alternative to lower level languages. – Robbie Dee Feb 21 '14 at 13:19
@RobbieDee 1. I don't think it was the main reason. But I thing it was a thought at the back. 2. Which *languages* are lower level than C, sorry? Only don't mention Assembler - it is not a real language, for it substitutes lines to commands 1:1. – Gangnus Feb 21 '14 at 13:26
1

Machine code and assembler. It is no coincidence that you can inline assembler should you need to. C of course borrows much from earlier languages such as B so it isn't *just* a replacement for first and second generation languages. – Robbie Dee Feb 21 '14 at 13:40
@RobbieDee For me Assembler, autocode and machine code are not languages, sorry. – Gangnus Feb 21 '14 at 14:05
@RobbieDee And C was done later, than APL, SNOBOL, PL/I, simula, Smalltalk. And it is absolutely primitive in comparison with them. – Gangnus Feb 21 '14 at 19:15
PL/I would seem feature rich in comparison to C given that it was designed by an IBM committee. Also bear in mind that C was a means to an end - the development of the Unix system. It earned the moniker of a general purpose language because it was hugely popular, not because it was designed to be so. – Robbie Dee Feb 21 '14 at 23:42
@RobbieDee It was popular because of its lower level/ easy realization, and in spite of it, the level of universality that was ENOUGH for these times. Then there were enough languages that were more powerful, but they were hard or almost impossible to realize wholely. – Gangnus Feb 22 '14 at 09:38
FORTH is lower level than C. It also manages to make do with no assignment operator at all. (Though it wasn't a language the C designers were reacting to.) – Gort the Robot Feb 24 '14 at 18:08
@StevenBurnap Wow! stupid me! And I thought it was a dialect of FORTRAN, due to the similarity of the name. I'll look at it. Thank you for the info... Lower level than C - possible. But was it more powerful? Why it lost to C? – Gangnus Feb 24 '14 at 18:33
1

FORTH is a stack based language. Think HP calculator. It was very compact and fast, but it was hard to write code that wasn't impenetrable. In FORTH, you rarely use variables but rather push things onto the stack and use operators that act on the stack. – Gort the Robot Feb 24 '14 at 18:57
@StevenBurnap Very interesting. I had worked on similar machine-level language. It is a powerful and complex way programming. – Gangnus Feb 25 '14 at 08:47
2

I humbly disagree about 1-indices being natural for math languages because 0 is normal index to start indexing by in most mathematics. Yes, children count from 1. But when you start talking about math advanced enough to handle sequences, you see s0, s1, s2, ... all the time. If you need a programming language to help you with your math, then bets are that you are at least at this level of mathematics. – Thomas Eding Mar 08 '14 at 21:26
@ThomasEding "If you need a programming language to help you with your math, then bets are that you are at least at this level of mathematics" If you think you are on level to participy in any discussion, do such work as to read the rules of discussion. In wiki, for example. You are positioning yourself out of any discussion. – Gangnus Mar 08 '14 at 21:37

Kilian Foth · Answer 2 · 2014-02-23T15:17:47.260

15

Indexing from 1 is not weird, it is completely normal and expected except for programmers, because they've been conditioned to expect 0-based counting by C (which was conditioned from the properties of processor architecture).

Comments are denoted in many, many many different ways in different languages; there is no standard way, every language chooses a symbol or digraph that isn't already taken.

Assignment is likewise a strange and incomprehensible concept, except for programmers; most people couldn't care less whether it's = or := or <-, they struggle to understand the meaning (and for them, it is in fact better not to use =, because this emphasizes that assignment is not equality - the most common hurdle for non-programmers to understand code).

In short, programming languages intended for people other than professional programmers look different because the people who use them most want it that way.

edited Feb 23 '14 at 15:17

answered Feb 21 '14 at 10:40

Kilian Foth

107,706
45
295
310

5

I disagree that indexing from 1 is not weird. 0-indexing is at least as common as 1-indexing in mathematics, and it had obviously been the norm in programming for years before the advent of Matlab or S/R. – haroba Feb 21 '14 at 10:47
9

@Aqwis Oh, yes, I already see the baby counting zero, one, two... The most natural way, really. – Gangnus Feb 21 '14 at 10:48
5

Babies don't write code. There are good reasons to use zero-indexing (see: Dijkstra), and when zero-indexing is also common in mathematics I cannot see many reasons to use 1-indexing. – haroba Feb 21 '14 at 10:51
1

@Aqwis Answer for your own words. What is weird and not. A thing that is set from the babyhood and by maths (natural numbers do not include zero), can not be weird from any side. And what contradicts with it, IS weird. And that you have accustomed to something else, is irrelevant. These languages simply are not made for you or me. – Gangnus Feb 21 '14 at 10:59
Folks, FORTRAN indices are 1-based. – VH-NZZ Feb 21 '14 at 11:10
the Dijkstra paper mentioned https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html – jk. Feb 21 '14 at 11:23
I don't get your first paragraph. C arrays are 0-based!? – phresnel Feb 21 '14 at 15:00
1

@phresnel To paraphrase from the answer: Indexing from 1 is normal. Except for programmers, because they've been conditioned to expect it [indexing from 0] from C – Robbie Dee Feb 21 '14 at 17:00
@okiharaherbst These old languages were different, because the programmers in 50-60-ties and up from 70-ties were/are different. Then programming was art and now it is an industry. – Gangnus Feb 24 '14 at 09:18
@Gangnus: FYI, there are still $bn worth of FORTRAN code (in defense, aerospace, weather among others) running in the US, maintained and developed daily in very real, industrial codes. As for different, please elaborate. It's always an imperative procedural language. Isn't that still very contemporary? (sorry I'm teasing you! but I think I got a point) – VH-NZZ Feb 24 '14 at 10:04
@okiharaherbst I am glad to speak on such interesting theme and I thought I am teasing you... There are millions of paper books printed on the word. But alas - they are far less modern than e-books. The times in IT changes so quickly that all four generations of IT languages live side by side. Assembler, Fortran, C, Java+Spring... Another thought - FORTRAN 1957 and Fortran 2008 with Recursive allocatable components are dammed different! – Gangnus Feb 24 '14 at 10:17
1

@RobbieDee Even programmers count their code lines starting from 1. – Gangnus Jan 08 '16 at 11:12
@Gangnus Not me - I put all the useful comments at line zero. – Robbie Dee Jan 08 '16 at 12:31

score 5 · Answer 3 · answered Feb 21 '14 at 10:43

There are three problems:

You are unaware of certain traditions, and the good reasons for certain choices.
You put too much emphasis on syntax, too little on semantics.
Engineers and scientists have no experience in language design, leading to questionable syntax.

Now to your specific points:

I don't know Matlab, so I can't comment on the requirements of file organization. Note that Java wants you to use one file per public class.
In R, = can be used as an assignment operator as well. Note that it needs multiple assignment operators <- and <<- to deal with its concept of scoping (<<- assigns to a symbol in an outside scope instead of creating a new symbol inside a function). The arrows can be used in the other direction too, potentially making cleaner code: complex_calculation() -> x.
1-based indexing is the standard in mathematics, which is what Matlab's and R's users are more comfortable with than C. Julia follows Matlab in order to have a better learning curve.
% for comments is also used in TeX/LaTeX. The # is only a convention from Unix scripting languages, and their descendants.

You also ignore that “real” programming languages have many weird parts. Why doesn't Scheme use =? Instead:

(define foo 5)

Why does C use * for dereferencing, when obviously a caret ^x is more common in other traditions?

"I don't know Matlab, so I can't comment on the requirements of file organization. Note that Java wants you to use one file per public class." I think it's perfectly reasonable for the language to expect you to divide your project into several files. However, a class is usually a relatively large amount of code. Functions don't have to be. By forcing a separate file for every function, Matlab discourages you from creating small functions and instead promotes large, monolithic functions. — haroba, Feb 21 '14 at 10:54
I agree almost with everything, except p.3. Scientists do not make their languages, they ORDER them. They are clients, users, but not their creators. If somebody is, he/she is already an IT geek. And syntax of any language is questionable, no one is ideal for all tasks. — Gangnus, Feb 21 '14 at 10:55
Matlab compiles functions/files on a just-in-time basis as required. It has no real concept of a program, just a bunch of functions. If I am running a function which makes a call to foo(), then it will search its path for a file called foo.m, compile it, and run it. There's no need to tell Matlab in advance what set of files I intend to use. — Simon B, Jan 08 '16 at 12:18

Robbie Dee · Answer 4 · 2014-02-21T11:16:39.243

1

I guess it depends on your exposure to other languages. Off the top of my head:

C/C++ have separate source files (.c/.cpp & .h)
The -> characters are used in C# for lambda expressions
Old versions of VB used 1 as a default index (although this could be changed with Option Base)

edited Feb 21 '14 at 11:16

answered Feb 21 '14 at 10:41

Robbie Dee

9,717
2
23
53

1

In C and C++, you can define as many functions as you want in one file. – haroba Feb 21 '14 at 10:42
I'm just making the point that it isn't unusual for modules to be split across multiple files. If you wished to you *could* put all your functions in separate files using .NET languages with the **partial class** construct. – Robbie Dee Feb 21 '14 at 10:45
1

Of course it is not unusual for modules to be split across several files, and it is in many cases desirable. But in Matlab you have to put *every single function* in its own file, which means that if you have a thousand functions you need a *thousand files*. – haroba Feb 21 '14 at 10:46
3

Comments in HTML look like ``. The percent sign is used for URL-encoding: `http://example.com/()` becomes `http://example.com/%28%29`. – amon Feb 21 '14 at 11:02
Sorry, my mistake. – Robbie Dee Feb 21 '14 at 11:16

Why are scientific programming languages so weird?

4 Answers4