53

Some (link 1, link 2) programming languages allow spaces in their identifiers (e.g. variables, procedures) but most of them don't and instead programmers usually use camel case, snake case and other ways to separate words in names.

To support spaces or even other Unicode characters some programming languages allow encapsulating the name with a certain character to delimit its start and end.

Is it a bad idea to allow spaces or is it just commonly not allowed for historical reasons (when there were more limitations than now or simply being decided not worth implementing)?

The question is more about the main pros and cons of implementing it in newly created programming languages.

Related pages: link 1, link 2.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
user7393973
  • 501
  • 1
  • 4
  • 9
  • Hopefully this question isn't off-topic or too broad. If you believe so please help me improve it. – user7393973 Nov 25 '19 at 12:14
  • 12
    If feature X is not widely used, then the most obvious answer is that the advantages don't overweight disadvantages. In this case, the disadvantage is complication in parsing the language. The advantage is ability to make nicer variable names. Which can be worked around by various casing schemes, as you noticed. – Euphoric Nov 25 '19 at 12:22
  • 5
    See also [Why can't variable names have spaces in them?](https://stackoverflow.com/questions/20769465/why-cant-variable-names-have-spaces-in-them). - Oh, oddly enough, see also [Why can't you rename the Recycle Bin?](https://devblogs.microsoft.com/oldnewthing/20080317-00/?p=23103) ("features start out nonexistent and somebody has to make them happen"). – Theraot Nov 25 '19 at 12:22
  • @Euphoric Isn't it just about as complicated as for example strings but with a different delimiter character? – user7393973 Nov 25 '19 at 12:28
  • @Theraot That can be indeed the reason why. That it simply wasn't decided to be implemented as a feature back then. But what about in today's time, if implementation isn't a problem, is there any significant negative downside to doing so? – user7393973 Nov 25 '19 at 12:37
  • 3
    @user7393973 Is there any significant positive upside to doing so? I mean there are other things to implement, it is opportunity cost. Yes, I know you can be a little extra expressive as programmer, however, think as language designer... Does this enable developer to implement some feature easier? Does this make programming safer? What? – Theraot Nov 25 '19 at 13:07
  • @Theraot The positive upside is making things easier to read and express as with other [syntactic sugar](https://en.wikipedia.org/wiki/Syntactic_sugar) implementations. – user7393973 Nov 25 '19 at 13:13
  • @Theraot I agree that it can be seen not worth it as its not needed for a language's functionality. Since why I made the question more about the downsides, ignoring how complicated and worth it or not it is to implement it. – user7393973 Nov 25 '19 at 13:21
  • btw PHP allows unicode characters in identifiers (at least if the source file is UTF-8 encoded), but not spaces. identifiers can even be chess pieces! - https://3v4l.org/gpem6 – hanshenrik Nov 25 '19 at 22:19
  • `Camel` is spelled with one 'M.' – David Conrad Nov 25 '19 at 22:59
  • 7
    I'm still annoyed by spaces being allowed in file names. – Carey Gregory Nov 26 '19 at 01:15
  • 8
    Whitespace is the **strongest** separator in a language, our brain can scan whitespaces much easier than they can scan for dots, commas, or brackets. On the other hand, like words, variable names represents a single atomic concept in the code that we write, they're the smallest, most tightly bound concept that we build other elements out of. – Lie Ryan Nov 26 '19 at 03:28
  • If you notice the coding convention in most languages usually take a lot of care about specifying where to put whitespaces. Generally, in most coding conventions, you might also notice the tendency that most important delimiters would also be accompanied with whitespaces, or the convention would also specify to omit whitespace when the convention designer thinks that there's a stronger bindings between some groups of elements than to their surroundings. – Lie Ryan Nov 26 '19 at 04:29
  • Allowing whitespace in variable names would also imply that, when the code is syntax coloured, these names would be disconnected. Unless you used background colouring, in which case, you lost the ability for background colours to convey another stronger meanings that should actually grab your attention (e.g. errors). – Lie Ryan Nov 26 '19 at 04:34
  • @hanshenrik A little strange in my opinion that it supports unicode characters but not spaces. – user7393973 Nov 26 '19 at 08:06
  • @DavidConrad Yeah I did noticed the mistake at some point but didn't felt bumping the question with a 1 character edit, someone did it for me though. – user7393973 Nov 26 '19 at 08:07
  • @CareyGregory Why are spaces in file names annoying to you though? – user7393973 Nov 26 '19 at 08:08
  • @LieRyan I think the colouring part would just be kind of like it is with strings with spaces. – user7393973 Nov 26 '19 at 08:10
  • @hanshenrik what they mean with multiple words in PHP variable names is this: https://3v4l.org/lJopb - simiar to multiple words in Javascript variable names: https://liveweave.com/ljVr0I – Theraot Nov 26 '19 at 12:00
  • 1
    @user7393973 Because they require special handling on the command line, in scripts, URLs, etc. – Carey Gregory Nov 26 '19 at 14:50
  • One well-known investment bank has a proprietary language with spaces in variable names. It seems to work reasonably well. – rlms Nov 26 '19 at 14:53
  • @lieryan that's not true in all languages. Japanese doesn't have any separation between characters/words typically. – Andy Nov 26 '19 at 23:41
  • There are lots of languages with spaces in identifier names: https://github.com/featurist/pogoscript/tree/master/examples and https://jamesboer.github.io/Jinx/examples.htm for example. They don't need quotes or other delimiters and are very readable. – Jerry Jeremiah Feb 16 '23 at 21:07

7 Answers7

104

Consider the following.

 var [Example Number] = 5;
 [Example Number] = [Example Number] + 5;
 print([Example Number]);
 
 int[] [Examples Array] = new int[25];
 [Examples Array][[Example Number]] = [Example Number]

Compare it with the more traditional example:

 var ExampleNumber = 5;
 ExampleNumber = ExampleNumber + 5;
 print(ExampleNumber);
 
 int[] ExamplesArray = new int[25];
 ExamplesArray[ExampleNumber] = ExampleNumber;

I'm pretty sure you noticed that the strain for your brain to read the second example was much lower.

If you allow whitespaces on an identifier, you'll need to put some other language element to mark the start and the stop of a word. Those delimiters force the brain to do some extra parsing and, depending on which one you pick, create a whole new set of ambiguity issues for the human brain.

If you don't put delimiters, and try to infer what identifier you're talking about when typing code by context only, you invite another type of can of worms:

 var Example = 5;
 var Number = 10;
 var Example Number = Example + Number;

 int[] Examples Array = new int[25];
 Examples Array[Example Number] = Example Number;

 Example Number = Example Number + Example + Number;
 print text(Example Number);

Perfectly doable.

A total pain for your brain's pattern matching.

Those examples are painful to read not only because of the choice of the words I'm picking, but also because your brain takes some extra time to identify what is every identifier.

Consider the more regular format, once again:

 var Example = 5;
 var Number = 10;
 var ExampleNumber = Example + Number;

 int[] ExamplesArray = new int[25];
 ExamplesArray[ExampleNumber] = ExampleNumber;

 ExampleNumber = ExampleNumber + Example + Number;
 printText(ExampleNumber);

Do you notice something?

The names of the variables are still terrible, but the strain to read it went way down. That happens because your brain now has a natural anchor to identify the beginning and the ending of every word, enabling you to abstract away that part of your thinking. You don't need to worry about that context anymore - you see a break in the text, you know it is a new identifier coming.

When reading code, you brain doesn't much read the words as much as it matches it with what you have in your mind right now. You don't really stop to read "ExampleWord". You see the overal shape of the thing, ExxxxxxWxxd, matches it with whatever you have stashed in your mental heap, and them go ahead reading. That's why it is easy to miss up mistakes like "ExampleWord = ExapmleWord" - your brain isn't really reading it. You're just matching up similar stuff.

Once more, consider the following:

 Example Word += Example  Word + 1;

Now imagine yourself trying to debug that code. Imagine how many times you'll miss that extra space on "Example Word". A misplaced letter is already hard as fork to detect at first glance; an extra space is an order of magnitude worse.

In the end, it is hard to say that allowing whitespaces would make the text more readable. I find it difficult to believe that the added hassle of extra terminators and the extra overhead on my brain would be worth to use this type of functionality if the language I'm working with had it.

Personally, I consider it bad design - not because of the hassle on the compiler, interpreter, or whatever, but because my brain trips on those spaces thinking that it is a new identifier that is about to begin, when it is not.

In a sense, our brain suffers the same problems than our processors, when it comes to branch prediction.

So please, be kind to our trains of thought. Don't put whitespaces on your identifiers.


I completely forgot to add a mention to a language I use every single day accepts spaces in identifiers - SQL!

That doesn't mean it is a good idea to use them, however. Most people I know agree it's a Bad Idea to shove spaces around on your identifiers - to the point it's sometimes a forgotten feature of the language.

T. Sar
  • 2,045
  • 1
  • 14
  • 20
  • Changing the delimiters to backquotes/backticks/grave accents (`) like MySQL's aliases does make it easier to read than square brackets which is also used for the array syntax but I do understand what you mean. The double space example can happen almost as likely as a double underscore using snake case with the difference being visual since usually its harder to count invisible characters. I think the actual brain problem is from the multiuse of spaces in and out of identifiers because I'm used to seeing spaces used as seperators and inside strings which are less common than in those examples. – user7393973 Nov 25 '19 at 15:22
  • While misuse and abuse of it can indeed make the code worst to read I still think it can have good benefits in some circumstances so it would be nice if it was a more common option, maybe. – user7393973 Nov 25 '19 at 15:27
  • The colors used for names might matter. See also [my answer](https://softwareengineering.stackexchange.com/a/401609/40065). My brain definitely depends on syntax coloring. I dream of code in colors. – Basile Starynkevitch Nov 25 '19 at 16:05
  • @BasileStarynkevitch When dealing with identifiers, almost all of them will have the same color. Number, [Example Number], and Example will all be colored baby blue, for example. That's what happens on SQL and other whitespace-supporting languages. – T. Sar Nov 25 '19 at 16:37
  • 16
    "If you allow whitespaces on an identifier, you'll need to put some other language element to mark the start and the stop of a word." …or you just need a parser that's flexible enough to figure it out from the surrounding context. [Inform 7](http://inform7.com) is one example of such a language. – Ilmari Karonen Nov 25 '19 at 20:38
  • 1
    @IlmariKaronen I put this on a example a few lines down below the point you're poiting out! =) – T. Sar Nov 25 '19 at 21:00
  • @IlmariKaronen: I just saw your comment now but I had exactly the same thought; see my answer for some examples. – Eric Lippert Nov 25 '19 at 23:02
  • You just need to design your grammar in a way that you never have two identifiers near each other (always separated by an operator, keyword or other syntactical element like the `:` for separating the type like in Pascal, Scala, Kotlin). Then your third example becomes a lot less problematic. – Paŭlo Ebermann Nov 26 '19 at 00:27
  • @PaŭloEbermann That's good in theory, but overly "decorated" languages tend to be even harder to read. I won't go as far as saying that mostly heavily decorated languages are _bad_ - some are very expressive and very powerful - but everything in the end boils down to balance - to enable whitespace as part of my identifier, what else do I need to give up? What adaptations should I make to my grammatical tree? Will those adaptations make it easier or harder to understand? Those are hard questions. – T. Sar Nov 26 '19 at 01:02
  • 17
    I remember a university lecturer's comment years ago, where in response to a student's question, he said a computer could perfectly interpret the context of a single word re-used for everything (variable name, function, array, arguments etc), it was the *human brain* that couldn't, and so programming constraints were largely there for humans, not machines. Blew my mind. – SE Does Not Like Dissent Nov 26 '19 at 03:09
  • The traditional way is to use `underscores_for_spaces` in traditional programming languages. What you have written is ***`NOT_TRADITONAL`*** at all. And all those `capital_letters` also look odd. :) – tchrist Nov 26 '19 at 04:29
  • 5
    @T. Sar - Reinstate Monica: Same color? Only if everybody that reads the code used an editor that supports coloration, and everybody has the same color settings. Coloration gives me headaches from the eye strain, and that's before I get to people that think dark blue text should be readable on a black background :-( – jamesqf Nov 26 '19 at 04:34
  • 3
    Setting aside languages that *have* allowed spaces in identifiers, outside of programming and pure math people still write equations in which a single quantity is represented by multiple words separated by spaces. It isn't hard for the brain to process if you're otherwise careful about style. I didn't find the third example particularly more difficult to read than the fourth example once I prepared myself to read names with blanks. But so few languages offer the feature, I think we've just gotten used to the idea that names are blank-free, and it's become a habit that dies hard. – David K Nov 26 '19 at 04:48
  • 12
    "*I'm pretty sure you noticed that the strain for your brain to read the second example was much lower.*" it was lower, yes, but partly because I've been exposed to that *a lot*, while this is my first brush with the first style. I don't think it's valid to declare this a clear "win". Especially since I *have* seen very hard to read camel cased names. Sometimes they grow long by necessity and spaces honestly improve them. Due to lack of spaces, I've sometimes broken convention and used underscores to specifically remove the strain of reading `longAndComplexYetDescriptiveIdentifierIWrote` – VLAZ Nov 26 '19 at 07:11
  • @IlmariKaronen While it might be possible to implement it in a way without the start and end I think it is better with them to visualize the start and end more easly. – user7393973 Nov 26 '19 at 08:15
  • @VLAZ I agree that the examples are in a way to show the negative side of allowing spaces and there isn't an example to demonstrate the positive part of it. – user7393973 Nov 26 '19 at 08:21
  • 1
    @VLAZ exactly the point I was going to make. Having spent 15 years reading camelCase code, it comes naturally for me, whereas looking at code with spaces in is not as natural, so took some time to process. Someone new to coding will struggle with camelCase - I wonder whether they would struggle more with camelCase or spaced variable names? I think the processing time would certainly be a lot closer. – Gavin Coates Nov 26 '19 at 14:57
  • As for the last point in the answer about double spaces - i'm afraid I spotted the double space almost instantly. Likewise I always spot extra spaces in word documents and printed text relatively easily - just as easily as I do with typos, so i'm not sure I buy that argument. – Gavin Coates Nov 26 '19 at 14:57
  • @GavinCoates Those arguments aren't necessarily valid to everyone. You might have spotted it instantly, but I know that I stumble upon those things quite easily. – T. Sar Nov 26 '19 at 15:19
  • 1
    @T.Sar-ReinstateMonica: I once did a survey of all natural languages to determine which one was the best, and it turns out it is English. English is the only language in which *the words come in the same order that I think them*. All those other languages must cause a lot of mental strain on their speakers, who think "the white dog" and then have to say "le chien blanc", inverting the correct word order. :) – Eric Lippert Nov 26 '19 at 15:37
  • 1
    Silly analogies aside: familiarity matters, but only to a point. C# for instance was designed to be immediately familiar to C, C++ and Java programmers, but we deliberately chose to strongly discourage `SHOUTING_SNAKE_NAMING_CONVENTION` even though that is familiar to programmers in those languages because, well, because it is horrid. It needlessly focuses attention on what is likely the part of the program that needs little attention. – Eric Lippert Nov 26 '19 at 15:41
  • That is actually pretty interesting! My native language (portuguese) has a lot of meaning tucked inside the word positioning ("Grande Cachorro" and "Cachorro Grande" are two totally different things). The directness of English is quite helpful while programming. – T. Sar Nov 26 '19 at 15:43
  • 1
    @T.Sar-ReinstateMonica: There is a famous story about JRR Tolkien, who was a professional linguist in his day job. He said once that he wrote his first fantasy story when he was a small child and remembers only one thing about it: that "the great green dragon" is correct English, and "the green great dragon" is not, and that even as a professional linguist who studied the history of English, he was not sure why that is. In fact the "correct" order is far more complex than just color comes after size: https://dictionary.cambridge.org/grammar/british-grammar/adjectives-order – Eric Lippert Nov 26 '19 at 15:48
  • @EricLippert That was a good reading. Maybe that's why C# and similar languages seem to work better in the mental model [Type thing] than the languages that use it on the other way around [thing as type]. It is closer to how the brain is thinking, it seems. – T. Sar Nov 26 '19 at 16:11
  • 2
    @tchrist not in all languages. Python uses snake case with `names_like_this` but is actually in the minority. The most common is camel case with `namesLikeThis` These are not the only two styles. – Baldrickk Nov 26 '19 at 16:27
  • Telling the difference between `Example Word` and `Example Word` is tough enough, but "Example Word" and "Example  Word" in a proportional font are tougher to distinguish, and in a Web page, the extra space vanishes altogether unless special formatting is used to preserve it. Even then, if the line wraps at the break, good luck. I'm old enough to think of `Example␢␢Word` as being the only unambiguous way to write such a thing. – Monty Harder Nov 26 '19 at 18:54
  • Also note that this is only half the problem. The other half is getting the computer to recognize tokens too... `for While from 1 to 10 then while For > 0 then For-- endWhile print While endFor` – Draco18s no longer trusts SE Nov 26 '19 at 19:48
  • This answer is too definite for my taste -- certainly it seems like a bad idea, but I think this is a case for empirical evidence, not first-principles argument. – Eli Rose Nov 26 '19 at 19:49
63

Is it a bad design for a programming language to allow spaces in identifiers?

Short answer:

Maybe.

Slightly longer answer:

Design is the process of identifying and weighting conflicting solutions to complex problems, and making good compromises that meet the needs of stakeholders. There is no "bad design" or "good design" except in the context of the goals of those stakeholders, and you have not said what those goals are, so the question is too vague to answer.

Even longer answer:

As I've alluded to above, it depends on the goals of the constituency that the language designer is addressing. Let's consider two languages that I am familiar with: the human-readable form of MSIL, the low-level "intermediate language" that C# compiles to, and C#.

C# is intended to be a language that makes line-of-business developers highly productive in environments that Microsoft considers to be strategically important. In C#, an identifier is a sequence of one or more UTF-16 characters where all the characters are classified as alphanumeric or _, and the first character is not a number.

This lexical grammar was carefully chosen to have characteristics that match the needs of those strategically important LOB developers:

  • It is unambiguously lexable as an identifier; 1e10 for example must not be a legal identifier because it is lexically ambiguous with a double.
  • It supports idioms commonly used in C, C++ and Java, like naming a private field _foo. C# was designed to appeal to developers who already knew a common LOB language.
  • It supports identifiers written in almost any human language. You want to write var φωτογραφία = @"C:\Photos"; in C#, you go right ahead. This makes the language more accessible to developers who are not native English speakers.

However, C# does not support spaces in identifiers.

  • It would complicate the lexical grammar and introduce ambiguities that then must be resolved.
  • In the vast majority of interop situations, it is not necessary. No one names their public members to have spaces in them.

It was a good idea to disallow characters other than letters and numbers in C# identifiers.

In MSIL by contrast, you can name a function almost anything, including putting spaces or other "weird" characters in method names. And in fact the C# compiler takes advantage of this! It will generate "unspeakable names" for compiler-generated methods that must not be called directly by user code.

Why is this a good idea for MSIL and not C#? Because the MSIL use cases are completely different:

  • MSIL is not designed as a primary development language; it is an intermediate language so the primary use case for it is for compiler developers trying to understand the output of their compiler.
  • MSIL is designed to be able to interoperate with any legacy Microsoft development environment including pre-.NET Visual Basic and other OLE Automation clients, which allowed spaces in identifiers.
  • As noted above, being able to generate an "unspeakable" name for a function is a feature, not a bug.

So is it a good idea to allow spaces in identifiers? It depends on the use cases of the language. If you have a solid use case for allowing it, by all means allow it. If you don't, don't.

Further reading: If you want an example of a fascinating language that makes excellent use of complex identifiers, see Inform7, a DSL for text-based adventure games:

The Open Plain is a room. 
"A wide-open grassy expanse, from which you could really go any way at all."

This declares a new object of type room called The Open Plain, and that object can then be referred to as such throughout the program. Inform7 has a very rich and complex parser, as you might imagine.

Here's a more complex example:

Before going a direction (called way) when a room (called next location) is not visited:
  let further place be the room the way from the location;
  if further place is a room, continue the action;
  change the way exit of the location to the next location;
  let reverse be the opposite of the way;
  change the reverse exit of the next location to the location.

Note that way and next location and further place and reverse are identifiers in this language. Notice also that next location and the next location are aliased. (Exercise: what is this code doing to the data structure that maintains the map of rooms in the game?)

Inform7 has a constituency that wants full-on natural-seeming English language as the source code. It would seem strange to write this Inform7 as

  change the way exit of the location to the_next_location;

It's immersion-breaking to do so. Contrast this with T. Sar's (excellent) answer which makes the contrasting point -- that it is immersion-breaking for developers in LOB languages to try to mentally parse out where the identifiers are. Again, it comes down to context and goals.

Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
  • 6
    *squeals like a fangirl* You just absolutely made my day. – T. Sar Nov 25 '19 at 23:01
  • 3
    @T.Sar-ReinstateMonica: Glad to hear it. Our innate ability to understand languages is what we take advantage of when building software; I wish that there was a more principled approach to language design that included taking account of research into the sorts of pattern recognition tasks you mention in your answer. – Eric Lippert Nov 25 '19 at 23:10
  • Inform 7 is a fascinating example. I wouldn't usually consider MSIL or x86 assembly a "programming language", but thinking about this it seems hard to give a definition of programming language that wouldn't include MSIL. – Voo Nov 26 '19 at 12:35
  • @Voo: Those are definitely programming languages. They are not, say, *flipping toggle switches on the front panel to input the boot sequence of the computer in binary*. They are human-readable languages for programming computers, so they are programming languages. – Eric Lippert Nov 26 '19 at 15:32
  • I like the term "unspeakable" to refer to compiler-generated "illegal" names. It invokes sigils and runes and other black magic stuff that I'd prefer to leave to the compiler – Mike Caron Nov 26 '19 at 21:37
  • 1
    @MikeCaron: I don't remember who came up with that term but it was probably Anders. – Eric Lippert Nov 26 '19 at 22:11
  • Re *"It was a good idea to disallow characters other than letters and numbers in C# identifiers"*: In [Forth](http://en.wikipedia.org/wiki/Forth_%28programming_language%29), *any* character can be used *except* space (as that is the only separator). [It is still alive](http://amforth.sourceforge.net/) (not entirely [ancient](https://en.wikipedia.org/wiki/Jupiter_Ace)). – Peter Mortensen Nov 27 '19 at 00:28
  • @PeterMortensen: Indeed, and I note that FORTH has more immediate similarities to MSIL than it does to C#. – Eric Lippert Nov 27 '19 at 03:23
17

One relatively well-known example is of some Fortran code in which a single typo completely changed the meaning of the code.

It was intended to repeat a section of code 100 times (with I as the loop counter):

DO 10 I = 1,100

However, the comma was mistyped as a dot:

DO 10 I = 1.100

Because Fortran allows spaces in identifiers (and because it automatically creates variables if they haven't been declared), the second line is perfectly valid: it implicitly creates a spurious real variable called DO10I, and assigns it the number 1.1.  So the program compiled fine with no errors; it just failed to run the loop.

The code in question controlled a rocket; as you can imagine, that kind of mistake could have been catastrophic!  Luckily, in this case, the error was caught in testing, and no spacecraft were harmed.

I think this shows rather well one of the dangers in allowing spaces in identifiers…

gidds
  • 789
  • 3
  • 8
  • 4
    I would rather blame the language for automatically creating mistyped variables. Still a nice story. That , and . (both decimal separators depending on your locale) completely changed the meaning makes it even more 'funny'. Do you happen to have a source for this? – Ángel Nov 26 '19 at 02:11
  • @Angel The link I gave has a description, and also a link to the [RISKS Digest](http://catless.ncl.ac.uk/Risks/9.54.html#subj1) with an account from someone who worked at NASA at the time.  I first heard about it in the (fascinating and hilarious) book ‘Expert C Programming: Deep C Secrets’ by Peter van der Linden; pp.31–32 describe the incident, also citing the RISKS Digest.  It happened at NASA in Summer 1963, in a program calculating orbital trajectories for Mercury flights. – gidds Nov 26 '19 at 02:24
  • @Ángel that's just in this example. You can turn off implicit variable generation with `implicit none`, but I've dealt with a couple of similar errors in Fortran where this didn't help. – leftaroundabout Nov 26 '19 at 16:50
8

Is it a bad design for a programming language to allow spaces in identifiers?

You forgot important implementation details:

what is source code for you?

I like the FSF definition of it: the preferred form on which developers work. It is a social definition, not a technical one.

In some languages and their 1980s implementation (think of original SmallTalk and 1980 SmallTalk machines), the source code was not a sequence of characters. It was an abstract syntax tree and was manipulated by the user, with the mouse and keyboard, using some GUI.

In some sense, Common Lisp accepts spaces in its symbols.

You could decide (that is a lot of work) to co-design both your programming language (documented in some report giving both syntax and semantics), its implementation (as some software), and its editor or IDE (as some software).

Read old discussions on tunes.org. Read the old work at INRIA on

@TechReport{Jacobs:1992:Centaur,
 author =       {Jacobs, Ian and Rideau-Gallot, Laurence},
 title =        {a {\textsc{Centaur}} Tutorial},
 institution =  {\textsc{Inria} Sophia-Antipolis},
 year =         1992,
 number =       {RT-140},
 month =        {july},
 url =          {ftp://www.inria.fr/pub/rapports/RT-140.ps}
}

and

@techreport{donzeaugouge:inria-mentor,
 TITLE =        {{Programming environments based on structured
                 editors : the \textsc{Mentor} experience}},
 AUTHOR =       {Donzeau-Gouge, Véronique and Huet, Gérard and Lang,
                 Bernard and Kahn, Gilles},
 URL =          {https://hal.inria.fr/inria-00076535},
 TYPE =         {Research Report},
 NUMBER =       {RR-0026},
 INSTITUTION =  {{INRIA}},
 YEAR =         1980,
 PDF =
              {https://hal.inria.fr/inria-00076535/file/RR-0026.pdf},
 HAL_ID =       {inria-00076535},
 HAL_VERSION =  {v1},
}

See also my Bismon draft report and http://refpersys.org/

My RefPerSys dream is to co-design such a declarative programming language with a nice IDE for it. I do know it could take a decade. Feel free to think that we are crazy, in some sense we are!

From a usability point of view, syntax coloring and autocompletion is more important than spaces in identifiers (look into both GtkSourceView and CodeMirror for inspiration). Visually an underscore _ looks close to a space character. And if you code your own IDE, you might accept ctrl space as input for "spaces inside names". My opinion is that ℕ and ∀ should be "keywords", the question becomes how do you type them. I am dreaming of typing (inspired by LaTeX) \ f o r a l l ESC to get a ∀ (and I heard of some emacs submode for that).

NB: I hate Python (and Makefile-s) because white spaces (or tabs) are significant there.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
  • 2
    Ctrl-space is sometimes used for a Unicode "Non-breaking Space" (nbsp), which is _the_ logical character for a space that doesn't break the identifier in two tokens. – MSalters Nov 26 '19 at 13:52
  • 1
    Unicode special symbols don't need to be a big deal for normal text-based languages. Haskell has the `-XUnicodeSyntax` extension, with it I write all the time stuff like `both :: (∀ x . x → x) → (a,b) → (a,b)`, or `type ℤ = Integer`. Much the same for Agda, except it uses Unicode much more agressively. — Such symbols can, as you say, be entered with Emacs LaTeX-mode, or – IMO much more convenient – [Vim digraphs](https://vimhelp.org/digraph.txt.html#Digraphs). – leftaroundabout Nov 26 '19 at 20:56
7

It is not inherently bad design to allow spaces in symbol names. This can be shown with a simple counter-example.

Kotlin allows spaces in names. It also has official coding conventions which state when it is ok to use this feature:

Names for test methods

In tests (and only in tests), it's acceptable to use method names with spaces enclosed in backticks.

Example:

class MyTestCase {
     @Test fun `ensure everything works`() { /*...*/ }

"Good" and "bad" is of course subjective, but using spaces in test method names make the test code much nicer to read, and also tests results nice to read, without test coder needing to repeat themselves by having an ugly method name and a human readable test description separately.

The important point here is, these methods will not normally be explicitly called from code written by humans, so only place where the name appears is at the method definition. I think this is an important distinction for considering when spaces might be a good idea in symbol names: only when the symbol is written only once by the programmer.

hyde
  • 3,744
  • 4
  • 25
  • 35
3

Rule of thumb:

Errors are proportional to the time it takes to read code out loud.

Anything that increases the number of open bracket, close bracket, open curly brace, close curly brace, open parenthesis, close parenthesis... will increase the number of errors in the code.

This is one reason why * is star or splat, and not asterisk. # is shhh, ! is bang. Mathematicians I suspect have short verbal expressions for their symbols too, I'm sure.

It's why tech fields fill with acronyms and abbreviations: We think in words. We have a finite attention span, and can hold only so many symbols in our head. So we group and lump things together.

ReallyReallyLongIdentifier can do the same thing. There the tradeoff is between remembering what it's for, and getting tangled up in our thought processes. But ReallyReallyLongIndentifer is still better than QzslkjfZslk19

The further away from it's creation it's used, the more it needs to be memorable. Thus, i,j,k used for loop constructs -- like mayflies they live for the life of a loop, and that loop starts and ends on the same screen.

This extends to coding too:

A=FunctionAlpha(21,$C,$Q)

B=FunctionBeta($A,$D,$R)

is cleaner than

B=FunctionBeta(FunctionAlpha(21,$C,$Q),$D,$R)

I think this is one reason why spread sheets have such abysmal error rates bad coding: Except by adding temporary cells/rows/columns, there is no way to avoid messy nested statements.

  • One thing I really dislike about adding Unicode characters to identifiers is that it can make it impossible to verbalize code. Many Unicode characters look almost indistinguishable, and someone trying to read code aloud would be hard-pressed to reliably describe them in a manner that was concise yet unambiguous (or even correct!). – supercat Nov 26 '19 at 19:37
  • 2
    Though you make a good point, reducing the amount of "punctuation ceremony" in the code also has undesirable effects. For example, consider languages in the CAML family. In those languages *so many things* are legal expressions, and there is so *little* punctuation, that it is easy to get into a situation where you've got something syntactically invalid, but it is hard for either humans or machines to read the code and guess where the real problem is, because so much of the code looks right. – Eric Lippert Nov 26 '19 at 22:15
  • Contrast that with curly-brace languages; if we see `class A { void M() { class B{}` then the compiler can make a good guess that there is a close curly missing before `class B` and help you out. You are right to call out that code is "clumpy"; by using punctuation to clearly delimit those clumps we make error detection easier for both humans and compilers. – Eric Lippert Nov 26 '19 at 22:19
0

It took me a LONG time to truly grok that there will never truly be a best language. For a programming team the most important aspects are that the language be well known, supported by many tools, should have minimum language syntax and should surprise you as rarely as possible.

For a single coder a powerful language that allows quick test/run cycles is great.

For an admin a language tailored to the operating system's shell language is critical.

For some working languages shared among disciplines, DSLs can be nice.

Is there a place for a language with spaces--probably. It violates the not-surprising rules but fits in very well with the DSL goals.

One thing I don't think anyone mentioned though--with a custom IDE you could actually have a hard space and a soft space. They would look similar (perhaps have different shades in the IDE).

For that matter, you can do it right now with any language--just put a toggle on your IDE to have underscores displayed as spaces. Anyone who makes eclipse plugins could probably do this in an hour.

It's also possible to pragmatically convert camel case to "words with spaces", your IDE could do that for you but it would be slightly weirder.

Bill K
  • 2,699
  • 18
  • 18