14

In Python and JavaScript, semi-colons are optional.

In PHP, quotes around array-keys are optional ($_GET[key] vs $_GET['key']), although if you omit them it will first look for a constant by that name. It also allows 2 different styles for blocks (colon, or brace delimited).

I'm creating a programming language now, and I'm trying to decide how strict I should make it. There are a lot of cases where extra characters aren't really necessary and can be unambiguously interpreted due to priorities, but I'm wondering if I should still enforce them or not to encourage consistency.

What do you think?


Okay, my language isn't so much a programming language as a fancy templating language. Kind of a cross between Haml and Django templates. Meant to be used with my C# web framework, and supposed to be very extensible.

mpen
  • 1,889
  • 19
  • 29
  • What are good programming habits to you? – LennyProgrammers Feb 19 '11 at 08:46
  • Who's your target audience? "Programmers" are different too, you know. – etranger Feb 19 '11 at 09:41
  • 22
    This is a topic for a holy war. –  Feb 19 '11 at 11:35
  • 1
    Pythonistas discourage the use of semicolons. Frankly, I am not sure if they are needed at all - only to allow multiple statements per line, which can be totally avoided without suffering. So ... I am in favor of stricter languages. However, sometimes you can enforce things outside of a language with code analysis tools, such as StyleCop. – Job Feb 19 '11 at 17:21
  • @Lenny222: I guess I should have said "consistency" instead. Will edit. – mpen Feb 19 '11 at 18:16
  • 1
    The quotes for PHP array keys are not optional. They were in PHP2, but later versions auto-defined constants then. They are *disallowed* in basic string interpolation `"..$_GET[raw].."` however. – mario Feb 19 '11 at 19:42
  • Just a little note: I believe the alternate style for blocks (with colon) is not an example of looseness, but is actually needed for readability when using control structures in HTML views. – Matteo Riva Feb 20 '11 at 00:21
  • Your first sentence is an apples-to-oranges comparison. In Python, the normal statement terminator is a newline, and semicolons separate multiple statements on a line. In JavaScript, the semicolon is the normal statement terminator, which is optional in some cases. – dan04 Feb 20 '11 at 13:32
  • @dan04: I wasn't comparing JS to Python, I was just pointing out that they aren't necessary at the end of every statement, as an example of "looseness" in programming languages. – mpen Feb 20 '11 at 17:04
  • @kemp: I wouldn't say it's needed. You could always use the regular `{}` syntax, and just put a comment beside the closing `}` to let other developers know *what* it's closing. – mpen Feb 20 '11 at 17:06
  • @mario: I mean when used in strings, like `"$_GET[raw]"` no? I seem to remember that being the case. – mpen Feb 20 '11 at 17:10
  • 1
    @Ralph: the rules are a bit more complicated. It's correct to write `"xx$_GET[raw]xx"` -- If you start to use curly braces, then the key must be enclosed `"xx{$_GET['raw']}xx"` in quotes. If curly braces are used then the ordinary PHP parser checks it and the stringent syntax applies. It's just for `"$_GET[x]"` that the key is treated as raw string, and that's also a strict rule, PHP would parse error on `"$_GET['x']"`. – mario Feb 20 '11 at 17:14
  • @mario: and in `"$_GET[x]"` will `x` always be treated as a string key, or will it look for a constant named `x` first? – mpen Feb 20 '11 at 18:01
  • @Ralph: In this context it will always interpret it as final string, and neither lookup nor define a constant. (But that's just backwards-compatible syntax, not that relevant for your actual topic.) – mario Feb 20 '11 at 18:30
  • @mario: No... none of this was relevant. Btw, from the docs "This is wrong, but it works. The reason is that this code has an undefined constant (bar) rather than a string ('bar' - notice the quotes). PHP may in future define constants which, unfortunately for such code, have the same name. It works because PHP automatically converts a bare string (an unquoted string which does not correspond to any known symbol) into a string which contains the bare string. For instance, if there is no defined constant named bar, then PHP will substitute in the string 'bar' and use that." – mpen Feb 20 '11 at 19:39
  • @mario: So, like I said in the first place... if the constant doesn't exist, it'll treat it like a string. – mpen Feb 20 '11 at 19:40
  • @Ralph: That part of the docs refers to normal PHP. As I said there is a [different handling](http://stackoverflow.com/questions/2405482/is-it-okay-to-use-arraykey-in-php/5059548#5059548) of arrays in PHP and of arrays in strings. See the example below. (And if the constant doesn't exist, it will be auto-defined, and then do a redundant lookup - for PHP code.) – mario Feb 20 '11 at 19:44
  • 2
    @mario: The very fact that we're even having this conversation means there's some ambiguity and confusion in the way array keys are handled. It seems inside strings, it's unambiguous but inconsistent (you can't use quotes when already in a string, unless you use curly braces, then you have to, but outside you should). And outside strings, in "normal" PHP...well, it does weird crap like that. – mpen Feb 20 '11 at 19:49
  • @Ralph: So the ambiguity in this specific syntax idiosyncrasy isn't the actual problem. It's that one variant nowadays can have unexpected side-effects. The problem for PHP thus is that this loose construct was valid once, and later restriction of that syntax led to problems. Maybe that's already a relevant consideration for your intended templating syntax. – mario Feb 20 '11 at 20:15
  • In regards to optional semicolons, check out this jsfiddle using a language where semicolons are optional: https://jsfiddle.net/67ptmd3u/ – Matthew Aug 31 '17 at 14:40
  • @Matthew The best part is that deleting the semi-colons doesn't solve the problem either. You're hooped either way. The fact that they're optional means we all suffer :-) – mpen Aug 31 '17 at 21:02

11 Answers11

26

What I look for in a programming language (as opposed to a scripting language) is consistency and strong typing.

In current programming languages it is possible to omit the semicolon for instance in certain places without becoming ambiguous (the last expression in a {} block is one). If a programming language allows you to omit characters in these cases, a programmer now has an extra problem; on top of the general language syntax she now has to know in which cases it is allowed to omit parts of the syntax too.

This extra knowledge is no problem for the programmer writing the code, but it becomes a burden to anyone who has to interpret existing code at a later point in time (including the original author after a while).

Your PHP example opens the possibility for subtle bugs in a program when the constant key would be added in the same context. The compiler has no way of knowing that is not what was meant, so the problem only becomes apparent at runtime instead of compile time.

Desolate Planet
  • 6,038
  • 3
  • 29
  • 38
rsp
  • 1,127
  • 8
  • 13
  • 1
    Agree, you should limit possibilities for developpers : more possibilites => need to think more (should i proceed this way or that way) => less time to do actual work – Kemoda Dec 21 '11 at 10:39
  • I fail to see what the lack of implicit type casting has to do with the syntax of a language. – dan_waterworth Jul 20 '12 at 16:57
  • 5
    Also, when you read `$_GET[key]` you know nothing. You end up grepping the whole project just to know whether `key` is a constant or not. Such thing saves 0.5 seconds of writing and takes 20 seconds to read. – Zippo Jul 20 '12 at 21:37
  • If your language gives you options without distinction, the coding-style - whether codified or not - tends to standardize on one of them... – Deduplicator Aug 31 '17 at 16:00
19

Different types of languages have different uses, so the answer to this question really depends on what your going to be using it for.

For example, Perl is a very loose language, and I find it very useful for writing quick fixup or number crunching scripts. For solid robust projects I use C#.

You need to get the balance right for the target useage. The more strict it is, the longer you need to spend writing the code, but you get greater robustness, reusability, and easier to maintain.

BG100
  • 306
  • 3
  • 7
11

Every place where there's some ambiguity, the compiler needs to have some way to guess what the programmer really meant. Every time this happens, there's the chance that the programmer really meant something different, but didn't have the ambiguity-resolution rule down.

Writing logically correct code is hard enough already. Adding syntactical ambiguities may seem "friendly" on the surface, but it's an open invitation to introduce new, unexpected, hard-to-debug bugs into the codebase. Bottom line, be as strict as possible.

From your example, you mentioned that semicolons are optional in Python and JavaScript. For Javascript, at least, this is not entirely true. Semicolons are just as required in JS as they are in any other C family language. But the JavaScript parser is required by the language specification to insert missing semicolons under certain circumstances. This is widely regarded as a very bad thing because it tends to get your intentions wrong and screw up your code.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
6

The answer to how loose you should make your language is equal to the answer of the question said in an Texas accent "How lucky do you feel, punk?".

Henrik
  • 634
  • 4
  • 8
  • I don't get it. – mpen Feb 19 '11 at 22:18
  • 4
    My bad attempt at a joke was that dynamic typing might bite you as the systems grow bigger and bigger, especially when adding inexperienced developers to the mix. In my experience systems of any value tend to grow bigger and bigger and have an increasing number of developers developing them. Having "Find all usages of symbol" or "Rename all" or "Safe Delete" or "Find errors in solution" then, is absolutely invaluable. Dynamic typing in the limited sense that VB is late-bound and does extensive type-coercion has caused _many_ bugs in at current gig. – Henrik Feb 19 '11 at 23:02
  • Ergo, if you feel lucky about your project, for example lucky to have good and experienced devs, or lucky in terms of writing correct code; you may use dynamic typing. – Henrik Feb 19 '11 at 23:05
  • 2
    Ah... but this question was never really about dynamic typing :) – mpen Feb 20 '11 at 17:08
  • 1
    Ah, very true Raplh. I just tend to think of dynamic languages as more loose as they usually are more loose. You are right though. – Henrik Feb 20 '11 at 17:10
  • A question: would you call type-inferred lambda calculus loose or strict if it always chooses the most general types? – Henrik Feb 20 '11 at 17:12
  • If it chose the most general types? Wouldn't that always be "object" (assuming they all derive from that)? That would essentially be loose I think, but why would you want that? – mpen May 29 '12 at 15:42
  • It's not necessary to have object be a root of everything. You can have many fine types without having a hierarchy in every object. You could e.g. choose a type-class that has the largest number of methods attached to it based on type-inference. – Henrik May 29 '12 at 19:33
  • Like Traits, you mean? I think it should be obvious to the developer what the compiler will pick, otherwise the code becomes hard to read. If there isn't an obvious choice, force the developer to be explicit. – mpen May 29 '12 at 20:12
  • +1 for the Dirty Harry reference. Some people abuse of their luck when it comes to programming good manners and style. Sadly the ones who have to maintain the code are usually the unlucky ones. – Tulains Córdova Oct 30 '12 at 23:54
4

Everyone wouldn't have to work so hard for coding consistency if the languages didn't have so much variation. We don't like it when users make requests that unnecessarily increase complexity, so why should be ask that of our development languages?

JeffO
  • 36,816
  • 2
  • 57
  • 124
  • +1: I totally agree. I do not see why principles like KISS and YAGNI should not apply to language design. – Giorgio Aug 31 '17 at 21:54
2

My personal preference is for the ability to have just enough strictness to catch my typos, but with as little extra boilerplate as possible. I talk about this issue at http://www.perlmonks.org/?node_id=755790.

That said, you're designing your language for yourself. You should make it be whatever you want it to be.

btilly
  • 18,250
  • 1
  • 49
  • 75
  • +1: *...The ability to have just enough strictness to catch my typos, but with as little extra boilerplate as possible.* - Yes. Are you familiar with Anders Hejlsberg's plan for C#? He's making a conscious decision to emphasize "essence over ceremony". http://channel9.msdn.com/Blogs/matthijs/C-40-and-beyond-by-Anders-Hejlsberg – Jim G. Feb 19 '11 at 15:59
  • @jim-g: Thanks for the thought. I am not familiar with much of anything about C#. I have not worked in the Microsoft world for many, many years. – btilly Feb 19 '11 at 16:26
1

I would suggest that a good programming language should have strict rules, which implementations would be expected to enforce consistently, but the rules should be written in such fashion so as to be helpful. I would further suggest that one should consider designing a language to avoid cases where the "Hamming distance" between two substantially-different programs is only one. Obviously one can't achieve such a thing with numeric or string literals (if a programmer who meant 123 instead types 1223 or 13, the compiler can't very well know what the program meant). On the other hand, if language were to use := for assignment and == for equality comparison, and not use a single = for any legal purpose, then would greatly reduce the possibilities both for accidental assignments which were supposed to be comparisons, and accidental do-nothing comparisons which were supposed to be assignments.

I would suggest that while there are places where it is useful for compilers to infer things, such inference is often most valuable in the simplest cases, and less valuable in the more complicated cases. For example, allowing the replacement of:

  Dictionary<complicatedType1,complicatedType2> item =
    new Dictionary<complicatedType1, complicatedType2()>;

with

  var item = new Dictionary<complicatedType1, complicatedType2()>;

does not require any complicated type inference, but makes the code vastly more readable (among other things, using the more verbose syntax only in scenarios where it's needed, e.g. because the type of the storage location doesn't precisely match the type of the expression creating it, will help call extra attention to places that may require it).

One major difficulty of attempting more sophisticated type inference is that ambiguous situations may arise; I would suggest that a good language should allow a programmer to include information to the compiler could use to either resolve such ambiguities (e.g. by regarding some typecasts as preferable to others), determine that they don't matter (e.g. because even though two possible overloads may execute different code, the programmer has indicated that they should behave identically in those cases where either could be used), or flag those (and only those) which cannot be handled in either of the above ways.

supercat
  • 8,335
  • 22
  • 28
1

To me, readability is most important.

To someone experienced with the language, a code fragment's meaning should be clear without having to analyze the context deeply.

The language should be able to flag mistakes as often as possible. If every random sequence of characters makes a syntactically correct program, that's not helpful. And if variables are automatically created the first time they are used, then misspelling client as cleint will not give you a compile error.

Besides the syntax, the language should have a clearly-defined semantics, and maybe that's even harder than deciding on a decent syntax...

Good examples:

  • In Java, "1" is a string, 1 is an int, 1.0 is a double, and 1L is a long. One look and you know what it is.

  • In Java, = is the assignment. It assigns the value for primitive types and the reference for reference types. It never copies complex data or compares.

  • In Java, calling a method needs parentheses und this way is clearly distinguished from variables - so, if there's no parenthesis, you don't need to search for a method definition, it's just reading data.

Bad examples:

  • In Java, a symbol like client can be nearly anything: a package path element, a class or interface name, an inner class name, a field name, a method name, a local variable, and even more. It's up to the user to introduce or obey naming conventions or not.

  • In Java, the dot . is over-used. It can be separator within the package name, separator between package and class, separator between outer and inner class, connector between instance expression and method to be invoked on the instance, and many more.

  • In many languages, the curly braces of the if blocks are optional, leading to nasty mistakes if someone adds one more statement to the (not really existing) block.

  • Infix operators: sometimes I have to stop at a numerical expression and think hard what it means, step-by-step. We are all used to write math expressions in infix notation like a * b / c * d + e. Most of the time we remember the precedence of multiplication and division over addition and subtraction (but did you realize that we're not dividing by c*d, but dividing only by c and then multiplying by d?). But there are so many additional infix operators with their own precedence rules and in some languages overloading that it's hard to keep track. Maybe enforcing the use of parentheses had been a better approach...

Ralf Kleberhoff
  • 5,891
  • 15
  • 19
  • You've mostly talked about ambiguity, but there can be multiple ways of doing the same thing without creating ambiguity. Maybe we can have two multiplication operators, `*` and `×`. Both `5*3` and 5×3` mean the same thing, and an experienced programmer knows exactly what they mean without having to look around at surrounding context. The problem, however, is that there's now two ways of doing the same thing and someone might swap between them throughout the program. I believe this is what I was more concerned about when I asked the question. – mpen Aug 31 '17 at 20:59
1

I generally tend to fall on to the side of "What would make it easier for me as a programmer". Of course that can mean more than one thing. In Javascript there is almost no type checking, which works great until you hit a weird bug. On the other hand in Haskell there is a lot of type checking which puts more of the work up front but clobbers some classes of bugs.

To be honest I would check out a bunch of languages to see what they do and try to find a niche that none of them hit!

I don't think there is one obvious right way to do it, or at least if there is its not something people have found a consensus on yet. So by creating languages with different type systems we are learning.

Good luck.

Zachary K
  • 10,433
  • 2
  • 37
  • 55
1

I like my languages to do what I mean. Generally that leans pretty hard towards loose. I also would like to be able to tag "strict" on an element or block to be able to debug/analyze that limited area.

Paul Nathan
  • 8,560
  • 1
  • 33
  • 41
-2

You might consider an analogy with natural language. In email, are you a Grammar Nazi? Or are you okay with some grammatical errors, such as split infinitives, missing conjunctions, or misplaced modifiers. The answer boils down to personal preference.

emallove
  • 101
  • 4