4

I design and implement languages, that can range from object notations to markup languages. In many cases I have considered restrictions in favor of sanity (common knowledge), like in the case of control characters in identifiers. There are two consequences to consider before doing this:

  • It takes extra computation
  • It narrows liberty

I'm interested to learn how developers think of decisions like this. As you may know Microsoft C# is very open on the contrary. If you really want to prefix your integer as Long with 'l' instead of 'L' and so risk other developers of confusing '1' and 'l', no problem. If you want to name your variables in non-latin script so they will contrast with C#'s latin keywords, no problem. Or if you want to distribute a string over multiple lines and so break a series of indentation, no problem.

It is cheap to ensure consistency with restrictions and this makes it tempting to implement. But in the case of disallowing non-latin characters (concerning the second example), it means a discredit to Unicode, because one would not take full advantage of its capacity.

toplel32
  • 237
  • 1
  • 7
  • Rule of thumb: Make the right things easy to achieve and the dangerous ones hard to achieve. If it is the other way round, people will happily do the dangerous things, most of them without really knowing what they are doing. If you need a proof, look at PHP. – JensG Jun 02 '14 at 09:54
  • I'd not use a language that tries to babysit my naming practices, it's guaranteed to do a poor job at it. – jwenting Jun 05 '14 at 04:47

5 Answers5

10

No matter how restrictive a language is, some programmers will still find ways to screw things up. Imagine the following piece of code:

var price = this.ApplyVAT(basePrice);
if (this.acceptsRebate)
{
    return new RebateCalculator(RebateType.Default).Transform(price);
}

return price;

Let's screw it:

// Rebats calculator became variable cal.
var cal = new RebateCalculator(RebateType.Default);

// stylecop complain if i dont put comment here and leave line empty. don't know why??
//
// don't know price now so later maybe!!!
Price pp = null;

// Func is transforms price to price!
Func<Price, Price> tar = cal.Do;
var g = Tax(bPR);

// changed by MAG 02/08/2013
// if arebt2 is true, the condition should match.....
if (arebt2)
{
    // assign g to g2
    var g2 = g;

    // don't remember what tar mean but its very important!!! don't remove!!!
    pp = tar(g);
}
else
{
    // it's the same!
    pp = g;
}

// i can return now because i know price!!!
return pp;

The code still complies to the same IL, and doesn't violate any StyleCop or Code analysis rules (while StyleCop is damn strict forcing you to write code in a very precise way). But it's now a piece of crap, completely impossible to work with. How would you possibly design a language which tells that those comments are useless and annoying, that my naming conventions is among the poorest and that intermediary variables add more harm than good?

Using Unicode in names is a good example. In the code above, I haven't even used Unicode in the names of variables. [A-Za-z0-9_] is enough to give meaningless names. On the other hand, Unicode may be very helpful in improving code. This piece of code:

const numeric Pi = 3.1415926535897932384626433832795;
numeric firstAlpha = deltaY / deltaX + Pi;
numeric secondAlpha = this.Compute(firstAlpha);
Assert.Equals(math.Infinity, secondAlpha);

is readable enough, but allowing Unicode in names can empower the programmer to write this instead (the problem being that you won't be able to type variable names on a non-Greek keyword):

const numeric π = 3.1415926535897932384626433832795;
numeric α₁ = Δy / Δx + π;
numeric α₂ = this.Compute(α₁);
Assert.Equals(math.∞, α₂);

Some developers will consider that the second variant pushes naming too far, but it may be an excellent readability boost for some projects (such as a project written by a team of scientists).

Talking about naming conventions, I haven't seen a single rule which actually prevents giving bad names to variables. An example such as the code above can be prevented by a rule similar to:

The names of private variables should contain at least four characters. The names of public variables should contain at least six characters.

But this won't prevent renaming pp into ppppppp just to get rid of the warning. An additional rule:

The names can't contain a character consecutively repeated more than two times. For example, feed is a valid name, while feeed is not.

may also be circumvented by renaming the variable to ppabcde.

The actual goal

What's more important is that a well designed programming language should help avoiding errors. For example:

if (this.isEnabled)
    this.PlayVideo();
    this.PlaySubtitles();

is a sign that something went wrong. When I see that in a C# code, I blame:

  • The programmer, because it's so simple to always insert curly brackets, instead of taking a risk of losing thousands of dollars because of a stupid bug, hard to detect during code reviews and hard to find later,

  • The programmer, again, because he should have used proper tools (StyleCop in this case),

  • The language, because such syntax could have been forbidden,

  • The IDE, because it didn't reacted to the fact that the programmer screwed up indentation.

Conclusion

  • Be restrictive when it helps avoiding bugs and when the syntax flexibility is not particularly useful.

  • Don't be restrictive just to help beginners: they'll find ways to make their code unreadable anyway.

  • Add syntactic sugar when it helps readability most of the time, without being misused most of the time.

Example of Python

If you know Python, this is a good example of useful strictness. Persons who just started learning it are usually negatively surprised by its strictness:

What? I can't indent the code like I want? This is annoying!

or:

There are no switch statements in Python? Are you serious?! Are you telling that I have to use if-elif-else every time I need a switch?

A few years later, the person would confirm that:

  • Having meaningful indentation is an excellent idea, because it prevents subtle errors (like the one with the if above) and reduces the effort of writing code (instead of putting semicolons and indenting code, you deal only with indentation).

  • They don't need a switch, because they replace conditional by polymorphism anyway and when they actually need a mechanism similar to a switch, a dictionary is a perfect choice.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 1
    This isn't really what I meant. I was pointing at the finer details of a language, like valid codepoints. – toplel32 Jun 01 '14 at 14:23
  • @toplel32: I'm not sure I understand. This includes style, right? I also emphasized Unicode-related stuff in one of my latest edits, BTW. – Arseni Mourzenko Jun 01 '14 at 14:31
  • Yes you're getting closer with the last addition. – toplel32 Jun 01 '14 at 14:37
  • @toplel32: good. Let me know if there are ways to improve my answer. (Maybe you could give examples of the fine details of a language you're talking about?) – Arseni Mourzenko Jun 01 '14 at 14:39
  • I need more focus on the subject of identifiers. For example you mention the common word characters "A-Za-z0-9_" and personally I wish to exile any other characters completely. But a more exotic programmer could be a little upset to learn that his language has not been qualified to name stuff in a familiar way. Therefor Unicode Standard Annex 31, which is way to open imo. – toplel32 Jun 01 '14 at 14:45
  • "They don't need a switch, because they replace conditional by polymorphism anyway": good point, this is the object-oriented approach. `switch` is the procedural counterpart. – Giorgio Jun 01 '14 at 14:46
  • @toplel32: well, speaking about identifiers, as I've shown, Unicode may improve readability in hands of a skillful developer, while a non-skillful one can name variables `a`, `b`, etc. If forced to have a minimum length, this may transform into `aaaaa`, `bbbbb`. – Arseni Mourzenko Jun 01 '14 at 14:48
  • @MainMa: What would you think of a rule that required a program to include either directly or by reference a list of all characters that may be used in identifiers? I would think that would help to catch at least some problems involving normalized forms of combining diacritics (from what I understand, there's no way a compiler written today can know whether presently-undefined code point will in future be defined as a combination of combining marks; if code written using non-combined characters is modified with an editor using the combined form, who knows what havoc may ensue? – supercat Jun 03 '14 at 21:38
  • @supercat: would there be a problem if both the official IDE and the compiler use the same unicode normalization form? – Arseni Mourzenko Jun 03 '14 at 21:44
  • @MainMa: What happens if code is given to someone who's using an older compiler version? Actually, what I might like to see would be a means by which programs could specify a "transliteration" table that would map each character into a sequence of [0-9A-Z_], optionally enclosed by «», with the rule that identifiers could not be used in any scope which containing a different identifier with the same transliterated form. If hovering over an identifier would show the transliterated form, that would either make it easier to distinguish "РЕСТОРАН" from "PECTOPAH"... – supercat Jun 03 '14 at 22:16
  • ...or ensure that code which used one where the other was needed would fail compilation [if the transliterated form of the Cyrillic characters matched their look-alikes]. – supercat Jun 03 '14 at 22:18
  • @supercat: *"What happens if code is given to someone who's using an older compiler version?"*: I don't see how is this related to the current subject. By the way, what happens if I give C# code which uses `async`, `dynamic` and anonymous types to a person who only has support for .NET Framework 1.1? – Arseni Mourzenko Jun 03 '14 at 22:33
  • @MainMa: Code which uses features unsupported in older languages naturally wouldn't work. On the other hand, it's pretty easy for someone to know which features are only available in the latest-and-greatest C#, and which ones will be usable in older versions. Can any person of normal mental capacity be expected to know which Unicode characters are going to be available to which potential recipients of his code? – supercat Jun 03 '14 at 22:45
1

Don't go out of your way to prevent bad code. You'll never cover any sizable subset of all the ways to write bad code and every additional rule costs something (time, money, mental capacity, opportunity, complexity, etc.). One can write FORTRAN in any language, don't try to fight it.

Instead, ask positive questions: How much do I, and the user, gain by restricting the language this? Does it make the parser simpler and faster? Does it really simplify the language? Does it make it easier to write good code (as opposed to harder to write bad code)? Does it have synergies with other features that add entirely new benefits? Weigh those benefits against the aforementioned costs.

0

The primary goals in a language should be:

  1. Allow a programmer to actually express what he wants. For example, rather than having a small number of integer types, allow a programmer to specify which characteristics are important, including representable range, wrapping behavior, overflow trapping, endianness, etc. Don't force programmers to specify properties which aren't of interest, but allow specification of whatever combination of properties is important. For floating-point types, allow code some control over precision/speed trade-offs. Allow types to propagate into expressions, so that e.g. long1 = int1 + int2; will perform the arithmetic as long.(*)

  2. Require "yes that's what I mean" markers in places where there's more than one thing a programmer might plausibly have intended, but not in cases where there the straightforward meaning is the only one that makes sense. An initialization like double someDouble = 0.1f; might match programmer intention, but is more likely a mistake; it would be appropriate to require something like double someDouble = FloatToDouble(0.1f;. On the other hand, float someFloat = 0.1; would only have one plausible meaning, and should thus store the fraction 1/10 into someFloat as accurately as possible.

  3. Allow programmers to specify places when implicit conversions should be applied more strictly than usual, and places where they should be applied more loosely. For example, if a language supports auto-boxing, allow a means by which a method parameter can say that it should not accept an implicitly-auto-boxed object.

  4. When deciding what kinds of implicit constructs should or should not be legal, think about the effects of changes to parts of the code. For example, if code is required to say someField = (float)Math.Sqrt(something) rather than simply someField = Math.Sqrt(something) when someField is a float, how will that affect the effort required to make someFloat be a double if float precision is found to be inadequate?

(*) The normal argument I've read against allowing types to propagate into expressions is that it leads to ambiguous situations involving operator overloads. To that I would suggest allowing methods to either specify which overloads to favor in cases of ambiguity, or require that the invoking code make clear what it wants. Further, given an integer expression like (a*b+c)/d, I would suggest that requiring an indication of how the dividend should be computed would make programmer intentions clearer than would applying a default behavior.

supercat
  • 8,335
  • 22
  • 28
0

It depends entirely on the philosophy of the language.

C allows you to deliberately shoot yourself in the foot, in order to obtain the best possible performance. C++ allows you to blow your leg off. Java is a bondage and discipline language, preferring ceremony and safety over performance. Haskell's type system emphasizes mathematical proof of consistency.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
-1

I don't think you can make a language "idiot proof". At most, you can make it "mistake proof" for those that have good intentions.

So I wouldn't invest computertime and developer (compile time) or user (runtime) frustration in checking every little detail. Of course, errors will be made, and of course bugs will be shipped, but is it worth restricting everything and everyone in the hope that you can prevent (for the full 100%) fools from being fools, or humans from making mistakes?

  • 1
    C/C++ allows for a bunch of foolish constructs without complaining, most compilers even eat it even without a warning. Unfortunately, in most cases the effect is an undesired one. The classic surely is `if(c = 0) { DoSomething(); }`. I prefer an environment/language/whatever that does not necessarily forbid these things, but makes me aware of them, because in most cases they are not intended, they are just simple foolish bugs that happen. Finding those things early makes you more productive. – JensG Jun 02 '14 at 09:50
  • @JensG: You may be right, when you are talking beginning programmers. From my side, I develop C, C++, C#, Objective C, and others in that range, and I can't remember when I've last made the error 'if(c = 0) { }' Everyone has to learn, and the toughest things to learn are very often the most performant once you know them. – Painted Black Jun 02 '14 at 15:36
  • 1
    If the OP is the only user there would be no need for this question at all. So I think we can assume he is not. – JensG Jun 02 '14 at 18:33