Is hungarian notation a workaround for languages with insufficiently-expressive static typing?

Question

In Eric Lippert's article What's Up With Hungarian Notation?, he states that the purpose of Hungarian Notation (the good kind) is to

extend the concept of "type" to encompass semantic information in addition to storage representation information.

A simple example would be prefixing a variable that represents an X-coordinate with "x" and a variable that represents a Y-coordinate with "y", regardless of whether those variables are integers or floats or whatever, so that when you accidentally write xFoo + yBar, the code clearly looks wrong.

But I've also been reading about Haskell's type system, and it seems that in Haskell, one can accomplish the same thing (i.e. "extend the concept of type to encompass semantic information") using actual types that the compiler will check for you. So in the example above, xFoo + yBar in Haskell would actually fail to compile if you designed your program correctly, since they would be declared as incompatible types. In other words, it seems like Haskell's type system effectively supports compile-time checking equivalent to Hungarian Notation

So, is Hungarian Notation just a band-aid for programming languages whose type systems cannot encode semantic information? Or does Hungarian Notation offer something beyond what a static type system such as Haskell's can offer?

(Of course, I'm using Haskell as an example. I'm sure there are other languages with similarly expressive (rich? strong?) type systems, though I haven't come across any.)

To be clear, I'm not talking about annotating variable names with the data type, but rather with information about the meaning of the variable in the context of the program. For example, a variable may be an integer or float or double or long or whatever, but maybe the variable's meaning is that it's a relative x-coordinate measured in inches. This is the kind of information I'm talking about encoding via Hungarian Notation (and via Haskell types).

Pascal - although if you try and add an XCood and YCoord type you defined in Pascal you will just get a compiler warning IIRC — mcottle, Oct 11 '11 at 01:35
http://blog.moertel.com/articles/2006/10/18/a-type-based-solution-to-the-strings-problem is an article about doing something very similar to "apps hungarian" in the type system in Haskell. — Logan Capaldo, Oct 11 '11 at 11:07
That's a really nice article link (The moertel.com one) showing exactly the kind of thing I was thinking about: using the type system to turn string-interpolation security vulnerabilities and such into compile-time errors. Thanks for the link. — Ryan C. Thompson, Oct 11 '11 at 21:47
I think for a lot OO caught up with Hungarian notation for semantics, because today you would probably write: Foo.Position.X + Bar.Position.Y. — Pieter B, Jun 14 '16 at 13:22

score 27 · Accepted Answer · answered Oct 11 '11 at 02:44

27

I would say "Yes".

As you say, the purpose of Hungarian Notation is to encode information in the name that cannot be encoded in the type. However, there are basically two cases:

That information is important.
That information is not important.

Let's start with case 2 first: if that information is not important, then Hungarian Notation is simply superfluous noise.

The more interesting case is number 1, but I would argue that if the information is important, it should be checked, i.e. it should be part of the type, not the name.

Which brings us back to the Eric Lippert quote:

extend the concept of "type" to encompass semantic information in addition to storage representation information.

Actually, that's not "extending the concept of type", that is the concept of type! The whole purpose of types (as a design tool) is to encode semantic information! Storage representation is an implementation detail that doesn't usually belong in the type at all. (And specifically in an OO language cannot belong in the type, since representation independence is one of the major prerequisites for OO.)

answered Oct 11 '11 at 02:44

Jörg W Mittag

101,921
24
218
318

C, where Hungarian notation was AFAIK most used, is not an OO language though. – Péter Török Oct 11 '11 at 07:01
4

@PéterTörök: OO is a design pattern, not feature of a language, though modern languages are designed to make it easy while C is not. – Jan Hudec Oct 11 '11 at 08:22
@Péter Török: "I might think, though I'm not quite sure if I believe this or not, but Erlang might be the only object oriented language because the 3 tenets of object oriented programming are that it's based on message passing, that you have isolation between objects and have polymorphism." -- Joe Armstrong – Frank Shearar Oct 11 '11 at 08:46
3

@PéterTörök: I wrote quite a lot of object oriented code in plain C. I do know what I am talking about. – Jan Hudec Oct 11 '11 at 09:13
@Frank, I hesitated to add Erlang as an example - apparently it wasn't the best choice. Thanks for the quote. – Péter Török Oct 11 '11 at 09:31
@Jan, hats off to you then. – Péter Török Oct 11 '11 at 09:31
1

While it may be true that important information should be embedded in a variable's type rather than its name, there are many important things which should be said, but which type systems cannot express. For example, if `S1` is the only reference anywhere in the universe to an `char[]`, whose holder can and will change it whenever desired, but must never expose to outside code, and `S2` is a reference to a `char[]` which nobody should ever change, but which may be shared with objects that promise not to change it, should `S1` and `S2` be regarded semantically as the same "kind of thing"? – supercat May 06 '14 at 21:00
Back when variables were called pkitstr and people knew what that meant, c++ had sufficient typing, but what was lacking was the intelisense help to quickly identify variable types whilst editing the code. Intelisense improved, and hungarian notation has largely disappeared from my code. – Michael Shaw Sep 25 '15 at 10:52
1

@supercat - You are describing uniqueness types. – Jack Jun 14 '16 at 19:00
@supercat That sounds similar to Rust's ownership model. If `S1` has been borrowed (i.e. someone else has a reference to it) somewhere, and the owner tries to modify it before that borrow ends, the compiler will refuse to compile it. For `S2`, Rust will only permit multiple concurrent borrows if *all* of those borrows are immutable. If you want a mutable borrow, then there must be no other existing borrows, mutable or immutable, and you cannot borrow again until your mutable borrow ends. C++'s `std::unique_ptr` might be able to do some of this, too, but I'm not very familiar with it. – 8bittree Jul 11 '17 at 21:54
@8bittree: By "mutable borrow" do you mean read-write? Conceptually, there should be multiple kinds of read-only views--a "live" view which will not limit other access to an object, a "snapshot" view that shows the state of the object when it was created but doesn't forbid modifications to the original object, an "maybe-live" view that might show changes but doesn't promise timely updates, and an "exclusive" view which forbids outside changes. It sounds like you're saying Rust only supports the last of those? – supercat Jul 11 '17 at 22:21
@supercat Yes, a mutable borrow is a read-write reference. Your "snapshot" view sounds like it could be satisfied by cloning. I'm not sure about "maybe-live". `Rc` handles the "live" view via reference counting. Combine with `Refcell` for writing, with the runtime, rather than the compiler, providing safety from data races. `Arc` and `Mutex` are thread-safe equivalents. Raw, C-style pointers are also available, but dereferencing them can only be done in an `unsafe` block, which basically just means the language cannot guarantee the safety of doing so. – 8bittree Jul 11 '17 at 23:13

score 9 · Answer 2 · edited Jun 14 '16 at 12:11

9

The whole purpose of types (as a design tool) is to encode semantic information!

I liked this answer and wanted to follow up on this answer ...

I don't know anything about Haskell, but you can accomplish something like the example of xFoo + yBar in any language that supports some form of type safety such as C, C++ or Java. In C++ you could define XDir and YDir classes with overloaded '+' operators that only take objects of their own type. In C or Java, you would need to do your addition using an add() function/method instead of the '+' operator.

I have always seen Hungarian Notation used for for type information, not semantics (except insofar as semantics might be represented by type). A convenient way to remember the type of a variable back in the days before "smart" programming editors that display the type for you in one way or or another right in the editor.

edited Jun 14 '16 at 12:11

Pierre.Vriens

233
1
2
11

answered Oct 11 '11 at 05:11

BHS

251
1
4

Being object-oriented is neither necessary nor sufficient for a language to allow `xFoo + yBar` for user-defined types, nor is the OO aspect of C++ necessary for that example to work. – Luc Danton Oct 11 '11 at 05:25
You're right it's not OO it's type safety. I edited my answer. – BHS Oct 11 '11 at 17:16
Hmm. It's a good point that you could make `xFoo + yBar` a compile error (or at least a runtime error) in pretty much any language. However, would math with XDir and YDir classes in, say, Java or C++ be slower than math with raw numbers? My understanding is that in Haskell, the types are checked at compile time, and then at runtime, it would just be raw math with no type-checking, and hence no slower than adding regular numbers. – Ryan C. Thompson Oct 11 '11 at 21:54
In C++, the type checking would be done at compile time as well, and the conversion and such will be optimized away in most cases. Java doesn't do it as well, because it doesn't allow operator overloading and such -- so you can't treat an `XCoordinate` as a regular int, for example. – cHao Oct 12 '11 at 06:36

JacquesB · Answer 3 · 2015-09-25T17:03:50.993

Hungarian notation was invented for BCPL, a language which didn't have types at all. Or rather, it had exactly one data type, the word. A word could be a pointer or it could be a character or boolean or a plain integer number depending on how you used it. Obviously this made it very easy to make horrible mistakes like dereferencing a character. So Hungarian notation was invented so the programmer could at least perform manual type checking by looking at the code.

C, a descendant of BCPL, has distinct types for integers, pointers, chars etc. This made the basic Hungarian notation superfluous to some extent (you didn't need to encode in the variable name if it was an int or a pointer), but semantics beyond this level still couldn't be expressed as types. This lead to the distinction between what has been called "Systems" and "Apps" Hungarian. You didn't need to express that a variable was an int, but you could use code-letters to indicate if the int was a say an x or y coordinate or an index.

More modern languages allow definitions of custom types, which means you can encode the semantic constraints in the types, rather then in the variable names. For example a typical OO language will have specific types for coordinate-pairs and areas, so you avoid adding an x coordinate to an y coordinate.

For example, in Joels famous article praising Apps Hungarian, he uses the example of the prefix us for an unsafe string, and s for a safe (html encoded) string, in order to prevent HTML-injection. The developer can prevent HTML-injection mistakes by simply carefully inspecting the code and ensyre that the variable prefixes match up. His example is in VBScript, a now obsolete language which didn't initially allow custom classes. In a modern language the problem can be fixed with a custom type, and indeed this is what Asp.net does with the HtmlString class. This way the compiler will automatically find the error, which is much safer that relying on human eyeballing. So clearly a language with custom types eliminates the need for "Apps Hungarian" in this case.

score 5 · Answer 4 · answered Oct 11 '11 at 01:46

I realize that the phrase "Hungarian Notation" has come to mean something different that the original, but I'll answer "no" to the question. Naming variables with either semantic or computational type does not do the same thing as SML or Haskell style typing. It's not even a bandaid. Taking C as an example, you could name a variable gpszTitle, but that variable might not have global scope, it might not even constitute a point to a null-terminated string.

I think the more modern Hungarian notations have even bigger divergence from a strong type deduction system, because they mix "semantic" information (like "g" for global or "f" for flag) with the computational type ("p" pointer, "i" integer, etc etc.) That just ends up as an unholy mess where variable names have only a vague resemblance to their computational type (which changes over time) and all look so similar that you can't use "next match" to find a variable in a particular function - they're all the same.

mcottle · Answer 5 · 2011-10-12T05:31:31.463

2

Remember, there was a time when IDEs didn't have popup hints telling you what they type of a variable is. There was a time when IDEs didn't understand the code they were editing so you couldn't jump from usage to declaration easily. There was also a time, when you couldn't refactor a variable name without manually going through the whole of the codebase, making the change by hand and hoping you didn't miss one. You couldn't use search & replace because searching for Customer also gets you CustomerName...

Back in those dark days, it was helpful to know what type a variable was where it was being used. If properly maintained (a BIG if because of the lack of refactoring tools) Hungarian notation gave you that.

The cost these days of the horrible names it produces is too high but that's a relatively recent thing. A lot of code still exists that predates the IDE developments I've described.

edited Oct 12 '11 at 05:31

answered Oct 11 '11 at 01:56

mcottle

6,122
2
25
27

1

If I'm not mistaken, this is another answer that's addressing a different type of Hungarian notation than the one the OP is asking about. – Tyler Oct 11 '11 at 06:25
2

This answer describes what's been called "Systems Hungarian", where the prefix denotes the language-level "type". The question asks about "Apps Hungarian", where the word "type" has not been misunderstood and means the *semantic* type. Systems Hungarian is nearly universally condemned these days (and rightly so; it's a bastardization of the real purpose of Hungarian notaion). Apps Hungarian, however, can be a good thing. – cHao Oct 11 '11 at 08:32
Editors capable of searching for sCustomer without picking up sCustomerName (vi and emacs are 2 examples) have existed since the 70's. – Larry Coleman Oct 11 '11 at 17:08
@Larry, maybe, but you couldn't get them to run on the systems I was programming in the '80s – mcottle Oct 12 '11 at 00:12
@cHAo, No it doesn't - My point was trying to explain why people put the extra information into variable names generally. I studiously avoided mentioning any version of Hungarian notation. Maybe the example I gave in the "why search & replace doesn't work on source code" section looks to you like "Systems Hungarian" but it wasn't meant to. I've deleted the leading "s" to avoid the confusion. – mcottle Oct 12 '11 at 05:30
@mcottle: It wasn't the "s". Your post specifically says "there was a time when IDEs didn't have popup hints telling you what they type of a variable is." Considering an IDE *doesn't know* the semantic type, and hardly ever really has, and thus *still* can't pop up hints regarding the semantic type, one can only assume you were talking about the language-level type. – cHao Oct 12 '11 at 06:13
It does in a language where you can define types like an xCoord and yCoord that are treated as different types even though they're both Integers. I've been spoiled though :) – mcottle Oct 12 '11 at 08:22
Apparently. :) It'd be possible, but a bit annoying, in C++ -- you'd have to create a new class. In Java, it gets a lot more complicated, since you can't make some random object act like an int (no operator overloading, plus the object/primitive divide). You'd have to change how you use them. And in something like C, it's right out. – cHao Oct 12 '11 at 09:19
In Haskell you can quite easily get name clashes btw. – MasterMastic May 14 '14 at 21:18

score 2 · Answer 6 · answered Oct 11 '11 at 08:35

Yes, though many languages which have otherwise strong enough type systems still have a problem - expressibility of new types that are based on/similar to existing types.

i.e. In many langugaes where we could use the type system more we don't because the overhead of making a new type that is basically the same as an existing type other than name and a couple of conversion functions is too great.

Essentially we need some sort of strongly typed typedefs to kill thoroughly hungarian notation in these languages (F# style UoM could also do it)

score 0 · Answer 7 · answered Oct 11 '11 at 01:38

0

Correct!

Outside of totally untyped languages such as Assembler, Hungarian notation is superfluous and annoying. Doubly so when you consider that most IDEs check type safety as you, er, type.

The extra "i" "d" and "?" prefixes just make the code less readable, and, can be truly misleading - as when a "cow-orker" changes the type of iaSumsItems from Integer to Long but doesn't bother refactoring the field name.

answered Oct 11 '11 at 01:38

James Anderson

18,049
1
42
72

10

Your response suggests that you don't understand the difference between the original, smart "Apps" Hungarian and the dumb bastardization called "Systems" Hungarian. Read http://www.joelonsoftware.com/articles/Wrong.html – Ryan Culpepper Oct 11 '11 at 02:58

Is hungarian notation a workaround for languages with insufficiently-expressive static typing?

7 Answers7

The whole purpose of types (as a design tool) is to encode semantic information!