75

There used to be very good reasons for keeping instruction / register names short. Those reasons no longer apply, but short cryptic names are still very common in low-level programming.

Why is this? Is it just because old habits are hard to break, or are there better reasons?

For example:

  • Atmel ATMEGA32U2 (2010?): TIFR1 (instead of TimerCounter1InterruptFlag), ICR1H (instead of InputCapture1High), DDRB (instead of DataDirectionPortB), etc.
  • .NET CLR instruction set (2002): bge.s (instead of branch-if-greater-or-equal.short), etc.

Aren't the longer, non-cryptic names easier to work with?


When answering and voting, please consider the following. Many of the possible explanations suggested here apply equally to high-level programming, and yet the consensus, by and large, is to use non-cryptic names consisting of a word or two (commonly understood acronyms excluded).

Also, if your main argument is about physical space on a paper diagram, please consider that this absolutely does not apply to assembly language or CIL, plus I would appreciate if you show me a diagram where terse names fit but readable ones make the diagram worse. From personal experience at a fabless semiconductor company, readable names fit just fine, and result in more readable diagrams.

What is the core thing that is different about low-level programming as opposed to high-level languages that makes the terse cryptic names desirable in low-level but not high-level programming?

gnat
  • 21,442
  • 29
  • 112
  • 288
Roman Starkov
  • 4,469
  • 3
  • 31
  • 38
  • 85
    Answer: To make it feel like you are programming in a low level language. – Thomas Eding Aug 28 '12 at 22:07
  • 5
    Cryptic is relative. `JSR` is three times longer than the opcode it represents (`$20` on a 6502) and considerably easier to understand at a glance. – Blrfl Jan 14 '13 at 12:30
  • 4
    I'm a bit disappointed, because the correct answer is there, but it definitely isn't the accepted one. With circuit diagrams and such interrupts are usually named after lines that they are associated with, and on a circuit diagram you don't want verbose, it isn't good practice or practical. Secondly, because you don't like the answers doesn't mean they are not correct. – Jeff Langemeier Jan 14 '13 at 14:33
  • @JeffLangemeier Agreed, it's not ideal; there are several reasons behind something like this but I must accept a single answer. There is some truth to that answer, but I have worked at a large fabless semiconductor company who had no problem fitting descriptive names in both the digital and analog silicon diagrams. And heck, was that guessability handy to a firmware developer having to code for that silicon! Seriously, there's tons of space there; some might _think_ there isn't, but I know first hand that's false. – Roman Starkov Jan 14 '13 at 15:07
  • programs in low level languages tend to have many statements (as opposed to higher level ones). Short identifiers help to have less letters to make it easier to overview by a programmer. Compare `mov eax, ebx` in [x86 assembly](http://en.wikipedia.org/wiki/X86_assembly_language) to `moveDataToFrom ExtendedAccumulatorRegister, ExtendedBaseIndexRegister` – gnat Jan 14 '13 at 16:13
  • 5
    @gnat: Try `set Accumulator32 to BaseIndex32`? Simply expanding the traditional abbreviations is not the only way to make something more readable. – Timwi Jan 14 '13 at 17:42
  • @Timwi all these naming tricks are useless: no matter what I'd prefer `mov eax, ebx`, because of **[Zipf's Law](http://programmers.stackexchange.com/a/183578/31260 "the one true answer")** – gnat Jan 14 '13 at 17:56
  • 1
    "if your main argument is about physical space on a paper diagram", no it is about the fact that good naming takes other things into consideration than just clarity of the name (I gave some in my answer, diagrams -- included those drawn on a blackboard -- are just one of these other things) and that clarify is a relative thing (familiarity tends to help clarity whatever the choice is for instance). – AProgrammer Jan 14 '13 at 18:04
  • There are intra-domain initialism (initialism made up by one camp e.g. hardware engineers), and cross-domain initialism (initialism passed (enforced) from one camp onto the other camp, e.g. from hardware engineers to software engineers, or from marketing department onto sales department). The underlying factors have been well covered by all of the answers. Still, it is a general phenomenon, not just in low-level programming – rwong Jan 15 '13 at 04:40
  • see also: [Why do most of us use 'i' as a loop counter variable?](http://programmers.stackexchange.com/questions/86904/why-do-most-of-us-use-i-as-a-loop-counter-variable) and [Using single characters for variable names in loops/exceptions](http://programmers.stackexchange.com/questions/71710/using-single-characters-for-variable-names-in-loops-exceptions) – gnat May 29 '15 at 16:42
  • Habit & Job Security. Most of the guys writing embedded code have been doing so for decades. So, the reasons they initially used these are no longer valid, but they've been ingrained in their thought process from decades of habit. And obviously, if no one else can read their code, no one else can work on it, so they've made themselves indispensable. – RubberDuck Dec 27 '15 at 15:28

11 Answers11

109

The reason the software uses those names is because the datasheets use those names. Since code at that level is very difficult to understand without the datasheet anyway, making variable names you can't search is extremely unhelpful.

That brings up the question of why datasheets use short names. That's probably because you often need to present the names in tables like this where you don't have room for 25-character identifiers:

TIFR1 table from datasheet

Also, things like schematics, pin diagrams, and PCB silkscreens often are very cramped for space.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 3
    Agreed, unsearchable names would suck... but longer non-cryptic names are surely still equally searchable? As for screen space... my C# code often runs out of screen space, should I start using cryptic names? :) Surely not. Then why is it that low-level stuff is excluded from good naming practices? – Roman Starkov Aug 29 '12 at 01:01
  • 7
    Also, this answer doesn’t really address the pure-software side, e.g. CLR, JVM, x86, etc. :) – Timwi Aug 29 '12 at 03:08
  • 12
    @romkyns: It's a bit more obvious why they used these short names when you actually read through these datasheets. The datasheets for a microcontroller I have on hand is about 500 pages *even when using the short names throughout*. The width of the tables would span several pages/screens if we used longer names, making them very inconvenient for use a reference. – In silico Aug 29 '12 at 05:19
  • 29
    @romkyns: Longer names are equally searchable, but they're "non-native". If you listen to embedded engineers, they actually say "tiffer zero", not "timer zero's interrupt flag". I doubt that web developers expand HTTP, HTML, or JSON in their method names, either. – TMN Aug 29 '12 at 14:08
  • 2
    Go ahead and search for `TimerCounter1InterruptFlag` in the datasheet and tell me what you find. No, it's nowhere near equally searchable. That's a good point about abbreviations in other domains though, @TMN. Try writing software for the military sometime. – Karl Bielefeldt Aug 29 '12 at 14:43
  • 6
    @KarlBielefeldt er, what? :) **Obviously** I will not find it in the current datasheet because they went for the short name instead. That does not support the claim that short names are more searchable in the slightest... – Roman Starkov Aug 31 '12 at 17:50
  • Basically, in every community with a common language, the concepts that are used often will have shorter names. Consider unix curses(3) - termcap introduced the short cryptic tags for various escape sequences, then terminfo came along and added its own set of two or three letter tags, *and* long names for everything. Nobody appears to use the latter, and I keep looking up the difference between tput {ec,ed,el}. – Henk Langeveld Jan 13 '13 at 15:09
  • 6
    It's not just datasheets that are space-constrained, it's the schematics. All those logical components have leads that need to be connected to other components. "TimerCounter1InteruptFlag.clear" doesn't fit on top of a tiny wire representation nearly as well "TCIF.C" – AShelly Jan 14 '13 at 14:40
  • @AShelly yep, net labels are very much better shorter on a dense schematic, and having the register name exactly match the net is helpful (Not least when doing things with parts like the Zinc where the HDL tools can automatically generate the register map and defines from the hardware description). Naming conventions exist within the hardware community too, and they have different constraints to software, but these names tend to leak up the stack. The fact that any low level developer worth the name can usually read a schematic and use a 'scope for debugging probably encourages this. – Dan Mills Nov 26 '18 at 21:33
  • Being X or Y on the schematic or whatever is not an excuse. That's what comments in code are for. Declare the variable in a header or something (in case of a global variable) or have a file with the correlation between the two. That way the code _is_ understandable in some level in the code _and_ you have the correlation between the two things. – andrebrait Aug 12 '20 at 00:30
62

Zipf's Law

You yourself can observe by looking at this very text that word length and frequency of usage are, in general, inversely related. Words that are used very frequently, like it, a, but, you, and and are very short, while words that are used less often like observe, comprehension, and verbosity are longer. This observed relationship between frequency and length is called Zipf's Law.

The number of instructions in the instruction set for a given microprocessor usually numbers in the dozens or hundreds. For example, the Atmel AVR instruction set appears to contain about a hundred distinct instructions (I didn't count), but many of those are variations on a common theme and have very similar mnemonics. For example, the multiplication instructions include MUL, MULS, MULSU, FMUL, FMULS, and FMULSU. You don't have to look at the list of instructions for very long before you get the general idea that instructions that start with "BR" are branches, instructions that start with "LD" are loads, etc. The same applies to variables: even complex processors provide only a limited number of places to store values: condition registers, general purpose registers, etc.

Because there are so few instructions, and because long names take longer to read, it makes sense to give them short names. By contrast, higher level languages allow programmers to create a huge number of functions, methods, classes, variables, and so on. Each of these will be used far less frequently than most assembly instructions, and longer, more descriptive names are increasingly important to give readers (and writers) enough information to understand what they are and what they do.

Additionally, instruction sets for different processors often use similar names for similar operations. Most instruction sets include operations for ADD, MUL, SUB, LD, ST, BR, NOP, and if they don't use those exact names they usually use names that are very close. Once you've learned the mnemonics for one instruction set, it doesn't take long to adapt to the instruction sets for other devices. So names that might seem "cryptic" to you are about as familiar as words like and, or, and not to programmers who are skilled in the art of low level programming. I think that most people who work at the assembly level would tell you that learning to read the code is not one of the greater challenges in low level programming.

Caleb
  • 38,959
  • 8
  • 94
  • 152
  • 4
    thanks Caleb! To me, this excellent answer salvaged a question that somehow managed to collect four [value judgements](http://en.wikipedia.org/wiki/Value_judgement "'...will likely solicit debate, arguments, polling, or extended discussion'") in one title: "cryptic", "short", "still", "so common" – gnat Jan 14 '13 at 22:08
  • 1
    Thank you, @gnat, for both your comment and your generous bonus. – Caleb Jan 19 '13 at 04:10
39

In general

Quality of naming is not just about having descriptive names it also has to consider other aspects, and that leads to recommendations like:

  • the more global the scope, the more descriptive the name should be
  • the more often it is used, the shorter the name should be
  • the same name should be used in all contexts for the same thing
  • different things should have different names even if the context is different
  • variations should be easily detected
  • ...

Note that these recommandations are conflicting.

Instruction mnemonics

As an assembly language programmer, using short-branch-if-greater-or-equal for bge.s gives me the same impression than when I see, as an Algol programmer doing computational geometry, SUBSTRACT THE-HORIZONTAL-COORDINATE-OF-THE-FIRST-POINT TO THE-HORIZONTAL-COORDINATE-OF-THE-SECOND-POINT GIVING THE-DIFFERENCES-OF-THE-COORDINATE-OF-THE-TWO-POINTS instead of dx := p2.x - p1.x. I just can't agree that the first are more readable in the contexts I care of.

Register names

You pick the official name from the documentation. The documentation picks the name from the design. The design uses a lot graphical formats where long names aren't adequate and the design team will life with those names for months, if not years. For both reasons, they won't use "Interrupt flag of the first timer counter", they will abbreviate it in their schema as well as when they speak. They know it and they use systematic abbreviations like TIFR1 so that there is less chance of confusion. One point here is that TIFR1 isn't a random abbreviation, it is the result of a naming scheme.

AProgrammer
  • 10,404
  • 1
  • 30
  • 45
  • 4
    Is `TIFR1` really a better naming scheme than `InterruptFlag1` though, or `IptFlag1` if you really have to be short? – Timwi Aug 29 '12 at 13:40
  • 4
    @Timwi, `InterruptFlag` and `IptFlag` are better than `IF` in the same way that `EnumerableInterface` and `ItfcEnumerable` are better than `IEnumerable`. – AProgrammer Aug 29 '12 at 14:01
  • @AProgrammer: I consider your answer and your comment the best and I would mark it as accepted if I could. Those who believe that only physical limits dictate short names are wrong. This discussion will be interesting for you: http://37signals.com/svn/posts/3250-clarity-over-brevity-in-variable-and-method-names – alpav Sep 12 '12 at 04:45
  • 5
    @alpav Do you realise that your link argues the opposite of what this answer says? If anything, it fully supports `InterruptFlag1` for reasons of better clarity. – Roman Starkov Jan 12 '13 at 08:27
25

Apart from the "old habits" reasons, Legacy code that was written 30 years ago and is still in use is very common. Despite what some less experienced people think, refactoring these systems so they look pretty comes at a very high cost for a small gain and is not commercially viable.

Embedded systems that are close to the hardware - and accessing registers, tend to use the same or similar labels to those used in the Hardware data sheets, for very good reasons. If the register is called XYZZY1 in the hardware data sheets, it makes sense the Variable representing it is likely XYZZY1, or if the programmer was having a good day, RegXYZZY1.

As far as the bge.s, it's similar to assembler - to the few people who need to know it longer names are less readable. If you cannot get you head around bge.s and think branch-if-greater-or-equal.short will make a difference - you are merely playing with the CLR and do not know it.

The other reason that you will see short variable names is due to wide spread us of abbreviations within the domain the software is targeting.

In summary - short abbreviated variable names that reflect an External influence such as industry norms and hardware data sheets are expected. Short abbreviated variable names that are internal to the software are normally less desirable.

Timwi
  • 4,411
  • 29
  • 37
mattnz
  • 21,315
  • 5
  • 54
  • 83
  • If I understood the argument you use to defend "bge.s", `TIFR1` is more readable to those who need to know it than `TimerCounter1InterruptFlag`, correct? – Roman Starkov Aug 29 '12 at 00:58
  • 2
    @romkyns : Absolutely - in this case less is more.... Unlike CNTR which could mean "Counter","Control", "Can Not Trace Route" etc, T1FR1 Precisely defined meaning. – mattnz Aug 29 '12 at 21:49
  • *"If you cannot get you head around bge.s and think branch-if-greater-or-equal.short will make a difference - you are merely playing with the CLR and do not know it."* I don't know about that. I understand x86 assembly pretty well, but *every time* I write a loop, I have to look up what [all the `j?` instructions](http://www.unixwiz.net/techtips/x86-jumps.html) mean. Having a more obviously-named instruction would definitely help me. But maybe I'm the exception rather than the rule. I have trouble remembering trivial details. – Cody Gray - on strike Aug 03 '14 at 12:26
12

There are so many different ideas here. I can't accept any of the existing answers as the answer: firstly, there are likely many factors contributing to this, and secondly, I can't possibly know which one is the most significant one.

So here's a summary of answers posted by others here. I'm posting this as CW and my intention is to eventually mark it accepted. Please edit if I missed something out. I tried to rephrase each idea to express it concisely yet clearly.

So why are cryptic short identifiers so common in low-level programming?

  • Because many of them are common enough in the respective domain to warrant a very short name. This worsens the learning curve, but is a worthwhile tradeoff given the frequency of use.
  • Because there is usually a small set of possibilities that is fixed (the programmer can't add to the set).
  • Because readability is a matter of habit and practice. branch-if-greater-than-or-equal.short is initially more readable than bge.s, but with some practice the situation becomes reversed.
  • Because they often have to be typed out in full, by hand, because low-level languages often don't come with powerful IDEs that have good autocompletion, or a/c is not reliable.
  • Because it's sometimes desirable to pack a lot of information into the identifier, and a readable name would be unacceptably long even by high-level standards.
  • Because that's what low-level environments have looked like historically. Breaking habit requires conscious effort, runs the risk of annoying those who liked the old ways, and must be justified as worthwhile. Sticking with the established way is the "default".
  • Because many of them originate elsewhere, such as schematics and datasheets. Those, in turn, are affected by space constraints.
  • Because the people in charge of naming things have never even considered readability, or don't realize they are creating a problem, or are lazy.
  • Because in some cases the names have become part of a protocol for interchange of data, such as the use of assembly language as an intermediate representation by some compilers.
  • Because this style is instantly recognizable as low-level and thus looks cool to geeks.

I personally feel that some of these do not actually contribute to the reasons why a newly developed system would choose this naming style, but I felt it would be wrong to filter some ideas out in this type of answer.

Rad80
  • 251
  • 1
  • 7
Roman Starkov
  • 4,469
  • 3
  • 31
  • 38
10

I'm going to toss my hat into this mess.

High level coding conventions and standards are not the same as low level coding standards and practices. Unfortunately, most of those are holdovers from legacy code and old thought processes.

Some, however, do serve a purpose. Sure BranchGreaterThan would be much more readable than BGT, but there's a convention there now, it's an instruction and as such has gained some bit of traction in the last 30 years of use as a standard. Why'd they start with it, probably some arbitrary character width limit for instructions, variables and such; why do they keep it, it's a standard. This standard is the same as using int as an identifier, it would be more legible to use Integer in all cases, but is it necessary for anyone that's been programming more than a few weeks... no. Why? Because it's a standard practice.

Second, as I said in my comment, many of the interrupts are named INTG1 and other cryptic names, these serve a purpose as well. In circuit diagrams it is NOT good convention to name your lines and such verbosely it clutters the diagram and hurts legibility. All verboseness is handled in documentation. And since all of the wiring/circuit diagrams have these short names for interrupt lines, the interrupts themselves also get the same name as to keep consistency for the embedded designer from the circuit diagram all the way up to the code to program it.

A designer has some control over this, but like any field/new language there is conventions that follow from hardware to hardware, and as such should stay similar across each assembly language. I can look at a snippet of assembly and be able to get the gist of the code without ever using that instruction set because they stick to a convention, LDA or some relation to it is probably loading a register MV is probably moving something from somewhere to somewhere else, it isn't about what you think is nice or is a high level practice, it's a language unto itself and as such has its own standards and means that you as the designer should follow, these are often not nearly as arbitrary as they seem.

I'll leave you with this: Asking the embedded community to use verbose high level practices is like asking chemists to always write out chemical compounds. The chemist writes them short for themselves and anyone else in the field will understand it, but it may take a new comer a little time to adjust.

Jeff Langemeier
  • 1,397
  • 9
  • 19
  • 1
    I do feel that _"we'll use cryptic names because that's what makes low-level programming feel as such"_ and _"we'll use cryptic names because that's the convention for low-level programming"_ are pretty much the same, so +1 from me and I'll think about accepting this as a less inflammatory variant of [the one I accepted initially](http://programmers.stackexchange.com/a/182469/3278). – Roman Starkov Jan 14 '13 at 15:24
  • 8
    +1 for the chemists reference as it generates a good analogy for the differing realms of programming. –  Jan 14 '13 at 15:40
  • 4
    +1 I also never understood why people use short, cryptic names like "water" if there is the much more readable "DiHydrogenOxyde" – Ingo Jan 15 '13 at 10:24
6

One reason they use cryptic short identifiers it's because they are not cryptic for the developers. You have to realize they work with it every day and those names are really domain names. So they know by heart what exactly TIFR1 means.

If a new developer comes to the team he'll have to read the datasheets (as explained by @KarlBielefeldt) so they'll get comfortable with those.

I believe your question used a bad example because indeed on those kind of source codes you usually see a lot of unnecessary crypt identifiers for non-domain stuff.

I'd say mostly they do that because of bad habits that existed when the compilers did not auto-complete everything you type.

Alex
  • 3,228
  • 1
  • 22
  • 25
5

Summary

Initialism is a pervasive phenomenon in many technical and non-technical circles. As such it is not limited to low-level programming. For the general discussion, see the Wikipedia article on Acronym. My answer is specific to low-level programming.

Causes of cryptic names:

  1. Low-level instructions are strongly-typed
  2. Need to pack a lot of type information into the name of a low-level instruction
  3. Historically, single-character codes are favored for packing the type information.

Solutions and their drawbacks:

  1. There are modern low-level naming schemes that are more consistent than historical ones.
    • LLVM
  2. However, the need to pack a lot of type information still exists.
    • Thus, cryptic abbreviations can still be found everywhere.
  3. Improved line-to-line readability will help a novice low-level programmer pick up the language faster, but will not help with comprehending large pieces of low-level code.

Full answer

(A) Longer names are possible. For example, the names of C++ SSE2 intrinsics average 12 characters compared to the 7 characters in the assembly mnemonic. http://msdn.microsoft.com/en-us/library/c8c5hx3b(v=vs.80).aspx

(B) The question then moves on to: How long / non-cryptic does one need to get from low-level instructions?

(C) Now we analyze the composition of such naming schemes. The following are two naming schemes for the same low-level instruction:

  • Naming scheme #1: CVTSI2SD
  • Naming scheme #2: __m128d _mm_cvtsi32_sd (__m128d a, int b);

(C.1) Low-level instructions are always strongly typed. There cannot be ambiguity, type inference, automatic type conversion, or overloading (reuse of instruction name to mean similar but non-equivalent operations).

(C.2) Each low-level instruction must encode a lot of type informations into its name. Examples of information:

  • Architecture Family
  • Operation
  • Arguments (Inputs) and Outputs
  • Types (Signed Integer, Unsigned Integer, Float)
  • Precision (Bit Width)

(C.3) If each piece of information is spelled out, the program will be more verbose.

(C.4) The type-encoding schemes used by various vendors had long historical roots. As an example, in the x86 instruction set:

  • B means byte (8-bit)
  • W means word (16-bit)
  • D means dword "double-word" (32-bit)
  • Q means qword "quad-word" (64-bit)
  • DQ means dqword "double-quad-word" (128-bit)

These historical references had no modern meanings whatsoever, but still sticks around. A more consistent scheme would have put the bit-width value (8, 16, 32, 64, 128) into the name.

On the contrary, LLVM is a right step in the direction of consistency in low-level instructions: http://llvm.org/docs/LangRef.html#functions

(D) Regardless of instruction naming scheme, low-level programs are already verbose and hard to understand because they focus on the minute details of execution. Changing the instruction naming scheme will improve readability on a line-to-line level, but will not remove the difficulty of comprehending the operations of a large piece of code.

Timwi
  • 4,411
  • 29
  • 37
rwong
  • 16,695
  • 3
  • 33
  • 81
  • 1
    Improved line-to-line readability will necessarily have some impact on comprehending the whole thing, but naming alone can't make it trivial, of course. – Roman Starkov Jan 14 '13 at 13:13
  • 3
    Also, kind of off-topic, but `CVTSI2SD` doesn't carry any more information than `ConvertDword2Double` or `ConvInt32ToFloat64`, but the latter, while longer, are instantly recognizable, whereas the former must be deciphered... – Roman Starkov Jan 14 '13 at 13:21
2

Humans read and write assembly only occasionally, and most of the time it's just a communication protocol. I.e., it is most often used as an intermediate serialised text-based representation between compiler and assembler. The more verbose this representation is, the more unnecessary overhead is in this protocol.

In the case of opcodes and register names, long names actually harm readability. Short mnemonics are better for a communication protocol (between compiler and assember), and assembly language is a communication protocol most of the time. Short mnemonics are better for programmers, since compiler code is easier to read.

SK-logic
  • 8,497
  • 4
  • 25
  • 37
  • If you need to save space, just gzip it!... If you need no overhead, use a binary format instead! If you're using text, you're aiming for readability - then why not go all the way and make it properly readable? – Roman Starkov Jan 13 '13 at 11:46
  • 2
    @romkyns, compressing a text-based communication protocol between two local processes? That's something new. Binary protocols are much less robust. It's a unix way - text-based protocols are there for *occasional* readability. They're readable just enough. – SK-logic Jan 13 '13 at 12:00
  • Right. Your premise is that I read and write the names of these registers or CIL instructions little enough for the overhead to matter. But think about it; they are used as often as any odd method or variable name in any other programming language _while you're programming_. Is that so rare that the extra few bytes matter? – Roman Starkov Jan 13 '13 at 12:13
  • @romkyns, when you're programming, you don't want long names too. I write compilers for LLVM, CIL and JVM. I like the density and expressiveness of the code. I will most certainly hate the code with the naming scheme you're suggesting. And, it's not "few bytes", it's multiplying the intermediate assembly file size n-fold. – SK-logic Jan 13 '13 at 12:36
  • 1
    I respect your right to have a different taste in how long names should be, but do you really name methods and locals in your compilers cryptic things like `TIFR`, or do they tend to contain full words? – Roman Starkov Jan 13 '13 at 18:05
  • @romkyns, do you really see no difference between function names and opcode mnemonics? – SK-logic Jan 13 '13 at 22:05
  • 1
    I see no difference that is relevant to the readable vs. short trade-off. I see them as different, of course, just like variables are different to functions which are different to types. I just don't see why opcodes and register names benefit from being *extremely short* to the point of having to consult the documentation for every newly encountered one before you have _any clue_ as to what it does. Your only argument so far is efficient storage, if I'm not mistaken. Do you *really* mean it?... Or do you have other reasons? – Roman Starkov Jan 14 '13 at 13:05
  • @romkyns, in this case long names actually *harm* readability. And you failed to get my arguments, sorry. Short mnemonics are: better for a *communication protocol* (not a "storage"!), and assembly language is a communication protocol most of the time. Short mnemonics are better for programmers, since compiler code is easier to read. Take a look at instruction selection tables in, say, LLVM, you'll get what I mean. – SK-logic Jan 14 '13 at 13:22
  • I shouldn't have started this discussion; I asked why they are common, you explained a possible reason; whether I agree with it or not is irrelevant. I'm not entirely sure what you mean by "assembly language is a communication protocol"; if you could expand on that in your answer, that might help. – Roman Starkov Jan 14 '13 at 13:27
  • If it’s an *intermediate* representation, then overhead is irrelevant nowadays, unless you have been in cryostasis since 1980. The final representation — the one that is actually distributed to all users — is in a compact binary format. As for readability, people who find such a short and cryptic representation to be *more readable* than normal English words seem to me to be elitists who (subconsciously, perhaps) want to exclude the “noobs” from their noble profession. You must memorize these abbrevs or you can’t be one of us! – Timwi Jan 14 '13 at 13:31
  • @Timwi, it's not irrelevant. Not in 80s, not now, not in 20 years from this moment. I.e., `gcc` is producing an assembly stream. `gas` is parsing it. It's fairly fast, but it will be nearly 8 times slower if the source stream is 8 times more verbose. As for readability, longer instructions will lead to more cluttered source, won't fit any more into the now short and nice instruction selection patterns. You can still figure out what's going on from the left hand side of the pattern, and it really makes it uncomfortable if you bloat the right hand side. I know, I tried doing it. – SK-logic Jan 14 '13 at 13:36
  • @romkyns, assembly is usually produced by a compiler backend and consumed by the assembler. Therefore, it's a communication protocol. It's the main use. You're not supposed to read or write it routinely. In the manually written code, assembly should only be seen in the instruction selection patterns, and there it's also much better when it's dense, otherwise the (much more important) left hand side of a pattern will become less visible and less readable. – SK-logic Jan 14 '13 at 13:39
  • @SK-logic: ① “it will be nearly 8 times slower if the source stream is 8 times more verbose” — poppycock. Have you measured it? ② “assembly is usually produced by a compiler backend and consumed by the assembler” — also poppycock. That was the case in the 1980s and is only maintained in antiquated rubbish like gcc. Any proper modern compiler simply spits out a binary and there is none of that nonsense. – Timwi Jan 14 '13 at 17:37
  • @Timwi, if by "modern compiler" you mean something like LLVM, it still does exactly the same thing, with an exception for a couple of platforms, but still producing binary objects is considered an *experimental* feature. So I wonder, which "modern" compilers do you refer to? – SK-logic Jan 14 '13 at 18:11
  • @Timwi, and yes, I profiled assemblers, parsing takes over a half of the CPU time. So, my bad, not 8 but 4 times. Still not acceptable. Still no single reason for more verbose mnemonics. – SK-logic Jan 14 '13 at 18:13
1

Mostly it's idiomatic. As @TMN says elsewhere, just as you don't write import JavaScriptObjectNotation or import HypertextTransferProtocolLibrary in Python, you don't write Timer1LowerHalf = 0xFFFF in C. It looks equally ridiculous in context. Everyone who needs to know already knows.

Resistance to change might arise, in part, from the fact that some C compiler vendors for embedded systems deviate from the language standard and syntax in order to implement features more useful to embedded programming. This means that you can't always use the autocomplete feature of your favourite IDE or text editor when writing low level code, because these customisations defeat their ability to analyse code. Hence the utility of short register names, macros and constants.

For example, HiTech's C compiler included a special syntax for variables that needed to have a user-specified position in memory. You might declare:

volatile char MAGIC_REGISTER @ 0x7FFFABCD;

Now the only IDE in existence that will parse this is HiTech's own IDE (HiTide). In any other editor, you'll have to type it out manually, from memory, every time. This gets old very quickly.

Then there's also the fact that when you're using development tools to inspect registers, you'll often have a table displayed with several columns (register name, value in hex, value in binary, last value in hex, etc). Long names mean you have to expand the name column to 13 characters to see the difference between two registers, and play "spot the difference" across dozens of lines of repeated words.

These might sound like silly little quibbles, but isn't every coding convention designed to reduce eye strain, decrease superfluous typing or address any one of a million other little complaints?

detly
  • 1,595
  • 12
  • 13
  • 2
    All of your arguments make sense. I fully understand all of those points. However, don't you think the **exact same thing** applies to high-level code? You also need to see a table of locals in a C# function. Context is subjective and `File.ReadAllBytes` might also look ridiculously long to someone used to `fread`. So... why treat high-level and low-level code *differently*? – Roman Starkov Jan 14 '13 at 13:16
  • @romkyns - I take your point, but I don't think we actually don't treat high level code **very** differently. Abbreviations are fine in many high-level contexts, we just don't realise it because we're more used to the abbreviation or whatever scheme goes with it. When I actually write functions or create variables in low level code, I use nice descriptive names. But when I refer to a register, I'm glad that I can glance at a jumble of letters and numbers and quickly think "T = timer, IF = interrupt flag, 1 = first register". It's almost like organic chemistry in that respect :P – detly Jan 14 '13 at 20:38
  • @romkyns - Also, in a purely practical sense, I think the difference between a table of registers in some microprocessor's IDE and application development in C# is this: a table of uP registers might look like: `Timer1InterruptFlag`, `Timer2InterruptFlag`, ..., `Timer9InterruptFlag`, `IOPortAToggleMask`, `IOPortBToggleMask`, etc x100. In a higher level language you'd use variables that differ much more... or you'd use more structure. `Timer1InterruptFlag` is 75% irrelevent noise compared to `T1IF`. I don't think you'd create a huge list of variables in C# that barely differ like that. – detly Jan 14 '13 at 20:44
  • 1
    @romkyns - What you might not be aware of is the fact that there **has** been a shift towards what you describe. Microchip's recent compilers come with libraries that are far more verbose and descriptive than just registers eg. `UARTEnable(UART1, BITS_8, PARITY_N, STOP_1, BAUD_115200)`. But they're incredibly clunky still, and involve a lot of indirection and inefficiency. I try to use them where possible, but most of the time, I wrap the register manipulation up in my own functions and call it from the higher-level logic. – detly Jan 14 '13 at 20:57
  • @detly: The CCS compiler had such methods, and some other processors do to. I generally dislike them. The register spec is sufficient to write code that uses the registers and it's sufficient to let someone reading code that uses registers to see what those registers do. If the act of writing a value of N to a hardware prescalar sets the period to N+1 (quite common), the proper meaning of `set_prescalar(TMR4,13);` is IMHO a lot less clear than would be `TMR4->PSREG=12;`. Even if one looks at the compiler manual to find out what the first code does, one will likely *still* have to... – supercat May 29 '15 at 18:53
  • ...look at the MCU manual to find out how the prescalar interacts with other parts of the timer; given `TMR4->PSREG=12;` one manual suffices for everything. – supercat May 29 '15 at 18:54
1

I'm surprised that no one has mentioned laziness and that other sciences are not discussed. My daily work as programmer shows to me that naming conventions for any kind of variable in a program are influenced by three different aspects:

  1. The scientific background of the programmer.
  2. The programming skills of the programmer.
  3. The environment of the programmer.

I think it is of no use to discuss about low level or high level programming. In the very end it can always be pinned down to the former three aspects.


An explanation of the first aspect: Many "programmers" are not programmers in the first place. They are mathematicians, physicists, biologists or even psychologists or economists but many of them are not computer scientists. Most of them have their own domain specific keywords and abbreviations which you can see in their naming "conventions". They are often trapped in their domain and use those known abbreviations without thinking of readability or coding guides.

An explanation of the second aspect: As most of the programmers are no computer scientists their programming skills are limited. Thats why they often dont care about coding conventions but more on domain specific conventions as stated as first aspect. Also if you do not have the skills of a programmer you do not have the understanding of coding conventions. I think most of them dont see the urgent need to write understandable code. Its like fire and forget.

An explanation of the third aspect: It is unlikely to brake with the conventions of your environment which can be old code you have to support, coding standards of your company (run by economists who dont care about coding) or the domain you belong to. If someone started to use cryptic names and you have to support him or his code you are unlikely to change the cryptic names. If there are no coding standards at your company I bet almost every programmer will write their own standard. And last if you are surrounded by domain users you will not start to write another languange than they use.

wagnerpeer
  • 192
  • 1
  • 6
  • _no one has mentioned laziness_ - maybe that's because this is not relevant here. And _that other sciences are not discussed_ oh that's easy: this site in not for **discussion**. It's for [questions and answers](http://programmers.stackexchange.com/about-new "as explained here") – gnat Jan 15 '13 at 08:17
  • Laziness is a legit reason. Almost all programmers are lazy people (otherwise we would do everything manually ooo!). – Thomas Eding Jul 03 '14 at 21:32