131

I recently encountered a class which provides pretty much every single-character as a constant; everything from COMMA to BRACKET_OPEN. Wondering whether this was necessary; I read an "article" which suggests that it may be helpful to pull single-character literals into constants. So, I'm skeptical.

The main appeal of using constants is they minimize maintenance when a change is needed. But when are we going to start using a different symbol than ',' to represent a comma?

The only reason I see for using constants instead of literals is to make the code more readable. But is city + CharacterClass.COMMA + state (for example) really more readable than city + ',' + state?

For me the cons outweigh the pros, mainly that you introduce another class and another import. And I believe in less code where possible. So, I'm wondering what the general consensus is here.

Bilesh Ganguly
  • 343
  • 1
  • 3
  • 13
Austin Day
  • 1,361
  • 2
  • 8
  • 9
  • 12
    Very related: http://programmers.stackexchange.com/questions/221034/usage-of-magic-strings-numbers/221042#221042 – Blrfl Jul 06 '16 at 17:48
  • 33
    Hmm... it might be useful for different locales, maybe? For example, some languages use guillements (angle quotes, `«` and `»`) as quotation marks instead of English's standard `"` (or nicer-looking `“` and `”`). Apart from that, it just sounds like a set of magic characters. Assuming two instances of `CharacterClass` called `englishChars` and `frenchChars`, it's possible that `englishChars.LEFT_QUOTE` might be `“`, while `frenchChars.LEFT_QUOTE` might be `«`. – Justin Time - Reinstate Monica Jul 06 '16 at 19:28
  • 4
    There are lots of different variants on commas: https://en.wikipedia.org/wiki/Comma#Comma_variants - perhaps this is not such a dumb idea, especially if your source code can be encoded as utf-8. – Aaron Hall Jul 06 '16 at 19:57
  • It could make sense to do this if the type of the constants matters. Suppose your application produces ASCII files and you want it to do that also after you upgrade to a compiler that does unicode by default for text, this could be the way to go. Something similar could be the case for integer values, using 8, 16 or 32 bits. – Martin Maat Jul 06 '16 at 21:16
  • I can't imagine that this would be the reason why, but you might want to define a constant of a literal character to manage the type, or be able to hand it to a function as a pointer (why you would want that i have no idea). In particular c and c++ define character constants differently, in c they are signed ints and in c++ they are signed chars. You might want to define `unsigned char COMMA = ',';` which is actually implicitly converting the comma constant to the type you want. – Steve Cox Jul 06 '16 at 21:57
  • 21
    In your case, it's like calling a variable "number". Your constant should've been called DELIMITER. Or it should be CITY_STATE = "{0}, {1}" – the_lotus Jul 07 '16 at 11:23
  • 13
    That article you linked is very terrible. Constants should _never_ be thrown into a bucket like that. Put them on the classes where they have context: in essence, the class with the constant provides the context in which the constant is used. For example, Java's `File.separator`. The class tells you the type of separator. Having a class named `Consts` or `Constants` provides no context and makes constants harder to use correctly. –  Jul 07 '16 at 15:26
  • 2
    I still have nightmares about some code where all String literals were replaced by constants. He actually wrote stuff like `StringConstants.LEFT_PARENTHESIS + lastName + StringConstants.COMMA + StringConstants.SPACE + firstName + StringConstants.RIGHT_PARENTHESIS` rather than `"(" + lastName + ", " + firstName + ")"`. I have never been as glad about the inline constant field refactoring as I was back then :-) – meriton Jul 07 '16 at 19:07
  • 1
    for those that argue readibility and explicit intent, with these single letter constants, the idiomatic way in every language would be their supported string formatting facility. for example in Java you would write `String.format("(%s,%s)",lastName,firstName)` and a static import makes it all that more succinct. –  Jul 08 '16 at 15:03
  • 2
    extracting an inlined literal to a constant provides and ABSTRACTION. you remove the explicit coupling to the inlined literal to an abstract field, whose value can change without you caring. now, this CAN be useful. but if you abstract away `','` to `COMMA`, then you haven't really abstracted away anything, because a comma is a comma is a comma. it's not an abstraction, it's just itself. you CAN gain value from this if what the code needs isn't a comma per se, but rather some delimiter for creating a CSV (which can use semicolons etc too), but even then it's shady. abstractions have a price. – sara Jul 09 '16 at 07:58
  • It may be easier to avoid overlooking mixing similar characters. `.` and `,` are very visually very similar as `:` and `;` are. If it is very important to get this right, then spelling out the character may make it easier for future readers to be certain that the right characters are used. – Thorbjørn Ravn Andersen Jul 09 '16 at 17:27
  • Would you probably have a special separator/joiner for `city + CharacterClass.COMMA + state` that's designed for the domain scope, say CITY_STATE_JOINER or similar? `CharacterClass.COMMA=','` is nothing else than, let's say, `.red { color: red; }` in CSS -- bound to representation directly, and not to semantics. – Lyubomyr Shaydariv Jul 11 '16 at 12:21

15 Answers15

183

Tautology:

It is very clear if you read the very first sentence of the question that this question is not about appropriate uses like eliminating magic numbers, it is about terrible mindless foolish consistency at best. Which is what this answer addresses

Common sense tells you that const char UPPER_CASE_A = 'A'; or const char A = 'A' does not add anything but maintenance and complexity to your system. const char STATUS_CODE.ARRIVED = 'A' is a different case.

Constants are supposed to represent things that are immutable at runtime, but may need to be modified in the future at compile time. When would const char A = correctly equal anything other than A?

If you see public static final char COLON = ':' in Java code, find whomever wrote that and break their keyboards. If the representation for COLON ever changes from : you will have a maintenance nightmare.

Obfuscation:

What happens when someone changes it to COLON = '-' because where they are using it needs a - instead everywhere? Are you going to write unit tests that basically say assertThat(':' == COLON) for every single const reference to make sure they do not get changed? Only to have someone fix the test when they change them?

If someone actually argues that public static final String EMPTY_STRING = ""; is useful and beneficial, you just qualified their knowledge and safely ignore them on everything else.

Having every printable character available with a named version just demonstrates that whomever did it, is not qualified to be writing code unsupervised.

Cohesion:

It also artificially lowers cohesion, because it moves things away from the things that use them and are related to them.

In computer programming, cohesion refers to the degree to which the elements of a module belong together. Thus, cohesion measures the strength of relationship between pieces of functionality within a given module. For example, in highly cohesive systems functionality is strongly related.

Coupling:

It also couples lots of unrelated classes together because they all end up referencing files that are not really related to what they do.

Tight coupling is when a group of classes are highly dependent on one another. This scenario arises when a class assumes too many responsibilities, or when one concern is spread over many classes rather than having its own class.

If you used a better name like DELIMITER = ',' you would still have the same problem, because the name is generic and carries no semantic. Reassigning the value does no more to help do an impact analysis than searching and replacing for the literal ','. Because what is some code uses it and needs the , and some other code uses but needs ; now? Still have to look at every use manually and change them.

In the Wild:

I recently refactored a 1,000,000+ LOC application that was 18 years old. It had things like public static final COMMA = SPACE + "," + SPACE;. That is in no way better than just inlining " , " where it is needed.

If you want to argue readability you need to learn you to configure your IDE to display whitespace characters where you can see them or whatever, that is just an extremely lazy reason to introduce entropy into a system.

It also had , defined multiple times with multiple misspellings of the word COMMA in multiple packages and classes. With references to all the variations intermixed together in code. It was nothing short of a nightmare to try and fix something without breaking something completely unrelated.

Same with the alphabet, there were multiple UPPER_CASE_A, A, UPPER_A, A_UPPER that most of the time were equal to A but in some cases were not. For almost every character, but not all characters.

And from the edit histories it did not appear that a single one of these was ever edited or changed over the 18 years, because of what should now be obvious reason is it would break way too many things that were untraceable, thus you have new variable names pointing to the same thing that can never be changed for the same reason.

In no sane reality can you argue that this practice is not doing anything but starting out at maximum entropy.

I refactored all this mess out and inlined all the tautologies and the new college hires were much more productive because they did not have to hunt down through multiple levels of indirection what these const references actually pointed to, because they were not reliable in what they were named vs what they contained.

  • I think I once encountered in C "typedef long* UINTPTR". – gnasher729 Jul 06 '16 at 18:32
  • 113
    Maybe you should add a counterexample: `const char DELIMITER = ':'` would be actually useful. – Bergi Jul 06 '16 at 19:03
  • @gnasher729 And that how everything is `typedef`'d in Windows' header files. There are horrendous things such as `LPCTSTR` meaning "Long Pointer to Constant Tchar STRing", aka `const TCHAR*`. – ElementW Jul 06 '16 at 20:39
  • 2
    It is explicit in the first sentence that is not about the counter example; *I have recently encountered a class which provides pretty much every single-character as a constant, everything from COMMA to BRACKET_OPEN.* And not about when it is obviously appropriate. –  Jul 06 '16 at 21:13
  • 115
    I would make several arguments that `EMPTY_STRING` is beneficial. (1) I can much more easily find all uses of `EMPTY_STRING` in a file than I can find all uses of `""`. (2) when I see `EMPTY_STRING` I know for darn sure that the developer intended that string to be empty, and that it is not a mis-edit or a placeholder for a string to be supplied later. Now, you claim that by me making this argument that you may qualify my knowledge, and safely ignore me forever. So, how do you qualify my knowledge? And are you planning on ignoring my advice forever? I have no issue either way. – Eric Lippert Jul 06 '16 at 22:32
  • 1
    @ElementW That is more excusable as most of the typedefs used to be different types, or are different types on some platforms. For example `WPARAM` used to be 16 bits while `LPARAM` was 32 bits, but then they were both 32, and then both 64. Also the standard C fixed-width types weren't around at that time. – user253751 Jul 06 '16 at 22:42
  • 2
    @JarrodRoberson But the KEY_VALUE_PAIR_DELIMITER for JSON will ***always*** be `':'`, so it's not more useful than `COLON = ':'`. But if you're writing delimiter-separated-values files you might want to change the delimiters in the future. – user253751 Jul 06 '16 at 22:45
  • 39
    @immibis: We can stop thinking about these things as useful in the context of managing change. They're constants. They don't change. Think of them as useful in the context of *humans searching and comprehending the semantics of code*. Knowing that something is a key-value-pair-delimiter is *much* more useful than knowing it is a colon; that is a fact about the *semantic domain* of the program's concern, not its *syntax*. – Eric Lippert Jul 06 '16 at 23:14
  • 15
    @EricLippert: I'm kinda seeing the point of others here who point out that the only guarantee that a `const` provides is that it won't change at runtime (after compilation), though I do agree with you that the semantic meaning of the `const`is far more important than its use as a change management tool. That said, I can certainly imagine a `const EARLIEST_OS_SUPPORTED` which is not only semantically consistent, but will also change over time as the program evolves and old cruft is removed. – Robert Harvey Jul 06 '16 at 23:23
  • 5
    @EricLippert Because the fact that it's being written between the key and the value doesn't tell the reader that it's a key/value delimiter? – user253751 Jul 06 '16 at 23:33
  • 2
    Another benefit of the empty string constant arises in languages that don't intern string literals by default. Ruby strings are mutable and can't be interned, which means they cost (ballpark) 40 bytes per use (ballpark; reference to the string + RTTI + internal reference to an array + RTTI + length) instead of just 8. Would you prefer writing `String.EMPTY` everywhere, or `String.intern("")` (neither of which exists in Ruby 2.2, and the former of which is much easier to implement)? Ruby 2.3 allows making strings immutable across the project (then literals intern), but you might not want that. – John Dvorak Jul 07 '16 at 07:54
  • 4
    @immibis That presuposes that the reader can identify the key and the value. – Taemyr Jul 07 '16 at 11:44
  • 5
    @EricLippert Could you explain how `EMPTY_STRING` is more easily found than `""` (your point (1))? To me this (single) argument screamed "nonsense", but I know of some very good posts of yours, so I'm curious whether there's something I'm missing? – Daniel Jour Jul 07 '16 at 18:20
  • 10
    @DanielJour: In C#, a verbatim [string literal](https://msdn.microsoft.com/en-us/library/aa691090(v=vs.71).aspx) is designated by an @ sign in front of it, as in `@"This is a ""Literal String."""` Notice the repeating quotation marks? That's the only way you can escape a double-quote in a verbatim literal string and is also the only escape sequence available in such strings. `String.Empty` will distinguish an empty string in C# code from escaped double quotes in verbatim string literals. – Robert Harvey Jul 07 '16 at 18:34
  • 13
    @DanielJour: As Robert Harvey notes, "" can be difficult to grep for in many languages. But more generally, we wish to stop thinking that grep is a suitable tool for searching code. *Code is the reification of the semantics of business logic*, so we should be doing searches at the *semantic* level, not the *lexical* level. An IDE can surface a "find all references of this symbol" feature that guarantees that your search results are only those things which you actually seek. – Eric Lippert Jul 07 '16 at 19:19
  • 16
    @DanielJour: So this then is a third argument for `EMPTY_STRING`; that a well-designed IDE will surface tools that allow me to treat this entity symbolically, rather than syntactically. Generalize this to a fourth argument: that the library of code analysis tools that sits below the IDE may allow for advanced *programmatic* analysis of code correctness *at the symbolic level*. A developer who wishes to take advantage of tools more advanced than those written literally 40 years ago need only make small changes to their habits in order to reap the rewards of advanced tooling. – Eric Lippert Jul 07 '16 at 19:23
  • 4
    @EricLippert - that isn't a third argument for `EMPTY_STRING` specifically. It is a valid first argument for structural refactoring, but not for the very specific case of `EMPTY_STRING`. Now if it was `DEFAULT_STRING = ""` or some other domain specific semantic case but for `EMPTY_STRING` there is no argument. –  Jul 07 '16 at 20:37
  • 6
    "That is in no way better than just inlining `" , "` where it is needed." The phrase "Syntax must not look like grit on Tim's screen." comes to mind. Code that's heavy with punctuation constants and constants including leading or trailing space characters can be very tedious to read. – Random832 Jul 07 '16 at 21:25
  • 5
    @EricLippert If your argument is we should use an IDE which searches syntax tokens instead of strings with grep, there should be an option in the IDE to search for "Empty Strings" and the IDE will use the parsed syntax-tree to find all instances of an empty string, without me having to define a constant EMPTY_STRING and search for that... – Falco Jul 08 '16 at 11:44
  • @EricLippert Year 1976 is the year Cray-1, a supercomputer ten times slower than my cellphone, was built. Do you have a specific programming tool in mind, or just primitive text editors in general? – John Dvorak Jul 08 '16 at 14:01
  • 2
    @JanDvorak: I was thinking specifically of grep. But more generally yes, the state of developer tools outside of the .NET and JavaVM worlds has shown remarkably little progress in the productivity tools side of things. The fact that I am again using vi at least once a day is not a good thing. – Eric Lippert Jul 08 '16 at 14:55
  • 1
    @EricLippert - your problem is you are using `vi` instead of `emacs` ;-) but seriously JetBrains has every mainstream ( and most non-mainstream like erlang ) languages covered pretty well right now. Java, .Net, C, C++, Python, Ruby, JavaScript and HTML/CSS are pretty well represented I mean what do you want? –  Jul 08 '16 at 14:58
  • @eric nice, thanks. I didn't realize grep was _that_ old. But speaking of biggest improvements, isn't one of them an automated build script machinery? Open a webpage on one screen, your ide on the other, then watch the webpage change at the very moment you save the CSS file. Unity gets close, too. – John Dvorak Jul 08 '16 at 15:10
  • @JanDvorak: Grep was released in 1973. There certainly have been some improvements; given the choice between phabricator and `make`, I am very glad that today I use phabricator every day and `make` never. But there are still plenty of development shops that use `make`. – Eric Lippert Jul 08 '16 at 15:26
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/42240/discussion-on-answer-by-jarrod-roberson-are-single-character-constants-better-th). – yannis Jul 09 '16 at 10:19
  • 2
    `find whomever wrote that and break their keyboards` He had an [IBM Model M](https://en.wikipedia.org/wiki/Model_M_keyboard). Now I need a new hammer. – dotancohen Jul 10 '16 at 06:54
  • 2
    Great answer. Agreed with request for a counter example. One important, often missed, use of named constants is to assign a semantic label to a commonly used value for clarity, e.g. (really contrived example) `ASSIGNMENT_OPERATOR = '='; SPREADSHEET_FORMULA_START = '=';` rather than `EQUAL_SIGN = '=';` indicates what you're using that value for in context (or say `POLL_INTERVAL=30; EXPIRATION_TIME=30;` rather than `THIRTY=30;` or just using `30` everywhere and not knowing what different `30`s mean). – Jason C Jul 10 '16 at 19:06
  • 1
    PI = 3.14 - yes. THREE_POINT_ONE_FOUR = 3 - no. If one cannot understand the difference they are beyond help and should not be allowed to touch a computer. – cyborg Oct 10 '18 at 10:24
149

The main appeal of using constants is they minimize maintenance when a change is needed.

ABSOLUTELY NOT. This is not at all the reason to use constants because constants do not change by definition. If a constant ever changes then it was not a constant, was it?

The appeal of using constants has nothing whatsoever to do with change management and everything to do with making programs amenable to being written, understood and maintained by people. If I want to know everywhere in my program where a colon is used as a URL separator, then I can know that very easily if I have the discipline to define a constant URLSeparator, and cannot know that easily at all if I have to grep for : and get every single place in the code where : is used to indicate a base class, or a ?: operator, or whatever.

I thoroughly disagree with the other answers which state that this is a pointless waste of time. Named constants add meaning to a program, and those semantics can be used by both humans and machines to understand a program more deeply and maintain it more effectively.

The trick here is not to eschew constants, but rather to name them with their semantic properties rather than their syntactical properties. What is the constant being used for? Don't call it Comma unless the business domain of your program is typography, English language parsing, or the like. Call it ListSeparator or some such thing, to make the semantics of the thing clear.

Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
  • 42
    While I agree with the spirit of what you're saying here, your second/third sentences aren't really correct. A constant can change between versions of a file. In fact, most programs I write have a constant named something like `MY_VER`, which contains the current version number of the program, which can then be used throughout the remainder of the program rather than a magic string like "5.03.427.0038". The added benefit is as you say that it's provided semantic information. – Monty Harder Jul 06 '16 at 19:50
  • 50
    To be fair, the point of a constant is that it doesn't change during runtime after being initialised, not that it doesn't change between compilations. From a compiler's perspective, the point is that the compiler can make assumptions that the program is unable to modify it; whether the programmer is allowed to modify it when they recompile doesn't change its constant-ness. There can also be cases where software takes a read-only value from hardware, maybe by dereferencing a `const volatile T*` pointer to a predetermined address; while the program can't change it, the hardware can. – Justin Time - Reinstate Monica Jul 06 '16 at 20:01
  • 6
    @MontyHarder: Good point. My opinion is informed by the fact that I typically use languages that distinguish between constants -- which must be forever unchanging -- and *variables which may be assigned once* -- which can change from version to version, run to run, or whatever. A constant and a variable are different things; one stays the same and one varies over time. – Eric Lippert Jul 06 '16 at 20:04
  • 3
    @JustinTime That's really only the c++ const keyword you're talking about, most other languages use constant to just refer to the construction in the OP. Even then, most of the time that keyword shows up in code, the compiler can't assume the value is immutable e.g. `int func(const int *i){/*...*/}`, `void myclass::fun() const {/*...*/}` – Steve Cox Jul 06 '16 at 22:25
  • 7
    @SteveCox: I agree; the way C/C++ characterize "const" is weird and of limited use. The property I want of constants is that their values do not change, not that I am restricted from changing them in some functions but not in others. – Eric Lippert Jul 06 '16 at 22:27
  • @JustinTime In addition to what others point out about C++'s `const`, I'd like to mention that originally Bjarne was going to call it `readonly` but chose not to purely because he was trying to avoid adding new keywords if it was possible to avoid doing so. In addition, something marked `const` in C++ is never truly constant thanks to `const_cast`. – Pharap Jul 07 '16 at 08:50
  • 15
    "This is not at all the reason to use constants because constants do not change by definition. If a constant ever changes then it was not a constant, was it?" Changing constants at compile time (not runtime obviously) is perfectly normal. That's why you made them a clearly labeled "thing" in the first place. Of course, the constants of the OP are junk, but think of something like `const VERSION='3.1.2'` or `const KEYSIZE=1024` or whatever. – AnoE Jul 07 '16 at 09:14
  • 3
    @Pharap: All objects declared `const` are constant in C++. If you choose to lie to the compiler by using `const_cast` on a `const` object, you will invoke undefined behavior. – You Jul 07 '16 at 10:30
  • 2
    I think the discussion about how constant constants should be is a bit irrelevant. Even if you allow for changing constants the benefit you get in change management arises directly from the fact that you are capturing meaning rather than syntactic property. – Taemyr Jul 07 '16 at 11:53
  • 3
    Are there programming languages that define constants as "forever unchanging?" - I've seen plenty of mathematical constants (planck's constant, pi, gas constant, etc.) defined like that but only ever heard it discussed in programming in terms of variables that other parts of a program should not change (the c/c++ construction...) – John-M Jul 07 '16 at 16:10
  • 4
    @John-M: C# distinguishes between `const` -- which *must* be unchanging because a const which changes requires recompiling *everything* that depends on your library, not just your library -- and `readonly` which is a variable in a constructor and a value everywhere else. – Eric Lippert Jul 07 '16 at 16:26
  • 4
    +1 You wonderfully capture the jist of the answer I now won't have to write. It is all about the _semantics_. If your named literal provides meaning it adds value. – Mr.Mindor Jul 07 '16 at 20:19
  • 2
    @SteveCox True, my point is mainly that when it comes to programming, there's a lot more leeway in the term "constant" than in most other places. They can be anything from truly constant, to 'constant' between major updates, to 'constant' for a single session, to merely read-only, with different languages offering different amounts of leeway; the first two will be initialised at compile time and baked into the finished program (if possible), the third will be initialised during runtime, and the fourth will be controlled by outside sources. – Justin Time - Reinstate Monica Jul 08 '16 at 00:50
  • @Pharap Good point, I forgot about `readonly`. – Justin Time - Reinstate Monica Jul 08 '16 at 00:50
  • 5
    -1: I think this answer ignores the scenario described by the question, in which there is no good reason to have constants for each single character. Also, I disagree that *"the appeal of using constants has nothing whatsoever to do with change management"* -- it may not be the ideal main motivation (readability) but it is certainly an appealing side-benefit. It is definitely easier to change a constant's value than to change every place in the code where it is used. – Jordan Rieger Jul 08 '16 at 21:00
  • 1
    This answer is dangerously divorced from real-world cost benefit analysis. There is indeed some value to using constants - that's not in dispute. But clearly, that value depends on the details of the case - some constants are critical to understanding, others are barely relevant. The one-letter (and empty-string) constants in *this* question are some of the least valuable. Yet the costs are non-trivial. Layers of indirection (even trivial) make it harder to reason about what code does - just try showing such code to a new programmer, and watch the time it takes to cut through the cruft. – Eamon Nerbonne Jul 11 '16 at 18:54
  • @EamonNerbonne: The function of an abstraction is to decrease the cognitive burden upon the reader; if what you're saying is that abstractions which increase cognitive burden are bad abstractions, then I certainly agree with you! It is of course a truism of design that all problems can be solved by adding more abstraction except one: "I have too much abstraction". – Eric Lippert Jul 11 '16 at 19:00
  • 3
    @EricLippert *every* abstraction is a burden to the reader. Almost always that's worth it, because the alternative is even more burdensome, but I seriously doubt that in real code of a decent size with a long maintenance history - even if that is good code - that most of these constants are worth it. They just aren't abstracting away meaningful complexity; yet their existence imposes friction. (If the code is short lived or small I don't think it really matters either way). – Eamon Nerbonne Jul 11 '16 at 19:03
  • ...let me get this straight. You have never, once in your life, defined a constant in a program, and then at a later stage of development changed the definition of that constant for any reason whatsoever? Because if you have... WTF were you thinking?! Constants are **constant**! They are not meant to be changed! It's in the name! If you had to change it later it wasn't a constant now was it? It's programming 101! –  Jul 12 '16 at 13:31
  • @NajibIdrissi: Well I was going to say no, but your question made me think about it pretty hard, and yes, I did once do that. I changed the value of a named IID in a COM library back in... 1997 I think it was. Those are logically constant. The reason was in order to work around a complicated bug that was introduced by a coworker *incorrectly* changing a different constant value that had already been published; my fix was to resolve the unintentional inconsistency in a manner that broke the fewest customers. I do not recall the exact details though; it was 19 years ago after all. – Eric Lippert Jul 12 '16 at 14:45
  • 1
    Multiple decades without making a single mistake when defining a constant. I say bravo. –  Jul 12 '16 at 17:06
62

No, that is dumb.

What is not necessarily dumb is pulling things like that into named labels for localization reasons. For example, the thousands delimiter is a comma in America (1,000,000), but not a comma in other locales. Pulling that into a named label (with an appropriate, non-comma name) allows the programmer to ignore/abstract those details.

But making a constant because "magic strings are bad" is just cargo culting.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 8
    Localization is usually more complicated than just string constants. For example, some languages want list delimiter between all list items, while others exclude the delimiter before the last item. So, usually one needs not localized constants, but localized _rules_. – Vlad Jul 07 '16 at 08:51
  • 19
    Actually the thousands delimiter is not necessarily a thousands delimiter in other locales (China/Japan). It's not even set after a constant number of digits (India). Oh, and there may be different delimiters depending on if it's a 1000 delimiter or the 1000000 delimiter (Mexico). But that's less of a problem than not using ASCII digits 0-9 in some locales (Farsi). http://ux.stackexchange.com/questions/23667/is-adding-commas-to-numbers-a-cultural-thing – Peter Jul 07 '16 at 09:43
  • 1
    @Vlad Localization is much more complex than that, however, the thousands separator is a well-known example that people recognize. –  Jul 07 '16 at 15:27
  • It depends on the localization strategy ... do you change all the constants in your program to translate it? Or should you rather read the values from a file (or other data store), making them effectively runtime variables? – Paŭlo Ebermann Jul 09 '16 at 20:03
  • That wouldn't be useful at all as a constant, then. The program would need recompiled for locales, which is awful practice. They should be variables loaded in from definition files and looked up as needed. Not that I disagree with the point (I voted the answer up), but I'd take a harder position on the matter. –  Jul 11 '16 at 19:45
29

There are a few characters that are can be ambiguous or are used for several different purposes. For example, we use '-' as a hyphen, a minus sign, or even a dash. You could make separate names as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '-';
static const wchar_t EM_DASH = '-';

Later, you could choose to modify your code to disambiguate by redefining them as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '\u2122';
static const wchar_t EM_DASH = '\u2014';

That might be a reason why you'd consider defining constants for certain single characters. However, the number of characters that are ambiguous in this manner is small. At most, it seems you'd do it only for those. I'd also argue that you could wait until you actually have a need to distinguish the ambiguous characters before you factor the code in this manner.

As typographical conventions can vary by language and region, you're probably better off loading such ambiguous punctuation from a translation table.

Adrian McCarthy
  • 981
  • 5
  • 8
  • For me this is the only valid reason one might create character constants – F.P Jul 07 '16 at 08:03
  • 2
    Using `-` as an em dash is quite misleading ... it is much to short for that in most fonts. (It is even shorter than an en dash.) – Paŭlo Ebermann Jul 09 '16 at 20:04
  • OK, not the best example. I started out with `string`s rather the `wchar_t`s and used the standard manuscript convention of `"--"` for the dash. But the original example was using single characters, so I switched to stay true to the question. There are folks who type `-` for dashes, especially when working in a fixed pitch font. – Adrian McCarthy Jul 11 '16 at 21:15
  • 1
    @PaŭloEbermann No, traditionally an em dash is the width of a typeface's 'm' character and an en dash is the width of an 'n' character. – Dizzley Jul 12 '16 at 13:48
  • @Dizzley yes, and hyphen-width < n-width < m-width. – Paŭlo Ebermann Jul 12 '16 at 15:01
  • @PaŭloEbermann My problem, I didn't understand what you wrote on first reading. We are in agreement. – Dizzley Jul 13 '16 at 08:46
22

A constant must add meaning.

Defining COMMA to be a comma doesn't add meaning, because we know that a comma is a comma. Instead we destroy meaning, because now COMMA might actually not be a comma anymore.

If you use a comma for a purpose and want to use a named constant, name it after it's purpose. Example:

  • city + CharacterClass.COMMA + state = bad
  • city + CITY_STATE_DELIMITER + state = good

Use functions for formatting

I personally prefer FormatCityState(city, state) and don't care about how the body of that function looks as long as it's short and passes the test cases.

Peter
  • 3,718
  • 1
  • 12
  • 20
  • 1
    Ah, but a comma is not always the same comma. I could define COMMA = '\u0559' or '\u060C' etc. (see Unicode) or even turn it into a variable later and read it from a config file. That way, it will still have the same _meaning_, but just a different value. How about that. – Mr Lister Jul 09 '16 at 15:48
  • 2
    @MrLister: YAGNI. If you have that need: great! You have a fine solution. But if you don't - don't clutter your code because possibly maybe you one day might. Also, in my experience if you try to introduce abstractions with no function in your codebase, people aren't great at being consistent. So, even if you did define COMMA with the intent of using some other codepoint, in a program of sufficient size and age such that the choice matters at all, you're likely to find that the constant wasn't used everywhere it should have been (and conversely, may have been used inappropriately too). – Eamon Nerbonne Jul 11 '16 at 20:50
17

The idea that a constant COMMA is better than ',' or "," is rather easy to debunk. Sure there are cases where it makes sense, for example making final String QUOTE = "\""; saves heavily on the readibility without all the slashes, but barring language control characters like \ ' and " I haven't found them to be very useful.

Using final String COMMA = "," is not only bad form, it's dangerous! When someone wants to change the separator from "," to ";" they might go change the constants file to COMMA = ";" because it's faster for them to do so and it just works. Except, you know, all the other things that used COMMA now also are semicolons, including things sent to external consumers. So it passes all your tests (because all the marshalling and unmarshalling code was also using COMMA) but external tests will fail.

What is useful is to give them useful names. And yes, sometimes multiple constants will have the same contents but different names. For example final String LIST_SEPARATOR = ",".

So your question is "are single char constants better than literals" and the answer is unequivically no, they aren't. But even better than both of those is a narrowly scoped variable name that explicitly says what its purpose is. Sure, you'll spend a few extra bytes on those extra references (assuming they don't get compiled out on you, which they probably will) but in long term maintenance, which is where most of the cost of an application is, they are worth the time to make.

corsiKa
  • 1,084
  • 6
  • 13
  • How about conditionally defining DISP_APOSTROPHE as either an ASCII 0x27 or a Unicode single right quote character (which is a more typographically-appropriate rendition of an apostrophe), depending upon the target platform? – supercat Jul 06 '16 at 21:21
  • 3
    actually `QUOTE` example proves it is a bad idea as well since you are assigning it to what is generally/popularly known as the `DOUBLE QUOTE` and `QUOTE` implies `SINGLE_QUOTE` which is more correctly referred to as `APOSTROPHE`. –  Jul 06 '16 at 21:38
  • 3
    @JarrodRoberson I don't feel quote implies single quote, personally - but that's another good reason to remove ambiguity where you can! – corsiKa Jul 06 '16 at 21:59
  • 2
    I don't like the `QUOTE` example for an additional reason - it makes reading strings constructed with it even harder `"Hello, my name is " + QUOTE + "My Name" + QUOTE` this is a trivial example and yet it still looks bad. Oh, sure, instead of concatenation you can use replace tokens, too `"Hello, my name is %sMy Name%s".format(QUOTE, QUOTE)` may just be worse. But, hey, let's try indexed tokens `"Hello, my name is {0}My Name{0}".format(QUOTE)` ugh, not that much better. Any non-trivial string generated with quotes in it would be even worse. – VLAZ Jul 08 '16 at 18:18
  • @Vld What's your alternative then? I like what you put better than `"Hello, my name is \"My Name\"" personally - that's a matter of taste, I guess. – corsiKa Jul 08 '16 at 18:21
  • 2
    @corsiKa - I'll live with the escaped actual quotes. If I miss escaping one, the IDE I use would immediately complain. Code most likely won't compile, either. It's fairly easy to spot. How easy it is to make a mistake when doing `"My name is" + QUOTE + "My Name" + QUOTE` I actually made that same mistake _three times_ writing the above comment. Can you spot it? If it takes you a bit, it's the missing space after *is*. Do you format the string? In which case, a string with multiple tokens to replace is going to be even worse to work out. How am I to use it so that it's more readable? – VLAZ Jul 08 '16 at 18:29
  • @supercat using the "right quote character" as an apostrophe sounds similarly wrong (but in the other direction) as using the apostroph as a single-quote character (which is done in many programming languages due to its availability in ASCII). Maybe your font did mix them up? – Paŭlo Ebermann Jul 09 '16 at 20:09
  • @PaŭloEbermann I'm afraid I don't follow your question. Did I mix what up where? – corsiKa Jul 09 '16 at 21:48
  • @PaŭloEbermann: In most fonts, the ASCII character 0x27 is neither an apostrophe nor a right single quote. It's a prime, and ASCII 0x22 is a double-prime. Both are used (correctly) in measurements like 4'3", and prime is also used when describing e.g. variables x and x', y and y', etc. I was unaware of any separate proper apostrophe other than the right quote. – supercat Jul 11 '16 at 14:33
  • @corsiKa my comment was a reply to supercat's first comment, not to your answer. Sorry for any confusion. – Paŭlo Ebermann Jul 11 '16 at 15:53
  • @supercat so I guess the Unicode consortium got the naming of the characters wrong? U+2019 seems to be meant for both apostrophe and right right quotation sign ... ugh. Okay, not your fault here, Unicode's one. (I think they should have used two different characters for the two meanings.) – Paŭlo Ebermann Jul 11 '16 at 16:07
  • @PaŭloEbermann: Character 0x27 was called the apostrophe long before fonts with a proper right-quote symbol became commonplace; since right single quotes in places where programming languages require character 0x27, the name "apostrophe" for code 0x27 may be inaccurate, but it's apt to cause less confusion than calling it anything else. – supercat Jul 11 '16 at 16:18
  • @supercat Yes, renaming the ASCII character would likely cause confusion, but when introducing the "right quotation sign", they could have also introduced a new character "typographic apostrophe" or similar (even if both map to the same glyph in most fonts). – Paŭlo Ebermann Jul 12 '16 at 07:55
3

I've done some work writing lexers and parsers and used integer constants to represent terminals. Single-character terminals happened to have the ASCII code as their numeric value for simplicity's sake, but the code could have been something else entirely. So, I'd have a T_COMMA that was assigned the ASCII-code for ',' as its constant value. However, there were also constants for nonterminals which were assigned integers above the ASCII set. From looking at parser generators such as yacc or bison, or parsers written using these tools, I got the impression that's basically how everybody did it.

So, while, like everybody else, I think it's pointless to define constants for the express purpose of using the constants instead of the literals throughout your code, I do think there are edge cases (parsers, say) where you might encounter code riddled with constants such as you describe. Note that in the parser case, the constants aren't just there to represent character literals; they represent entities that might just happen to be character literals.

I can think of a few more isolated cases where it might make sense to use constants instead of the corresponding literals. For example, you might define NEWLINE to be the literal '\n' on a unix box, but '\r\n' or '\n\r' if you're on windows or mac box. The same goes for parsing files which represent tabular data; you might define FIELDSEPARATOR and RECORDSEPARATOR constants. In these cases, you're actually defining a constant to represent a character that serves a certain function. Still, if you were a novice programmer, maybe you'd name your field separator constant COMMA, not realizing you should have called it FIELDSEPARATOR, and by the time you realized, the code would be in production and you'd be on the next project, so the wrongly named constant would stay in the code for someone to later find and shake his head at.

Finally, the practice you describe might make sense in a few cases where you write code to handle data encoded in a specific character encoding, say iso-8859-1, but expect the encoding to change later on. Of course in such a case it would make much more sense to use localization or encoding and decoding libraries to handle it, but if for some reason you couldn't use such a library to handle encoding issues for you, using constants you'd only have to redefine in a single file instead of hard-coded literals littered all over your source-code might be a way to go.

As to the article you linked to: I don't think it tries to make a case for replacing character literals with constants. I think it's trying to illustrate a method to use interfaces to pull constants into other parts of your code base. The example constants used to illustrate this are chosen very badly, but I don't think they matter in any way.

Pascal
  • 347
  • 1
  • 5
  • 2
    *I think it's trying to illustrate a method to use interfaces to pull constants into other parts of your code base.* which is an even worse **anti-pattern** and is tightly coupling and low cohesion as well, there is no valid reason to do that either. –  Jul 06 '16 at 21:26
3

In addition to all the fine answers here, I'd like to add as food for thought, that good programming is about providing appropriate abstractions that can be built upon by yourself and maybe others, without having to repeat the same code over and over.

Good abstractions make the code easy to use on the one hand, and easy to maintain on the other hand.

I totally agree the DELIMITER=':' in and of itself is a poor abstraction, and only just better than COLON=':' (since the latter is totally impoverished).

A good abstraction involving strings and separators would include a way to pack one or more individual content items into the string and to unpack them from the packed string as well, first and foremost, before telling you what the delimiter is. Such an abstraction would be bundled as a concept, in most languages as a class; for example, so that its use would be practically self documenting, in that you can search for all places where this class is used and be confident of what the programmer's intention regarding the format of the packed strings in each case where some abstraction is used.

Once such an abstraction is provided, it would be easy to use without ever having to consult what the value of the DELIMITER or COLON is, and, changing the implementation details would generally be limited to the implementation. So, in short, these constants should really be implementation details hidden within an appropriate abstraction.

The main appeal of using constants is they minimize maintenance when a change is needed.

Good abstractions, which are typically compositions of several related capabilities, are better at minimizing maintenance. First, they clearly separate the provider from the consumers. Second, they hide the implementation details and instead provide directly useful functionality. Third, they document at a high level when and where they are being used.

user2943160
  • 103
  • 1
  • 3
Erik Eidt
  • 33,282
  • 5
  • 57
  • 91
2

The one time I have seen such constants used effectively is to match an existing API or document. I've seen symbols such as COMMA used because a particular piece of software was directly connected to a parser which used COMMA as a tag in an abstract syntax tree. I've also seen it used to match a formal specification. in formal specifications, you'll sometimes see symbols like COMMA rather than ',' because they want to be as utterly clear as possible.

In both cases, the use of a named symbol like COMMA helps provide cohesiveness to an otherwise disjoint product. That value can often outweigh the cost of overly verbose notations.

Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
2

Observe that you are trying to make a list.

So, refactor it as: String makeList(String[] items)

In other words, factor out the logic instead of the data.
Languages might be different in how they represent lists, but commas are always commas (that's a tautology). So if the language changes, changing the comma character won't help you -- but this will.

user541686
  • 8,074
  • 8
  • 38
  • 49
0

If this was a class written as a part of an application by your fellow developer, this is almost certainly a bad idea. As others already pointed out, it makes sense to define constants such as SEPARATOR = ',' where you can change the value and the constant still makes sense but much less so constants whose name describes just their value.

However, there are at least two cases where it does make sense to declare constants whose name describes exactly their contents and where you cannot change the value without appropriately changing the constant's name:

  • Mathematical or physical constants, e.g. PI = 3.14159. Here, the role of the constant is to act as a mnemonic since the symbolic name PI is much shorter and more readable than the value it represents.
  • Exhaustive lists of symbols in a parser or keys on a keyboard. It might even make sense to have a list of constants with most or all Unicode characters and this is where your case may fall. Some characters such as A are obvious and clearly recognizable. But can you easily tell А and A apart? The first one is Cyrillic letter А while the latter is Latin letter A. They are different letters, represented by different Unicode code points, even though graphically they are almost identical. I'd rather have constants CYRILLIC_CAPITAL_A and LATIN_CAPITAL_A in my code than two almost-identical-looking characters. Of course, this is pointless if you know you will only be working with ASCII characters which do not contain Cyrillic. Likewise: I use Latin alphabet day-to-day so if I were writing a program which needed a Chinese character, I would probably prefer to use a constant rather than insert a character which I don't understand. For someone using Chinese characters day-to-day, a Chinese character may be obvious but a Latin one may be easier to represent as a named constant. So, as you see, it depends on the context. Still, a library might contain symbolic constants for all characters since the authors can't know in advance how the library is going to be used and which characters might need constants to improve readability in a specific application.

However, such cases are usually handled by system classes or special-purpose libraries and their occurrence in code written by application developers should be very rare unless you are working on some very special project.

Michał Kosmulski
  • 3,474
  • 19
  • 18
-1

Maybe.

Single character constants are relatively hard to distinguish. So it can be rather easy to miss the fact that you're adding a period rather than a comma

city + '.' + state

whereas that's a relatively hard mistake to make with

city + Const.PERIOD + state

Depending on your internationalization and globalization environment, the difference between an ASCII apostrophe and the Windows-1252 open and close apostrophe (or the ASCII double quote and the Windows-1252 open and close double quote) may be significant and is notoriously difficult to visualize looking at code.

Now, presumably, if mistakenly putting a period rather than a comma was a significant functional issue, you would have an automated test that would find the typo. If your software is generating CSV files, I would expect that your test suite would discover pretty quickly that you had a period between the city and the state. If your software is supposed to run for clients with a variety of internationalization configurations, presumably your test suite will run in each environment and will pick up if you have a Microsoft open quote if you meant to have an apostrophe.

I could imagine a project where it made more sense to opt for more verbose code that could head off these issues particularly when you've got older code that doesn't have a comprehensive test suite even though I probably wouldn't code this way in a green field development project. And adding a constant for every punctuation character rather than just those that are potentially problematic in your particular application is probably gross overkill.

Justin Cave
  • 12,691
  • 3
  • 44
  • 53
  • 2
    what happens when some moron changes `Const.PERIOD` to be equal to `~`? There is no justification for a tautology of named characters, it just adds maintenance and complexity that is uneeded in modern day programming environments. Are you going to write a suite of unit tests that basically say `assert(Const.PERIOD == '.')`? –  Jul 06 '16 at 17:54
  • 3
    @JarrodRoberson - That would suck, sure. But you'd be in just as much trouble if someone added a Unicode constant that looks almost exactly like a comma rather than an actual comma. Like I said, this isn't the sort of thing that I'd do in a greenfield development project. But if you have a legacy code base with a spotty test suite where you've tripped over the comma/ period or apostrophe/ Microsoft abomination apostrophes issues a couple times, creating some constants and telling people to use them may be a reasonable way to make the code better without spending a year writing tests. – Justin Cave Jul 06 '16 at 17:58
  • 3
    your legacy example is a poor one, I just finished refactoring a 1,000,000+ LOC code base that is 18 years old. It had every printable character defined like this multiple times with different conflicting names even. And many times things named `COMMA` were actually set `= SPACE + "," + SPACE`. Yes some idiot had a `SPACE` constant. I refactored them ALL out and the code was orders of magitude more readable and college hires were much more able to track things down and fix them without having 6 levels of indirection to find out what something was actually set to. –  Jul 06 '16 at 18:02
-1

Are single-character constants better than literals?

There are a lot of conflations floating around here. Let me see if I can tease them apart.

Constants provide:

  • semantics
  • change, during development
  • indirection

Going down to a single character name only impacts the semantics. A name should be useful as a comment and clear in context. It should express meaning, not the value. If it can do all that with a single character fine. If it can't, please don't.

A literal and a constant can both change during development. This is what brings up the magic number issue. Strings can be magic numbers as well.

If semantic meaning exists, and since both are constant, then whether the constant has more value than a literal comes down to indirection.

Indirection can solve any problem, other than to much indirection.

Indirection can solve the magic number problem because it allows you to decide on a value for an idea in one place. Semantically, for that to be worthwhile the name must make what that idea is clear. The name should be about the idea, not the value.

Indirection can be overdone. Some prefer to search and replace literals to make their changes. That's fine so long as 42 is clearly the meaning of life and not mixed together with 42, the atomic number of molybdenum.

Whither you can make useful distinctions like that with a single letter depends largely on context. But I wouldn't make it a habit.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • 1
    Semantic is the key. If and "A" has more semantic than simply to be an "A" then it's worth to bind same semantic to the same "reference". Doesn't matter if it is a constant or not. I totally agree. – oopexpert Jul 13 '16 at 17:57
-1

As a philosophical contrapunctus to the majority opinion, I must state that there are some of us, who appreciate the unsophisticated 19th century French peasant programmer and

remembered his monotonous, everlasting lucidity, his stupefyingly sensible views of everything, his colossal contentment with truisms merely because they were true. "Confound it all!" cried Turnbull to himself, "if he is in the asylum, there can't be anyone outside."

G.K. Chesterton, The Ball and The Cross

There is nothing wrong appreciating the truth and there is nothing wrong with stating the truth, especially when talking to a computer.

If you lie to the computer, it will get you

Perry Farrar - Germantown, Maryland (from More Programming Pearls )


But, for the most part I agree with the people who say it's dumb. I'm too young to have learned to programmed FORTRAN, but I've heard tell that you could redefine 'A' = 'Q' and come up with all sorts of wonderful cryptograms. You are not doing this.

Beyond the i18n issues brought up before (which are not redefining the glyph "COMMA", but truly redefining the glyph of a DECIMAL_POINT). Constructing French carroty quotes or British single quotes to convey meaning to humans is on thing and those really ought to be variables, not constants. The constant would be AMERICAN_COMMA := ',' and the comma := AMERICAN_COMMA

And, if I were using a builder pattern to construct an SQL Query, I would much rather see

sb.append("insert into ")
 .append(table_name)
 .append(" values ")
 .append(" ( ")
 .append(val_1)
 .append(",")
 .append(val_2)
 .append(" ); ")

than any thing else, but if you were going to add constants, it would be

INSERT_VALUES_START = " ( "
INSERT_VALUES_END = " ) "
INSERT_VALUES_SEPARATOR = " , "
QUERY_TERMINATOR = ";"

sb.append("insert into ")
 .append(table_name)
 .append(" values ")
 .append(INSERT_VALUES_START)
 .append(val_1)
 .append(INSERT_VALUES_SEPARATOR)
 .append(val_2)
 .append(INSERT_VALUES_END)
 .append(QUERY_TERMINATOR)

However, if you've ever watched anyone else program (or type) you might notice some interesting quirks. Not all of us are stellar typists. Lots of us got in to programming late or were raised with Soviet keyboards (where keys type on you) and we like to cut and paste individual letters instead of trying to find them on the keyboard and/or rely on autocomplete.

Nothing is going to autocomplete a string for you, so if I can get a comma by pressing 'con', alt-space, down, down, down, enter and get a quote by pressing 'con', alt-space, down, down, enter. I might just do that.


Another thing to remember about string literals is the way they are compiled. In Delphi at least, (which is the only language I've obsessed over the stack of) you'll wind up your literals popped into the stack of each function. So, lots of literals = lots of function overhead; "," in function_A is not the same bit of memory as a "," in function_B". To combat this, there's a "resource string" which can be built and linked in sideways - and this is how they do i18n stuff (killing two birds with one bush). In Python all your string literals are objects, and it might actually seem nice to use utils.constants.COMMA.join(["some","happy","array","strings"]), but it's not a stellar idea for the points repeated over and over on this page.

Peter Turner
  • 6,897
  • 1
  • 33
  • 57
-4

But when are we going to start using a different symbol than ',' to represent a comma?

For localisation.

In English-speaking countries, the symbol separating the whole and fractional parts of a decimal is ".", which we call "decimal point". In many other countries, the symbol is "," and is typically called the equivalent of "comma" in the local language. Similarly, where English-speaking countries use "," to separate groups of three digits in large numbers (such as 1,000,000 for one million), countries that use a comma as a decimal point use a dot (1.000.000).

So there is a case for making DECIMAL_POINT and COMMA constants if you are doing globalisation.

Paul G
  • 1
  • 2
    But then COMMA and DECIMAL_POINT are not the correct names for the entities (which is probably why you've been downvoted). – Kyle Strand Jul 11 '16 at 14:28
  • You'd need to compile specific localized versions. Literal constants are not suited for that; that use case would call for definition files and lookups into them (which might involve constants, but lookup constants, not constant characters). –  Jul 11 '16 at 19:52