223

As an experienced software developer, I have learned to avoid magic strings.

My problem is that it is such a long time since I have used them, I've forgotten most of the reasons why. As a result, I'm having trouble explaining why they're a problem to my less experienced colleagues.

What objective reasons are there for avoiding them? What problems do they cause?

Kramii
  • 14,029
  • 5
  • 44
  • 64
  • 41
    What's a magic string? Same thing as *magic numbers* ? – Laiv Feb 05 '18 at 09:39
  • 19
    @Laiv: They're similar to magic numbers, yes.I like the definition at http://deviq.com/magic-strings/: "Magic strings are string values that are specified directly within application code that have an impact on the application’s behavior.". (The definition at https://en.wikipedia.org/wiki/Magic_string isn't what I have in mind at all) – Kramii Feb 05 '18 at 10:20
  • 4
    I see. I guess you can not explain why that's a real issue because you haven't experienced issues derivated from this sort of "hardcoded" values. Maybe because you never used them. Or maybe you forgot it :-). Like use to happen with global variables. – Laiv Feb 05 '18 at 10:26
  • 1
    see [How do I explain ${something} to ${someone}?](https://softwareengineering.meta.stackexchange.com/a/6630/31260) – gnat Feb 05 '18 at 10:27
  • 18
    this is funny ***I have learned to detest***... later *What arguments can I use **to persuade** my juniors*... The never end story :-). I wouldn't try to "convince" I would rather let'em learn on their own. Nothing last more than a lesson/idea reached by your own experience. What you are trying to do is **indoctrinate**. Don't do that unless you want a team of Lemmings. – Laiv Feb 05 '18 at 10:32
  • 3
    @gnat: I'm not really asking "how to explain" anything. I'm asking for objective reasons against a practice that is commonly understood to be a poor one. My problem is that I have avoided the practice for so long that I don't really remember the reasons why any more (I also recognise that every rule of thumb has its exceptions, but that's a different question). I'll re-word my question. – Kramii Feb 05 '18 at 11:09
  • 16
    @Laiv: I'd love to let people learn from their own experience, but unfortunately that isn't an option for me. I work for a publicly funded hospital where subtle bugs can compromise patient care, and where we can't afford avoidable maintenance costs. – Kramii Feb 05 '18 at 11:29
  • 1
    I think it's best to find a piece of code that illustrates the issues. Besides this, the arguments against magic numbers hold too. – IS4 Feb 05 '18 at 12:36
  • 6
    @DavidArno, that is **exactly** what he's doing by asking this question. – user56834 Feb 05 '18 at 12:40
  • 2
    @Bilkokuya, if he doesn't justify himself to his juniors in *some* way, there will be a failure of reproduction of culture and expertise in the firm (unless the code standards become written gospel). – Steve Feb 05 '18 at 13:44
  • 1
    [Related](https://softwareengineering.stackexchange.com/a/299375/73508) – Robbie Dee Feb 05 '18 at 15:17
  • 4
    There's an additional reason to not use "magic strings" that doesn't apply to magic numbers: localization! It's much easier to have libraries of string translations that you can easily choose between rather than having to edit your source code for every new language. (Admittedly, many frameworks these days, particularly for mobile devices, do support translations with magic strings using intermediate functions.) – JAB Feb 05 '18 at 16:35
  • 4
    Can we call them hard-coded strings – Wadih M. Feb 05 '18 at 17:39
  • @DavidArno It is probably the same as DON'T USE GOTO. If you find nowadays a situation where a GOTO command would be absolutely perfect and it should definitely be used. There will always be someone to tell you "DON'T USE GOTO". Maybe your own inner voice, or the code reviewer. :) – Kenyakorn Ketsombut Feb 06 '18 at 02:27
  • 4
    If you simply mean string literals, you should say that. – tchrist Feb 06 '18 at 04:14
  • 1
    @JAB in a sense, localization and "temporalization" are fundamentally the same thing. In the one case, you're providing for the strings to be in a language from a different place. In the other, you're providing for strings that will be used at a future time, possibly because you're interfacing with another program that's been modified to use different strings. – Monty Harder Feb 07 '18 at 23:24
  • 1
    I just use *Silly String* instead. –  Feb 08 '18 at 14:23
  • 1
    This is the "no onion in the varnish" syndrome: we have never put onions into the varnish for so many decades that nobody remembers why we do not do that (and should we start?) – Kaz Feb 09 '18 at 05:18
  • 1
    @MartinMaat Please don't answer questions in the comment field. When you start to write a comment, it even tells you to avoid it. – pipe Feb 09 '18 at 08:45
  • 1
    Note that for logically minded people, "magic" is a pejorative term for "unexplained" (as opposed to a synonym for "wondrous") – Flater Feb 09 '18 at 10:58
  • 3
    Do you just mean "string literals"? If so, nothing - they are generally easier to read and comprehend than lots of prematurely abstracted constants. I think your question is based on a false premise. This kind of thinking leads to lots of complexity and higher maintenance costs. – Ant P Feb 09 '18 at 19:19
  • @AntP: No, I do not mean just string literals. – Kramii Feb 12 '18 at 10:18

7 Answers7

260
  1. In a language that compiles, a magic string's value is not checked at compile time. If the string must match a particular pattern, you have to run the program to guarantee it fits that pattern. If you used something such as an enum, the value is at least valid at compile-time, even if it might be the wrong value.

  2. If a magic string is being written in multiple places you have to change all of them without any safety (such as compile-time error). This can be countered by only declaring it in one place and reusing the variable, though.

  3. Typos can become serious bugs. If you have a function:

    func(string foo) {
        if (foo == "bar") {
            // do something
        }
    }
    

    and someone accidentally types:

    func("barr");
    

    This is worse the rarer or more complex the string is, especially if you have programmers that are unfamiliar with the project's native language.

  4. Magic strings are rarely self-documenting. If you see one string, that tells you nothing of what else the string could / should be. You will probably have to look into the implementation to be sure you've picked the right string.

    That sort of implementation is leaky, needing either external documentation or access to the code to understand what should be written, especially since it has to be character-perfect (as in point 3).

  5. Short of "find string" functions in IDEs, there are a small number of tools that support the pattern.

  6. You may coincidentally use the same magic string in two places, when really they are different things, so if you did a Find & Replace, and changed both, one of them could break while the other worked.

Erdrik Ironrose
  • 4,806
  • 3
  • 13
  • 25
  • 41
    Regarding the first argument: TypeScript is a compiled language that can typecheck string literals. This also invalidates argument two to four. Therefore, not strings itself are the problem, but using a type that allows too many values. The same reasoning can be applied to using magic integers for enumerations. – Yogu Feb 05 '18 at 12:47
  • 13
    Since I have no experience with TypeScript I will defer to your judgement there. What I would say then, is that unchecked-strings (as is the case for all languages I've used) are the problem. – Erdrik Ironrose Feb 05 '18 at 13:14
  • 28
    @Yogu Typescript won’t rename all your strings for you if you change the static string literal type you’re expecting. You’ll get compile time errors to help you find all of them, but that’s only a partial improvement on 2. Not saying it’s anything less than absolutely amazing (because it is that, and I love the feature), but it definitely does not outright eliminate the advantage of enums. In our project, when to use enums and when not to remains a kind of open style question that we aren’t sure about; both approaches have annoyances and advantages. – KRyan Feb 05 '18 at 16:25
  • 32
    One big one I've seen not for strings as much as numbers, but could happen with strings, is when you have two magic values with the same value. Then one of them changes. Now you're going through code changing the old value to the new value, which is work on its own, but you're also doing EXTRA work to make sure you're not changing the wrong values. With constant variables, not only do you not have to go through it manually, but you don't worry that you've changed the wrong thing. – corsiKa Feb 05 '18 at 16:47
  • 39
    @Yogu I would further argue that if a string literal's value is being checked at compile time, then _it ceases to be a magic string_. At that point it's just a normal const/enum value that happens to be written in a funny way. Given that perspective, I would actually argue that your comment actually _supports_ Erdrik's points, rather than refuting them. – GrandOpener Feb 06 '18 at 06:52
  • 3
    You could use [definitely vs. definately](http://www.d-e-f-i-n-i-t-e-l-y.com/) instead of "bar" / "barr". – Peter Mortensen Feb 06 '18 at 13:13
  • 1
    #4 is a good point, especially since having a single constant variable storing the string gives an obvious and expected place for documentation to go. Deciding where to put documentation is a serious problem for large programs. Separate programs (eg, a wiki) get outdated way easier than in code. Having to repeat documentation everywhere the value is used is tiresome and gives places for things to get outdated. It's ideal to have just one place to document each class/method/enum value, etc. – Kat Feb 06 '18 at 19:18
  • @Kat What's the difference between a constant variable and a variable constant? :) – tchrist Feb 07 '18 at 02:31
  • @tchrist A constant variable is literally _able_ to vary, yet it remains constant. "variable constant" is oxymoronic. – Monty Harder Feb 07 '18 at 22:55
  • One might add that a global substitution that accidentally changes a string is almost certain not to lead to a compiler warning, while changing a variable probably will. (Not that you would ever do such a thing, of course, but who knows what your colleagues or successors might do.) – PJTraill Feb 08 '18 at 14:38
  • @corsiKa: good point about two parameters which happen to have the same value; in that case using appropriate names also makes it easier to understand the code, as you can see which parameter is intended. – PJTraill Feb 08 '18 at 14:40
  • @MontyHarder "variable constant" is _seemingly_ oxymoronic, but not in practice. This relies on a slightly tweaked definition of the word "variable". Initially, when programming was mainly comprised of functional arithmetic it was pretty much defined as "not a constant", but due to a shift towards OOP, it has evolved to a meaning that is closer to "data value". – Flater Feb 09 '18 at 11:05
  • Another big problem is the issue of transposing characters. Particularly with strings that contain a series of letters and numbers that make no sense from a language perspective. Can you spot the difference between "1l0O" and "l1O0"? Really depends on the font. Similar strings are hard to detect when you have them wrong. – Berin Loritsch Feb 09 '18 at 20:30
  • yet another issue - implies i18n issues - how you gonna localize those magic strings when they are spread throughout your code base – jk. Dec 06 '18 at 12:21
  • Why not use triple equal signs for comparison? like === instead of ==. – Don Dilanga Nov 02 '20 at 12:56
  • @DonDilanga In the code example I wrote in my answer I was writing in C#. It has no concept of ===. – Erdrik Ironrose Nov 02 '20 at 14:45
  • yeah right. in TypeScript it's not an issue. – Don Dilanga Nov 02 '20 at 14:50
  • @DonDilanga I have a small amount of experience with TypeScript. Comments from KRyan and GrandOpener on this answer suggest that although TypeScript generally makes it safer, it may still cause problems, even if at least to people like me, who dabble in TypeScript but are used to code like C# or C++. Also, I find the documentation and find/replace arguments still apply. – Erdrik Ironrose Nov 02 '20 at 15:03
96

The summit of what the other answers have grasped at, is not that "magic values" are bad, but that they ought to be:

  1. defined recognisably as constants;
  2. defined only once within their entire domain of use (if architecturally possible);
  3. defined together if they form a set of constants that are somehow related;
  4. defined at an appropriate level of generality in the application in which they are used; and
  5. defined in such a way as to limit their use in inappropriate contexts (e.g. amenable to type checking).

What typically distinguishes acceptable "constants" from "magic values" is some violation of one or more of these rules.

Used well, constants simply allow us to express certain axioms of our code.

Which brings me to a final point, that an excessive use of constants (and therefore an excessive number of assumptions or constraints expressed in terms of values), even if it otherwise complies with the criteria above (but especially if it deviates from them), may imply that the solution being devised is not sufficiently general or well-structured (and therefore we're not really talking about the pros and cons of constants anymore, but about the pros and cons of well-structured code).

High-level languages have constructs for patterns in lower-level languages which would have to employ constants. The same patterns can also be used in the higher-level language, but ought not to be.

But that may be an expert judgment based on an impression of all the circumstances and what a solution ought to look like, and exactly how that judgment will be justified will depend heavily on the context. Indeed it may not be justifiable in terms of any general principle, except to assert "I am old enough to have already seen this kind of work, with which I am familiar, done better"!

EDIT: having accepted one edit, rejected another, and having now performed my own edit, may I now consider the formatting and punctuation style of my list of rules to be settled once and for all haha!

Steve
  • 6,998
  • 1
  • 14
  • 24
  • 2
    I like this answer. After all "struct" (and every other reserved word) is a magic string to the C compiler. There are good and bad ways of coding for them. – Alfred Armstrong Feb 05 '18 at 16:08
  • 7
    As an example, if someone sees “X := 898755167*Z” in your code, they probably won’t know what it means, and even less likely to know that it’s wrong. But if they see “Speed_of_Light : constant Integer := 299792456” someone will look it up and suggest the correct value (and maybe even a better data type). – WGroleau Feb 05 '18 at 16:14
  • 35
    Some people miss the point completely and write COMMA = "," instead of SEPARATOR = ",". The former doesn't make anything clearer, while the latter states the intended usage and allows you to change the separator later in a single place. – marcus Feb 06 '18 at 13:54
  • 3
    @marcus, indeed! There is of course a case for using simple literal values in-place - for example, if a method divides a value by two, it may be clearer and simpler to simply write `value / 2`, rather than `value / VALUE_DIVISOR` with the latter defined as `2` elsewhere. If you intended to generalise a method handling CSVs, you'd probably want the separator passed in as a parameter, and not defined as a constant at all. But it's all a question of judgment in the context - @WGroleau's example of the `SPEED_OF_LIGHT` is something you'd want to explicitly name, but not every literal needs this. – Steve Feb 06 '18 at 17:27
  • 4
    The top answer is better than this answer if need convincing that magic strings are a "bad thing." This answer is better if you know and accept that they're a "bad thing" and need to find the best way to meet the needs that they serve in a maintainable way. – corsiKa Feb 07 '18 at 15:48
  • 1
    @Steve I agree, if your function is called `average(a, b) = a + b / 2` you really don't need to have a name for the two. The name of the function already sufficiently describes what it means. – CodeMonkey Feb 08 '18 at 14:05
36
  • They are hard to track.
  • Changing all may require changing multiple files in possibly multiple projects (hard to maintain).
  • Sometimes it's hard to tell what their purpose is just by looking at their value.
  • No reuse.
Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
igorc
  • 431
  • 3
  • 8
  • 4
    What does "no reuse" mean? – bye Feb 05 '18 at 15:06
  • 7
    Instead of creating one variable/constant etc. and reuse it across all your project/code you are creating a new string in each one which causes an unnecessary duplication. – igorc Feb 05 '18 at 15:14
  • So point 2 and 4 are the same? – Thomas Feb 06 '18 at 09:55
  • 4
    @ThomasMoors No he's talking about the way you have to build a new string every time you want to use an *already existing* magic string, point 2 is about changing the string itself – Pierre Arlaud Feb 06 '18 at 10:29
27

Real life example: I am working with a third party system wherein "entities" are stored with "fields". Basically an EAV system. As it is fairly easy to add another field, you get access to one by using the field's name as string:

Field nameField = myEntity.GetField("ProductName");

(note the magic string "ProductName")

This can lead to several problems:

  • I need to refer to external documentation to know that "ProductName" even exists and its exact spelling
  • Plus I need to refer to that doc to see what the datatype of that field is.
  • Typos in this magic string will not get caught until this line of code is executed.
  • When someone decides to rename this field on the server (difficult while preventing dataloss, but not impossible), then I cannot easily search through my code to see where I should adjust this name.

So my solution for this was to generate constants for these names, organised by entity type. So now I can use:

Field nameField = myEntity.GetField(Model.Product.ProductName);

It is still a string constant and compiles to the exact same binary, but has several advantages:

  • After I have typed "Model.", my IDE shows just the available entity types, so I can select "Product" easily.
  • Then my IDE supplies just the fieldnames that are available for this type of entity, also selectable.
  • Auto-generated documentation shows what the meaning of this field is plus the datatype that is used to store its values.
  • Starting from the constant, my IDE can find all places where that exact constant is used (as opposed to its value)
  • Typos will be caught by the compiler. This also applies when a fresh model (possibly after renaming or deleting a field) is used to regenerate the constants.

Next on my list: hide these constants behind generated strongly typed classes - then also the datatype is secured.

Hans Kesting
  • 421
  • 3
  • 6
  • +1 you bring up a lot of good points not limited to code structure: IDE support and tooling, which can be a lifesaver in large projects – kmdreko Feb 07 '18 at 19:36
  • If some parts of your entity type is static enough that actually defining a constant name for it is worthwhile, I think it would be more apt to just define a proper data model for it so you can just do `nameField = myEntity.ProductName;`. – Lie Ryan Feb 07 '18 at 20:27
  • @LieRyan - it was much easier to generate plain constants and upgrade existing projects to use them. That said, I *am* working on generating static types so I can do precisely that – Hans Kesting Feb 07 '18 at 20:43
13

Magic strings not always bad, so this might the reason you cannot come up with a blanket reason for avoiding them. (By "magic string" I assume you mean string literal as part of an expression, and not defined as a constant.)

In some particular cases, magic strings should be avoided:

  • The same string appears multiple times in code. This means you could have a spelling error one of the places. And it will be a hassle of the string changes. Turn the string into a constant, and you will avoid this issue.
  • The string may change independent of the code where it appears. Eg. if the string is text displayed to the end user it will likely change independent of any logic change. Separating such string into a separate module (or external configuration or database) will make it easier to change independently
  • The meaning of the string is not obvious from the context. In that case introducing a constant will make the code easier to understand.

But in some cases, "magic strings" are fine. Say you have a simple parser:

switch (token.Text) {
  case "+":
    return a + b;
  case "-":
    return a - b;
  //etc.
}

There is really no magic here, and none of the above described problems apply. There would be no benefit IMHO to define string Plus="+" etc. Keep it simple.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 7
    I think your definition of "magic string" is insufficient, it needs to have some concept of hiding/obscuring/making-mysterious. I wouldn't refer to the "+" and "-" in that counter-example as "magic", any more than I'd refer to the zero as magic in `if (dx != 0) { grad = dy/dx; }`. – Rupe Feb 05 '18 at 17:26
  • 2
    @Rupe: I agree, but the OP uses the definition "*string values that are specified directly within application code that have an impact on the application’s behavior.*" which does not require the string to be mysterious, so this is the definition I use in the answer. – JacquesB Feb 05 '18 at 17:38
  • 11
    With reference to your example, I have seen switch statements which replaced `"+"` and `"-"` with `TOKEN_PLUS` and `TOKEN_MINUS`. Every time I read it I felt like it was harder to read and debug because of it! Definitely a place where I agree that using simple strings is better. – Cort Ammon Feb 05 '18 at 19:49
  • 2
    I do agree that there are times when magic strings are appropriate: avoiding them is a rule of thumb, and all rules of thumb have exceptions. Hopefully, when we are clear about why they *can* be a bad thing, we will be able to make intelligent choices, rather than doing things because (1) we've never understood that there can be a better way, or (2) we've been told to do things differently by a senior developer or a coding standard. – Kramii Feb 05 '18 at 22:02
  • 3
    I don't know what's "magic" here. Those look like basic string literals to me. – tchrist Feb 07 '18 at 02:33
  • Yeah this doesn't quite fit the definition, as they are obvious and unlikely to change. – Ben Power Jan 29 '21 at 00:47
6

To add to existing answers:

Internationalisation (i18n)

If the text to display on screen is hard-coded and buried within layers of functions, you're going to have a very difficult time providing translations of that text into other languages.

Some development environments (e.g. Qt) handle translations by lookup from a base language text string to the translated language. Magic strings can generally survive this - until you decide you want to use the same text elsewhere and get a typo. Even then, it's very hard to find which magic strings need translating when you want to add support for another language.

Some development environments (e.g. MS Visual Studio) take another approach and require all translated strings to be held within a resources database and read back for the current locale by the unique ID of that string. In this case your application with magic strings simply cannot be translated into another language without major rework. Efficient development requires all text strings to be entered into the resources database and given a unique ID when the code is first written, and i18n thereafter is relatively easy. Trying to backfill this after the fact will typically require a very large effort (and yes, I've been there!) so it's much better to do things right in the first place.

Graham
  • 1,996
  • 1
  • 12
  • 11
4

This is not a priority for everyone, but if you ever want to be able to calculate coupling/cohesion metrics on your code in an automated fashion, magic strings make this nearly impossible. A string in one place will refer to a class, method or function in another place, and there is no easy, automatic way to determine that the string is coupled to the class/method/function just by parsing the code. Only the underlying framework (Angular, e.g.) can determine that there is a linkage--and it can only do it at run-time. To obtain the coupling information yourself, your parser would have to know everything about the framework you were using, above and beyond the base language you are coding in.

But again, this is not something a lot of developers care about.