96

I understand that we should use %s to concatenate a string rather than + in Python.

I could do any of:

hello = "hello"
world = "world"

print hello + " " + world
print "%s %s" % (hello, world)
print "{} {}".format(hello, world)
print ' '.join([hello, world])

But why should I use anything other than the +? It's quicker to write concatenation with a simple +. Then if you look at the formatting string, you specify the types e.g. %s and %d and such. I understand it could be better to be explicit about the type.

But then I read that using + for concatenation should be avoided even though it's easier to type. Is there a clear reason that strings should be concatenated in one of those other ways?

Alex
  • 3,228
  • 1
  • 22
  • 25
Niklas Rosencrantz
  • 8,008
  • 17
  • 56
  • 95
  • 33
    Who told you it's better? – yannis Dec 07 '15 at 10:38
  • 4
    `%s` isn't for concatenation, it's a conversion specification for string formatting derived from C's `printf(3)`. There are cases to for using that or a concatenation operator; which you use should be based on judgment of the situation, not dogma. How easy it is to write the code is entirely irrelevant because you're only going to do that once. – Blrfl Dec 07 '15 at 13:33
  • I've refocused the question to *just* python (though I'm not a python person and there might still be glitches in the code). Please make sure that this is the question you are asking, make any appropriate updates and consider asking a *different* question if you are interested in C or Java. –  Dec 07 '15 at 20:46
  • 17
    And now we have the [superior f-strings](https://www.python.org/dev/peps/pep-0498/)! `print(f"{hello} {world}")`, has readability of concatenation since variables are seen where they occur in the string, and is faster than `str.format`. – Enrico Borba Aug 24 '17 at 23:44
  • @EnricoBorba Agreed. The ease of reading interpolated strings is well known. F-strings are similar to the syntax used in `shell`, `perl` and other languages: `echo "$hello $world"`. – jrw32982 Jan 06 '21 at 21:33

5 Answers5

99
  1. Readability. The format string syntax is more readable, as it separates style from the data. Also, in Python, %s syntax will automatically coerce any non str types to str; while concatenation only works with str, and you can't concatenate str with int.

  2. Performance. In Python str is immutable, so the left and right string have to be copied into the new string for every pair of concatenation. If you concatenate four strings of length 10, you will be copying (10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters, instead of just 40 characters. And things gets quadratically worse as the number and size of the string increases. Java optimizes this case some of the times by transforming the series of concatenation to use StringBuilder, but CPython doesn't.

  3. For some use cases, the logging library provide an API that uses format string to create the log entry string lazily (logging.info("blah: %s", 4)). This is great for improved performance if the logging library decided that the current log entry will be discarded by a log filter, so it doesn't need to format the string.

amon
  • 132,749
  • 27
  • 279
  • 375
Lie Ryan
  • 12,291
  • 1
  • 30
  • 41
  • 38
    do you have any scientific or empiric source for #1? Because I think it's much **much** less readable (especially with more than 2 or three arguments) – Lovis Dec 07 '15 at 12:58
  • 6
    @L.Möller: I'm not quite sure what kind of source you expect from what is ultimately a subjective experience (ease of reading), but if you want my reasoning: 1) %s requires 2 extra characters per placeholder vs + requires minimum of 4 (or 8 if you follow PEP8, 13 if you coerce), 2) %s is enclosed in a single string, so it's easier to parse visually, with +, you has more moving parts: close string, operator, variable, operator, open string, 3) syntax coloring %s has one color for each functions: string and placeholder, with + you get three colorings: string, operator, and variable coloring. – Lie Ryan Dec 07 '15 at 13:28
  • 4
    @L.Möller: 4) I have the option to put longer format strings in a variable or dictionary, away from where formatting needs to be done, 5) the format string can be user specified from a config file, command args, or database, the same can't be said with concatenations. But yeah, I also wouldn't use %s when I have more than 4-5 things to interpolate, instead I'd use the %(varname)s variant or "{foo}".format() in Python. I think the explicit names improves readability for longer format strings with lots of interpolated variables. – Lie Ryan Dec 07 '15 at 13:46
  • Thing is, ease of reading is _not_ a (pure) subjective thing. (E.g.: 80 chars per line restriction: our monitors got larger, our range of view didn't; or the fact that that Gestalt principles also hold true to text). That's why I wondered wether you know a source. However, if I have `logging.info("blah: %s %s %s %s %s %s", "a", "b", "c", "d", "e", "f", "a")` it can be easily less readable than the alternative, since you cant even see on the first glance if the number of params is correct. If you use line breaks, you might not see the connection to the method anymore. – Lovis Dec 07 '15 at 13:48
  • 2
    I don't know whats "true", that's why I ask if you have evidence :-). Really agree with your second comment – Lovis Dec 07 '15 at 13:48
  • 1
    @L.Möller: ultimately though, it's my personal experience that I find that generally format string is easier to read than concatenation. If you want to say your experience says otherwise, I'm not going to argue with that. It's your experience, you are free to interpret it as you wish :) – Lie Ryan Dec 07 '15 at 13:49
  • 7
    I find #2 to be suspect - do you have documented proof? I'm not supremely familiar with Java, but in C# *concatenation is faster than string interpolation*. I completely agree with #1 and really rely on that for deciding when to use which, but you have to remember interpolation requires an amount of string parsing and complexity where concatenation requires none of that. – Jimmy Hoffa Dec 07 '15 at 14:02
  • @JimmyHoffa: as I said, Java can optimize some cases of concatenation, so it can be as performant or better than %s in Java. If the format string is a constant, the compiler can in theory parse it at compile time to produce an optimized code which should be just as fast as StringBuilder. I don't know whether Java do this. Now that I think about it more though, I agree with you that the performance claim is probably not true in Java. – Lie Ryan Dec 07 '15 at 14:31
  • @JimmyHoffa: But Python definitely don't optimize repeated concatenations, also in Python % is a single bytecode and the format string is parsed by C, while + involves one byte code each and a quadratic growth behaviour. I think I'm going to remove Java from #2. – Lie Ryan Dec 07 '15 at 14:34
  • Are we trying to compare the performance of `sprintf("%s%s",foo,bar)` with `strcat(foo,bar)`? And logging has ***other*** issues at hand (`logging.info("foo: %s; bar: %s", foo, bar);` is less expensive than `logging.info("foo: " + foo + " bar: " + bar);` ***if*** the logging level is warn or higher and *more* costly if it is info or lower). That said, these are problems that are unclear in the question and leads to problems in the answer trying to guess at the OP's intent. –  Dec 07 '15 at 17:40
  • I've attempted to focus the question to just python in an attempt to make sure that it matches the question that you answered, though there may be some different points you would like to bring up. –  Dec 07 '15 at 20:44
  • @LieRyan: I very much doubt #2, do you have any actual measurements which proves that concatenating with % is faster than + ? – JacquesB Dec 09 '15 at 09:53
  • 1
    @JacquesB: when [formatting an IPv6-like string](https://ideone.com/1BYfQJ), %s is about 25% faster than + concatenation. If you increase n=15 (IPv15?), the gap increases even more that + concatenate takes double the time compared to %s. Even faster than either of these would be a `str.join()` based method, but `str.join()` is inflexible when you have more complex formats. In practice though, whenever you're dealing with strings you probably have to deal with a file or socket, and that will almost always overwhelm any performance gain you get from changing how you concatenate string. – Lie Ryan Dec 09 '15 at 13:12
  • 1
    @JacquesB: PyPy is much smarter than CPython when optimizing code, and are able to do massive optimizations here. In PyPy, the same benchmark, [+ concatenation is more than 40 times faster than using %s](https://ideone.com/ECCP4u), as you would have expected. I'd attribute this to PyPy being able to infer the type of the variables and produce JIT optimized code for the snippet that recognizes that it can avoid quadratic copying. – Lie Ryan Dec 09 '15 at 13:54
  • @LieRyan: Thanks for the timing-code, very cool. But I tried with only three strings as in the original question, and here + is faster than %s. In the most common case, which is concatenating two strings, + is significantly faster. I don't really agree that performance is an important augment either way, since you are only dealing with a small and fixed number of strings. – JacquesB Dec 09 '15 at 14:34
  • #2 is actually not true in CPython: It concatenates in place if possible, see https://stackoverflow.com/questions/4435169/good-way-to-append-to-a-string – balu Apr 02 '17 at 11:46
  • @LieRyan Since you acknowledge that the statement #1 is subjective and cannot be established as fact, perhaps it makes sense to make the other two items #1 and #2, and turn #1 into a statement that is qualified as being an opinion, subjective, or based on experience. – Thomas Carlisle Sep 29 '17 at 15:42
  • @ThomasCarlisle: reading is a subjective experience, as is hearing, thinking, etc. But there are objective facts about these activities, despite the activities themselves being a subjective experience. There are colour combinations that are objectively bad for readability and there are character sequences that are objectively difficult to untangle. I wasn't saying that #1 is a subjective opinion. – Lie Ryan Sep 29 '17 at 16:22
  • For [Java devs](https://stackoverflow.com/a/925444/5934037). I find readability to be the main reason to go with templating instead of concatenation. It also makes easier to look for the strings into the code (when need It). However, I would not find format() to be a big deal for short strings. As is the case of the OP' example. – Laiv Sep 29 '17 at 18:31
53

Am I the only one who reads left to right?

To me, using %s is like listening to German speakers, where I have to wait until the end of a very long sentence to hear what the verb is.

Which of these is clearer at a quick glance?

"your %s is in the %s" % (object, location)

or

"your " + object + " is in the " + location  
  • 20
    Obviously this is subjective, since I find the first one more readable - and easier to write and edit. The second intermingles the text with code which obscures both and adds noise. For example it is easy to get the spaces wrong in the second. – JacquesB Jul 27 '16 at 06:28
  • 8
    @JacquesB I actually think your brain is so familiar with this format that you immediately jump to the brackets and are replacing the words instantly. Technically it isn't left-to-right reading, but that's perfectly fine. I find I do that too, so yes, 1 is easier to read because I know I have to deal with stupid spacing issues before and after the quotes in the second, and that is really slow to work with. – Nelson Sep 29 '16 at 02:37
  • 3
    After `n` decades, my mind works like that too ;-) But I still stand by my answer, the second is clearer and easier to read, therefore to maintain. And that becomes more apparent the more parameters that you have. In the end, if it's a one man show, go with what you are familiar and comfortable with; if it’s a team effort, enforce consistency and code reviews; people can get used to either. – Mawg says reinstate Monica Sep 29 '16 at 07:11
  • 4
    The first one is way more readable for me because it has less "cruft" in the middle of the sentence. It's easier for my eye to glance to the end then it is for my brain to parse out the extra quotes, spaces, and pluses. Of course, I now much prefer Python 3.6 format strings: `f"your {object} is in the {location}"`. – Dustin Wyatt Jan 26 '17 at 23:29
  • 9
    I also find it even harder to read and write when the variable needs to be surrounded with quotes itself. `"your '" + object + "' is in the '" + location + "'"`...I'm not even sure if I got that right just now... – Dustin Wyatt Jan 26 '17 at 23:32
  • That one hit home (+1). I often write code which generates code, so that your example needs to be wrapped in yet another set of quotes :-/ – Mawg says reinstate Monica Jan 27 '17 at 12:16
  • 3
    The best of both worlds would be string interpolation like in ES6's template literals - ex: `\`My name is ${name}\``. Apparently, Python will get string interpolation in version 3.6 (in development): https://www.python.org/dev/peps/pep-0498/ – jbyrd Apr 25 '17 at 23:22
  • 6
    I find `f"your {object} is in the {kitchen}"` (Python 3.6+) to be the most readable. Yay progress. – Graipher Aug 26 '18 at 07:48
  • 1
    @Graipher: Nice. It's just a pity that the switch to string interpolation is only signalled by a single character at the start. Needless opaque, IMO. – Jack Aidley Oct 18 '18 at 12:13
  • 2
    If you're not yet in Python 3.6 (where f-strings are the solution), you can still use the longer but somewhat more readable `'your {object} is in the {location}'.format(object='head', location='sand')` – Peter Nov 23 '18 at 10:31
  • In the post above the concatenation is clear, but expressions are rarely that simple, eg `"Line " + str(i) + " has the values " + "".join(["{0:02X}".format(buf[j]) for j in xrange(i*width, (i + 1)*width)]) + " to replace."`. In that case, in most real cases actually, you don't want your code to be that messy. The same goes for templates. You can have something very readable using named argument and format, `"We got {number:d} beams of {energy:d} eV with an angle of {angle:d} degrees".format(**data[i+self.offset])`. – RedGlyph Aug 17 '19 at 09:44
  • Personally, I don't write such convoluted code. I have many assignment statements, writing each part of the statement to an individual variable, (e.g `join` on one line, then `sort` on the next, etc before combining them. For two reasons, 1) I learned coding decades before one could chain methods (but that's just me) 2) because doing so makes debugging much, much simpler. But, you do have a valid point. Happy coding :-) – Mawg says reinstate Monica Aug 18 '19 at 08:29
12

An example clarifying readability argument:

print 'id: ' + id + '; function: ' + function + '; method: ' + method + '; class: ' + class + ' -- total == ' + total

print 'id: %s; function: %s; method: %s; class: %s --total == %s' % \
   (id, function, method, class, total)

(Note that second example is not only more readable but also easier to edit, you can change the template on one line and list of variables on another)

A separate issue is that %s code also converts to the string, otherwise you have to use str() call which is also less readable than a %s code.

Rainy
  • 314
  • 1
  • 4
9

Using + should not be avoided in general. In many cases is the correct approach. Using %s or .join() are only preferable in particular cases, and it is usually quite obvious when they are the better solution.

In your example you are concatenating three strings together, and the example using + is clearly the simplest and most readable, and therefore the recommended.

%s or .format() are useful if you want to interpolate strings or values in the middle of a larger string. Example:

print "Hello %s, welcome to the computer!" % name

In this case using %s it is more readable since you avoid chopping the first string into multiple segments. Especially if you are interpolating multiple values.

.join() is appropriate if you have a variable size sequence of strings and/or you want to concatenate multiple strings with the same separator.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
3

Since the word order may change in different languages, the form with %s is imperative if you want to properly support the translation of strings in your software.

martjno
  • 149
  • 2