63

The plus sign + is used for addition and for string concatenation, but its companion: the minus sign, -, is generally not seen for trimming of strings or some other case other than subtraction. What could be the reason or limitations for that?

Consider the following example in JavaScript:

var a = "abcdefg";
var b = "efg";

a-b == NaN
// but
a+b == "abcdefgefg"
Lesmana
  • 1,559
  • 2
  • 15
  • 18
Digvijay Yadav
  • 685
  • 1
  • 5
  • 10
  • 35
    which "yy" should be removed? – gashach Oct 28 '15 at 18:54
  • 12
    If I go with the behavior of the '+' sign, then the right most makes sense to to. – Digvijay Yadav Oct 28 '15 at 18:56
  • The opposite of string addition (concatenation) would/could be splitting. So, it could be used as an overload for the split operation, although I have never come across it being overloaded in that fashion. But, in general string concatenation is more prevalent than removal in code scenarios. – Jon Raynor Oct 28 '15 at 19:03
  • 46
    It is bad enough that the binary `+` operator is overloaded with the two totally unrelated meanings “numeric addition” and “string concatenation”. Thankfully, some languages provide a separate concatenation operator such as `.` (Perl5, PHP), `~` (Perl6), `&` (VB), `++` (Haskell), … – amon Oct 28 '15 at 19:11
  • @amon: So what do Perl and PHP use for member access, then? – Mason Wheeler Oct 28 '15 at 19:23
  • 2
    @MasonWheeler - PHP uses an arrow `->`, can't speak to Perl – Sam Dufel Oct 28 '15 at 19:25
  • 6
    @MasonWheeler They use `->` (think dereferencing member access in C, since virtual method calls necessarily involve pointer-like indirection). There is no law of language design that requires method calls/member access to use a `.` operator, though it is an increasingly common convention. Did you know that Smalltalk has no method call operator? Simple juxtaposition `object method` is sufficient. – amon Oct 28 '15 at 19:28
  • @amon: In statically typed languages, that's *not a problem*. But go to a dynamically typed language, and for best effect throw in implicit type-conversion, and ... the horror. – Deduplicator Oct 28 '15 at 21:27
  • 20
    Python *does* overload minus, for [set subtraction](https://docs.python.org/3/library/stdtypes.html#set.difference) (and it can be overloaded in user-defined types as well). Python sets also overload most of the bitwise operators for intersection/union/etc. – Kevin Oct 28 '15 at 22:40
  • @amon VB *does* have a separate concatenating operator, but `+` also acts the same way. – RubberDuck Oct 28 '15 at 23:09
  • 1
    -1 because `b = "gy"` makes the answer obvious. – Superbest Oct 29 '15 at 01:47
  • @JonRaynor "The opposite of string addition (concatenation) would/could be splitting" No, because the opposite of splitting would rather be joining than concatenating. – glglgl Oct 29 '15 at 07:07
  • 4
    To avoid the unwarranted waves of abuse you're getting, string `a-b` can be perfectly well-defined: it means trim one trailing `b` from `a`, only if `b` is the rightmost string in `a` (otherwise `a` unchanged). – smci Oct 29 '15 at 09:07
  • 4
    It would support your case further if you noted that Python's `'xy' * 3 == 'xyxyxy'` syntax has been around for about a decade, without objections. – smci Oct 29 '15 at 09:12
  • @Kevin Applied to string + is not commutative – paparazzo Oct 29 '15 at 09:52
  • @gashach: Being the counterpart of the `+` operator, the trailing would be removed. – phresnel Oct 29 '15 at 10:20
  • 1
    @amon - It may be perverse that `+` is used for two different operations, but I still prefer that option to the bizarre use of `<<` for that purpose as seen in C++. At least "add" and "concatenate" are somewhat similar. "left shift" is completely unrelated... – Darrel Hoffman Oct 29 '15 at 15:12
  • How about / as an operator to split strings? `"bbbacccaddd"/"a" == {"bbb","ccc","ddd}` (note,:I am not actually endorsing this) – Mr.Mindor Oct 29 '15 at 20:33
  • Using like terms seems like the standard in javascript, For example, you can use indexOf on arrays, objects and strings! – Adam Fowler Oct 29 '15 at 22:58
  • @DarrelHoffman Once I realized that operator overloading in C++ is essentially a form of domain specification I became perfectly comfortable with the usage of what are normally bit-shift operators for stream I/O. (Of course, you could also think of it as "shifting" characters into/out of streams, but that's just a mnemonic.) – JAB Oct 30 '15 at 13:07
  • @JAB Ah so you're the guy who thought that Boost spirit made perfect sense? ;) – Voo Oct 30 '15 at 19:24
  • @Voo anything makes sense once you grok it. (Of course, grokking it is the hard part.) – JAB Oct 30 '15 at 20:50
  • 1
    In the XBase language family (DBIII, Clipper, FoxPro ...) the '-' is string concatenation with space trimming: "ABC " - "DEF" => "ABCDEF". It made sense in an environment with fixed length, space padded strings. – edc65 Oct 31 '15 at 02:04
  • @Deduplicator, or a language like Java, were `"foo" + anything` is valid, no matter what `anything` is. – Paul Draper Oct 31 '15 at 05:12
  • I was just reading *From Mathematics to Generic Programming* and Stepanov points out that `+` is used for *the* operation that's commutative and associative, and `*` is used for *the* operation that's associative but not commutative. So the monoid of string catenation really should use `*`! Perl 6 uses a `~` and haskell uses a double-plus symbol. – JDługosz Nov 01 '15 at 00:57
  • @DarrelHoffman Fun fact: in the context of iostreams, `<<` is officially the "stream insertion operator", and is *not* the "left shift operator", despite the fact that the compiler treats them as exactly the same thing. – user253751 Nov 01 '15 at 03:52
  • Maybe this is an example of historical baggage, a lot of languages allow string concatenation with `+` even if no other operator overloading is supported. I can think of the Metrowerks Pascal compiler and a few BASIC variants off the top of my head that allow `+` for strings even though every other mathematical operator is limited to numeric data-types. – Michael Shopsin Nov 02 '15 at 20:49
  • Using mathematical symbol as a string concatenation operator is confusing. Its almost like using + to describe reproduction of humans: 1+1=3 – user3123061 Nov 03 '15 at 09:23

6 Answers6

116

In short, there aren’t any particularly useful subtraction-like operations on strings that people have wanted to write algorithms with.

The + operator generally denotes the operation of an additive monoid, that is, an associative operation with an identity element:

  • A + (B + C) = (A + B) + C
  • A + 0 = 0 + A = A

It makes sense to use this operator for things like integer addition, string concatenation, and set union because they all have the same algebraic structure:

1 + (2 + 3) == (1 + 2) + 3
1 + 0 == 0 + 1 == 1

"a" + ("b" + "c") == ("a" + "b") + "c"
"a" + "" == "" + "a" == "a"

And we can use it to write handy algorithms like a concat function that works on a sequence of any “concatenable” things, e.g.:

def concat(sequence):
    return sequence.reduce(+, 0)

When subtraction - gets involved, you usually talk about the structure of a group, which adds an inverse −A for every element A, so that:

  • A + −A = −A + A = 0

And while this makes sense for things like integer and floating-point subtraction, or even set difference, it doesn’t make so much sense for strings and lists. What is the inverse of "foo"?

There is a structure called a cancellative monoid, which doesn’t have inverses, but does have the cancellation property, so that:

  • A − A = 0
  • A − 0 = A
  • (A + B) − B = A

This is the structure you describe, where "ab" - "b" == "a", but "ab" - "c" is not defined. It’s just that we don’t have many useful algorithms that use this structure. I guess if you think of concatenation as serialisation, then subtraction could be used for some kind of parsing.

Jon Purdy
  • 20,437
  • 7
  • 63
  • 95
  • 2
    For sets (and multi-sets) subtraction makes sense, because unlike sequences, the order of the element doesn't matter. – CodesInChaos Oct 28 '15 at 23:16
  • @CodesInChaos: I added a mention of them, but I wasn’t really comfortable putting sets as an example of a group—I don’t believe they form one, as you can’t generally construct the inverse of a set. – Jon Purdy Oct 29 '15 at 01:38
  • 12
    Actually, the `+` operation is also commutative for numbers, i.e. `A+B == B+A`, which makes it a *bad* candidate for string concatenation. This, plus the confusing operator precedence renders using `+` for string concatenation a historical mistake. However, it’s true that using `-` for whatever string operation made things much worse… – Holger Oct 29 '15 at 09:12
  • 1
    @Holger: Hmm, I agree in part. Perhaps `*` or something would be better, as people familiar with linear algebra will already know that multiplication is not necessarily commutative. But there is also a strong appeal to natural language—“plus” has a strong association with “and”, even though for many monoids (and as far as operator precedence goes) it’s more like an “or” operation: addition is like bitwise union with carry, for example. – Jon Purdy Oct 29 '15 at 10:08
  • Maybe a custom operator would be better. But that depends on the language. E.g. when Java was designed, there were unused characters and defining a custom operator would have made much more sense than defining one single exception to the otherwise absent operator overloading. For C++, things are much different… – Holger Oct 29 '15 at 10:20
  • That pseudo code looks awfully Pythonic. I was thinking "I thought `reduce` was a function not a method! And how can you use `+` as a function?" –  Oct 29 '15 at 10:22
  • @spudowiar: Hah, you’ve got me there. Figured it would be a readable notation, without using any particular language. – Jon Purdy Oct 29 '15 at 10:39
  • As a note, there's *at least* one language where string concatenation has a dedicated operator: PHP with its `.` operator. And I remember some other language using `~` but I can't remember which one. – Darkhogg Oct 29 '15 at 12:44
  • 2
    @Darkhogg: Right! PHP borrowed `.` from Perl; it’s `~` in Perl6, possibly others. – Jon Purdy Oct 29 '15 at 13:03
  • @Darkhogg Haskell has `++` for concatenation. It works on any list and a string is just a list of characters. It also has ``\\``, which removes the first occurence of every element in the right argument from the left argument. – John Dvorak Oct 29 '15 at 14:32
  • What about "blahblah.txt" - ".txt" to leave "blahblah", that would be useful – Martin Beckett Oct 29 '15 at 15:27
  • 1
    @MartinBeckett but you can see that the behaviour might be confusing with `.text.gz.text`... – Boris the Spider Oct 29 '15 at 15:46
37

Because concatenation of any two valid strings is always a valid operation, but the opposite is not true.

var a = "Hello";
var b = "World";

What should a - b be here? There's really no good way to answer that question, because the question itself isn't valid.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 1
    I might be wrong on this but, I believe removing 5 mangoes from 5 apples should not effect the apples. Definitely a lot of research has to be done before implementing such logic, but once clarified the usage guidelines can be provided. – Digvijay Yadav Oct 28 '15 at 18:59
  • 31
    @DigvijayYadav, if you remove 5 mangoes from 5 apples does there have to then be a counter of -5 mangoes? Does it do nothing? Can you define this well enough that it can be broadly accepted and put into all compilers and interpreters of languages to use this operator in this form? That is the big challenge here. – JB King Oct 28 '15 at 19:17
  • 1
    If I can think of it this way: Consider both the strings as sets of characters and to subtract one from another will remove only if the second string is present as a substring in the first set(may be with a restriction of subtracting from the right end of the first one) and everything else is intact. – Digvijay Yadav Oct 28 '15 at 19:30
  • 29
    @DigvijayYadav: So you just described two possible ways to implement this, and there's a good argument to consider each one as valid, so we're already making a mess of the idea of specifying this operation. :P – Mason Wheeler Oct 28 '15 at 19:34
  • 1
    What should `a - b` be here? It should be simply `a`, since there was no trailing `b` to be trimmed from `a`. Simple. (Somewhat analogously, what should `5 + False` be? `5`) – smci Oct 29 '15 at 09:02
  • 14
    @smci Seems to me `5 + False` should obviously be an *error*, since a number is not a boolean and a boolean is not a number. – Mason Wheeler Oct 29 '15 at 10:32
  • 1
    If you remove 5 mangoes from 5 apples, you get ppl. If you remove 5 apples from 5 mangoes, you get mngo. – John Dvorak Oct 29 '15 at 13:01
  • @mason welcome to the Haskell school of thinking. – John Dvorak Oct 29 '15 at 13:04
  • 7
    @JanDvorak: There's nothing particularly "Haskelly" about that; that's basic strong typing. – Mason Wheeler Oct 29 '15 at 13:42
  • 5
    @DigvijayYadav So `(a+b)-b = a` (hopefully!), but `(a-b)+b` is sometimes `a`, sometimes `a+b` depending on if `b` is a substring of `a` or not? What madness is this? –  Oct 30 '15 at 14:45
  • @MasonWheeler Well, that's 5 in C++! :P – cubuspl42 Oct 30 '15 at 18:34
28

Because the - operator for string manipulation does not have enough "semantic cohesion." Operators should only be overloaded when it is absolutely clear what the overload does with its operands, and string subtraction doesn't meet that bar.

Consequently, method calls are preferred:

public string Remove(string source, string toRemove)
public string Replace(string source, string oldValue, string newValue)

In the C# language, we use + for string concatenation because the form

var result = string1 + string2 + string3;

instead of

var result = string.Concat(string1, string2, string3);

is convenient and arguably easier to read, even though a function call is probably more "correct," from a semantic standpoint.

The + operator can really only mean one thing in this context. This isn't as true for -, since the notion of subtracting strings is ambiguous (the function call Replace(source, oldValue, newValue) with "" as the newValue parameter removes all doubt, and the function can be used to alter substrings, not just remove them).

The problem, of course, is that the operator overload is dependent on the types being passed to the operator, and if you pass a string where a number should have been, you may get a result you didn't expect. In addition, for many concatenations (i.e. in a loop), a StringBuilder object is preferable, since each use of + creates a brand new string, and performance can suffer. So the + operator isn't even appropriate in all contexts.

There are operator overloads that have better semantic cohesiveness than the + operator does for string concatenation. Here's one that adds two complex numbers:

public static Complex operator +(Complex c1, Complex c2) 
{
    return new Complex(c1.real + c2.real, c1.imaginary + c2.imaginary);
}
Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • 8
    +1 Given two strings, A and B, I can think of A-B as "remove a trailing B from the end of A," "remove an instance of B from somewhere in A," "remove all instances of B from somewhere in A," or even "remove all characters found in B from A." – Cort Ammon Oct 29 '15 at 01:02
8

The Groovy language does allow -:

println('ABC'-'B')

returns:

AC

And:

println( 'Hello' - 'World' )

returns:

Hello

And:

println('ABABABABAB' - 'B')

returns:

AABABABAB
Wim Deblauwe
  • 321
  • 2
  • 7
  • 12
    Interesting - so it chooses to remove the first occurrence? A good example for a completely counter-intuitive behavior. – Hulk Oct 29 '15 at 11:23
  • 9
    Hence, we have that `('ABABABABA' + 'B') - 'B'` is nowhere near the same as the starting value `'ABABABABA'`. – user Oct 29 '15 at 13:21
  • 3
    @MichaelKjörling OTOH, `(A + B) - A == B` for every A and B. Can I call that a left subtraction? – John Dvorak Oct 29 '15 at 14:39
  • 2
    Haskell has `++` for concatenation. It works on any list and a string is just a list of characters. It also has ``\\``, which removes the first occurence of every element in the right argument from the left argument. – John Dvorak Oct 29 '15 at 14:42
  • 1
    Just found out you can even do minus with a regex in Groovy: http://mrhaki.blogspot.be/2009/11/groovy-goodness-remove-parts-of-string.html – Wim Deblauwe Oct 29 '15 at 14:43
  • 3
    I feel like these examples are exactly why there should be no minus operator for strings. It's inconsistent and not intuitive behavior. When I think of "-" I sure don't think, "remove the first instance of the matching string, if it occurs, otherwise just do nothing." – enderland Oct 29 '15 at 17:18
  • While an interesting example of a language that *does* implement `-` on strings, this answer fails to actually answer the question of why most languages *don't* do this. –  Oct 29 '15 at 18:54
  • @MichaelT i agree, but it was too much to put in a comment – Wim Deblauwe Oct 29 '15 at 18:56
6

The plus sign probably contextually makes sense in more cases, but a counter-example (perhaps an exception that proves the rule) in Python is the set object, which provides for - but not +:

>>> set('abc') - set('bcd')
set(['a'])
>>> set('abc') + set('bcd')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'set' and 'set'

It doesn't make sense to use the + sign because the intention could be ambiguous - does it mean set intersection or union? Instead, it uses | for union and & for intersection:

>>> set('abc') | set('bcd')
set(['a', 'c', 'b', 'd'])
>>> set('abc') & set('bcd')
set(['c', 'b'])
Aaron Hall
  • 5,895
  • 4
  • 25
  • 47
  • 2
    This is more likely because set subtraction is defined in math, but set addition is not. – user541686 Oct 29 '15 at 09:28
  • The use of "-" seems dodgy; what's really needed is a "but not" operator which would also be useful when performing bitwise arithmetic with integers. If 30 ~& 7 were 24, then using ~& with sets would fit nicely with & and | even though sets lack a ~ operator. – supercat Oct 29 '15 at 22:00
  • 1
    `set('abc') ^ set('bcd')` returns `set(['a', 'd'])`, if you're asking about the symmetric difference. – Aaron Hall Oct 29 '15 at 22:35
3

"-" is used in some compound words (for example, "on-site") for joining the different parts into the same word. Why don't we use "-" for joining different strings together in programming languages? I think it would make perfect sense! To hell with this + nonsense!

However, let's try looking at this from a bit more abstract angle.

How would you define string algebra? What operations would you have, and what laws would hold for them? What would their relations be?

Remember, there may be absolutely no ambiguity! Every possible case must be well defined, even if it does mean saying it is not possible to do this! The smaller your algebra is, the easier this is done.

For example, what does it actually mean to add or subtract two strings?

If you add two strings (for example, let a = "aa" and b = "bb"), would you get aabb as the result of a + b?

How about b + a? Would that be bbaa? Why not aabb? What happens if you subtract aa from the result of your addition? Would your string have a concept of negative amount of aa in it?

Now go back to the beginning of this answer and substitute spaceshuttle instead of the string. To generalize, why is any operation defined or not defined for any type?

The point I'm trying to make is, that there is nothing stopping you from creating an algebra for anything. It might be hard to find meaningful operations, or even useful operations for it.

For strings, concatenating is pretty much the only sensible one I've ever come across. Doesn't matter what symbol is used to represent the operation.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Zavior
  • 1,334
  • 2
  • 11
  • 18