7

Why do programming languages like Ruby use symbols? I understand that String manipulation is much slower than using a lookup table as well as the idea that Strings are reallocated in memory no matter if it is the same or different as one used previously, but can't interpreters compensate for this? It would seem like an interpreter still has to parse the word that you typed in order to match it to a symbol, so why not just do the same with a string object?

For instance, why doesn't the compiler take:

myHash["myKey"] = ...

and treat it as

myHash[:myKey] = ...

behind the scenes, anyways? Even if the key is dynamic - it's an interpreter, so should it not know what the key is going to be before it finds the value and still treat the string key as a symbol? eg.:

concatMe = "Key"
myHash["my" + concatMe] = ...

How come an interpreter can't still treat this as

myHash[:myKey]

If it knows what

"my" + concatMe

is before it finds the value by key?

AndrewKS
  • 1,073
  • 9
  • 16
  • 2
    Your statement `that Strings are reallocated in memory no matter if it is the same or different as one used previously` is incorrect. Many languages support string pools, which is what you're describing. See http://en.wikipedia.org/wiki/String_interning. – Tesserex Apr 26 '11 at 16:27
  • Thanks for the info. I was searching on this topic earlier and found this: http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols. He mentions that the memory ceiling is hit because the same string is reallocated. Is he incorrect or does Ruby not utilize string pooling? And if it does, it would seem like even more of a reason not to use symbols. – AndrewKS Apr 26 '11 at 16:29
  • 1
    I'm no Ruby expert by any means, but I would guess that in this case, the distinction is explicit. Symbols basically are interned strings. In some languages, it's done for you, in Ruby, you have to specify that you want interning. So if you don't use symbol syntax, it's reallocated. No guarantees that what I just said is correct. – Tesserex Apr 26 '11 at 16:31
  • Does the details of symbols in Ruby specifically matter, or is this about immutable vs. mutable strings in general? (And how do atoms fit in? There is no such thing in Ruby afaik, I only know about atoms in lisp, and these are broader than just symbols vs. strings)? –  Apr 26 '11 at 16:47
  • In general... unless the way things are done in Ruby have a different rationale. – AndrewKS Apr 26 '11 at 16:49
  • Internally, symbols are lookups in ruby, they actually aren't "strings". A ruby symbol is an erlang atom or in java a ruby symbol is more like using integer constant fields (you know public static final MY_SYMBOL=1). This is very efficient for switch/case and boolean operations (even more than using string pools in almost all cases), but once you start to coerce them into strings, you start to lose the benefits. – Mainguy Sep 03 '11 at 19:36
  • Note, using your initial question... what would the symbol for "my symbol" be? ':my symbol' would be a fail right? – Mainguy Sep 03 '11 at 19:40

1 Answers1

7

TD;DR: Strings are mutable. Symbols are not. Strings and symbols serve different purposes.

an interpreter still has to parse the word that you typed in order to match it to a symbol

:foo == "foo" could be determined by interning the string or turning the symbol into a string. In any event, if the interpreter interned every string it saw, it would have to do a lot of extra work when those strings are mutated, a poor tradeoff. It would also be unable to garbage collect those strings, which would be totally unperformant. In fact, interning all strings to symbols would be far less performant than the current behavior.

Ruby does not use string pooling. You can tell this pretty easily by creating a large number of the same string and profiling the interpreter's memory usage. However, such implementation details are very low on the list of tensions you should consider when deciding to use a string or a symbol.

I understand that String manipulation is much slower than using a lookup table

What does "much slower" mean to you? Are sub-microsecond timings "much slower"? Because that's what we're talking about. Use strings and symbols where appropriate, not based on some imagined performance concern with no real-world impact except in pathological cases.

as well as the idea that Strings are reallocated in memory no matter if it is the same or different as one used previously, but can't interpreters compensate for this?

Yes, and they are also garbage collected when no longer referenced. Symbols are never garbage collected. It's a tradeoff.

In many languages (such as Erlang, which uses 'atoms'), strings are actually just lists of characters (or integers). In these languages, interning all strings into symbols internally would be even more cost prohibitive.

Rein Henrichs
  • 13,112
  • 42
  • 66
  • 1
    This answer seems dangerously Ruby-centric to me. There's languages where *all* Strings are immutable, that doesn't make them all Symbols. So I don't know if there is a really good language-agnostic definition of Symbols- it depends on each language. Generally they are used for identifiers/keywords/names. – andy Jan 22 '13 at 16:38