1

There are several cases when I want to switch over a String input. I decided for implementing something like:

public Object doStuff(String param) {
    switch (param.hashCode()) {
    case 1234546:
         break;
    case -18754956:
         break;
    default:
         whatever();
    }
}

It works as intended, but there is this feeling that something can go awry.

What are the possible pitfalls of using that implementation? I am bound to use java 1.5 or 1.6 because corporate dictates so (therefore I cannot upgrade to 1.7 or later).

Is there some better implementation of a switch over Strings other than a monstruous chain of if-then-else (for the purpose of this question, lets assume I cannot do if-then-else over every possible String value of the input)?


related: What is the benefit of switching on Strings in Java 7? but I am asking pre-1.7

  • 3
    Non-equal objects can have the same hashcode.. – Zavior Feb 23 '15 at 16:12
  • _"I cannot do if-then-else over every possible String value of the input"_ ... but you can do a switch on the hash of every possible value of the input? Please explain what do you plan to do in each case (not just `break;`) and I'll give you a solution. – Tulains Córdova Feb 23 '15 at 17:13

2 Answers2

3

When you compare two strings using their hash codes:

  • If two hash codes are different, it means that strings are different,

  • If two hash codes are identical, this doesn't mean anything. The original strings may be identical, or different.

In the same way, a different string length indicates that two strings are different, but if you have the same length, it doesn't mean that two strings are equal.

This is why if you need to perform an action based on the user input, you should use the original string, not its hash code. Having a switch based on the string itself is one way to go. Another alternative you should consider is a map.

The benefit of a map is that you can change it at run time, for instance by defining the different inputs and their corresponding actions in a configuration file. This can also be useful if you are switching over long strings which would be weird hadcoded within a switch statement.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
3

Hash codes are not unique. There are numerous strings which hash to, say, 1234546. If you want to check for the string "foo" and the unrelated, meaningless string "bar" has the same hash, your code will falsely treat "bar" the same as "foo" rather than rejecting it/going to default.

Fixing this is generally possible (check whether the value really equals the intended string in each case), but it's quite a bit of extra code, especially if there is a default case (since you have to transfer control out of the "wrong" case and to the default case). This is how the switch-over-string feature proposal desugars it. It's not pretty.

Hash codes are not meaningful. Someone has to find out the hash code and hard-code them, and the code has more meaningless magic numbers. You can and should of course write the string in a comment next to the integer literal, but it's still a maintenance burden.

Technically, the hash algorithm might also change when the JVM is updated or when the deployment environment is changed, etc. — depending on the JVM, maybe even when the JVM is restarted! And then the code would be silently wrong. This probably won't actually happen (I think the implementation of Java 7 switch-over-string depends on the compiler being able to predict the hashCode value of string literals).

  • 1
    Consider also that hash codes aren't necessarily stable between versions of the runtime (or different runtimes). What works in Java for Hotspot JVM may work differently on another JVM. (Which is addressed in this answer) That said, it appears to be stable - see http://stackoverflow.com/a/785150/289086 –  Feb 23 '15 at 16:24
  • @MichaelT isn't there a contract on the hashCode function documentation to always be that `s[0]*31^(n-1)` ... formula? – Mindwin Remember Monica Feb 23 '15 at 16:28
  • @Ordous `"".hashCode()` is [not a constant expression](http://ideone.com/JA8Kjp), so you still have to hard-code the number. Moving the magic number behind an appropriately-named constant is better, of course, but only marginally so. –  Feb 23 '15 at 16:44
  • @Mindwin the javadoc is a documentation of the current functionality. That functionality has been consistent for a long time, but the documentation and algorithim can be changed without breaking the *code* contract of how hashCode works. I am sure if I dig, I can find differences in the documentation of how the code works between Java 1.2 and 8 that don't break the code contract. –  Feb 23 '15 at 17:00
  • @delnan Whoops, forgot you can't do that in Java. – Ordous Feb 23 '15 at 17:22
  • @Mindwin btw, the implementation of hashCode *has* changed in the past. Java 1.0 and 1.1 had the following algorithm: "In JDK (Java Development Kit) 1.0+ and 1.1+ the hashCode function for long Strings worked by sampling every nth character. This pretty well guaranteed you would have many Strings hashing to the same value, thus slowing down Hashtable lookup." This was changed to a slower version (examining every character) that improved performance by having fewer collisions. The key point being hashCode *did* change in the past and *may* change again in the future. –  Feb 23 '15 at 19:33