How do you hash 4 doubles into a size_t?

Question

I have bounding boxes the key type.

Box {
  double mins[2];
  double maxs[2];
}

And I want to have Box as the key type in the D programming language, so I have to implement:

size_t toHash() const @safe pure nothrow {
   size_t hash;
   for(size_t k=0; k < 2; k++) {
       // do something here
   }
   return hash;
}

Should I have a linked list on the output of the associative array and search through the list of there's a collision? Should I find some reasonable bound of double that is in my application then come up with some formula by shifting?

Not sure what to do here.

[Sharing your research helps everyone](http://meta.programmers.stackexchange.com/questions/6559/why-is-research-important). Tell us what you've tried and why it didn’t meet your needs. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer. Also see [ask] — gnat, Feb 11 '16 at 15:35
There are some well-known algorithms for hashing data, including variable-length data. What have you tried? — , Feb 11 '16 at 16:03
Are you sure you want to hash doubles? Unless you're *really* careful, floating point precision issues will cause trouble at some point. — CodesInChaos, Feb 11 '16 at 17:06
What are you actually trying to do with this hash? Are you worried about hash collisions? or rectangle collisions? — , Feb 11 '16 at 17:56

score 3 · Answer 1 · edited May 23 '17 at 12:40

Hashing makes only sense in conjunction with an equality comparison. But if types contain double values like in your example, any sensible equality comparison should work with an "epsilon", something like

   bool Equal(Box b1,Box b2, double epsilon)
   {
      return fabs(b1.mins[0]-b2.mins[0])<epsilon &&
             fabs(b1.mins[1]-b2.mins[1])<epsilon &&
             fabs(b1.maxs[0]-b2.maxs[0])<epsilon &&
             fabs(b1.maxs[1]-b2.maxs[1])<epsilon;
   }

However, this makes hashing pointless, because now "equal" boxes according to this comparison are not guaranteed to produce the same hash code if they are just "almost" equal (for any sensible hash function, as long as you do not implement the hashing function trivially by returning always 0).

Hashing something like double coordinates starts only making sense when you can map them to a discrete range, which means you have some kind of mapping from double to int. And as soon as you have integer coordinates, it should be easy to create a hash value for them (for example, like shown here).

Floating point is hard. An equality relationship that is not transitive does not seem sensible enough to me to suggest that other choices are never sensible (yes, I know about NaN). Suggesting that a systematic use of an epsilon will help is misleading. (Especially that, as far as my experience is generalizable, epsilon should be used as a relative value). — AProgrammer, Feb 11 '16 at 17:49
@AProgrammer: which approach is right or wrong depends on the problem context which we don't know since the OP "forgot" to tell us. But trying to hash double values is often a sign of solving the wrong problem, which is what I tries to demonstrate with my answer. However, this question is not so bad that it cannot still serve as a bad example. — Doc Brown, Feb 11 '16 at 19:49

score 2 · Accepted Answer · edited May 23 '17 at 12:40

If you can generate a guaranteed unique hash, then do so. However this means your box coordinates must be relatively small (eg if a box can be between 0 and 1, in increments of 0.1 then you have 10 steps for each co-ordinate, making a possible 10^4 unique values, assuming no 2 boxes can overlap)

In most cases you will not be able to guarantee a unique hash, so generate what you can and understand your hashes may collide. If this case, it depends what you're trying to do - I think you'd want to store these clashing hashes in a list similar collection and when reading a box, iterate through all clashes to find the exact one you wanted. The point is that you can quickly skip nearly all the un-needed boxes to get to the one you wanted.

SO has a good explanation of what hash tables are for and how they're used.

In your case, you might want to consider a single corner to identify your boxes if they don't overlay each other, the size of the box is irrelevant in such a case. Whatever you do, try to make calculating the hash as fast as possible as it will be calculated often.

How do you hash 4 doubles into a size_t?

2 Answers2