In code dealing with hash tables, I often find the constant 0x9e3779b9 or sometimes 0x9e3779b1. For example
hash = n * 0x9e3779b1 >>> 24
Why is this particular value used?
In code dealing with hash tables, I often find the constant 0x9e3779b9 or sometimes 0x9e3779b1. For example
hash = n * 0x9e3779b1 >>> 24
Why is this particular value used?
0x9e3779b9
is the integral part of the Golden Ratio's fractional part 0.61803398875… (sqrt(5)-1)/2, multiplied by 2^32.
Hence, if φ = (sqrt(5)+1)/2 = 1.61803398875 is the Golden Ratio, the hash function calculates the fractional part of n * φ, which has nice scattering properties. To convince yourself, just create a scatter plot of (n, n*c-FLOOR(n*c))
in your favourite spreadsheet, replacing c
with φ, e, π, etc. Some interesting real-life issues when getting it wrong are described in https://lkml.org/lkml/2016/4/29/838.
This method is often referred to as "Golden Ratio Hashing", or "Fibonacci Hashing" and was popularised by Donald Knuth (The Art of Computer Programming: Volume 3: Sorting and Searching). In number theoretical terms, it mostly boils down to the Steinhaus Conjecture (https://en.wikipedia.org/wiki/Three-gap_theorem) and the recursive symmetry of the fractional parts of the multiples of the Golden Ratio φ.
Occasionally, you may also see 0x9e3779b1
, which is the prime closest to 0x9e3779b9
(and appears to be a bit of "cargo cult" as this is not a modular hash). Similarly, 0x9e3779b97f4a7c15
and 0x9e3779b97f4a7c55
are the 64 bit equivalents of these numbers.
The other answers explain the intent behind those magic numbers, which is probably what you wanted to know. However one could say that where "they come from" is from bad programming practices. Magic numbers are bad, and they should never be used. Constants such as those mentioned should be given proper descriptive variable names, and perhaps even comments should be added to where they are defined. Then, every appearance of the values in the code should be in the form of the named variable. Where this the case in the codes where you met those values, you would not have been preplexed by their intent in the first place.
example:
Bad example - uses magic numbers
hash = n * 0x9e3779b1
Better example - with comments and meaningful variable
# Golden Ratio constant used for better hash scattering
# See https://softwareengineering.stackexchange.com/a/402543
GOLDEN_RATIO = 0x9e3779b1
hash = n * GOLDEN_RATIO
In code dealing with hash tables, I often find the constant 0x9e3779b9 or sometimes 0x9e3779b1
The other answer correctly explained why this value is used. However, if you often find this constant, what you may not be realizing that you often find code vulnerable to hash flooding attacks.
There are two strategies against hash flooding attacks:
Use a secure hash function having a secret random seed. Your hash function doesn't have a secret random seed. Murmurhash3_32 has a secret random seed, but it has seed-independent multicollisions due to the small internal state. The best hash function having near cryptographical security and still nearly acceptable performance is probably SipHash. Unfortunately, it is slow, although not as slow as SHA512 etc.
Use a hash function that is quick to calculate (such as the hash function you found, or Murmurhash3_32), and make each hash bucket into the root of a balanced binary search tree. So, an ordinary separately chained hash table has each bucket as a linked list, which is slow if lots of values hash to the same bucket. By making it a balanced binary search tree such as AVL tree or red-black tree, you still have guaranteed worst-case performance.
My opinion is that (2) is better because SipHash is so slow. Also, in operating system kernel space there may not be enough entropy to create a secret random seed early in boot-up stage, so in kernel space you may not have the ability to create random numbers early in bootup.
Hash tables are widely misused. It is easy to bring down many systems to a practical halt merely by sending lots of values that hash to the same bucket.