-3

I'm searching an existing hash function or trying to make a hash function that has a lot of collisions.

Regularly a hash is used for it's ability to create unique hashes for hash tables or security purposes, but I desire the opposite. I desire a hash function that has allot of collisions. Preferably a simplistic hash function so finding a collision is easier/faster.

By the definition of a hash function. The hashes also have a predetermined length. I'd like a hash function that has this as a variable.

I'm new to this subject so I'm searching sources to create a hash function myself or candidate's that might fulfill the requirements.

If I have definition or terms wrong, please correct me.

  • 1
    Does this answer your question? [Which hashing algorithm is best for uniqueness and speed?](https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed) – gnat Jul 12 '20 at 12:53
  • No, that question asks for a hash function that produces a lot of unique hashes, I desire the opposite as my question says. A hash function that contains a lot of collisions... – NS - Not the train Jul 12 '20 at 12:55
  • What kind of data are you hashing? If it is strings you could take the first character as your hash code. – Martin Maat Jul 12 '20 at 13:14
  • Im hashing a string, but it's necessary to hash the whole string not just the first character – NS - Not the train Jul 12 '20 at 13:51
  • XOR every byte together into a single byte result and you're done. Make it one bit instead of a byte for extra collisions! – whatsisname Jul 12 '20 at 14:05
  • 5
    Why do you want collisions? The trivial hash function `h(x) = 0` would have a colliding hash for everything. More reasonably, you can take a good hash function and chop off unneeded bits until you get the desired collision rate. – amon Jul 12 '20 at 14:27

2 Answers2

1

The probability of collisions is ratio between how many items you have and what size the output hash is.

If you have 1024 items (10 bits) and hash outputs 8 bits, you would expect 4 items (2^(10-8)) to have same hash and thus 4 collisions.

And there are plenty of hash functions with low amount of output bits. Like a trivial Pearson Hash. And it being trivial and easy to understand, it should be easy to modify it to have variable bit output size.

Euphoric
  • 36,735
  • 6
  • 78
  • 110
  • To clarify, it will be (on average) 4 items for *each* of the 256 possible hash values, not just 4 collisions total. – 8bittree Jul 13 '20 at 16:33
1

Here is an example of a hash function that has the properties you desire:

H(message, outputLength) = 1 << outputLength

  • It is clearly a hash function, since it maps a larger input space to a smaller output space.
  • It has the maximum possible performance for every possible sequence of inputs.
  • It has the maximum possible amount of collisions for every possible sequence of inputs.
Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318