Non-Cryptographic Hash Functions for Routing

Non-Cryptographic Hash Functions

Non-cryptographic hash functions take a string as input and compute an integer output. The desirable property of a hash function is that the outputs are evenly distributed across the domain of possible outputs, especially for inputs that are similar. Unlike a cryptographic hash function, these functions are not designed to withstand an effort by an attacker to find a collision. Cryptographic hash functions have this property, but are much slower: SHA-1 is on the order of 0.09 bytes/cycle whereas the newest non-cryptographic hash functions are on the order of 3 bytes/cycle. So non-cryptographic hashes are roughly 33x faster, at the cost of not being able to withstand attacks. Non-cryptographic hashes are most often used for hash tables.

Since there are lots of options out there for non-cryptographic hash functions and this number keeps expanding, I thought I’d summarize my knowledge of what is out there.

Bob Jenkins’ Functions

Bob Jenkins has been working on hash functions for 15 years or so. In 1997 he published an article about hash functions in Dr. Dobbs Journal; the article is available now on the web with more content added since its original publication: A hash function for hash Table lookup. In this article Bob has an extensive catalog of existing hash functions, as well as presenting his own called “lookup2.” Bob subsequently published lookup3 in 2006, which for the purposes of this article I will consider the first “modern” hash function, in the sense that it is both fast (0.5 bytes/cycle, according to Bob) and free of any serious flaws.

More information about Bob’s functions can be found on Wikipedia: Jenkins hash function.

Second generation: MurmurHash

In 2008 Austin Appleby published a new hash function called MurmurHash. In its most recent version it is roughly 2x the speed of lookup3 (so roughly 1 byte/cycle), and it comes in both 32 and 64-bit versions. The 32-bit version uses only 32-bit math and gives you a 32-bit hash, the 64-bit version uses 64-bit math and gives a 64-bit hash. According to Austin’s analysis it has excellent properties, though Bob Jenkins says in his expanded Dr. Dobbs article “I can see [MurmurHash is] weaker than my lookup3, but I don’t by how much, I haven’t tested it.” MurmurHash quickly became popular thanks to its excellent speed and statistical properties.

Third generation: CityHash and SpookyHash

In 2011 two hash functions were released that both improve on MurmurHash due largely to greater instruction-level parallelism. Google released CityHash (written by Geoff Pike and Jyrki Alakuijala) and Bob Jenkins released a new hash of his own, SpookyHash (so named because it was released on Halloween). Both functions are on the order of 2x the speed of MurmurHash, but both functions use 64-bit math and have no 32-bit version, and CityHash depends on the CRC32 instruction that is present in SSE 4.2 (Intel Nehalem and later) for its speed. SpookyHash gives you 128-bit output, whereas CityHash has 64-bit, 128-bit, and 256-bit variants.

Which function is best/fastest?

To conclude, MurmurHash still looks like the best option if you need 32-bit or aligned-only reads. CityHash and SpookyHash look to be faster on x86-64.

Applications of non-cryptographic hash functions

Data Retrieval: These functions are often used in hash tables for efficient data retrieval. It’s like having a magic map that leads you straight to the hiding spot!
Error Detection: They can be used to detect errors in data storage and transmission. Imagine if a hide-and-seek player could instantly tell if someone’s moved from their hiding spot. That’s what these hash functions can do.
Database Indexing: In databases, non-cryptographic hash functions create indexes to speed up data access. It’s as if you had a super fast player who could locate all the hiding spots in the blink of an eye.
File or Data Comparison: They can also be used to compare files or data quickly. It’s like having a player who can instantly tell if two hiding spots are the same.

Benefits of using non-cryptographic hash functions

Speed: One of the top benefits of non-cryptographic hash functions is their speed. They process data faster than a cheetah chasing its dinner! This makes them perfect for applications that need quick results, like searching for information in a database.
Consistency: They also provide consistency. If you use the same hash function on the same data, you’ll always get the same result. It’s like your trusty old compass, always pointing you in the right direction.
Reduced Storage: Non-cryptographic hash functions can reduce storage needs. They turn data into a fixed size, no matter how large the original data was. It’s like having a magical suitcase that can fit everything you own, yet it remains the same size.
Easy to Implement: Lastly, they’re generally easy to implement. So, even if you’re not a wizard at coding, you can still explore non-cryptographic hash functions and reap their benefits.

Prasenjit Manna

Contact

Search

Tags