Hashing functions

You can use hashing functions to encode data, transforming input into a hash code or hash value. The hash algorithm is designed to minimize the chance that two inputs have the same hash value, termed a collision.

You can use hashing functions to speed up the retrieval of data records (simple one-way lookups), to validate data (by using checksums), and to perform cryptography. For lookups, the hash code is used as an index into a hash table that contains a pointer to the data record. For checksums, the hash code is computed for the data before storage or transmission and then recomputed afterward to verify data integrity; if the hash codes do not match, the data is corrupted. Cryptographic hashing functions are used for data security.

Some common use cases for hashing functions include the following ones:
  • Detecting duplicated records. Because the hash keys of duplicates hash to the same “bucket” in the hash table, the task reduces to scanning buckets that have more than two records. This is a much faster method than sorting and comparing each record in the file. Also, you can hashing techniques to find similar records: because similar keys hash to buckets that are contiguous, the search for similar records can therefore be limited to those buckets.
  • Locating points that are near each other. Applying a hashing function to spatial data effectively partitions the space that is being modeled into a grid. As in the previous example, the retrieval and comparison time is greatly reduced because only contiguous cells in the grid must be searched. This same technique works for other types of spatial data, such as shapes and images.
  • Verifying message integrity. The hash of message digests is made both before and after transmission, and the two hash values are compared to determine whether the message is corrupted.
  • Verifying passwords. During authentication, the login credentials of a user are hashed, and this value is compared with the hashed password that is stored for that user.