Basic Hash File Operation

If you are familiar with the principles of hashed files you can skip this section.

Hash files work by spreading data over a number of groups within a file. These speeds up data access as you can go to a specific group before sequentially searching through the data it contains for a particular data row. The number of groups you have, the size the groups, and the algorithm used to work out distribution is decided by the nature of the data you are storing in the file.

The rows of data are hashed (that is, allocated to groups) on a key field. The hashing algorithm efficiently and repeatably converts a string to a number in the range 1 to n, where n is the file modulus. This gives the group where the row will be written. The key field can be of any type; for example it could contain a name, a serial number, a date, and so on The type of data in the key determines the best hashing algorithm to use when writing data; this algorithm is also used to locate the data when reading it back. The aim is to use an algorithm that spreads data evenly over the file.

Another aim is to spread the data as evenly as possible over a number of groups. It is particularly important as far as performance goes not to overpopulate groups so that they have to extend into overflow groups, as this makes accessing the data inefficient. It is important to consider the size of your records (rows) when designing the file, as you want them to fit evenly into groups and not overflow.

There is a trade-off between size of group and number of groups. For most operations a good design has many groups each of small size (for example, four records per group). The sequential search for the required data row is then never that long. There might be circumstances, however, where a design would be better served by a smaller number of large groups.