IBM® InfoSphere DataStage Hash Files

There are two basic types of hash file that you might use in these circumstances: static (hash) and dynamic.

Static Files. These are the most performant if well designed. If poorly designed, however, they are likely to offer the worst performance. Static files allow you to decide the way in which the file is hashed. You specify:
- Hashing algorithm. The way data rows are allocated to different groups depending on the value of their key field or fields.
- Modulus. The number of groups the file has.
- Separation. The size of the group as the number of 512-byte blocks.
  Generally speaking, you should use a static file if you have good knowledge of the size and shape of the data you will be storing in the hashed file. You can restructure a static hashed file between job runs if you want to tune it. Do this using the RESIZE command, which can be issued using the Command feature of the Administrator client. The command for resizing a static file is:
```
RESIZE filename [type] [modulus] [separation]
```
  Where:
  
  filename is the name of the file you are resizing
  
  type specifies the hashing algorithm to use (see Hash File Design)
  
  modulus specifies the number of groups in the range 1 through 8,388,608.
  
  separation specifies the size of the groups in 512 byte blocks and is in the range 1 through 8,388,608.
Dynamic Files. These are hash files which change dynamically as data is written to them over time. This might sound ideal, but if you leave a dynamic file to grow organically it will need to perform several group split operations as data is written to it, which can be very time consuming and can impair performance where you have a fast growing file. Dynamic files do not perform as well as a well-designed static file, but do perform better than a badly designed one. When creating a dynamic file you can specify the following information (although all of these have default values):
- Minimum modulus. The minimum number of groups the file has. The default is 1.
- Group size. The group can be specified as 1 (2048 bytes) or 2 (4096 bytes). The default is 1.
- Split load. This specifies how much (as a percentage) a file can be loaded before it is split. The file load is calculated as follows:
```
File Load = ((total data bytes) / (total file bytes)) * 100
```
  The split load defaults to 80.
- Merge load. This specifies how small (as a percentage) a file load can be before the file is split. File load is calculated as for Split load. The default is 50.
- Large record. Specifies the number of bytes a record (row) can contain. A large record is always placed in an overflow group.
- Hash algorithm. Choose between GENERAL for most key field types and SEQ.NUM for keys that are a sequential number series.
- Record size. Optionally use this to specify an average record size in bytes. This can then be used to calculate group size and large record size.
  You can manually resize a dynamic file using the RESIZE command issued using the Command feature of the Administrator client. The command for resizing a dynamic file is:
```
RESIZE filename [parameter [value]]
```
  where:
  
  filename is the name of the file you are resizing.
  
  Parameter is one of the following and corresponds to the arguments described above for creating a dynamic file:
```
GENERAL | SEQ.NUM
MINIMUM.MODULUS  n
SPLIT.LOAD  n
MERGE.LOAD  n
LARGE.RECORD  n
RECORD.SIZE  n
```

By default InfoSphere® DataStage® will create you a dynamic file with the default settings described above. You can, however, use the Create File options on the Hashed File stage Inputs page to specify the type of file and its settings.

This offers a choice of several types of hash (static) files, and a dynamic file type. The different types of static files reflect the different hashing algorithms they use. Choose a type according to the type of your key, as shown below:

Type: Suitable for keys that are formed like this:
2: Numeric - significant in last 8 chars
3: Mostly numeric with delimiters significant in last 8 chars
4: Alphabetic significant in last 5 chars
5: Any ASCII significant in last 4 chars
6: Numeric significant in first 8 chars
7: Mostly numeric with delimiters significant in first 8 chars
8: Alphabetic significant in first 5 chars
9: Any ASCII significant in first 4 chars
10: Numeric significant in last 20 chars
11: Mostly numeric with delimiters significant in last 20 chars
12: Alphabetic significant in last 16 chars
13: Any ASCII significant in last 16 chars
14: Numeric whole key is significant
15: Mostly numeric with delimiters whole key is significant
16: Alphabetic whole key is significant
17: Any ASCII whole key is significant
18: Any chars whole key is significant