# Test data generator

1. **Create** a custom _seed file_
   * Input: Use the pre-created seed file (`testfile_50_100k` - that Storage Protect will compress at 50% and target a 100KB average extent size)
   * Run `create_seed_file.pl` to generate other `seed file` with different compression levels and target deduplication extents sizes
   * For example, to generate a `seed file` named testfile_44_100k that is slightly larger than 1MB, will compress approximately 44%, and result in an average deduplication extent size around 100KB in IBM Storage Protect, use the following command:
      ```
      perl create_seed_file.pl testfile_44_100k 1179648 44 102400 
      ```
     > Note: IBM Storage Protect deduplication uses variable length extent sizes, and performance is typically better for data which results in larger average extent sizes.  Suggested values to use for the extent size parameter: </br>
     >    Typical average: 102400 (100KB)     </br>
     >    Low extreme:     51200  (50KB)      </br>
     >    High extreme:    307200 (300KB)     </br>

1. **Generate** _test data files_ in the `/benchdata` folder 
   * Input: Use the seed file that was generated by the `create_seed_file.pl` script.
   * Generate the _test data file_ in the `/benchdata` folder using the `create_data_files.pl` script.  
     * By default, 128 files will be created allowing tests with up to 128 sessions.  
     * Use `filecount` parameter, to generate more or less test files.
   * For example,
     * generate benchdata with 128 files:
	    ```
        perl create_data_files.pl testfile_44_100k 
	    ```
     * generate benchdata with 200 files:
	    ```
        perl create_data_files.pl testfile_44_100k 200
	    ```

1. **Uniquify** the _test data files_ 
   * Input: Use the _test data file_ generated by `create_data_files.pl`, in the `/benchdata` folder
   * Run the `uniquify_data_files.pl` script to make the _test data files_ unique.
   * The `uniquify_data_files.pl` script ensure that `SP Load Generator tool` will not have a 100% de-duplication result, in every iteration.  
     * By default, 128 test files will be updated.  
     * Use the `objcount` parameter, to change the number of test files.
   * For example,
     * uniquify benchdata with 128 files:
	   ```
       perl uniquify_data_files.pl
	   ```
     * uniquify benchdata with 200 files:
	   ```
       perl uniquify_data_files.pl 200 
	   ```
---