Resource estimation

You can estimate and predict the resource utilization of parallel job runs by creating models and making projections in the Resource Estimation window.

A model estimates the system resources for a job, including the amount of scratch space, disk space, and CPU time that is needed for each stage to run on each partition. A model also estimates the data set throughput in a job. You can generate these types of models:

  • Static models estimate disk space and scratch space only. These models are based on a data sample that is automatically generated from the record schema. Use static models at compilation time.
  • Dynamic models predict disk space, scratch space, and CPU time. These models are based on a sampling of actual input data. Use dynamic models at run time.

An input projection estimates the size of all of the data sources in a job. You can project the size in megabytes or in number of records. A default projection is created when you generate a model.

The resource utilization results from a completed job run are treated as an actual model. A job can have only one actual model. In the Resource Estimation window, the actual model is the first model in the Models list. Similarly, the total size of the data sources in a completed job run are treated as an actual projection. You must select the actual projection in the Input Projections list to view the resource utilization statistics in the actual model. You can compare the actual model to your generated models to calibrate your modeling techniques.