What is instantiation?
Instantiation is the process of reading or specifying information, such as storage type and values for a data field. To optimize system resources, instantiating is a user-directed process—you tell the software to read values by specifying options on the Types tab in a source node or by running data through a Type node.
- Data with unknown types are also referred to as uninstantiated. Data whose storage type and values are unknown are displayed in the Measurement column of the Types tab as <Default>.
- When you have some information about a field's storage, such as string or numeric, the data are called partially instantiated. Categorical or Continuous are partially instantiated measurement levels. For example, Categorical specifies that the field is symbolic, but you don't know whether it is nominal, ordinal, or flag.
- When all of the details about a type are known, including the values, a fully instantiated measurement level—nominal, ordinal, flag, or continuous—is displayed in this column. Note that the continuous type is used for both partially instantiated and fully instantiated data fields. Continuous data can be either integers or real numbers.
During the execution of a data stream with a Type node, uninstantiated types immediately become partially instantiated, based on the initial data values. Once all of the data have passed through the node, all data become fully instantiated unless values were set to <Pass>. If execution is interrupted, the data will remain partially instantiated. Once the Types tab has been instantiated, the values of a field are static at that point in the stream. This means that any upstream changes will not affect the values of a particular field, even if you rerun the stream. To change or update the values based on new data or added manipulations, you need to edit them in the Types tab itself or set the value for a field to <Read> or <Read +>.
When to instantiate
Generally, if your dataset is not very large and you do not plan to add fields later in the stream, instantiating at the source node is the most convenient method. However, instantiating in a separate Type node is useful when:
- The dataset is large, and the stream filters a subset prior to the Type node.
- Data have been filtered in the stream.
- Data have been merged or appended in the stream.
- New data fields are derived during processing.