Node Caches
To optimize stream running, you can set up a cache on any nonterminal node. When you set up a cache on a node, the cache is filled with the data that passes through the node the next time you run the data stream. From then on, the data is read from the cache (which is stored on disk in a temporary directory) rather than from the data source.
Caching is most useful following a time-consuming operation such as a sort, merge, or aggregation. For example, suppose that you have a source node set to read sales data from a database and an Aggregate node that summarizes sales by location. You can set up a cache on the Aggregate node rather than on the source node because you want the cache to store the aggregated data rather than the entire data set.
Nodes with caching enabled are displayed with a small document icon at the top right corner. When the data is cached at the node, the document icon is green.
To Enable a Cache
- On the stream canvas, right-click the node and click Cache on the menu.
- On the caching submenu, click Enable.
- You can turn the cache off by right-clicking the node and clicking Disable on the caching submenu.
Caching Nodes in a Database
For streams run in a database, data can be cached midstream to a temporary table in the database rather than the file system. When combined with SQL optimization, this may result in significant gains in performance. For example, the output from a stream that merges multiple tables to create a data mining view may be cached and reused as needed. By automatically generating SQL for all downstream nodes, performance can be further improved.
To take advantage of database caching, both SQL optimization and database caching must be enabled. Note that Server optimization settings override those on the Client. See the topic Setting optimization options for streams for more information.
With database caching enabled, simply right-click any nonterminal node to cache data at that point, and the cache will be created automatically directly in the database the next time the stream is run. If database caching or SQL optimization is not enabled, the cache will be written to the file system instead.