The configuration tips for a first-time setup apply to the Informix Server Version 11.70.FC7 and later and to Version 12.10.FC1.
Before determining values for individual configuration parameters, it probably is helpful to take a look at the architecture of the Informix Warehouse Accelerator (IWA) and the system environment where it should run.
By design IWA is not only capable to take full advantage of the resources of a symmetric multiprocessing (SMP) machine, but also of a cluster system where individual cluster nodes neither share CPU nor memory resources. The differences between these environments contribute to the unique architecture of IWA.
At the highest level, IWA consists of two different kinds of processes, called coordinator node and worker node. As their names suggest, a coordinator node is mainly responsible for coordinating the work that is to be performed by one or several worker nodes. As part of this, the coordinator node not only tells a worker node, what it needs to do. The coordinator node also is responsible for accepting queries and administrative commands, as well as the warehouse data for an initial data load, from an Informix database server. Once the work has been executed, the coordinators activity is to collect results, do some final processing if needed and relay the consolidated result back to the Informix server. A worker node is mainly busy with processing the data, either compressing it to a special format during data load, or scanning, filtering and joining this data during query execution. Dealing foremost with the data, a worker node is also responsible for keeping all the data in its compressed format in main memory. This allows the radical approach of doing all processing for query execution in-memory. (The data is also written to disk, but only to facilitate persistence without the need to re-load and re-compress the raw data from the Informix server in the unlikely event of a hardware down time.)
If IWA is configured with multiple worker nodes (e.g. on a cluster system), the data of fact tables will be split evenly and distributed among the worker nodes. Each worker node therefore has its distinct part of the fact table data to work with. Therefore, determining the number of worker nodes for an IWA installation is a crucial step in the configuration phase. Changing it later actually requires re-creating and re-loading the data marts, so that the fact table data can again be distributed correctly.
This architecture enables IWA on a cluster system to run one coordinator node on one cluster node and on each of the remaining cluster nodes a single worker node. As the cluster nodes do not share memory, each worker node has to have its own set of data needed for the processing. Fact table data is distributed evenly among the worker nodes, whereas each worker node needs a complete duplicate of dimension table data in order to join it with fact table data. The idea here is, that the fact tables are the biggest tables with the most data and distributing them among the worker nodes enables the most efficient use of memory. Duplicating the dimension table data is not so inefficient as they are assumed to be smaller. Each IWA node is also able to take full advantage of all CPU resources on its cluster node, as each IWA node (worker as well as coordinator) is internally multithreaded and executes its tasks in parallel. This works best under the general assumption, that each cluster node configured with an IWA node is for exclusive use by that IWA node, i.e. the IWA node does not share any resources of its cluster node with any other IWA node or even other applications.
On SMP machines, things look somewhat different. CPU resources and memory are shared. Therefore, the duplication of dimension table data is not needed, instead it is an unnecessary waste of memory. Furthermore, a single IWA node can utilize all available CPU resources via the aforementioned built-in multithreading. Due to the architecture of IWA, the distinction between coordinator and worker node applies also on SMP machines, but a single pair of one coordinator and one worker node normally is absolutely sufficient. Multiple worker nodes not only cause the unnecessary duplication of dimension table data, but with this also increase consumption of additionally needed memory during query processing. Contrary to IWA on a cluster system using cluster nodes exclusively, IWA on a SMP machine may have to share resources of this machine even with another application, e.g. the Informix server itself being a prime candidate. In such a scenario, not only IWA, but also Informix server, should both be configured in ways to leave enough memory and CPU resources for each others use.
The principal configuration parameters for the number of IWA nodes and their type, the resources of memory and CPU are:
Total number of IWA nodes.
Combined size of shared memory used by all worker nodes for mart data. I.e. all data marts must fit into this amount of memory.
Combined size of shared memory used by all coordinator nodes.
How much CPU resources will be used by each IWA node for data mart loading tasks.
How much CPU resources will be used by each IWA node for query acceleration tasks.
For the administrator used to Informix terminology, it is worth noting that the meaning of "shared memory" (SHM) with IWA is somewhat different from SHM for the Informix server. The SHM for IWA nodes is the memory that the multiple, parallel threads of a single IWA node use in shared mode. Therefore, each IWA node has its own part of shared memory. The worker nodes use their shared memory to keep the mart data in the compressed format. This shared memory is allocated as needed, i.e. as data marts get loaded with data, and it is by far the major part of all SHM in IWA. But contrary to Informix server, the multiple threads of IWA nodes also allocate and use additional (private) memory temporarily to perform their tasks. This is the case during the data load phase (in order to build the data dictionary and perform the data compression) as well as during query acceleration (for intermediate results, hash tables, etc.). This temporary use of private memory is not configurable, i.e. no uppper limit can be specified. The threads of IWA nodes will allocate, use and free this private memory on an as needed basis, possibly until all memory of the system is exhausted. If memory needed for a certain task cannot be allocated because memory resources of the SMP machine or the cluster node are exhausted, that particular task will fail accordingly. It is therefore important to configure the size of the SHM in a way that enough memory remains for private allocation by IWA threads during the execution of their tasks.
The differences described above between a cluster system and a SMP system require specific configurations of IWA, adapted to the type of system where IWA is installed. While the following description does not replace the manual, it provides general considerations for a SMP system environment. Aspects of configuring IWA on a cluster system will be discussed at a later time.
In the next blog entry New Configuration Tips (2) we look at some specifics of configuring IWA on a SMP system.