The configuration tips for a first-time setup apply to the Informix Server Version 11.70.xC2.
The basic configuration is mainly concerned with the first-time setup of an Informix Warehouse Accelerator (IWA) instance. Taking the time to understand the configuration parameters will help to create a good setup for a new IWA instance before it is started.
The configuration information for an IWA instance is all contained in the file dwainst.conf located in directory $INFORMIXDIR/dwa/etc. All nodes (worker as well as coordinator nodes) will be created equally based on this configuration information. Any configuration needs that are specific to individual nodes will be dealt with by the ondwa utility. The ondwa utility is used to setup, start and stop the IWA instance.
Number of Nodes
The number of nodes is configured with the parameter NUM_NODES in the dwainst.conf file. The default is 2, which means one coordinator and one worker node.
Each configured node corresponds to a process that will be started. However, the processes themselves are multithreaded. This means that even with only the two default nodes configured, IWA will use multiple CPU resources of the machine as they are available.
On the other hand, there are some tasks that can be performed with better parallelism when more nodes, especially more worker nodes, are configured. This notably applies to the loading of data into a data mart, because data will be distributed among the worker nodes. But the distribution is not done for the load mart phase only. The data of fact tables will remain distributed among the worker nodes for the lifetime of the data mart. Therefore the same number of worker nodes will be needed to properly query the data in the data mart even after the load has completed and the mart has become active. Adding more worker nodes at this time will not really benefit this data mart as the newly added nodes will not have any share of this marts data. Though another, newly created data mart can benefit from the newly added worker nodes.
Having too many worker nodes may actually degrade query execution performance as the overhead of internal communication between the processes can outweigh the advantage of increased parallelism. A still higher number of worker nodes can further decrease the performance due to individual internal threads having to wait for CPU resources and other threads having to wait even longer for such CPU-starved threads.
One has to also consider whether Informix server is running on the same machine as IWA. In that case the Informix server will also need at least some CPU resources that should be available rather than used up by the IWA processes. How much CPU resources Informix server will need depends very much on the work load that Informix server has to process. If the workload is purely a data warehouse workload, and the bulk of that can be executed by IWA, then the Informix server will not need much CPU resources for the remaining tasks. However, if the Informix server should still do some more extensive post processing on data received from IWA, or some (unrelated) OLTP workload should progress in parallel, then it is important that the Informix server has sufficient access to the needed CPU resources.
A possible approach to figure out a sensible configuration would be to first determine the CPU resources needed by tasks other than the IWA acceleration. This should include a possibly co-resident Informix server. Based on this IWA can be configured to make good use of the remaining resources. For a rather small amount of CPU resources remaining for IWA, like one or two CPUs, the default configuration of two nodes (one coordinator and one worker node) should benefit most workloads. If a larger amount of CPU resources remains available for IWA, like three or more CPUs, then setting NUM_NODES to 5 (one coordinator and 4 worker nodes) may be useful.
However, it is important to remember that dimension tables get duplicated on each worker node. A higher number of worker nodes can therefore cause excessive need for memory, especially when the dimension tables themselves are not really small compared to the fact tables of the data warehouse database schema. (The data of fact tables gets distributed evenly among the worker nodes.) We therefore strongly recommend to also consider this later blog entry "Configuring CPU Utilization", describing an alternative to increasing the number of (worker) nodes.
The next blog entry will consider the configuration of memory resources.