The configuration tips for a first-time setup apply to the Informix Server Version 11.70.xC2.
A main factor for the processing speed of IWA is that IWA keeps all its data in memory. During query execution it is not necessary to access the disks. The data is also written to disk, but that is only for persistence, to avoid having to reload the data for the data marts from the database server after a restart of IWA. (Normally IWA just keeps running, but certain reconfiguration tasks or even machine maintenance may be simple reasons for having to bring down and restart IWA.)
IWA uses a special compression technique that normally reduces the data to about one half or one third of its original size. The compression ratio may be even better, but that depends heavily on the original data, e.g. when there are a lot of fixed length characters columns that contain many trailing spaces. The compression method allows IWA to process the data for query joins in its compressed format, which saves the need for decompression and the memory space needed for it. Only once the result set for a query has been determined will the data be uncompressed before sending it to the database server.
But memory is not only used to keep the data of data marts. As the goal
is to do all processing without the need for disk access, additional
memory is also needed for temporary processing results. Especially
intermediate query results, like the selected data that must be sorted
due to an
ORDER BY clause in the query, may need memory of
significant size. And such intermediate results will not be in the
compressed format. For example after having collected the result data
from the worker nodes, the coordinator node is responsible for sorting
and/or grouping the result set. The coordinator node will need some
amount of memory to complete this task. The size of this memory depends
on the number of rows in the result set and the number of keys in the
ORDER BY and/or
GROUP BY clauses of the query.
Additional memory is also used for intermediate data created before and
during the data compression phase when the data is loaded into the data
With this concept it is clear that memory is a crucial resource for IWA. If IWA is to experience a shortage of memory, its proper function is at risk. Therefore IWA is constantly monitoring the free memory situation on the machine and will issue warnings whenever there is a shortage of memory or errors when needed memory cannot be obtained from the operating system. IWA does not have a concept of using disk space in case of a memory shortage. It should also be clear that it is absolutely counterproductive if the operating system starts to transparently swap or page even only small pieces of IWAs memory out to disk. Though it may not cause immediate out-of-memory errors from IWA itself, it would seriously impact the performance of IWA. (Informix server is much more adept at managing only a subset of data in memory and accessing the disk in an efficient way.)
The IWA nodes will use shared memory mainly for the data of data marts and to communicate with each other. The worker nodes are keeping the compressed data of the data marts in their shared memory segments. This normally is the biggest part of shared memory usage. The coordinators use their shared memory mainly for communication and hence need less shared memory. Worker as well as coordinator nodes will use shared memory gradually, as needed. Not all of the configured shared memory will be allocated and used in advance.
The parameter WORKER_SHM in the dwainst.conf file is used to configure the size of the shared memory for all worker nodes combined. The configured amount will then be distributed evenly among the configured number of worker nodes. All the data marts have to fit into this memory. As the data is compressed during the data mart loading phase, before the actual load took place it can only be estimated, how much worker shared memory will be needed for a specific data mart. The ISAO Studio GUI is doing this while the user is creating a data mart definition.
The dwainst.conf file parameter COORDINATOR_SHM determines the size of the shared memory for all the coordinator nodes combined, which then will be distributed evenly among all coordinator nodes. Usually COORDINATOR_SHM should be much less than the WORKER_SHM. If more than one coordinator nodes exist (i.e. NUM_NODES is greater than 7) then a coordinator can assume the role of a worker node in case a worker node fails and terminates. For this to work correctly, also the coordinator node will create a file in the /dev/shm directory with the same size as the worker node. But in normal operation it will not use all this memory. In fact a coordinator will start using shared memory only when it is needed. Therefore it is normal that a single coordinator will start using shared memory as data marts get loaded with data. But additional coordinators will not immediately start using shared memory at this time.
A typical approach for configuration is to first figure out, how much memory of the machine is needed for other tasks, like the operating system and possibly a co-resident Informix server. The remaining memory (which normally should still be the bigger part) is then available for IWA. As a rule of thumb between 55% and 60% of this can be configured as WORKER_SHM, 5% to 6% of it as COORDINATOR_SHM, leaving the remaining 34% to 40% for temporary memory requirements during data load or query processing. This temporary memory needed for processing cannot be configured. Nor is it, unlike with Informix server, part of the shared memory. Instead it is allocated and released as private memory by each process as needed.
The shared memory segments are also reflected as memory mapped files in /dev/shm. It enables fast loading of the shared memory with the data at process start up. If the files in /dev/shm do not exist at start time of an IWA process, the data will be loaded from disk into the newly created shared memory segments. Pre-requisite for this to work properly is that the size for /dev/shm is configured large enough. If it is not large enough, it needs to be increased. A later blog entry in the series of configuration tips will give more detail on this. Apart from this it may be noted that /dev/shm should not be used as a 'fast temporary file storage place', because this may deprive IWA of needed space.
If the Informix server co-resides with IWA on the same machine then it will again be useful to look at the remaining workload for this Informix server. If the Informix server is to be used rather exclusively for OLAP queries, and these queries now get executed by IWA, then the Informix server's memory requirements may be reduced to the necessary minimum. Mainly the buffer pool(s) can be configured smaller. Possibly some other parameters can be changed, depending on the existing configuration. If the Informix server still has to process OLTP workload, and the data marts often get re-created and loaded with the new data, then any changes to the Informix server's memory configuration should be done rather carefully.
In my next blog entry we will see, what this means for some memory related parameters of the Linux kernel.