I/O optimization and the pf module

The pf module is a user space cache using a simple LRU (Last Recently Used) mechanism for page pre-emption. The pf module also monitors cache page usage to anticipate future needs for file data, issuing aio_read commands to preload the cache with data.

A common I/O pattern is the sequential reading of very large (tens of gigabytes) files. Applications that exhibit this I/O pattern tend to benefit minimally from operating system buffer caches. Large operating system buffer pools are ineffective since there is very little, if any, data reuse. The MIO library can be used to address this issue by invoking the pf module which detects the sequential access pattern and asynchronously preloads a much smaller cache space with data that will be needed. The pf cache needs only to be large enough to contain enough pages to maintain sufficient read ahead (prefetching). The pf module can optionally use direct I/O which avoids an extra memory copy to the system buffer pool and also frees the system buffers from the one time access of the I/O traffic, allowing the system buffers to be used more productively. Early experiences with the JFS and JFS2 file systems of AIX have consistently demonstrated that the use of direct I/O from the pf module is very beneficial to system throughput for large sequentially accessed files.