I just try to understand the GPFS trace from an AIX Oracle RAC system (not so that I could easily read traces - thus I am asking). Background is that the system exhibits considerable IO times for the redo log sync writes.
Direct IO is used.
I do have a short trace (default classes and level) run through trsum.awk. When picking out just the records with the disk names of the redo log file system, I get sections like
70124221 0.227214726 QIO: write data tag 331543 14823 buf 0x0700000020AE8E00 disk 8000001200000132 da 37:51170176 nSectors 79 align 0 by (Direct I/O handler) qlen 1
70124221 0.227289218 QIO: write data tag 331543 14823 buf 0x0700000020AE8E00 disk 8000001200000079 da 17:51170304 nSectors 79 align 0 by (Direct I/O handler) qlen 1
70124221 0.231426960 FIO: write data tag 331543 14823 buf 0x0700000020AE8E00 disk 8000001200000132 da 37:51170176 nSectors 79 qlen 0 duration 4212.234 us
70124221 0.231441511 FIO: write data tag 331543 14823 buf 0x0700000020AE8E00 disk 8000001200000079 da 17:51170304 nSectors 79 qlen 0 duration 4152.293 us
I wonder what is involved in these writes: the QIO records do have everything needed to send the request to the SCSI layer, and the request is so small it should easily be absorbed by the write cache of the storage system. Nevertheless it takes more than 4ms to come back.
Am I right that GPFS itself is, from the QIO to the FIO record, not involved in servicing these requests? I .e. if one sees long Durations there, GPFS cannot be blamed, but the sources of delays are further down in the stack? Or could there still be something odd in GPFS?
I realise QIO comes with (unbuffered?) writes, DIOQIO with (unbuffered?) read, and FIO seems related to the completion of the IO.
EDIT / UPDATE Jul 16 2013: Found the assumption that the diff between QIO and FIO is the (gpfs-independent) block device IO time confirmed in another post: https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014213603.
As for rdwr entries in the trsum output, that appears to be the time a file was held open in RDWR, e.g. in the trace looking for things related to
91161229 0.024665781 rdwr 23855.442 us +
I find 307 91161229 0.000810339 0.005214 MMFS VNODE: gpfs_v_rdwr enter: vP 0xF1000A0425A10130 op 0 flags 0x0C400003 offset 0x000000012577A000 len 8192 ext 0 vinfoP 0xF1000A043B71A180
where 0.024665781-(23855.442/1000000)=. 000810339
and at 0.024665781 the file might have been closed again:
307 91161229 0.024665781 0.003791 MMFS VNODE: gpfs_v_rdwr exit: new offset 0x000000012577C000 resid 0 code 0 rc 0
Some questions remain:
Which queue length is given by qlen? Is it the queue for the respective block device? I suppose that.
Is there a way to relate a WRITE or a READ to the IO records it generates? I could check the disk and sector the IO goes to, find the file occupying this, and get the file names related to the inodes in the high level request (BTW: how do I see the file system from the high level request?). But there might be a simpler way.