Wa have migrated our Oracle Database from JFS2 filesystems to GPFS filesystems.
Next step is to install Oracle RAC in order to user our two nodes.
We have created two GPFS filesystems :
One for the archives logs (blocksize = 1M)
One for all the remaining data (temp, redo, datafiles, ctl, ...) (blocksize = 1M)
We have observed that the write data rate was multiplied by 3, but the I/O rate is the same. And warning messages appeared in Oracle Logs :
"Warning: log write time 1210ms, size 13KB"
"Warning: log write time 1250ms, size 37KB"
So we have created one other filesystem dedicated to redolog (blocksize = 256K)
=> Write data rate decreased (but is always twice as before GPFS). With nmon tool, we have observed that the blocksize on redolog disk is always 256K). We suspect GPFS to write a full block instead of a single IO.
How do GPFS deal with sequential write (redolog activity) ? Why was write date rate increased so much ?
We want to move redolog on a new GPFS filesystem (blocksize = 16K). What do you think about this idea ?
Is it possible to disable the "sequential write algorithm" on a filesystem ?
PS: We are using GPFS 220.127.116.11
PS2: Please find nmon graphs attached.
NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
This topic has been locked.
6 replies Latest Post - 2009-06-15T15:39:13Z by HajoEhlers
Pinned topic GPFS Block Size and Oracle
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2009-06-15T15:39:13Z at 2009-06-15T15:39:13Z by HajoEhlers
Re: GPFS Block Size and Oracle2009-06-09T16:50:33Z in response to VincentLasselinHave you read the notes about Oracle on GPFS
Set the Oracle read-ahead value to prefetch one or two full GPFS blocks. For example, if your GPFS block size is 512 KB, set the Oracle blocks to either 32 or 64 16 KB blocks.
( The oracle block size is equal to one GPFS subblock - 512/32=16 )
I assume that you have already a database. This means also that your DB has a given blocksize. IMHO this blocksize should be not smaller then the GPFS subblock size.
Saying so with a Oracle DB block size of 4 K the GPFS block size should be 32*4=64k . In case you are able to change the block size of the Oracle DB/redolog ( i am not an expert on Oracle ) i would do so and follow the GPFS recommandation.
There is also a redbook ( 2008 ) about oracle/gpfs
hopes it helps a little bit
Re: GPFS Block Size and Oracle2009-06-10T09:35:48Z in response to HajoEhlersHi Hajo,
Thanks for your quick response.
The tuning you are talking about concern only read access.
We have no problem in read, it's only in writing.
Why do GPFS write so much ? How can I check the activity of redolog files ? With filemon ? With a truss on OracleLogWriter process ?
For Oracle, the blocksize for redolog is 512 bytes. But all redbooks and docs I've found advise to create only one GPFS filesystems with a blocksize of 256/512/1024k.
But in our case, it's not working. So we were forced to create a dedicated GPFS filesystem for redo with a 256k blocksize (instead of 1M for others Oracle files).
I think we are in the good way, but it's not good enough (compared to JFS2). Do you think creating a new filesystem with a 16k blocksize can solve my issue ? FYI, for redolog and with JFS2, it's recommanded to create a filesystem with a blocksize of 512 bytes.
Re: GPFS Block Size and Oracle2009-06-10T10:18:16Z in response to VincentLasselin> For Oracle, the blocksize for redolog is 512 bytes
I hope that this is not correct. AFAIK is the standard recommendation that the redo log size on JFS2 should be a *multiple * of 512 bytes meaning it should not be lower then 4k.
In case it is really 512 bytes large i strongly recommend to recreate the redo log with a higher block size which should fit then to the GPFS (sub) block size of GPFS ( Only one is needed )
from a oracle admin guide
A.1.5 Setting the Database Block Size You can configure Oracle Database block size for better I/O throughput. On AIX, you can set the value of the DB_BLOCK_SIZE initialization parameter to between 2 KB and 32 KB, with a default of 4 KB. If Oracle Database is installed on a journaled file system, then the block size should be a multiple of the file system block size (4 KB on JFS, 16 K to 1 MB on GPFS). For databases on raw partitions, Oracle Database block size is a multiple of the operating system physical block size (512 bytes on AIX). Oracle recommends smaller Oracle Database block sizes (2 KB or 4 KB) for online transaction processing (OLTP) or mixed workload environments and larger block sizes (8 KB, 16 KB, or 32 KB) for decision support system (DSS) workload environments. ... A.2.2 Using Journaled File Systems Compared to Raw Logical Volumes Considerations for GPFS If you are using GPFS, then you can use the same file system for all purposes. This includes using it for the Oracle home directory and for storing data files and logs. For optimal performance, you should use a large GPFS block size (typically, at least 512 KB). GPFS is designed for scalability, and there is no requirement to create multiple GPFS file systems as long as the amount of data fits in a single GPFS file system.
Like i said before: The Oracle block size must be at least as large as the GPFS subblock size. In case of your 256K BS GPFS it must be 8k. in cause of an 1MB BS GPFS the oracle bs must be at least 32k. A far as i understand the oracle documentation the oracle block size can even be large as the gpfs block size thus 1024k
This means that in case your redolog block size is really 512 bytes GPFS might write one full gpfs subblock of 8k ( i really do not know how GPFS handle small writes ). Since Oracle says that one GPFS fits to all needs i would really recreate the redo logs DB to fit to your GPFS instead of creating a GPFS which fits to your redo logs.
The step of the recreation of the redo logs is discribed in the previous mentioned redbook.
For trouble shooting:
Use "iostat -a " to get a feeling of the amount of transactions/sec and kilobytes/sec.
Kbps/tps will give you an estimate of the avergae size of an transaction.
fileman can be used as well as mmpmon ( GPFS tool )
All from a user perspective and it might be a good idea to ask a oracle DBA
Re: GPFS Block Size and Oracle2009-06-11T08:07:35Z in response to HajoEhlersHi Hajo,
We have more information and our understanding is better.
Now, we have two filesystems for redologs (for Multiplexing).
We have noticed that the comportement is different between the two filesystems :
In theory :
There is the same write operations in the two filesystems (only done by Oracle LogWriter)
For read, there is two mechanisms only in the first filesystem :
- Copy of redologs in archive filesystem (sequential read in "async" mode)
- Read of redologs for shareplex (It's a tool to handle an Oracle mirror)
We have analyzed TPC reports. See below one example of this report :
IO read (iops): A few in redo1, none in redo2. => It's normal.
IO write (iops) : 20% more for redo2. 90% sequential write for redo1 (only 30% for redo2) => It's strange.
Read Data Rate : 0.43 MB/s for redo1. Nothing for redo2 => It's normal.
Write Date Rate : 5.7 MB/s for redo1. Only 0.2 MB/s for redo2 !!! => It's very strange
Read Transfert Size : 225K for redo1. => It's normal. Only sequential read are done on this filesystem so the I/O is the same that GPFS blocksize.
Write Transfert Size : 250K for redo1. 7.5K for redo2 => It's very strange too and explain the Write Data Rate.
So we think that the same block is accessed for a sequential read and for sequential write. So the full block is buffered and the full block is written to disk when the logwriter do a write I/O.
What do you think about this ?
dlmcnabb 120000P4JT1012 PostsACCEPTED ANSWER
Re: GPFS Block Size and Oracle2009-06-11T14:23:19Z in response to VincentLasselinWhen Oracle sees that it is opening database files on GPFS it uses the O_DIRECT option so that all IO uses Direct IO to prevent GPFS from prefetching or caching blocks in the GPFS pagepool. Oracle uses many AIO threads to do its own prefetching or block flushing to get IO parallelism. I do not know if Oracle distinguishes between logs, databases, and utility files. If it uses O_DIRECT on sequential access files, it will get no parallelism and you will see slow times.
Can you tell from strace whether the open flags have the O_DIRECT flag on? In the trcrpt* file from a GPFS "mmtrace trace=io", it will show DIO (or Direct I/O handler) strings if the IO is done using Direct IO instead of buffered IO.
You will have to ask Oracle support if there is a way to tell Oracle to not use Direct IO on sequentially accessed files.
Re: GPFS Block Size and Oracle2009-06-15T15:39:13Z in response to VincentLasselin> Write Transfert Size : 250K for redo1. 7.5K for redo2 => It's very strange too and explain the Write Data Rate.
Then check/compare the block size of redo1 and redo2. I would bet that redo2 has a very small one.