Topic
  • 9 replies
  • Latest Post - ‏2010-08-25T12:51:41Z by SystemAdmin
SystemAdmin
SystemAdmin
228 Posts

Pinned topic Slow Response Times

‏2010-08-24T11:46:27Z |
Hi,

I'm new to Informix, but I've got an instance (OLTP system) that has been performing fine for months now, but recently has be displaying slow repsonse times for any/all SQL queries executed on it. These queries are for CC authorisations, that require sub-second responses (typically around a 1/10th of a second), but recently they've been taking up for 12 seconds!!

Therefore, can someone please suggest where best to look for probable causes, as the AIX server is only running at 50% CPU/memory, and the other 14 Informix instances running on the prod server aren't having any performance issues. Our network team have ruled out any on the connection from the application <-> db server too.

thanks in advance.
Updated on 2010-08-25T12:51:41Z at 2010-08-25T12:51:41Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T08:02:18Z  
    Hi,

    The main target for you here is to identify where is the bottleneck located. That can be disk I/O, network I/O, lack of resources (such as mutexes, locks or buffers).

    I would suggest you to analyze the environment at first. Can it be that it has changed (i.e. the amount of connections/users has increased significantly)?

    Use onstat utility to analyze the server status. Check 'onstat -g ioa' for I/O queues - if they are large the server has been stuck on disk I/O.

    Monitor the user threads to see what are they actually doing (onstat -g ath, onstat -g stk).

    Check 'onstat -p' or 'onstat -g buf' for bufwaits. If it's large, you're having lack of available buffers.
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T11:12:44Z  
    Hi,

    The main target for you here is to identify where is the bottleneck located. That can be disk I/O, network I/O, lack of resources (such as mutexes, locks or buffers).

    I would suggest you to analyze the environment at first. Can it be that it has changed (i.e. the amount of connections/users has increased significantly)?

    Use onstat utility to analyze the server status. Check 'onstat -g ioa' for I/O queues - if they are large the server has been stuck on disk I/O.

    Monitor the user threads to see what are they actually doing (onstat -g ath, onstat -g stk).

    Check 'onstat -p' or 'onstat -g buf' for bufwaits. If it's large, you're having lack of available buffers.
    Hi

    thanks for the onstat suggestions. Below are the results, as although buffer waits number is large, not sure at what number of waits is deemed too many?

    ukbiprodmvrs01[/home/informix]$ onstat -p

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:09:16 -- 1850848 Kbytes

    Profile
    dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
    1176887565 1841116027 16511495965 92.87 13666795 24122934 119306573 88.94

    isamtot open start read write rewrite delete commit rollbk
    16192932555 102915248 434689413 13862080158 39012259 10789882 2684051 2020736 780

    gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
    0 0 0 0 0 0 0

    ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
    0 0 0 84009.13 28363.27 1206 1378

    bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
    82297149 1712 9807705426 1 0 874 5906670 147790

    ixda-RA idx-RA da-RA RA-pgsused lchwaits
    494948436 4650232 41689791 539540757 901118

    ukbiprodmvrs01[/home/informix]$
    ukbiprodmvrs01[/home/informix]$
    ukbiprodmvrs01[/home/informix]$ onstat -g buf

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:09:59 -- 1850848 Kbytes

    Profile

    Buffer pool page size: 4096
    dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
    1176906592 1841138398 16513538062 92.87 13667539 24124030 119316434 88.55

    bufwrits_sinceckpt bufwaits ovbuff flushes
    58025 82300787 0 1378

    Fg Writes LRU Writes Avg. LRU Time Chunk Writes
    0 9522242 0.012 1533059

    Fast Cache Stats
    gets hits %hits puts
    732836386 728437231 99.40 84696574

    ukbiprodmvrs01[/home/informix]$ onstat -g ioq

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:15:00 -- 1850848 Kbytes

    AIO I/O queues:
    q name/id len maxlen totalops dskread dskwrite dskcopy
    drda_dbg 0 0 0 0 0 0 0
    sqli_dbg 0 0 0 0 0 0 0
    kio 0 0 17 353978622 350468414 3510208 0
    kio 1 0 31 281099087 277715892 3383195 0
    kio 2 0 17 329960823 326546537 3414286 0
    kio 3 0 17 225666031 222299692 3366339 0
    adt 0 0 0 0 0 0 0
    msc 0 0 2 116602 0 0 0
    aio 0 0 2 23527 6427 0 0
    pio 0 0 0 0 0 0 0
    lio 0 0 0 0 0 0 0
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T11:36:29Z  
    Hi

    thanks for the onstat suggestions. Below are the results, as although buffer waits number is large, not sure at what number of waits is deemed too many?

    ukbiprodmvrs01[/home/informix]$ onstat -p

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:09:16 -- 1850848 Kbytes

    Profile
    dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
    1176887565 1841116027 16511495965 92.87 13666795 24122934 119306573 88.94

    isamtot open start read write rewrite delete commit rollbk
    16192932555 102915248 434689413 13862080158 39012259 10789882 2684051 2020736 780

    gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
    0 0 0 0 0 0 0

    ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
    0 0 0 84009.13 28363.27 1206 1378

    bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
    82297149 1712 9807705426 1 0 874 5906670 147790

    ixda-RA idx-RA da-RA RA-pgsused lchwaits
    494948436 4650232 41689791 539540757 901118

    ukbiprodmvrs01[/home/informix]$
    ukbiprodmvrs01[/home/informix]$
    ukbiprodmvrs01[/home/informix]$ onstat -g buf

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:09:59 -- 1850848 Kbytes

    Profile

    Buffer pool page size: 4096
    dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
    1176906592 1841138398 16513538062 92.87 13667539 24124030 119316434 88.55

    bufwrits_sinceckpt bufwaits ovbuff flushes
    58025 82300787 0 1378

    Fg Writes LRU Writes Avg. LRU Time Chunk Writes
    0 9522242 0.012 1533059

    Fast Cache Stats
    gets hits %hits puts
    732836386 728437231 99.40 84696574

    ukbiprodmvrs01[/home/informix]$ onstat -g ioq

    IBM Informix Dynamic Server Version 11.50.FC3 -- On-Line -- Up 4 days 02:15:00 -- 1850848 Kbytes

    AIO I/O queues:
    q name/id len maxlen totalops dskread dskwrite dskcopy
    drda_dbg 0 0 0 0 0 0 0
    sqli_dbg 0 0 0 0 0 0 0
    kio 0 0 17 353978622 350468414 3510208 0
    kio 1 0 31 281099087 277715892 3383195 0
    kio 2 0 17 329960823 326546537 3414286 0
    kio 3 0 17 225666031 222299692 3366339 0
    adt 0 0 0 0 0 0 0
    msc 0 0 2 116602 0 0 0
    aio 0 0 2 23527 6427 0 0
    pio 0 0 0 0 0 0 0
    lio 0 0 0 0 0 0 0
    Hi,

    The I/O queues seem to be allright. Though bufwaits may be too big. Try to compare it against bufwaits on other production instances which don't have any performance issue. If the difference is huge it may worth to add some more buffers (BUFFERPOOL parameter).

    You should also check that you have enough poll threads to service the amount of connections you have. It is recommended to have 1 poll thread for ~250 connections.
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:01:55Z  
    Hi,

    The I/O queues seem to be allright. Though bufwaits may be too big. Try to compare it against bufwaits on other production instances which don't have any performance issue. If the difference is huge it may worth to add some more buffers (BUFFERPOOL parameter).

    You should also check that you have enough poll threads to service the amount of connections you have. It is recommended to have 1 poll thread for ~250 connections.
    How do I check what the poll thread to no. of connections are?
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:16:02Z  
    How do I check what the poll thread to no. of connections are?
    I take it you mean the NETTYPE parameter:

    NETTYPE soctcp,2,200,NET
    LISTEN_TIMEOUT 60
    MAX_INCOMPLETE_CONNECTIONS 1024
    FASTPOLL 1

    The NUMCPUVPS is 4

    onstat -u returns: 151 active, 256 total, 169 maximum concurrent - are these parameter values okay then do you think?
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:17:30Z  
    How do I check what the poll thread to no. of connections are?
    The number of poll threads & number of connections can be set via NETTYPE parameter.

    For example:

    NETTYPE onsoctcp,1,200,CPU

    In the above example 1 means one poll thread, 200 - maximum number of expected connections, CPU means that poll threads will be handled by CPU VP(s). For more than one poll thread it is recommended to set NET instead of CPU if you have more than one poll thread.

    Check following link for detailed description:
    http://publib.boulder.ibm.com/infocenter/idshelp/v115/topic/com.ibm.perf.doc/ids_prf_105.htm?resultof=%22%6e%65%74%74%79%70%65%22%20%22%6e%65%74%74%79%70%22%20
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:19:00Z  
    I take it you mean the NETTYPE parameter:

    NETTYPE soctcp,2,200,NET
    LISTEN_TIMEOUT 60
    MAX_INCOMPLETE_CONNECTIONS 1024
    FASTPOLL 1

    The NUMCPUVPS is 4

    onstat -u returns: 151 active, 256 total, 169 maximum concurrent - are these parameter values okay then do you think?
    Ah, just noticed your new post.

    Yep, these settings looks valid.
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:31:50Z  
    Ah, just noticed your new post.

    Yep, these settings looks valid.
    Good - that's something set right then!

    Back to the bufwaits, below is out BUFFERPOOL params:

    BUFFERPOOL default,buffers=10000,lrus=8,lru_min_dirty=50.000000,lru_max_dirty=60.500000
    BUFFERPOOL size=4K,buffers=132000,lrus=16,lru_min_dirty=0.750000,lru_max_dirty=1.550000

    Would you suggest adding another buffer pool?

    Also doing an onstat -g cpu, are the below normal:

    29 kaio 1cpu* 08/25 13:22:38 184564.2279 443414572 IO Idle
    59 aslogflush 5cpu 08/25 13:22:38 1.9307 358470 sleeping secs: 1
    60 btscanner_0 1cpu 08/25 13:22:36 662.4204 78508340 sleeping secs: 9
    77 kaio 4cpu* 08/25 13:22:38 152400.4080 357141153 IO Idle
    78 kaio 3cpu* 08/25 13:22:38 188091.5425 418102214 IO Idle
    79 kaio 5cpu* 08/25 13:22:38 118105.7210 286945987 IO Idle

    Our instance is using kernal I/O, but should they be idle more often than not?
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Slow Response Times

    ‏2010-08-25T12:51:41Z  
    Good - that's something set right then!

    Back to the bufwaits, below is out BUFFERPOOL params:

    BUFFERPOOL default,buffers=10000,lrus=8,lru_min_dirty=50.000000,lru_max_dirty=60.500000
    BUFFERPOOL size=4K,buffers=132000,lrus=16,lru_min_dirty=0.750000,lru_max_dirty=1.550000

    Would you suggest adding another buffer pool?

    Also doing an onstat -g cpu, are the below normal:

    29 kaio 1cpu* 08/25 13:22:38 184564.2279 443414572 IO Idle
    59 aslogflush 5cpu 08/25 13:22:38 1.9307 358470 sleeping secs: 1
    60 btscanner_0 1cpu 08/25 13:22:36 662.4204 78508340 sleeping secs: 9
    77 kaio 4cpu* 08/25 13:22:38 152400.4080 357141153 IO Idle
    78 kaio 3cpu* 08/25 13:22:38 188091.5425 418102214 IO Idle
    79 kaio 5cpu* 08/25 13:22:38 118105.7210 286945987 IO Idle

    Our instance is using kernal I/O, but should they be idle more often than not?
    'IO idle' is the normal state for i/o threads, which means there is no work for them so far.

    Regarding the BUFFERPOOL. As I already said, I suspect that 'bufwaits' may be too big, but I can't tell you that I'm sure without real experience with such instance. You mentioned that there are other similar instances which work ok. If there is a big difference between "bufwaits" of this and "normal" instance, I would suggest you to increase the amount of buffers (+10-20k). Be aware that this will affect IDS memory utilization! And you have to bounce the instace so parameter change could take effect.