Topic
  • 5 replies
  • Latest Post - ‏2007-10-10T17:18:10Z by gbowerman
SystemAdmin
SystemAdmin
228 Posts

Pinned topic Checkpoint Blocked during Create Index

‏2007-09-02T07:35:43Z |
When trying to create index on a table with 150 million records onstat -g ckp displays:-

IBM Informix Dynamic Server Version 11.10.FC1 -- On-Line (CKPT INP) -- Up 01:17:20 -- 272384 Kbytes
Blocked:CKPT

AUTO_CKPTS=On RTO_SERVER_RESTART=Off

Critical Sections Physical Log Logical Log
Clock Total Flush Block # Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg
Interval Time Trigger LSN Time Time Time Waits Time Time Time Buffers /Sec Pages /Sec Pages /Sec
416 11:33:26 CKPTINTVL 7:0x1359e2cc 0.5 0.4 0.0 1 0.0 0.0 0.0 2741 2741 19 0 20 0
417 11:38:36 CKPTINTVL 7:0x1413d018 6.7 6.6 0.0 0 0.0 0.0 0.0 53316 8024 95 0 2975 9
418 11:43:56 CKPTINTVL 7:0x151e7728 12.1 12.1 0.0 0 0.0 0.0 0.0 80903 6688 121 0 4266 13
419 11:49:01 CKPTINTVL 7:0x151e803c 0.4 0.4 0.0 0 0.0 0.0 0.0 4799 4799 0 0 1 0
420 11:54:16 CKPTINTVL 7:0x167693b0 12.8 12.8 0.0 0 0.0 0.0 0.0 74155 5796 176 0 5505 18
421 11:59:16 CKPTINTVL 7:0x1676b304 0.5 0.5 0.0 0 0.0 0.0 0.0 3791 3791 15 0 2 0
422 12:04:33 CKPTINTVL 7:0x17749204 11.9 11.8 0.0 0 0.0 0.0 0.0 98065 8282 9 0 4062 13
423 12:09:45 CKPTINTVL 7:0x1774a03c 0.4 0.4 0.0 0 0.0 0.0 0.0 5403 5403 0 0 1 0
424 12:14:55 CKPTINTVL 7:0x18d11644 9.9 9.9 0.0 0 0.0 0.0 0.0 80966 8214 4 0 5575 18
425 12:20:00 CKPTINTVL 7:0x18d1203c 1.1 1.1 0.0 0 0.0 0.0 0.0 6901 6345 0 0 1 0

Max Plog Max Llog Max Dskflush Avg Dskflush Avg Dirty Blocked
pages/sec pages/sec Time pages/sec pages/sec Time
200 200 13 6006 131 0

The oninit processes remain running at all but 100% cpu. We have left it running for a few hours but never returns.
If we try setting the checkpoint interval to say 1 hour, it manages to create the index within 30 mins. Tried increasing the physical log size up to 500mb, as some articles recommend higher than 1GB, but still fails to create.

Database logging is set to none, until we have the data imported.

The data is in one db space with the index data in another.

Any ideas ?

Running on solaris 10, with cooked chunks, fibre raid.

Thanks
Updated on 2007-10-10T17:18:10Z at 2007-10-10T17:18:10Z by gbowerman
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Checkpoint Blocked during Create Index

    ‏2007-09-04T15:42:12Z  
    I'd like to understand the problem better and the description is a bit confusing. The title indicates that checkpoints are blocked. But, the onstat -g ckp output indicates that checkpoints are occurring every 5 minutes or so. The text seems to indicate the index create is blocked. Which is it?
    Given that checkpoints are being triggered by CKPTINTVL, increasing the physical log won't change the checkpoint frequency. And, onstat -g ckp indicates that no transaction blocking is occurring. So, the problem doesn't appear to be checkpoints, it appears to be index create not completing in a timely manner.
    My first suggestion is to do onstat -g stk on the index create threads to see what their stacks are. Given that the system remains 100% busy, maybe they are in some tight loop or something. I'd also check onstat -t on the index partitions. See if the # of pages used continues to increase. As index create progresses, they use more pages so you can tell if the index build is continuing by making sure it continues to add pages to the index partitions.
    I would not implement this last suggestion as it may mask a problem. But, why set CKPTINTVL to 300? Why not set RTO_SERVER_RESTART to some value? I assume this is just the default setting? Setting RTO_SERVER_RESTART to say, 120 and keeping AUTO_CKPTS on will allow the IDS engine to choose the appropriate checkpoint frequency.
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Checkpoint Blocked during Create Index

    ‏2007-09-10T09:51:39Z  
    I'd like to understand the problem better and the description is a bit confusing. The title indicates that checkpoints are blocked. But, the onstat -g ckp output indicates that checkpoints are occurring every 5 minutes or so. The text seems to indicate the index create is blocked. Which is it?
    Given that checkpoints are being triggered by CKPTINTVL, increasing the physical log won't change the checkpoint frequency. And, onstat -g ckp indicates that no transaction blocking is occurring. So, the problem doesn't appear to be checkpoints, it appears to be index create not completing in a timely manner.
    My first suggestion is to do onstat -g stk on the index create threads to see what their stacks are. Given that the system remains 100% busy, maybe they are in some tight loop or something. I'd also check onstat -t on the index partitions. See if the # of pages used continues to increase. As index create progresses, they use more pages so you can tell if the index build is continuing by making sure it continues to add pages to the index partitions.
    I would not implement this last suggestion as it may mask a problem. But, why set CKPTINTVL to 300? Why not set RTO_SERVER_RESTART to some value? I assume this is just the default setting? Setting RTO_SERVER_RESTART to say, 120 and keeping AUTO_CKPTS on will allow the IDS engine to choose the appropriate checkpoint frequency.
    It's true that onstat -g ckp indicates that the checkpoints have occured every 5 minutes or so for a reasonable time whilst the index is being created.

    The problem is that the last checkpoint you see will have been in progress for anything from 5-10 mintues up to 12 hours.

    ie. the onstat -g ckp command may have been issued much later that 12:20:00.

    We find the Informix Server in the state of On-Line(CKPT INP), Blocked: CKPT

    The index creation is continuing as the indexdbs we use is being populated - but very very slowly whilst this chekpoint is in progress.

    Sometimes if we are lucky the index creation will succeed in around 30 minutes. Mostly we encounter this checkpoint in progress / blocked issue at some point during the index creation and we are not patient to wait for it to complete.

    The only way around the issue so far has been to have a huge CKPTINTVL so that the server never check points during our data load and index build
  • SystemAdmin
    SystemAdmin
    228 Posts

    Re: Checkpoint Blocked during Create Index

    ‏2007-09-10T14:56:23Z  
    It's true that onstat -g ckp indicates that the checkpoints have occured every 5 minutes or so for a reasonable time whilst the index is being created.

    The problem is that the last checkpoint you see will have been in progress for anything from 5-10 mintues up to 12 hours.

    ie. the onstat -g ckp command may have been issued much later that 12:20:00.

    We find the Informix Server in the state of On-Line(CKPT INP), Blocked: CKPT

    The index creation is continuing as the indexdbs we use is being populated - but very very slowly whilst this chekpoint is in progress.

    Sometimes if we are lucky the index creation will succeed in around 30 minutes. Mostly we encounter this checkpoint in progress / blocked issue at some point during the index creation and we are not patient to wait for it to complete.

    The only way around the issue so far has been to have a huge CKPTINTVL so that the server never check points during our data load and index build
    can you post the onstat -g skt output for main_loop() (the checkpoint thread) and the index create threads? this would help us to determine where things are in the process.
  • gbowerman
    gbowerman
    5 Posts

    Re: Checkpoint Blocked during Create Index

    ‏2007-09-14T18:04:25Z  
    can you post the onstat -g skt output for main_loop() (the checkpoint thread) and the index create threads? this would help us to determine where things are in the process.
    In case you didn't spot the typo. Scott meant can you run "onstat -g stk all" and supply the output.
    Thanks
  • gbowerman
    gbowerman
    5 Posts

    Re: Checkpoint Blocked during Create Index

    ‏2007-10-10T17:18:10Z  
    • gbowerman
    • ‏2007-09-14T18:04:25Z
    In case you didn't spot the typo. Scott meant can you run "onstat -g stk all" and supply the output.
    Thanks
    FYI this problem is resolved in 11.10.xC2. To track this see APAR IC53856 - http://www-1.ibm.com/support/docview.wss?uid=swg1IC53856