Skip to main content

Bringing IBM IDS Internals to the Surface

Mark Scranton (mailto:mscranto@us.ibm.com), Principal Consultant/Trainer , IBM, Software Group
Mark Scranton is a Principal Consultant/Trainer who joined Informix Software in 1995. Mark has focused exclusively on IBM Informix Dynamic Server and Extended Parallel Server engine-related topics since joining Informix. Mark spends a great deal of time in the classroom, teaching clients classes such as "Informix Dynamic Server Internal Architecture," "Informix Dynamic Server Performance Tuning," and the "Informix Dynamic Server Master Series." Mark also contributed to The Informix Handbook, and is a frequent presenter at the International Informix User Conference. Mark was also a nominee for the IIUGY Award through the International User Group. Any comments, questions, or curiosities can be sent to mscranto@us.ibm.com

Summary:  This article will highlight a number of fundamental concepts of the internals of IBM Informix Dynamic Server to help you support and maintain IBM Informix Dynamic Server.

Date:  25 Apr 2002
Level:  Introductory
Activity:  238 views

© 2002 International Business Machines Corporation. All rights reserved.

Introduction

The word "internals" can sometimes make even the most technical people run away in fear. There is a feeling that having knowledge of the internals of the IBM Informix® Dynamic ServerTM (IDS) engine just isn't necessary for troubleshooting, maintenance, and support. I strongly disagree. There are many internals concepts that can help nearly anyone in the day-to-day support and troubleshooting.

In the seven years that I have worked exclusively with the IDS and engine-related areas, I have found that having internals knowledge adds another dimension to my view of the engine and how it works. It also leads me into areas of discovery that I might not have otherwise encountered.

In this article, I will highlight a number of fundamental concepts of the internals of IDS that are helpful for you to know if you are in a position to support and maintain the product. It will focus mostly on areas that we can capture on disk shared memory components. To better appreciate the content of this paper, you should have good experience with the utilities onstat and oncheck, as well as strong experience with IDS.


Reason to know the internals

There are many reasons to know the internals of IDS. Here are just a few:

  • Troubleshooting-understanding concepts like page addresses can help you tremendously in the interpretation of many outputs and diagnostic information.
  • Tech support-when you are talking with our tech support staff, having a good foundation of knowledge can be very beneficial. Being able to understand and use similar terminologies can also be of great value.
  • Problem diagnosis-you can fix problems much faster with basic knowledge of internals concepts and structures.
  • Understanding engine behavior-many times, being able to display the contents of a page at the lowest level will explain a curious or unknown behavior of the engine.

I will use examples to illustrate these reasons. . It will also add more examples and useful application of the IDS internals.


Terminology

The following terms are used in this article:

  • IBM Informix Dynamic Server (IDS)
    Refers to the IBM Informix Dynamic Server product, version 7.x or 9.2x. Sometimes called IDS for short.
  • Engine
    A general term for the IBM IDS product.
  • oncheck
    A utility for looking a disk structures within the IBM IDS engine, as well as repairing some structures.
  • onstat
    A utility for looking at shared memory structures.
  • user
    A session or connection to the IDS engine.
  • tablespace
    In general terms, the same as a table.
  • fragment
    Logical view: part of a table; physical view: a tablespace.
  • partition
    Another name for a table, tablespace or fragment.
  • extent
    Contiguous pages from a chunk allocated to a tablespace.
  • free extent
    Contiguous page from a chunk not yet allocated.

Numeric representation

This section describes some basics of numeric representation and on units of measure.

Hexadecimal versus Decimal

Many outputs from onstat and oncheck have a mixture of hexadecimal (hex) and decimal numbers. A page header for example will have both hex and decimal in the content when output from the oncheck utility. This isn't just an inconsistency-there is actually a great rule of thumb to explain the logic behind it:

  • Hexadecimal output: Any output to do with a page-a physical address, a logical page number for example-will most often be in hexadecimal. Although these numbers are actually stored in decimal, you should prepare yourself for interpretation in hex most of the time.

    Another critical point to remember is that oncheck and onstat don't display leading zeroes. This can be very confusing in some cases. It's imperative to commit to memory the formats presented in this section. They will be invaluable many times over.
  • Decimal output: Any numbers that are "counters," or which keep track of each time a specific event increments or decrements, will typically be in decimal. The free bytes on a page in the page header, for example, will be a decimal number.

Pages, Kbytes, bytes and 8-KB blocks

Another area of confusion when looking at outputs is the unit of measure. You could argue that there are inconsistencies here, but again, there are some rules of thumb that will help a great deal.

  • When you specify a number to the engine, it's typically in kilobytes (KB). When you configure chunk sizes, memory allocations, etc., it's almost always in KB. An exception here is when you configure BUFFERS; in that case it's in 2-KB or 4-KB pages.
  • When interpreting output from the engine, it's typically in pages-2 KB or 4 KB. Under the covers, many structures are organized in pages, so it makes sense. An exception to the above rule are memory-related outputs. This type of output can be in bytes, KB, or even 8-KB blocks, as in the output from the utility onstat -g seg.

Physical pages and logical pages

The physical pPage

The physical page address is a very valuable concept that I use over and over when troubleshooting and diagnosing a problem. You will see more references to physical pages than logical pages (discussed later), although the reference to them typically doesn't identify which type of page is being referenced.

Format: 4 bytes, 0xCCCPPPPP, where

  • CCC
    This is the 1S byte-chunk number, which starts with 1
  • PPPPP
    This is the 2S byte-page offset within the chunk, which starts with 0.

Remember: A physical page is always located with respect to a chunk.

The logical pPage

The logical page concept is very important as well for a number of reasons. Very seldom do you see references to it in output, but when you are troubleshooting and diagnosing, the knowledge of what is a logical page can be very helpful.

There is really no format to be concerned with as in the case of the physical page. A logical page is always with respect to a tablespace. Just imagine a tablespace and its extents. If you gathered the extents together-one or more-and numbered the pages 1 to x, that would be the set of logical pages for that tablespace. It doesn't matter where the pages are physically located in a a chunk or chunks.

Remember: A logical page is always located with respect to a tablespace.

Figure 1. Relationship of logical and physical pages, with one chunk and one tablespace extent


Figure 1

Using the physical page address

A great example of using the physical page address is in the case of an assertion failure due to a "bad page." An assertion failure is a condition in the engine that should not be true, or doesn't make sense based on other conditions. The assertion failure file is usually found in /tmp/af.<unique number>, although you can change this via the onconfig file.

For example, the engine generates an assertion failure if it detects page corruption; that is, a value on the page is wrong or doesn't make sense. The engine typically comes down (is taken offline), and a message is written to the message log showing that a "bad page" has been found, and the message includes the physicall address of the bad page. . The engine produces an assertion failure file, usually in /tmp.

If you can interpret the bad page physical page address, you can drill down and determine what chunk has the error, and even what table has a problem page. The steps for identifying which table are as follows:

  • Convert the requested page address to a chunk number and page offset in decimal.
    Code Example 1
  • Using the output from oncheck -pe, find the extend list for the corresponding chunk.
  • Interpolate to find which extend contains the page offset.
    "code output here"

Usage: Where are my logs?

The physical page address is a great way to quickly viewing where the logical logs are located. More specifically, it indicates in which dbspace a log resides. I've had clients wonder if they have moved their logical logs to another dbspace. You can use onmonitor, although in some releases it's not supported. You can view the reserved pages with oncheck -pe, although if you have many logical logs, it can be cumbersome to wade through all the information. The best way is to simply do an onstat -l, and interpret the physical page address of each logical log.

Take a peek at the following output. Can you tell which dbspace the logical logs are located in? Yes, you can.

Figure 2: Page address of logical logs

Physical Logging 
Buffer bufused  bufsize  numpages numwrits pages/io 
  P-2  0        64       1639     141      11.62 
      phybegin physize  phypos   phyused  %used 
      100107   500      271      0        0.00 
 
Logical Logging 
Buffer bufused  bufsize  numrecs  numpages numwrits recs/pages pages/io 
  L-1  0        64       113765   15746    11169    7.2        1.4 
        Subsystem    numrecs  Log Space used 
        OLDRSAM      113765   11346056 
 
address  number   flags    uniqid   begin        size     used    %used 
a047e50  1        U---C-L  2683     1002fb        750      269    35.87 
a047e6c  2        U-B----  2678     1005e9        750      750   100.00 
a047e88  3        U-B----  2679     1008d7        750      750   100.00 
a047ea4  4        U-B----  2680     100bc5        750      750   100.00 
a047ec0  5        U-B----  2681     100eb3        750      750   100.00 
a047edc  6        U-B----  2682     1011a1        750      750   100.00

Check out the "begin" column in the lower output. This is the physical page address of the beginning of each logical log. Recall that the physical page address format is 4 bytes of the format:

0xCCCPPPPP, where

  • CCC
    This is the 1S byte-chunk number, which starts with 1
  • PPPPP
    This is the 2S byte-page offset within the chunk, which starts with 0.

Therefore, looking at log number 1, the physical page address is 1002fb, or 0x001002fb for the full address. This is a hex number and doesn't display leading zeroes (typical of oncheck or onstat). If you parse this address, you'll see quickly that the 001 (shown as 1) is the dbspace number, and the 002fb is the page offset within the chunk. You don't care about that at this point, just that this logical log falls into dbspace number 1. In all cases, this would be the root dbspace. If the number were something other than 1, a little further interrogation of the dbspaces (I suggest onstat -d) would tell you where the logical logs are located. Note this also works for the physical log file as well. Look back at the output, and towards the top, note the phybegin column. Its address is 100107, or 0x00100107 for the full address. The physical log therefore is also located in dbspace number 1, or the root dbspace.


The tablespace tablespace

If you hang around IDS long enough, you'll hear mention of the tablespace tablespace. It's another fundamental concept that is important to know. The tablespace tablespace keeps track of or describes all the tables and fragments within that specific dbspace. Therefore, each dbspace has one tablespace tablespace.

If you are looking at output from oncheck -pe, you may see what appears to be more than one tablespace tablespace for one dbspace. Not true-these are just multiple extents that have been allocated for that tablespace tablespace as it has grown.

The partition page

Each page in a tablespace tablespace (excluding bitmaps) is typically referred to as a partition page. This comes from the general use of the word "partition" to mean a tablespace or fragment. Typically, each tablespace has one partition page. Thus, this page describes a tablespace.

Understanding the content of a partition page is a very fundamental concept that can be quite illuminating the more you use it. First, let's define a slot. An easy approach is to say a slot equals a row. More accurately, though, a slot is simply a container. There is no better example of this than the partition page. As you'll see below, each of the five slots (and only five slots in a partition page) contains information that describes important pieces of a tablespace. Here is what each partition page for a table contains:

Slot 1: The partition structure. This slot contains what is called "the partition structure," which contains a great deal of information about the tablespace. This includes the number of rows, current SERIAL value, number of indexes, and pages allocated versus pages used.

For version 7.x this slot is 56 bytes, fixed, and for version 9.2, it's 92 bytes, fixed.

Slot 2: General table information. This slot contains information such as table name, owner, GLS information, and database.

The length on this slot is fixed, but varies between some releases.

Slot 3: Special columns. This slot contains information on varchars or blobs, if any.

The length of this slot will also vary. If your table doesn't have any varchars or blobs, the length should be 0.

Slot 4: Indexes. This slot contains information on attached indexes, if any.

The length of this slot could also be 0 if your tablespace doesn't have any indexes. Recall that with a detached index, the partition page for the data fragment of a tablespace will never have any index information associated with it, therefore this slot length will always be 0.

Slot 5: Extent list. This slot contains the extent list. It includes the logical page and physical page allocation for each extent. It also includes a final on-deck , or cap entry. Each extent allocated, including the on-deck or cap entry, is 8 bytes long.

Usage: Reading a partition page

One way to read from the partition page is with an oncheck -pt <database>:<table>. This will show Slot 1 and Slot 5 contents. The oncheck -pT <database>: <table> shows a great deal more information including index details.

Another way to view this information is to query sysmaster:sysptnhdr, or sysmaster:systabinfo, based on partnum.

The information in the partition page is not dependent on the running of UPDATE STATISTICS. So, you can find items such as row count at any time. As always though, use of oncheck should be limited due to resources it can consume and the potential for locking on the structures.

Here's an example of an oncheck -pt output, and Slot 5, or the extent list for a tablespace:

Figure 3. Output from oncheck -pt and Slot 5


Figure 3

This example shows where the table resides physically, or on which chunks. For the two extents shown here, one is on chunk 4, and the other is on chunk 5. The size column represents how large that specific extent is, in unit of pages. So a quick scan of the partition page for a table can tell you how many extents a tablespace has been allocated (not from this output, but the first part of the oncheck -pt command), and what physical location, or chunk, each extent resides on.

For fragmented tables or for tables with detached or fragmented indexes, you must be careful. Each fragment, data (called table fragment), or index (called index fragment) has a partition page associated with it. For example, consider a table "skippy" in database "sparky" as an example. The table is fragmented across three dbspaces, so there would be three data fragments. The indexes are fragmented the same way as the table and, thus would reside in the same dbspace as the data, but in different extents. Therefore, there would be three fragments for the index. An oncheck -pt sparky:skippy command would have six parts-one for each fragment-three for the data fragments, and three for the index fragments.


PARTNUMs

PARTNUM is a term that you see all over IBM Informix Dynamic Server outputs from utilities like onstat and oncheck. Other terms you'll also see for PARTNUM are tblsnum, partn, and even fragid. Here are some fundamentals:

  • Each tablespace has one unique partnum. This includes fragmented and non-fragmented tables. A fragmented table is actually comprised of many fragments, or tablespaces. Each fragment has a unique partnum. To the user, it looks like one logical table. But to the engine, each fragment is a table, or tablespace.
  • A PARTNUM does not point to the table itself, but a page that points to, or describes, the table or fragment. We have called it a double indirect pointer at times. The page that a partnum points to is the partition page for a table, as described in **add pointer to correct heading here**.

The format of PARTNUM is 4 bytes,

0xDDDLLLLL, where

  • DDD
    This is the 1S byte-dbspace number, which starts with 1.
  • LLLLL
    This is the 2S byte-logical page number within the tablespace tablespace for that dbspace.

Usage: The route to data

The following question often comes up: "How does the engine get to the data?" Understanding the route it takes to get there highlights a couple of important engine fundamentals. Here are the steps the engine goes through to get to your data:

user action engine action
DATABASE skippy;1. The engine looks through the database tablespace in rootdbs for database "skippy".
2. If found, get partnum for the systables system catalog for database skippy.
3. Go to the dbspace with the systables tables based on the partnum.
4. Go to the partition page for systables, based on the partnum.
5. Read table info from the partition page, including the extent list. This is just the list of tables in this database, including the system catalogs.
6. Using the extent list for systables, gather the pages from the extent list into cache.
SELECT * from foo;7. Read through the systables pages to find table foo.
8. Use the partnum for foo to locate its partition page.
9. Go to the dbspace with the partition page for foo.
10. Read the partition page for foo, and use the extent list to go to the physical pages.
11. Read as many pages as needed for the query.

Usage: Finding a table by dbspace

Let's bring the tablespace tablespace and the PARTNUM together. Given an example PARTNUM of 0x300021, the table that has this PARTNUM resides in dbspace number 3, and page 0x21 in the tablespace tablespace for dbspace number 3 describes the table. Knowing what the logical page number is that describes this tablespace doesn't help too much. But since there isn't an onstat showing tables by dbspace, having the knowledge of what the partnum really means is handy.


Maximum extents for a table

Tthis is a more advanced topic that relies on the previous three topics for understanding. Recall that each tablespace typically has one partition page. In this partition page, we allocate five slots to track info about the tablespace. Remember that Slot 5 was reserved for the extent list for a tablespace. Here are the slot sizes:

Slot # Description Size
Slot 1 Partition structure56 bytes (7.x), 92 bytes (9.2)
Slot 2 General table information varying size depending on port.
Slot 3 Special columns like varchars and blobs0-x bytes, depending on varchar or blob presence.
Slot 4 Indexes0-x bytes, depending on index presence.
Slot 5 Extent list16-x bytes, depending on extents allocated. (The 16 bytes represents the on-deck/cap entry, and the initial extent allocation. Each of these are 8 bytes).

Fundamentally, a tablespace can have as many extents as can fit in Slot 5. The size of the other slots, and the page size determines the room leftin Slot 5-either 2KB or 4KB. Does that mean a 4-KB port can have more extents per tablespace? Yes. For both ports, page overhead is 28 bytes. Therefore, a 2-KB port has 2020 bytes available and a 4-KB port has 4068 bytes available on each partition page.

Keep in mind that this also means that a tablespace can run out of extents even if there is a large amount of free chunk space in the dbspace.

Usage: A test case

Here is how I proved , that a tablespace can run out of extents due to the room left on the partition page for that tablespace. Here was the setup:

  • I created two tables, named "this" and "that", in a dbspace with many 1-megabyte (MB) chunks.
  • I projected the number of extents available as follows:

    Each new extent takes 8 bytes:
    • 4 bytes for the logical page address.
    • 4 bytes for the physical page address.

      The engine adds a cap/on-deck entry to cap the extent list. It also takes 8 bytes.So, the projection is as follows:

Figure 4: Extent projection for Slot 5

BYTES
Page Size 2048
Header -24
Pg Trailing Timestamp -4
5 slot table entries -20
slot 1: Table Structure -92
slot 2: Gen table Info -32
slot 3: Varchars/Blobs 0
slot 4: Indexes 0
1 Cap Entry for end of extent list -8
Room Left for Slot 5: extent list 1868
(Slot 5 room)/8 bytes=233.50

This shows that the maximum number of extents for the table should be 233.

  • Then I inserted rows alternately to each table to fill the first extent, plus one more row to force a new extent allocation. Eventually, these errors were returned to the application:
 
	271: Could not insert new row into the table. 
 
	136: ISAM error: no more extents

I used oncheck -pt to check the number of extents allocated for each table.

Figure 5: Extent report


Figure 5

This proves that a tablespace can run out of extents due to running out of room on the partition page for that tablespace. Some things to remember:

  • With IDS 9.2, indexes are detached by default, so this possibility is minimized, because the index and data fragment/tablespaces each have their own extents.
  • If a table has many attached indexes, varchars, or blobs, it increases the chance of running out of extents.
  • The best way to avoid running out of extents is to preallocate appropriate extent sizes for your tables or indexes.
  • A 2-KB partition page can hold fewer extents in the Slot 5 extent list than a 4-KB partition page, all things being equal.

Physical log overflow

A topic that seems to be murky to some people is the concept of physical log overflow. I base this conclusion on how many times in class I pose the question "What happens when the engine has wrapped around to the beginning of the physical log, and needs more room to keep writing?" Most students say the engine hangs. That is true in the case of the logical logs, but not the physical log.

Just a basic concept first: the physical log file is used predominantly for fast recovery. It receives the first before-image of pages modified in the buffer cache. That includes user tables, internal engine tables, whatever. This first before-image is needed to start the fast recovery process, because it is the last known point of physical consistency. So, as pages are being modified in the buffer cache (insert, update, delete), the first image is copied to the physical log file. (For simplicity, I've left out the use of the physical log buffers). In a busy environment, many pages can be sent to the physical log file over a short period of time.

Also, remember that when the physical log file becomes 75 percent full, a checkpoint request is raised. This simply means that we have requested a checkpoint when it can be done. From a very basic point of view, this means that when all threads are out of critical sections , we can start the checkpoint. (One could argue, and rightfully so, that the checkpoint started when the flag was raised, but it isn't very important in this context.)

Imagine the following:

  1. Pages are being copied to the physical log file as buffer cache pages are being modified.
  2. The physical log file is equal to or past 75% full. A checkpoint request is raised.
  3. The engine continues to copy pages to the physical log file, beyond the 75 percent mark.
  4. The engine approaches the beginning of the physical log file
  5. The engine keeps writing and, therefore, crosses over the current beginning of the physical log file
  6. The new pages written at page 0, page 1, etc., have overwritten the old pages.
  7. These old pages were needed for fast recovery, but they are literally gone-there is no way to recover them.
  8. Fast recovery will be impossible.

This scenario describes physical log overflow.

This could pose a problem. There are two scenarios that can occur:

  • All threads exit their critical sections. They were blocking the checkpoint.
    1. The checkpoint "starts," and it completes successfully.
    2. At the end of the checkpoint, the physical log file is "flushed." Technically, the engine simply gives the physical log file a new beginning, or a new physical beginning address, via the reserved pages. The engine no longer needs the original before-image pages for recovery, because the checkpoint has synced the changed pages in memory down to their originating chunks.
      In this scenario, all is well. The physical log overflow hasn't caused any harm. It's still not a desired situation, it's just that nothing tragic occurred during the overflow.

Here is a bad scenario:

  • The engine dies after the overflow has occurred, but before the checkpoint has completed.
    1. An attempt to restart the engine will hang. The pages at the beginning of the physical log have no meaning to the fast recovery that is being attempted. They are supposed to be pages that have been changed since the last full checkpoint, but are actually newer pages that have overwritten the other pages that were there.
    2. Any subsequent attempts to start the engine (without tech support intervention) will leave the engine stuck in fast recovery.

To avoid physical log overflow, size the physical log file appropriately. Watch the message log for indications of physical log overflow. Again, if it occurs, and the checkpoint completes, all is well. But if not, you'll most likely call tech support.


Conclusion

I realize that I have touched on a very small number of topics with regard to IBM Informix Dynamic Server internals. I have found "digging in" to the internals to be fascinating, because it brings a great deal of validation about the engine and how it works. An unexpected benefit of the internals knowledge has been the ability to troubleshoot faster when I can interpret page addresses and partnums "on the fly." I will admit, though, sometimes when troubleshooting I lean towards using internals knowledge to troubleshoot, and eventually realize that there is an onstat that would provide the same information much quicker. So, choose your use of the internals carefully. But I think you'll find that as you use the topics in this article you, too, will find it fascinating and informative.


About the author

Mark Scranton is a Principal Consultant/Trainer who joined Informix Software in 1995. Mark has focused exclusively on IBM Informix Dynamic Server and Extended Parallel Server engine-related topics since joining Informix. Mark spends a great deal of time in the classroom, teaching clients classes such as "Informix Dynamic Server Internal Architecture," "Informix Dynamic Server Performance Tuning," and the "Informix Dynamic Server Master Series." Mark also contributed to The Informix Handbook, and is a frequent presenter at the International Informix User Conference. Mark was also a nominee for the IIUGY Award through the International User Group. Any comments, questions, or curiosities can be sent to mscranto@us.ibm.com

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13876
ArticleTitle=Bringing IBM IDS Internals to the Surface
publish-date=04252002
author1-email=mailto:mscranto@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers