System call interception using a standard public IBM AIX interface

IBM AIX® provides a standard public interface named kmod_util to support system call interception since AIX 6.1 TL06 and AIX 7.1. In this tutorial, we provide step-by-step instructions explaining how kmod_util can be used to intercept system calls under various situations.

Jian Jun Wu (bjwjianj@cn.ibm.com), Advisory Software Engineer, IBM China

Jian Jun WuWu Jian Jun is a software engineer and has been working for IBM for the past five years, focusing on IBM AIX development support. He holds a PhD degree in Computer Science from Zhejiang University in China.



26 September 2013

Also available in Chinese

Introduction

System call interception is widely used for various purposes, such as security, debug, and so on. AIX provide a standard public interface, named kmod_util, to support system call interception since AIX 6.1 TL06 and AIX 7.1. This tutorial explains how to use the kmod_util interface for system call interception under different situations.


Understanding the running of a system call instruction

A system call is a routine that allows a user application to request actions that require special privileges. System calls are made by user code running a system call instruction, which causes the system call handler to be entered.

All system calls for a process resolve to dummy function descriptors that cause svc_instr to be called. When a user-level program is linked, the binder inserts the glue code (glink.s) for each unresolved external call. When the program is loaded, the loader resolves these calls by setting an entry in the users TOC to point to an entry in the user space portion of the system call (svc) table when it is a system call. This svc table entry contains a dummy function descriptor which has the svc_instr as its IAR and the svc number as its TOC. So when the user makes the system call, the code branches to the glue code, which loads the address of the descriptor (svc table entry), gets the iar (svc_instr) and toc value (svc number), loads the svc number into register 2 and branches to svc_instr. We must get the svc number of a system call before we intercept it.

Base system call

To better understand the running of a system call, we write a sample program testchmod (see sc_intercept.zip in the Downloads section) calling the base system call, chmod.

Listing 1. The sample program: testchmod.c
int main(void)
{
    printf("0x%llx\n", chmod);
    chmod("./testchmod.c", S_IRUSR|S_IWUSR|S_IXUSR);
    return 0;
}

The sample program displays the function pointer of chmod as the output:

# xlc -q64 -o testchmod testchmod.c
# ./testchmod
0x90000000048f3e0

Now, let us use dbx to check what happened when chmod is called.

First, we can get the glue code as shown in Figure 1.

Figure 1. Displaying the glue code
Displaying the glue code

Then, let us check what value is stored in the registers: r0, r2, and r12 as shown in Figure 2.

Figure 2. Getting the svc number
Getting the svc number

It is clear that the function pointer chmod (0x090000000048f3e0) points to the dummy function descriptor. The first double-word of the dummy function descriptor contains the address of the entry point of the system call (0x3700), and the second double-word contains the svc number of chmod (0x13e).

We can also use kdb to check the content of 0x90000000048f3e0 as shown in Figure 3.

Figure 3. Displaying the svc number
Displaying the svc number

The next instruction stored at the entry point 0x03700 is the system call instruction sc.

System call added by kernel extension

Adding system calls is one of several ways to extend the functions provided by the kernel. For demonstration purposes, we write a kernel extension that adds a system call named demo_syscall (see demo_syscall.zip in the Downloads section) to the kernel. The process calling demo_syscall will be put into sleep until the kernel extension is terminated.

Listing 2. The sample kernel extension: demo_syscall.c
int demo_syscall_init(int cmd, struct uio *uio)
{
    if (cmd == CFG_INIT) {
    scwait = EVENT_NULL; 
    } else if (cmd == CFG_TERM) {
    e_wakeup(&scwait); 
    } else {
    return -1; 
}
return 0; 
}
int demo_syscall(int arg) 
{
    int rc; 
    bsdlog(LOG_DEBUG | LOG_KERN, "demo_syscall %llx\n", arg); 
    rc = e_sleep_thread(&scwait, NULL, INTERRUPTIBLE); 
    return(rc); 
}

We also write a sample program calling demo_syscall.

Listing 3. The sample program: invoke_syscall.c
int main()
{
    if (demo_syscall(99) < 0){
    perror("demo_syscall error");
    exit(1); 
    }
    printf("demo_syscall 0x%llx\n", demo_syscall); 
    return 0; 
}

The following sample program displays the function pointer of demo_syscall as the output:

nimtb157:/home/ext>./invoke_syscall
demo_syscall 0x90000000086fe90

Using dbx, we can get the glue code and check the dummy function descriptor as shown in Figure 4.

Figure 4. Display the glue code and svc number
Display the glue code and svc number

Click to see larger image

Figure 4. Display the glue code and svc number

Display the glue code and svc number

We can find that the execution of system call added by kernel extension is just the same as the base system call. The first double-word of the dummy function descriptor contains the address of the entry point of the system call (0x3700), and the second double-word contains the svc number of demo_syscall (0x3e9).


kmod_util kernel service

From AIX 6.1 TL06 and AIX 7.1, the kmod_util kernel service is provided and this allows system calls to be intercepted. The interception of a system call is implemented so that all calls to the system call are intercepted, even for existing processes. The declaration of kmod_util is in <sys/sysconfig.h>:

int kmod_util(
    int flags,  /* cmd : KU_INTERCEPT, KU_INTERCEPT_STOP, KU_INTERCEPT_CANCEL */
    void *buffer,  /* cmd buffer */
    int length     /* buffer length */
);

The pre-sc and post-sc function

Routines called pre-sc functions are specified to be called before the intercepted system call. Routines called post-sc functions are specified to be called after the intercepted system call. In addition, a pre-sc function is allowed to abort the system call, providing its own return value and preventing subsequent pre-sc functions and the system call itself from being called. Similarly, each post-sc function can examine and alter the return value. If a system call does not return to the caller (for example, thread_terminate), post-sc functions are not called.

For each intercepted system call, either a pre-sc function or a post-sc function, or both, must be specified. If the pre-sc function and the post-sc function are registered for the same system call in the same kmod_util invocation, they are considered to be paired. All pre-sc and post-sc functions specified in a kmod_util call must be defined in the same kernel extension as that of the caller of the kmod_util kernel service. Other kernel extensions, however, can intercept the same system calls. The most recently registered pre-sc function is called first, and its paired post-sc function is called last.

The prototype of a pre-sc function is:

int pre_sc(uintptr_t *rc, void *parms, uintptr_t cookie, void *buffer);

Where parms is a pointer to the parameters of the system call, cookie is an opaque value specified by the caller of kmod_util, buffer is a scratch 128-byte buffer for use by the pre-sc function and its paired post-sc function.

If one pre-sc function returns non-zero, all subsequent pre-sc functions and the system call are aborted. The rc parameter is the address where an alternate return value can be specified. For pre-sc functions already called, their paired post-sc functions are called.

For example, suppose kernel extension A requests that system call foo() be intercepted, providing pre-sc and post-sc functions A-pre and A-post. Then, kernel extensions B and C request that system call foo() be intercepted, providing pre-sc and post-sc functions B-pre, C-pre, B-post, and C-post.

When the system call foo() is invoked, the following sequence of calls is made, assuming that all pre-sc functions return 0:

  • C-pre
  • B-pre
  • A-pre
  • foo
  • A-post
  • B-post
  • C-post

Now suppose that B-pre returns a non-zero value. The sequence of calls in this case is:

  • C-pre
  • B-pre
  • C-post

The prototype of a post-sc function is:

void post_sc(uintptr_t *rc, void *parms, uintptr_t cookie, void *buffer);

The parameters of the post-sc function are the same as those of the pre-sc function. In particular, the buffer parameter is the same buffer that was passed to the paired pre-sc function. The return value can be modified by a post-sc function.

Intercepting system call

Calls to kmod_util() with the KU_INTERCEPT flag can initiate system call interception.

Stopping system call interception

Calls to kmod_util() with the KU_INTERCEPT_STOP flag suspend the interception of the specified system calls. If a pre-sc function has already been called for a specified system call, its paired post-sc function will still be called, but future calls to the system call will not invoke either the pre-sc or the post-sc function. It is not valid to stop interception of a system call that was not originally intercepted by the calling kernel extension.

Cancelling system call interception

System call interception can be cancelled by specifying the KU_INTERCEPT_CANCEL flag. When the interception is cancelled, the post-sc function is not called even if its paired pre-sc function was called. It is not valid to cancel the interception of a system call that was not originally intercepted by the calling kernel extension, but the interception can be cancelled without first stopping the interception.

Buffer layout

For calls to the kmod_util kernel service, the buffer contains a header and an array of elements about system calls to be intercepted. The layout of these structures is defined in <sys/sysconfig.h>:

struct ku_element {
      void            *ku_svc;     /* function descriptor of system call */
      int              (*ku_pre)();   /* pre-sc function */
      void             (*ku_post)();  /* post-sc function */
      void            *ku_cookie;   /* cookie passed to pre-sc and post-sc function */
      int              ku_bsize;     /* intercept_stack size */
      unsigned short    ku_iflags;
      unsigned short    ku_oflags;
      };
      struct ku_intercept {
      int                 ku_version;  /* KU_VERSION */
      int                 ku_hdr_len;  /* sizeof(struct ku_intercept) */
      int                 ku_element_len; /* sizeof(struct ku_element) */
      int                 ku_num_elements; /* how many system calls to be 
      intercepted */
};

Sample kernel extension

To demonstrate the usage of kmod_util, we write a sample kernel extension that adds a system call named sc_intercept (see sc_intercept.zip in the Downloads section). The sc_intercept system call takes a cmd parameter, which selects the action (intercept, stop or cancel) that the caller wants, and a svc parameter, which specifies the function descriptor of the intercepted system call.

Listing 4. The sample kernel extension: sc_intercept.c
int myprec(uintptr_t *rc, void *parms, uintptr_t cookie, void *buffer) 
{
    bsdlog(LOG_DEBUG | LOG_KERN, "pre_sc: parms %llx, %s, %llx, %llx, %s, buffer 
    %llx\n",
    parms, *(long *)parms, ((char *)parms+8), *(long *)((char *)parms+8), cookie, 
    buffer);
    return 0; 
}

void mypost(uintptr_t *rc, void *parms, uintptr_t cookie, void *buffer)
{
   bsdlog(LOG_DEBUG | LOG_KERN, "post_sc: parms %llx, %s, %llx, %s, buffer %llx\n",
   parms, *(long *)parms, (long *)(parms+8), cookie, buffer); 
}

int sc_intercept(unsigned long long svc, int cmd) 
{
    struct ku_intercept *buffer; 
    struct ku_element *kue; 
    int buflen, rc; 
    
    /* valid cmd: KU_INTERCEPT, KU_INTERCEPT_STOP, KU_INTERCEPT_CANCEL */
    if(cmd<1 || cmd>3) 
    return -1; 
    buflen = sizeof(struct ku_intercept) + sizeof(struct ku_element); 
    buffer = xmalloc(buflen, 0, kernel_heap); 
    if(buffer==NULL) 
    return -1; 
    
    buffer->ku_version = KU_VERSION; 
    buffer->ku_hdr_len = sizeof(struct ku_intercept); 
    buffer->ku_element_len = sizeof(struct ku_element); 
    buffer->ku_num_elements = 1; 
    
    kue = (struct ku_element *)((char *)buffer + buffer->ku_hdr_len); 
    kue->ku_svc = (void *)svc; 
    kue->ku_pre = myprec; 
    kue->ku_post = mypost; 
    kue->ku_cookie = "This is cookie!"; 
    kue->ku_bsize = 0; 
    kue->ku_iflags = 0; 
    kue->ku_oflags = 0; 
    
    rc = kmod_util(cmd, buffer, buflen); 
    bsdlog(LOG_DEBUG | LOG_KERN, "rc: %d\n", rc); 
    xmfree(buffer, kernel_heap); 
    return rc; 
}

Get the function descriptor of a system call

To intercept a system call using kmod_util, we must know its function descriptor. This section demonstrates how to get the function descriptor of a system call in different situations.

Kernel storage-protection keys state

Kernel storage-protection keys provide a storage-protection mechanism for the kernel and kernel extensions. The method to get function descriptor of a system call is different when kernel storage protection keys state is different. We can use skeyctl to check and change the kernel storage protection keys state, as shown in Figure 5.

Figure 5. Display and change the kernel keys state
Display and change the kernel keys state

Base system call while kernel keys are enabled

We use the same sample program testchmod to demonstrate how to get the function descriptor of chmod. Using kdb, we can get the svc table base address. In section Understanding the running of a system call instruction, we have known that the svc number of chmod is 0x13e. Thus, we can calculate the svc table entry of chmod and get the function descriptor as shown in Figure 6.

Figure 6. Get function descriptor
Get function descriptor

In this example, the svc table base address is le_svc64 (0x04CB0000) and the svc table entry of chmod is at 0x04CB09F0. But, the data stored in 0x04CB09F0 is not the function descriptor because the kernel key is enabled. Instead, it points to two double- words before the address of the function descriptor. Thus, the function descriptor is 0x2B8F930.

To test the interception, we write a sample program named demo_intercept to call sc_intercept.

Listing 5. The sample program: demo_intercept.c
x = strtoull(argv[optind], 0, 16); 
rc = sc_intercept(x, cmd);
Figure 7. Test system call interception
Test system call interception

To view the interception result, we send the output of syslogd to the system console:

nimtb157:/home/ext/sc> more /etc/syslog.conf
*.debug               /dev/console
Figure 8. Test result of interception
Test result of interception

Base system call while kernel keys are disabled

When the kernel key is disabled, the function descriptor of system call is stored in its svc table entry, as shown in Figure 9.

Figure 9. Get function descriptor
Get function descriptor

We use the same method to test the interception, as shown in Figure 10.

Figure 10. Test system call interception
Test system call interception

We can view the interception result in system console, as shown in Figure 11.

Figure 11. Test result of interception
Test result of interception

System call added by kernel extension while kernel keys are enabled

We can use the same method as base system call to get the function descriptor for the system call added by kernel extension, as shown in Figure 12. Interestingly, we can find that the svc table entry not only contains the function descriptor, but also points to two double-words before the address of the function descriptor.

Figure 12. Get function descriptor
Get function descriptor

We use the same method to test the interception, as shown in Figure 13.

Figure 13. Test system call interception
Test system call interception

We can view the interception result in system console, as shown in Figure 14.

Figure 14. Test result of interception
Test result of interception

System call added by kernel extension while kernel keys are disabled

Similar to the base system call, the function descriptor is stored in the svc table entry, as shown in Figure 15.

Figure 15. Get function descriptor
Get function descriptor

We use the same method to test the interception, as shown in Figure 16.

Figure 16. Test system call interception
Test system call interception

We can view the interception result in system console, as shown in Figure 17.

Figure 17. Test result of interception
Test result of interception

Resources


Downloads

DescriptionNameSize
Code sampledemo_syscall.zip2.45 KB
Code samplesc_intercept.zip3.47 KB

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=945449
ArticleTitle=System call interception using a standard public IBM AIX interface
publish-date=09262013