Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

IBM Linux Scholar Challenge: Stephen's story

Constructively deconstructing the Linux kernel

Stephen Evanchik (evanchsa@clarkson.edu), Student, Clarkson University
Stephen Evanchik is currently finishing a Bachelor's Degree in Computer Science at Clarkson University. He is the Technical Lead in the newly founded Clarkson Open Source Institute, a student-driven organization that explores open source software solutions and development. You can contact Stephen at evanchsa@clarkson.edu.

Summary:  Stephen Evanchik designed a group resource tracking system for the Linux operating system kernel. Though his entry didn't win in the Challenge, the points he earned contributed to the school's overall win. In this article, he recounts his experience entering the Challenge and describes his project. He also explains just why you might want to go to school way up in the frozen North.

Date:  01 Jun 2002
Level:  Introductory

Activity:  2092 views
Comments:  

Linux and the North Country

Attending college in the North Country is an excellent way to receive a good education, especially at Clarkson University, because of how isolated the region is from the rest of the world. It's nearly two hours to a major metropolitan area (depending on how fast you drive). This situation is conducive both to study and to Linux development. The conditions here are startlingly similar to those that Linus Torvalds described when he began writing Linux: the weather is invariably cold and dreary, because it is winter during most of a student's time here. Wind chills of 20 to 30 degrees below zero make locking yourself in a computer lab for an all-night coding session sound inviting. In fact, it's because of these all-night coding sessions that I've become very partial to Linux development. One specific coding session is what really got me started on Linux related projects.

I was working on a small personal project that used Linux to cluster some machines. I wanted to boot the machines from the network so that I could centralize the kernel images, but I wasn't having much luck. After many daytime attempts, I decided to come back in the evening after dinner, certain that I could figure out the problem in a couple hours. I was a poor judge of time; 3 in the morning came, and I was still unable to get the machines to boot. However, I learned that I do my best work in the wee hours of the morning -- I was about to have a breakthrough. After reading all of the documentation and double-checking my work, I finally realized why the average person doesn't read the directions: they are often wrong. Instead of giving up, I dove headfirst into the kernel source and found a couple of bugs. I can't remember where they were exactly, but it was a simple fix somewhere in the IP autoconfigure source.

After making the kernel conform to the documentation, my problem was solved. Once I had worked this out, I was able to finally concentrate on what I really wanted to do: set up a small cluster. This small bug was probably already fixed in the newest kernel source, but it started me down a long and often sleepless road, playing with the kernel that would begin with the Advanced Operating Systems class and progress to the IBM Linux Scholar Challenge. Oh, and in-between I helped build the Clarkson Open Source Institute (COSI), ate, went to my other classes, and caught a few hours of sleep if I was lucky.


Advanced Operating Systems: Papers, papers, and still more papers

The idea to develop projects for the IBM Linux Scholar Challenge came out of a presentation at the first Association for Computing Machinery/Linux Users' Group meeting on campus. A few influential attendees expressed interest in the contest. One of those was Dr. Jeanna Matthews, who was teaching my Advanced Operating Systems class. She asked if I thought the IBM Linux Scholar Challenge would be a better project for our class to work on than the proposed project, which involved implementing a fair-share scheduler for Linux. I immediately said that it would be better; after all, who would pass up doing something cool when the alternative is a fair-share scheduler? (I apologize to those who think that implementing a fair-share scheduler would be fun, but I'd rather devise my own project.) With no time to spare (the deadline was less than a week away), the class voted to work on Linux Challenge projects. However, that left us all with the same problem: what do we do for projects?

Lack of ideas didn't last long: how could they in an operating systems class that studied past and current OS research papers? Each day in class we discussed operating system topics ranging from the days of UNIX and Multics to the best paper for the Symposium on Operating System Principles 2001. The amount of information we were consuming was staggering, and from these papers grew our IBM Linux Scholar Challenge projects. My original idea was to implement partial file caching for the Andrew File System. Unfortunately, I could never figure out whether the newly open-sourced AFS, OpenAFS, implemented it. Fortunately for me, there was another project that Bryan Clark had come up with.

Bryan had been scanning the Linux kernel source code during an undergraduate operating systems class we were both in during the spring semester. He found that kernel developers had set up a framework for global user resource tracking, but hadn't quite found the time to start it. His project was plain and simple: implement global user resource tracking. The main goal of the project was to replace the getrlimit and setrlimit system calls with a system that was more flexible and manageable. I liked his idea so much that I abandoned the AFS partial file caching project to work with him on resource tracking.


Who needs sleep?

I believe that programming is a state of mind. The programmer must carefully think through the problem at hand to code a solution that is both functional and manageable. Bryan and I decided that it would be best to begin programming at 8 p.m. or later. This is an unspoken rule among computer scientists: no coding during daylight hours. (In fact, the real coding doesn't begin until 2 a.m. -- which is still very early for productive work to be done.) Now, take this and multiply it by seven and you'll have what we experienced for a week.

With all this talk of how difficult coding can be, you'd think that we were a couple of freshmen writing our first "hello world" program. Well, you're wrong. We had it all planned out. Limiting a user or group to a specific amount of resources was trivial, all we had to do was find where that resource was consumed or released and wrap that in a conditional to account for the user's actions. Here's the process accounting:


Linux process accounting

int do_fork(unsigned long clone_flags, unsigned long stack_start,
            struct pt_regs *regs, unsigned long stack_size)
{
        int retval;
        struct task_struct *p;
        .
        .
        /* Cut to relavent portion of the code */
        .
        .
        atomic_inc(&p->user->__count);

        /* User Resource Tracking
         */
        if(p->euid != 0) {

            /* This is a little bit more complicated with resource groups
             * I must first check to see if the group has enough resources 
             * to allow the user
             * to continue. Then I must see if the user can continue 
             * because the user may
             * have been capped or something else might be limiting the user.
             * If the user can't allocate the resources, we back the resource 
             * out of the group
             */
            if(atomic_read(&get_current_resource_group()->processes) >=
               atomic_read(&get_current_resource_group()->max_processes)) {

                goto bad_fork_free;
            }

            atomic_inc(&get_current_resource_group()->processes);

            if(atomic_read(&get_current_user()->processes) >=
               atomic_read(&get_current_user()->max_processes)) {

                /* Back the newly allocated resource out of the group
                 */
                atomic_dec(&get_current_resource_group()->processes);
                goto bad_fork_free;
            }
        }

        .
        .
        .
}
	from kernel/fork.c

In this example most of my work was done already because it is much more efficient to count the number of processes as the user creates them than walk the process tree each time the data is needed. This made my job easy: the only thing I had to add was the per-group conditional around Bryan's per-user conditional. When the process exited, get_current_user()->processes was decremented. One evening I thought of a possible race condition (I don't recall the situation right now) which made us think we had to lock these variables. Other than that, it was simple. Really it was. In fact, it was so simple the whole process should have taken a couple of days.


Resource tracking: group style

As with any project, dividing the workload is key, as is buying the correct caffeinated beverage and sugar source. For Bryan and me, this entailed a short drive to the local grocery store for some Coke and chewy chocolate chip cookies. I highly recommend both for long programming sessions. Since Bryan thought of the idea to do per-user resource tracking, I decided that I would work on per-group resource tracking. The theory behind my group tracking scheme is best summed up in the following hypothetical situation.

Linux startup

Suppose you run a business that has just invested a lot of money in a fairly powerful server. This machine is capable of serving your entire business's needs and you want to maximize its efficacy. To do so, you analyze the computing resources needed for each of your departments. The finance department only needs to store some records and occasionally needs access to do some database work: everyone in the department is put into a group and they collectively receive 15% of the machine's resources. The engineers are the ones that really need the computing power: they are grouped accordingly and given 75% of the machine. The remaining 10% is reserved for when the system administrator logs on to make sure everything is running smoothly.

This is a simplified and somewhat incorrect portrayal of a business situation, but it shows what I wanted to implement. I was also in charge of the user space utility, chres, used to load the user and group resource limits in to the kernel as well as monitor a specific user or group's resource consumption. Without it, we wouldn't have been able to test the kernel thoroughly. That part of the project was tedious,so I'll leave it your imagination (HINT: It looks and acts like chmod and chgrp and all those tools).


When good code goes bad

The first lesson I learned as programmer is that when someone else writes good code use it whenever possible. As the following shows, the kernel already had the data structure and functions necessary to store per-user resource information:


Per-user datatype


/*
 * UID task count cache, to get fast user lookup in "alloc_uid"
 * when changing user ID's (ie setuid() and friends).
 */
#define UIDHASH_BITS            8
#define UIDHASH_SZ              (1 << UIDHASH_BITS)
#define UIDHASH_MASK            (UIDHASH_SZ - 1)
#define __uidhashfn(uid)        (((uid >> UIDHASH_BITS) ^ uid) & UIDHASH_MASK)
#define uidhashentry(uid)       (uidhash_table + __uidhashfn(uid))

struct user_struct root_user = {
        __count:        ATOMIC_INIT(1),
        processes:      ATOMIC_INIT(1),
        files:          ATOMIC_INIT(0),
        sockets:        ATOMIC_INIT(0),
        mem:            ATOMIC_INIT(0),
        max_files:      ATOMIC_INIT(SYSTEM_GROUP_FILE_MAX),
        max_sockets:    ATOMIC_INIT(SYSTEM_GROUP_SOCKET_MAX),
        max_processes:  ATOMIC_INIT(SYSTEM_GROUP_PROCESS_MAX),
        max_mem:        ATOMIC_INIT(SYSTEM_GROUP_MEM_MAX),
};

static inline void uid_hash_insert(struct user_struct *up, struct user_struct **hashent);
static inline void uid_hash_remove(struct user_struct *up);
static inline struct user_struct *uid_hash_find(uid_t uid, struct user_struct **hashent);
void free_uid(struct user_struct *up);

	from kernel/user.c

Naturally I wanted to find a way to reuse as much of the existing code for the data structure and functions as possible, so that I didn't have to rewrite a lot of bug-tested code. The best way to accomplish this is to think of a group as a normal user with an extremely large pool of resources and then assign each user a group identifier that is stored in their user struct, as seen here:


Modifications for resource groups

struct user_struct root_user = {
        __count:        ATOMIC_INIT(1),
        processes:      ATOMIC_INIT(1),
        files:          ATOMIC_INIT(0),
        sockets:        ATOMIC_INIT(0),
        mem:            ATOMIC_INIT(0),
        max_files:      ATOMIC_INIT(SYSTEM_GROUP_FILE_MAX),
        max_sockets:    ATOMIC_INIT(SYSTEM_GROUP_SOCKET_MAX),
        max_processes:  ATOMIC_INIT(SYSTEM_GROUP_PROCESS_MAX),
        max_mem:        ATOMIC_INIT(SYSTEM_GROUP_MEM_MAX),
        group_uid:      0
};

/* System group; any non-assigned users go in to this group
 * including root
 */
struct user_struct system_group = {
        __count:        ATOMIC_INIT(1),
        processes:      ATOMIC_INIT(1),
        files:          ATOMIC_INIT(0),
        sockets:        ATOMIC_INIT(0),
        mem:            ATOMIC_INIT(0),
        max_files:      ATOMIC_INIT(SYSTEM_GROUP_FILE_MAX),
        max_sockets:    ATOMIC_INIT(SYSTEM_GROUP_SOCKET_MAX),
        max_processes:  ATOMIC_INIT(SYSTEM_GROUP_PROCESS_MAX),
        max_mem:        ATOMIC_INIT(SYSTEM_GROUP_MEM_MAX)
};

/* Resource group hash table information
 */
static kmem_cache_t *resource_group_cachep;
static struct user_struct *grouphash_table[UIDHASH_SZ];
static spinlock_t grouphash_lock = SPIN_LOCK_UNLOCKED;

#define grouphashentry(uid)     (grouphash_table + __uidhashfn(uid))

	from kernel/user.c

Then, whenever the user logs into the system the user's information and group association are loaded into the kernel using the same functions with nearly zero modifications. The group_uid in the user_struct is important. Every user is a member of the default system_group until their group assignment is determined. This is an excellent example of a design where we preserved backwards compatibility, provided new functionality, and didn't introduce untested code. By forcing each user into the system_group by default, the system administrator can compile in support for resource groups and implement them at a later time making the system immediately usable. There is only one minor difference in the user and group access functions, as shown below:


The single function modified for resource groups

/* User Resource Tracking - Resource Group search
 * Find the group specified by the uid passed, if
 * the group cannot be found then return the system
 * group.
 */
struct user_struct * find_group_uid(uid_t uid)
{
        struct user_struct **hashent = grouphashentry(uid);
        struct user_struct *up;

        spin_lock(&grouphash_lock);
        up = uid_hash_find(uid, hashent);
        spin_unlock(&grouphash_lock);

        if (!up) {

                /* User Resource Tracking
                 * This should never happen because everyone is a member
                 * of the system group by default
                 */
                BUG();
        }
        return up;
}

This extra function was necessary due to the way the existing uid_hash_find worked. All of the user hash functions are reused without modification which means the only potential problems arise from the functions' use and not their implementation. One final problem remained: the initalization of the group hash table. The best place for this is where the user hash table is initialized, as shown here:


Initialization of the uid cache

static int __init uid_cache_init(void)
{
     .
     .
     /* Cut to relevant portion of the code */
     .
     .
     resource_group_cachep = kmem_cache_create("res_group_cache", sizeof(struct user_struct),
                                    0,
                                    SLAB_HWCACHE_ALIGN, NULL, NULL);
     if(!resource_group_cachep)
             panic("Cannot create resource group taskcount SLAB cache\n");

     uid_hash_insert(&system_group, grouphashentry(0));

     return 0;
}

from kernel/user.c

In reusing the same functions and data structures I was able to concentrate on tracking resource use. This kind of code reuse is something that programmers dream about -- I certainly do when working on a project. The Linux kernel is beginning to evolve into a system that has many pre-built, bug-free functions or macros at your disposal that can be used to add more features without duplicating any code. This advancement should have made my job easier because a lot of code could have been reused. If this were true, the bean counting would have been easy: find the function or macro that everything else calls for allocation and deallocation and add my conditionals there. Remember, I want to insert conditionals, like the process conditional, not rewrite entire blocks of kernel code so that I can instrument it.

Unfortunately the single allocation function didn't exist. In fact, it seemed like the exception to the rule. Most kernel programmers were inclined to rewrite many of the pre-existing, tested functions into the particular module or subsystem they were working on. The best example I can give is when an mm_struct needs to be allocated. This well-tested (but not quite generic enough) function works great:


A function with great potential

struct mm_struct * mm_alloc(void)
{
        struct mm_struct * mm;

        mm = allocate_mm();
        if (mm) {

          memset(mm, 0, sizeof(*mm));
          return mm_init(mm);
        }
        return NULL;
}
	from kernel/fork.c

But you'll see that kernel programmers don't like to use this function much (it's only used twice in 2.4.12 and 2.4.17) because it zeros out the mm_struct before returning it. This is bad because there are times when the newly zeroed memory is going to contain something immediately after the mm_alloc call returns. Kernel programmers can avoid that extra memset by inlining all the steps necessary in allocating an mm_struct with the exception of the line that zeros the memory. In case you are wondering: THIS IS BAD! Imagine tracking a resource that was allocated or deallocated in such a manner, and you'll understand where I am coming from. The best way to handle this is to have two functions or macros or some combination of function and macro, where one calls the other, that is used to allocate an mm_struct. Not only does this make my job easier, but it cuts down on the amount of duplicate code within the kernel. I mention this because I found a couple of instances where the same lines of code found in mm_alloc were interspersed within another function's code. We had been having this sort of trouble all night while trying to figure out where three file handles were being "magically" released.


Why you should never limit root's resources

Contrary to popular belief, you should never limit root's resources when the machine is coming up. Especially to only five processes. I don't know how many times I forgot to exclude user ID 0 from the quotas and had to recompile the kernel. This small conditional probably cost me a few hours each evening. The simple fact is: kernel programming requires you to actually think about what you've just written and decide if it really works, or if you are just forcing it to work in your mind. Since Bryan and I were working into the wee hours of the morning, we probably weren't able to think clearly about our code; perhaps we needed more Chinese food or caffeinated beverages. In any case, we ended up writing a couple of tools once the machine was booted so that we could test the limits in a simulated environment. The userspace tool chres was designed to load the initial quotas into the kernel and to be able to change them at run time. It worked remarkably well, considering it was more of an afterthought than a pillar of our grand scheme of limitations.

When Bryan and I finally submitted our entries to the contest, we had been without sleep for nearly a week. It was a fun experience actually; I got quite a bit of work done besides my entry, which I'll attribute to my need to not fall asleep. The next step was to wait for the winners to be announced; a group of people I wasn't among but Bryan was. It was great to have so many from Clarkson University win, even if one of them wasn't me. Not winning actually helped me become motivated to do more for the Linux movement -- I ended up writing a small decoder for my remote control to my soundcard. I was finally able to use all of my hardware in Linux, which means I haven't been back to the darkside since winter break. I think for my next project, I'll get force feedback working for my mouse, but that will have to wait until I have some free time. The best piece of news I have now is that I'll be in Cambridge, Massachusetts this summer with the Extreme Blue program, which is going to be a lot of fun! It's been a wild ride this year and I'm glad I was in the middle of everything.


Resources

About the author

Stephen Evanchik is currently finishing a Bachelor's Degree in Computer Science at Clarkson University. He is the Technical Lead in the newly founded Clarkson Open Source Institute, a student-driven organization that explores open source software solutions and development. You can contact Stephen at evanchsa@clarkson.edu.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Sample IT projects
ArticleID=10205
ArticleTitle=IBM Linux Scholar Challenge: Stephen's story
publish-date=06012002
author1-email=evanchsa@clarkson.edu
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).