Programming in the North Country
I first used Linux in the fall of 1997, my freshman year. I came to college with the first computer that I ever built. Constructing my first machine was almost like working with the NASA scientists; I needed to build the best box possible but I could only afford the lowest bidder. I ended up with a Cyrix 686 166Mhz processor and 32MB of SDRAM; according to all the benchmarks, this was one of the fastest chipset combos around. Of course I had no idea what the floating point abilities would affect, so I suffered through a year of all my friends being able to play Quake and other games, while my computer was best for number crunching the Tucows project. It was really fast, but not for graphics. During this depressing time I downloaded a copy of Linux off the Internet because it was supposed to make slow computers seem fast. Instead, it overwrote my MBR and destroyed my Windows partition. I didn't use Linux again for two years.
My junior year I moved off campus and needed a dial-up connection to the Internet. With my 5 roommates, I wired up a home network off some CAT-5 I had and called the local Linux nerd to help us set up something. Luckily, I still had my Cyrix machine sitting around, and that became our gateway to the Internet. This guy seemed to have no problem setting it up. I watched him and wanted to learn how it was done. I was really interested in Linux now that I needed it (and because it had beaten me once before), so I started learning as much as I could (hurray for the Man Pages!), because I didn't want to have to call someone everytime there was a problem, and the idea of Linux and open source software was very attractive to me.
Since then I've set up several home networks for friends and family (I'm the local Linux nerd now!). At home, I use an AMD 350 box I bought after my initial PC for centralizing services such as NIS, NFS, and soon AFS with replace NFS. The Cyrix still routes the internal network that my roommates and I use, but I code and compile on a Dual 450 Intel machine that I picked up cheap from a computer show two years ago. Of course now I'm coding on a new IBM ThinkPad T23! Almost too many computers!
The call of the Linux Scholar Challenge
In September 2001, I enrolled in the graduate operating system class at Clarkson; it was a great class. We read weekly journals on varied topics in operating systems and we discussed and debated things like mach, amoeba, and grapevine. There were 6 students in the class, and, as a part of our final projects, we entered our work in the Challenge.
As you can see from his comment in the code sample below, Linus Torvalds mentions creating a full-fledged user tracking system. I found this comment when I was doing some hacking in the kernel for my first operating systems class, and I remembered it because I thought it would be a neat project. As luck would have it, I got a chance to code up the idea for the graduate operating systems class and enter it in the Challenge.
The project concept itself is really quite simple. As you know, users occupy a certain amount of resources; things like memory, CPU time, file locks, and sockets. Originally the goal of resource tracking was to preserve the system by preventing users from allocating more resources than the system could afford. This meant strictly monitoring and limiting all resources. Now, however, resource tracking has evolved into a much more robust system that audits users as well as limiting them.
At first, diving into the Linux kernel may seem like a daunting task. There are many files and thousands of lines of code that control every aspect of your computer. Despite the lack of detailed comments, there are plenty of other sources of information about the kernel code. Given a small workspace, you can easily begin to understand a good portion of the kernel and how it works.
I started with the sched.h file, which is usually located in the $LINUX_SRC/include/linux/ directory where $LINUX_SRC is the base directory of your Linux source code, usually something like the /usr/src directory. I didn't make any changes to the task_struct structure; it already has everything we need to implement user tracking. It is shown below because the task_struct is very useful in determining resources that are attached to a process.
include <linux/sched.h> /* task_struct */
struct task_struct {
/* process credentials */
uid_t uid,euid,suid,fsuid;
gid_t gid,egid,sgid,fsgid;
int ngroups;
gid_t groups[NGROUPS];
kernel_cap_t cap_effective, cap_inheritable, cap_permitted;
int keep_capabilities:1;
struct user_struct *user;
/* limits */
struct rlimit rlim[RLIM_NLIMITS];
unsigned short used_math;
char comm[16];
};
|
Here you can see the user_struct, where all the magic happens. Inside the struct originally were the top three variables as well as the hash table information. While I make use of those top three variables in the struct, I left the arrangement the same and just added my new variables to the bottom so that the changes would be obvious.
As you've probably guessed by now, all the max_* variables hold the maximum values for each resource that we track while the other variables hold the current value used by the user. I used the atomic_t variable type because it was what was already in use for the original variables and it provides us with atomic transactions for adding and subtracting values. The atomic_t is defined elsewhere as a structure which uses assembly code to ensure an atomic transaction for incrementing or decrementing.
include <linux/sched.h> /* user_struct */
/*
* Some day this will be a full-fledged user tracking system..
*/
struct user_struct {
atomic_t __count; /* reference count */
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
/* Hash table maintenance information */
struct user_struct *next, **pprev;
uid_t uid;
/* Group uid for resource groups
*/
uid_t group_uid;
/*
* Extra user tracking information
*/
atomic_t sockets;
atomic_t mem;
struct rw_semaphore urt_sem;
/* User tracking limits
*/
atomic_t max_processes;
atomic_t max_files;
atomic_t max_sockets;
atomic_t max_mem;
};
|
The power of the "current" process
To simplify retrieving the user_struct for the current process running I defined a macro, get_current_user(), to return the user_struct that belongs to the current process. If you've never seen the current variable before don't worry, it's a little strange to get used to, but once you do it's really a great thing to have. The current variable is a global variable that is linked to the currently running process and is of the type task_struct, for the current task running. As the tasks get switched in and out so do the task_struct that current points to.
include <linux/sched.h> /* get_current_user() */
#define get_current_user() ({ \
struct user_struct *__user = current->user; \
atomic_inc(&__user->__count); \
__user; })
|
The next two pieces of code illustrate how simple process accounting really is inside the kernel. The do_fork() function is the only function that creates new processes; therefore, this is where you can insert code for checking the counter against your maximum values. Incrementing on the counter is done if the earlier check succeeds. Otherwise, a bad_fork_free is returned. The incrementing is done outside the #ifdef statements only because the incrementing function was already in place before I inserted my limiter. If it's not broke, don't fix it!
"kernel/fork.c" /* do_fork() */
int do_fork(unsigned long clone_flags, unsigned long stack_start,
struct pt_regs *regs, unsigned long stack_size)
{
....
#ifdef CONFIG_RESOURCE_PROCESS
/* User Resource Tracking
*/
if(p->euid != 0) {
/*
so if we've gotten here our user is not the root user
Now we want to check on our process usage
and see if we're at our limit
*/
if(atomic_read(&get_current_user()->processes) >=
atomic_read(&get_current_user()->max_processes)) {
goto bad_fork_free;
}
}
#endif
atomic_inc(&p->user->processes);
....
|
Always check who you are limiting. If you do not check for the presence of the root user then you can and often will cause great headaches and computers that will not complete their boot sequence. On my first test run of the system after implementing the process accounting, it took several reboots to figure out why the computer would not boot up completely. Everything seemed to initialize correctly in the boot sequence. The initialization sequence would just hit a certain spot every time, and it would stop responding, with no errors reported, just silence from the kernel. And then I hit the reset switch to start all over again.
Now that you have control over whether or not you can create a new process by checking the current count against the maximum available processes you need to keep track of the count for every process that is destroyed. Below you can see some of the code for the release_task method. This static method frees a task_struct from memory upon a process exit event. I don't do a uid check here, because it doesn't really matter whose record I'm decrementing since there are no control statements here that would affect users. In fact, since the incrementing and decrementing are done to the root user, as well as any other user, you can even record account transactions for the root user. This account transaction count could be useful to determine how much memory the root user is consuming, but a process count is not very useful as most daemons run by root dump messages relating to process creation and destruction somewhere into the /var/log directory, making our count a redundant waste of resources.
"kernel/exit.c" /* release_task(struct task_struct *) */
static void release_task(struct task_struct * p)
{
....
atomic_dec(&p->user->processes);
....
|
Now you have a simple user process tracking and limiting system in place. All you have to do now is recompile the kernel, install the new image, and reboot the machine. The limit values can be set in the kernel config as standard values for all users, except for the root user. These values can be changed on the fly through a kernel module created to access the values in the user_struct. The module can increase or decrease the values for the user_struct max_* values for a specified user. If the user is using more resources than the newly specified value then he or she will not be able to consume any more of that particular resource. It is still up to the system administrator to reclaim those resources by killing processes or whatever means are necessary. Once the user's resource use drops down under the limit again, normal functionality resumes.
I used a VMWare Linux image running under Windows 2000 as my development platform. I don't recommend this configuration to everyone, especially experienced kernel hackers, because running Linux through a virtual machine slows performance. The benefit of using this virtual Linux machine is that it prevents you from destroying the file system or any other part of the operating system image during your kernel hacking. If you think you know what you're doing, I recommend that you simply create a test kernel image that you develop on and retain a failsafe image to boot from in case of any mistakes. It's also a good idea to use a journaling file system, because it's a pain to wait on ext2 to check itself after reboot each time your kernel crashes.
To design a solution to the problem of different resource accounting you should first identify the areas where resources are consumed, as shown in the
do_fork
example. Then you should identify sections where resources are released, as shown in the
release_task
example above. The Linux documentation that comes with the source code can be very handy for finding the parts that you need, and often a macro define is used to perform a certain resource allocation over and over again.
A better way to submit your entry
My last piece of advice is off the topic of kernels and more towards my experience submitting my entry.
Like any good programmer, I waited until the last minute to write the required article that I needed to turn in for the Linux Challenge. Since it was due at midnight I didn't think it would be a big deal to wait until the last day to finish the article and send it away with minutes to spare. One thing that I didn't do before I sent the article was check my computer clock settings. Apparently, my computer's clock was 5 minutes ahead of the actual time. When I sent off my article to IBM, I CCed it to the class mailing list where others could see it had a time stamp on it of 12:02. Of course, there is a server time stamp on the e-mail as well, which was probably the only thing that saved my entry from being rejected right away. Phil Allen e-mailed me a minute after I sent my article off to say that its timestamp placed it past the deadline, despite the fact that I still had two minutes to send it. I was already at the bar by that time (this was a Friday at midnight) celebrating, so I didn't get Phil's message until later that night when I couldn't do anything about it. Fortunately my entry was accepted, but in the future I'll check my computer's time stamp before sending contest entries.
- Read about the other Clarkson entries in our series.
- Read the "htmlized" contest entry writeup I sent to IBM.
- View IBM Linux Scholar Challenge winner list.
- See the Advanced Issues in Operating Systems class home page.
- View the Linux Scholar Challenge entry repository.
- Get more information on the Linux kernel.
- Get more Linux articles from the developerWorks
Linux zone.
- Are you a student or a teacher? Visit the IBM Student Portal or the IBM Faculty Portal to find information and offers tailored to your needs.
Bryan Clark was a winner in the IBM Linux Scholar Challenge as an undergraduate in December of 2001 and is now a Graduate Student in the Computer Science program at Clarkson University. Bryan loves the outdoors, open source software and has been a Linux and Java technology enthusiast for years. He's now interning in the IBM Extreme Blue program in Austin, Texas. Bryan would like to take a second to thank Daniel Robbins for Gentoo Linux, as he's an avid user and evangelist for Gentoo, his favorite (and only) distrobution. You can contact Bryan at clarkbw@clarkson.edu.
