Machine learning in network systems

A top-down approach to autonomic computing using adaptive routing and scheduling

Computer systems, including large enterprise systems, are rapidly becoming so complex that maintaining them with human support staffs will be prohibitively expensive and inefficient. But most computer systems today are still built to rely on static configurations and can be installed, configured, and reconfigured only by human experts. Our work's long-term goal is for large-scale, integrated computer systems, consisting of tens to hundreds of machines with varying functions, to be delivered in a default configuration and then incrementally tune themselves to the needs of a particular enterprise based on observed usage patterns. The systems should also be able to adapt to changes in connectivity resulting from system failures or component upgrades.

We view these goals as fundamentally machine learning challenges. For computer systems to optimize their own performance without human assistance, they need to learn from experience. To adjust continually in response to a dynamic environment, they need the adaptability that only machine learning can offer.

This article explores and substantiates a top-down approach to developing autonomic systems. Our emphasis is on optimizing the entire system by developing autonomic components that work well, not just independently, but in concert with other such components. Because doing so in a real, full-fledged enterprise system isn't feasible, this article introduces a high-level simulator designed to facilitate the study of machine learning in enterprise systems. Our simulator captures some of the key complexities that make system-wide autonomic computing a challenge, while abstracting away the low-level details that make it impractical to create fully autonomic systems on real hardware.

In addition to introducing this new tool, we present machine-learning approaches to optimizing routing and scheduling in the networks represented in the simulator. Our results show that adaptive routers and schedulers can do better together than either can individually, validating the notion of a top-down approach to autonomic computing.

Simulating a network

To pursue our research goals, we designed and implemented a simulator capable of modeling interactions among the many different components of a computer system. The system simulates the way a computer network processes user requests from a high-level perspective and can represent many different kinds of networks. For example, Figure 1 depicts a commercial enterprise system in which a set of users use a Web interface to check their mail or query a database.

Figure 1. Example network implemented in the simulator
Simulated network
Simulated network

The ovals in Figure 1 represent users, and rectangles represent machines; the lines between them represent links that allow communication of jobs or other packets. Users forward their requests to a load balancer that selects a Web server to handle the job. The Web server forwards the job to a mail server or database as appropriate. When completed, the job is sent back to the user who created it.

The simulator includes the following components:

  • Nodes: A node is a component of the network that is connected to other nodes via links. There are two primary types of nodes:
    • Users: Special nodes who create jobs and send them to machines for processing. After a job is completed, it is returned to the user, who computes its score.
    • Machines: A node that can complete portions of a job. Each type of machine is defined by the set of steps it knows how to complete. Completing steps uses the machine's CPU time so each machine must have an algorithm for allocating CPU time among the jobs currently in its possession. If a machine cannot complete the next step of a given job, it must look for a neighboring node that can. If several neighbors qualify, the machine must make a routing decision -- that is, it must try to determine which of the contending neighbors to forward the job to so as to optimize the whole system's performance. An intelligent agent can be used to control a machine and make decisions about how to allocate CPU time and route jobs.
  • Links: A link connects two nodes in the network. It's used to transfer jobs and other packets between nodes.
  • Packets: A packet is a unit of information that travels between nodes along links. The most common type of packet is a job, described below, but nodes can create other types of packets to communicate with other nodes about their status.
  • Jobs: A series of steps that need to be completed in a specified order. For example, a user who wants to buy something from a Web site might create a "purchase job." This job might include steps such as accessing the customer database, confirming credit card information, and generating an order confirmation. Completing these steps could require the job to travel among several machines. A system usually has several types of jobs that differ in the list of steps they require for completion.
  • Steps: A step is one component of a job. Each step can only be carried out by a subset of machines in the network. For example, the retrieval of information in response to a database query must happen at a database server.

In the example shown in Figure 1, mail jobs must travel to a Web server, a mail server, and back to a Web server before returning to the user. Database jobs must travel to a Web server, a database, and back to a Web server before returning to the user. The goal of the agents controlling the system is to process these user requests as efficiently as possible.

A simulation proceeds for a specified number of discrete timesteps. At each timestep, machines can allocate their CPU cycles toward the completion of steps on the jobs in their possession, packets can be sent along links, and packets can arrive at new nodes.

The simulator represents a computer network as a graph. This simulator provides a valuable testbed for new approaches to autonomic computing. Its very general design lets it represent a wide variety of computer systems with arbitrary topology. Its high modularity makes it easy to insert intelligent agents to control any aspect of the system's behavior. Most important, the simulator captures many of the real-world problems associated with complex computer systems while retaining the simplicity that makes experimental research feasible.

Load Updates

Each machine periodically sends a special packet called a Load Update to each of its neighbors. A Load Update indicates how many jobs that machine already has in its queue. The contents of such an update can help an intelligent router make better decisions. However, the very presence of the update is also important information: if a machine does not receive any updates from a given neighbor for a certain period of time, it concludes that that neighbor has gone down and will no longer route any jobs to it until it receives another update. The system has a strong incentive to detect quickly when machines go down, because any jobs routed to a down machine receive a stiff penalty. Because Load Updates are communicated using packets, they incur real network traffic overhead in the simulator. As long as they are not too frequent, including them as a routine occurrence is not unrealistic.

Utility functions

The ultimate goal of our efforts is to improve the network's utility to its users. In the real world, that utility isn't necessarily straightforward. It's safe to assume that users always want their jobs completed as quickly as possible, but the value of reducing a job's completion time isn't always the same. Also, each user might have a different importance to the system.

To capture these complexities, the simulator allows different utility functions for each user or job type. Each utility function maps a job's completion time to its utility. The goal of the agents controlling the network is not to minimize average completion time, but to maximize the cumulative utility over all the jobs it is asked to process.

Intelligent routing

Now we'll summarize our approach to developing intelligent routers and schedulers for networks like the one shown in Figure 1. Achieving good performance in such a network using fixed algorithms and hand-coded heuristics is very difficult and prone to inflexibility. Instead, we used reinforcement learning to develop routers and schedulers that are efficient, robust, and adaptable.

In the network simulation described in Simulating a network, if a machine can't complete the next step required by the job, it must search among its neighbors for machines that can complete that step (or that can forward the job to machines that can complete it). If more than one neighbor qualifies, the machine should make the choice that lets the job be completed as quickly as possible.

The router tries to minimize a packet's travel time given only local information about the network. We want the job to travel along a path that lets the appropriate machines complete its steps in sequence and return to its creator in minimal time. This differs from the traditional routing problem, in which the goal is only to get the packet from its source to a specified destination. In our domain, that goal is irrelevant because the source and destination are the same.

We looked at four ways of addressing this modified routing problem:

  • A random router that forwards jobs to a machine selected randomly from the set of contenders
  • Two heuristic methods:
    • A speed-based router whose likelihood of routing to a given neighbor is in direct proportion to that neighbor's speed
    • A load-based router that always routes to the qualifying neighbor with the lowest currently estimated load
  • Q-routing, a method based on reinforcement learning (see Related topics)

The random router doesn't address any of the complications that make routing a nontrivial problem, so we expected it to perform poorly in real-world scenarios. The speed-based router's algorithm ignores both the load a machine's neighbors might be receiving from other machines and the status of any machines the packet might be sent to later, making it a myopic load balancer. The load-based router will perform very well if the capacity of the neighboring machines is a critical factor in the system's performance. But if the system's bottleneck isn't adjacent to the machine making a routing decision, its neighbors will often have no load and this heuristic will perform identically to the random router.

Many other routing heuristics (for example, considering both load and speed when routing) are feasible. But all such heuristics must make their decisions based only on information about immediate neighbors, which may not be a useful guide to efficient routing. By contrast, a machine using Q-routing can learn to route well even when the critical parts of the networks are not adjacent to it.


Q-routing is an online learning technique in which reinforcement learning modules are inserted into each node of the network. Reinforcement learning agents attempt to learn effective control policies by observing the positive and negative rewards they receive from behaving in different ways in different situations. From this feedback, reinforcement learning agents learn a value function, which estimates the long-term value of taking a certain action in a certain state. After the value function is known, deriving an effective policy is trivial: in each state the agent simply takes the action that the value function estimates will reap the greatest long-term reward.

In Q-routing, the agents do not attempt to maximize a reward function but instead to minimize a time-to-go function, which estimates how long a given packet will take to complete if it is routed to a particular neighbor. Each node maintains a Q-table -- a table of estimates about the time-to-go of different types of packets. Each entry is an estimate of how much additional time a packet will take to travel from the node to its ultimate destination if it is forwarded to a given neighbor. When the node sends a packet to a neighbor, it immediately gets back an estimate of that node's time-to-go, which is based on the values in the neighbor's Q-table. With this information, the originating node can update its estimate of the time-to-go for packets bound for the destination that are sent to that neighbor.

By bootstrapping off the values in its neighbors' Q-tables, this update rule lets each node improve its estimate of a packet's time-to-go without waiting for that packet to reach its final destination. This approach is based directly on the Q-learning method. After reasonable Q-values have been learned, packets can be routed efficiently by simply consulting the appropriate entries in the Q-table and routing to the neighbor with the lowest estimated time-to-go for packets with the given destination.

We invite readers to consult the original paper on which this article is based for details on the ways we adapted the Q-routing technique to suit our unique version of the routing problem (see Related topics).

Smart scheduling

The routing techniques discussed in Intelligent routing all attempt to distribute load on the system evenly to minimize the time that passes between the creation and completion of a job. Doing so correctly plays an important role in overall system performance, but it's not the only factor. Our goal in this article, and the point of introducing a high-level vertical simulator, is to investigate the possibility of employing autonomic elements at more than one level of the system. With this goal in mind, we coupled the routing mechanisms with intelligent schedulers that must determine how to allocate a given machine's CPU cycles most efficiently.

Schedulers determine the order in which jobs sitting at a machine will be processed, and they decide how the machine's CPU time will be scheduled. By determining which jobs are in most pressing need of completion and processing them first, intelligent schedulers can maximize the network's score even when the utility functions are asymmetric and nonlinear. We considered two simple scheduling heuristics and introduced a new technique that uses the time-to-go estimates contained in the router's Q-table to assess a job's priority.

The simulator's default scheduling algorithm is the first-in first-out (FIFO) technique. In this approach, jobs that have been waiting in the machine's queue the longest are always processed first. Clearly, the FIFO algorithm does nothing to address the complications that arise when jobs have differing importance.

An alternative heuristic that does address these concerns is a priority scheduler. This algorithm works just like the FIFO approach except that each job is assigned a priority. When allocating CPU time, the priority scheduler examines only those jobs with the highest priority and selects randomly from among the ones that have been waiting the longest.

If all the utility functions are simply multiples of one another, the priority scheduler can achieve optimal performance by assigning jobs priorities that correspond to the slope of their utility functions. However, when the utility functions are different or nonlinear, the problem of deciding which jobs deserve higher priority becomes much more complicated, and priority scheduler's simplistic approach breaks down.

The insertion scheduler

We developed a simple, fast heuristic -- called the insertion scheduler -- that offers a more sophisticated approach. When a new job arrives, the insertion scheduler does not consider any orderings that are radically different from the current ordering. Instead, it decides at what position to insert the new job into the current ordering such that aggregate utility is maximized. To compute aggregate utility, the scheduler must know how much additional time each job will take to complete after it leaves this machine. This quantity is unknown, but it can be approximated from the time-to-go estimates contained in the Q-router's table. Hence, if n jobs are in the queue, it needs to consider only n of the n! possible orderings. This restriction might prevent the insertion scheduler from discovering the optimal ordering, but it nonetheless allows for intelligent scheduling of jobs, with only linear computational complexity, that exploits learned estimates of completion time.

The sample scheduler

In order to test the value of the insertion scheduler's heuristic, we developed another, similar scheduler. The sample scheduler estimates the utility of each ordering it examines in exactly the same manner as the insertion scheduler. It also selects the ordering that produces the highest estimated utility, just like the insertion scheduler. The only difference is that it selects n orderings from the all the possibilities randomly. If the insertion scheduler's heuristic is worthwhile, we'd expect it to outperform the sample scheduler.

Experimental framework and results

Our experiments tested all of the routing and scheduling methods described above on three different networks. Each network simulates a commercial enterprise system serving two companies whose employees generate requests to access mail and database programs. We designed the first two networks to establish proof-of-concept for our methods; they are the simplest networks we could construct that exhibit the complications our methods attempt to address. The third network is larger and designed to provide some confidence that these methods will scale up.

All three networks have these features in common:

  • The job types: Mail jobs and database jobs, each consisting of three steps.
  • The mechanism for job creation: At each timestep, the users create enough jobs to bring the total number of incomplete jobs up to a threshold, set to 100 in our experiments. Each newly created job has an equal probability of being created by each user in the system.
  • The utility functions that each user uses: In order to capture the complexities raised by users of differing importance, we assigned different utility functions to the two users.

For each network, we determined the results of pairing each routing method with a FIFO scheduler and pairing each scheduling method with a Q-router. Each experiment runs for 20,000 timesteps. Each simulation is preceded by an additional 5,000 "warmup" steps before tallying scores. In the case of Q-routing, a random router is used during the warmup steps; Q-routing is turned on and begins training only at timestep #0. The purpose of the warmup is to ensure that the network is at full capacity before learning begins. Doing so helps distinguish changes in performance that are due to discovering superior policies from those that are due to load building up in an initially empty network. At any point in the simulation, the score for each method represents a uniform moving average over the scores received for the last 100 completed jobs. The scores are averaged over 20 runs.

At timestep #10,000 in each experiment, a system catastrophe is simulated in which the speed of a few critical machines is cut in half. Because our learning methods are designed to work online, we expect them to adapt rapidly to changes in their environment, a feature tested by these simulated catastrophes.

The results from the experiments on the two simpler networks -- detailed in our original paper (see Related topics) -- provide proof-of-concept for the advantages of learning-based routing and scheduling. To demonstrate that these advantages scale up, we also tested the various methods on the larger network. In this network, the same two load balancers must choose among nine Web servers, each of which is connected to its own mail server and database. The Web servers have enough CPU cycles that the bottleneck to system performance lies in the mail servers and databases. In order to keep this larger network busy, the users are allowed to have 500 incomplete jobs at any time.

Figure 2 compares the performance on this larger network of all four routing methods when paired with a FIFO scheduler. The Web servers all have the same speed in this network, so the speed-based router performs similarly to the random and load-based routers. Q-routing achieves by far the best performance, obtaining a statistically significant improvement over the other methods.

Figure 2. All four routing methods paired with a FIFO scheduler
All four routing methods are paired with a FIFO scheduler
All four routing methods are paired with a FIFO scheduler

The catastrophe that occurs at timestep #10,000 consists of cutting in half the speed of the four fastest mail servers and databases. Though its performance inevitably degrades, Q-routing recovers gracefully by adjusting its policy online in response to environmental changes.

Figure 3 compares all four schedulers by pairing them with Q-routing. The insertion scheduler scores the highest, yielding a statistically significant improvement over FIFO scheduling, the fixed heuristic of the priority scheduler, and the random evaluations of the sample scheduler.

Figure 3. All four scheduling methods paired with a Q-router
All four routing methods are paired with a FIFO scheduler
All four routing methods are paired with a FIFO scheduler


Our experimental results indicate clearly that machine-learning methods offer a substantial advantage in optimizing the performance of computer networks. Both the router and scheduler placed in each machine benefit substantially from the time-to-go estimates discovered through reinforcement learning. The best performance is achieved only by placing intelligent, adaptive agents at more than one level of the system: the Q-router and the insertion scheduler perform better together than either could apart. They benefit from a sensible division of optimization tasks; the router focuses on routing jobs efficiently and balancing load throughout the network while the scheduler focuses on prioritizing those jobs whose effect on the score will be most decisive.

This advantage is especially compelling in systems in which the machines that make important routing decisions (the load balancers) are not direct neighbors of the machines that are most critical to system performance (the mail servers and databases). In these scenarios, the information necessary to route efficiently is not directly available, and a good policy can be discovered only by learning from experience. But even when this is not the case and heuristic routing methods perform well, learning-based systems can still reap an advantage by exploiting Q-routing's time-to-go estimates to improve CPU scheduling. The success of the Q-router and the insertion scheduler on a larger system is cause for optimism that these methods will continue to be successful on networks of realistic complexity.

In ongoing research, we plan to investigate new ways of applying machine-learning methods to further automate and optimize networks like the one studied in this article. In particular, we hope to automate the decision of how frequently machines should send updates to their neighbors. We would also like to use machine learning to determine what network topology gives optimal performance. We hope to develop a system in which machine learning helps determine the most efficient structure of the network when it is initially designed, when it needs to be upgraded, or when it is repaired.


The three main contributions of this article are:

  • A concrete formulation of the autonomic computing problem in terms of the representative task of enterprise system optimization.
  • A new vertical simulator designed to abstractly represent all aspects of a computer system. This simulator is fully implemented and tested; it was used for all of the experiments presented in this article.
  • Adaptive approaches to the network routing and scheduling problems in this simulator that outperform reasonable benchmark policies.

The simulator facilitates the study of autonomic computing methods from a top-down perspective. Rather than simply optimizing individual components, we focused on optimizing the interactions among components at a system-wide level. The results presented here, in addition to demonstrating that machine-learning methods can offer a significant advantage for job routing and scheduling, also validate this top-down approach. They provide evidence of the value of combining intelligent, adaptive agents at more than one level of the system. Together, these results offer hope that machine-learning methods, when applied repeatedly and in concert, can produce the robust, self-configuring, and self-repairing systems necessary to meet tomorrow's computing needs.

Downloadable resources

Related topics

ArticleTitle=Machine learning in network systems