Implement lower timer granularity for retransmission of TCP

How a timer wheel algorithm can reduce overhead

Reduce the overhead of per-tick processing with a timer wheel algorithm that implements the retransmission timer. The AIX® Transmission Control Protocol (TCP) has seven timers (per-connection) and uses global timer functions with two granularities to implement the timers. In this article, learn how to get lower granularity with your retransmission timer by using the AIX TCP fast timer, and discover other advantages of lower timer granularity.

Venkat Venkatsubra (venkats@us.ibm.com), Software Developer, IBM

Photo of Venkat VenkatsubraVenkat Venkatsubra is a TCEM and developer on the IBM AIX Network (TCP/IP protocols and STREAMS) team in the AIX I/O development area. Venkat has worked in AIX networking development for eight years. He is a master inventor and is currently working on software exploitation of Infiniband technologies for database clusters. Venkat holds a BS degree in computer science from IIT Kharagpur, India. You can contact him at venkats@us.ibm.com.



G Shantala (gshantal@in.ibm.com), Technical Lead, IBM

Photo Shantala GShantala G is a developer and technical lead on the IBM AIX TCP/IP team in the AIX I/O area. She has worked in the TCP/IP development and support projects on OS/2® and AIX platforms for ten years. Shantala holds a BE degree in electronics and communications from Mangalore University, India. You can contact her at gshantal@in.ibm.com.



09 October 2007

Also available in Chinese Russian

Introduction

AIX® Transmission Control Protocol (TCP) maintains seven timers for each connection:

  • Connection establishment
  • Retransmission
  • Delayed acknowledgment (ACK)
  • Persist
  • Keepalive
  • FIN_WAIT_2
  • TIME_WAIT

To implement these per-connection timers, TCP uses global timer functions that offer two granularities:

  • tcp_fasttimo: It's invoked every 200ms and implements the fast timer.
  • tcp_slowtimo: It's invoked every 500ms and implements the slow timer.

With TCP's retransmission timer implementation, the retransmission timeout is stored in units of ticks in TCP's control block, where one tick = 500ms. When TCP's slow timer expires every 500ms, tcp_slowtimo gets called. This routine walks through the table of TCP control blocks and, for each connection, it decrements the timeout, specified in ticks, by one. When the number of ticks reaches zero for a timer, it calls the timeout handler routine to handle the corresponding timeout for that connection. This implementation imposes a lower limit of 500ms for a TCP timer.

As you know, TCP dynamically calculates timeout based on the round-trip time measured by TCP. Currently, TCP retransmission timeout takes on a minimum value of three seconds. However, with high-speed networks, such as Gigabit Ethernet and 10 Gigabit Ethernet, the round-trip time (hence retransmission timeout) is expected to be much lower. You lose 120MB per second of throughput for each second you are not transmitting. A better method is needed to deal with high-speed and low-latency networks.

One way to achieve lower granularity with a retransmission timer is to use TCP's fast timer (200ms), which could default to 50ms instead of 200ms. The drawback with this approach is handling the overhead of scanning all the protocol control blocks (PCBs) a higher rate. A PCB is an internal structure that holds the control information for a connection. This article explains an alternative method for implementing TCP's retransmission timer based on a timer wheel algorithm.

Timer wheel algorithm

A timing wheel has N number of slots. A slot represents a time unit such as si (slot interval). A cursor in the timing wheel moves one location every time unit, just like a second hand on a clock. Whenever a cursor moves to a slot, for example, cs (current slot), it implies that the list of timers in that slot, if any, expire at that instant or when the cursor reaches the same slot in the subsequent cycles.

When a new timer with a timer interval, such as ti (time interval), is to be added to this wheel, the slot for the new timer, ts (timer slot), is calculated, as follows:

ts = ( cs + (ti / si)) % N

Assume that the maximum timer interval value any timer takes on does not exceed an upper limit (tmax). If N is large enough to accommodate tmax within a rotation from the current cursor position, then when the cursor moves to a specific slot, all timers in that slot expire at the same instant (no subsequent cycles). This saves you from traversing the list to check which of the timers expire now and which of them expire in the subsequent cycles.

For example, in the timing wheel shown in Figure 1 below, there are 8 slots, 0 through 7. You see timer entries pivoted to slots 1, 3, 5, 4, and 6. Currently, the cursor is at slot 1. The three timers pivoted to this slot expire at this instant or when the cursor reaches the same slot in the subsequent cycle.

Figure 1. Timing wheel
Timing wheel

RTO implementation using the timer wheel algorithm

This section explains a retransmission time-out (RTO) implementation using a timer wheel algorithm on AIX. On AIX, the number of slots in the timer wheel is chosen as N=7000. The slot interval (si) of the timer wheel implementation is configurable using the no option with timer_wheel_tick. The range of values for timer_wheel_tick is zero to 100 ticks, where one tick = 10ms. Thus, if timer_wheel_tick is set to one, for example 10ms, the timer-period per cycle is N x si = 70 seconds. Since the maximum value (tmax) for the AIX TCP RTO is 64 seconds, this is large enough to ensure that all retransmission timers in a slot expire at the same time and not in the subsequent cycles.

The RTO value is configurable using the no option with tcp_low_rto. The range of values for tcp_low_rto is zero to 3000 ms. If configured, the RTO value would be set as the initial retransmission timeout for all the TCP connections on the system.

Even though tcp_low_rto is configured, the algorithm takes effect for a connection only when it experiences packet drops, as shown in Figure 2 below. It is not efficient to use this algorithm otherwise. In the traditional method of retransmission timer implementation for connections that do not experience packet drops, all you do is set the number of ticks in TCP's control block to start a retransmission timer. When the acknowledgment arrives, the retransmission timer is stopped by setting the value to zero. This method works best for start or stop cases where the retransmission timer never expires.

You can continue to use the traditional method of retransmission timer implementation until either of these conditions occurs:

  • Retransmission timer expires
  • Fast-retransmit phase is turned on for a connection
Figure 2. Low granularity connection
Low granularity connection
A. Trigger low RTO feature
Retransmission timer expires for a connection for the first time or the fast retransmit phase is turned on. You enable lower granularity for the retransmission timer for this connection and resend the packet. Since this is a retransmission, you do not time this segment for response time tracking (RTT) calculation. The retransmission timer is started by inserting the timer into an appropriate slot in the timer wheel.
B. Receipt of acknowledgment
Acknowledgment for the retransmitted packet is received. Cancel the retransmission timer (remove the timer from the timer wheel).
C. Time a segment
A segment is sent with sequence 'b' and the retransmission timer is started. You time this segment for RTT calculation by storing the timestamp. The retransmission timer is started by inserting the timer into an appropriate slot in the timer wheel.
D. Process ACK received for a timed segment
An ACK with acknowledgment 'c' greater than 'b' is received. You stop the retransmission timer by removing the timer from the timer wheel. You measure new RTT as current time minus the stored timestamp when the segment with sequence 'b' was sent. The measured RTT is used to compute the new retransmission timeout value.
E. Transmit a segment with RTO calculated as per low RTO feature
The retransmission timer is inserted into an appropriate slot in the timer wheel.
F. Retransmission timer expiry
Retransmission timer expires for a connection that is using the timer wheel algorithm. The cursor moves to the slot in the timer wheel where the retransmission timer was inserted previously. A timer handling routine is called to process the retransmission timer expiry.

Summary

With the timer wheel approach, starting and stopping a timer becomes expensive. If the timers in a slot are implemented as linked lists, then the addition or deletion of a timer to or from the list has to be synchronized using a slot chain lock.

On the other hand, using a timer wheel algorithm for implementing a retransmission timer provides the advantage of reduced overhead of per-tick processing. With a timer wheel, you have only those timers in a slot that expire when the cursor reaches the slot. You do not need to scan through all PCBs to check which of the timers are set for each connection.

The lower timer granularity implementation lets the administrator configure an RTO value as low as 10ms. This is particularly useful for TCP connections running over high-speed and low latency networks, such as Gigabit and 10 Gigabit Ethernet, that experience occasional packet drops.

Resources

Learn

Get products and technologies

  • IBM® trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=260238
ArticleTitle=Implement lower timer granularity for retransmission of TCP
publish-date=10092007