Deep-protocol analysis of UNIX networks

Whether you are monitoring your network to identify performance issues, debugging an application, or have found an application on your network that you do not recognize, occasionally you need to look deep into the protocols being used on your UNIX® network to understand what they are doing. Some protocols are easy to identify and understand, even when used on non-standard ports. Others need more investigation to understand what they are doing and what information they are exchanging. In this article, we will take a look at techniques for performing detailed analysis of the protocols in use on your UNIX network.

Martin Brown (mc@mcslp.com), Freelance Writer, Author

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.



08 June 2010

Also available in Chinese

Introduction

Networks have become so ubiquitous that in many cases we take the use of the network to communicate with different machines, inside and outside of our network, for granted. Most of the time this isn't an issue, but there are times when you need to take a closer look at your network and find out what is going on.

There are a number of reasons to take a closer look at the contents of the network traffic. The first is that you may simply be debugging an existing network application or one you are developing, and want to monitor the traffic going past on your network. The second reason is to identify traffic on your network that may be using up bandwidth and resources. For the former, you probably already know the contents of the protocol, but you may want to get a more detailed look at the actual data being transferred, for example, when using web services. For the latter, identifying the contents of the packets requires some extensive knowledge of the protocols being used.

With both TCP/IP and UDP/IP communications, the key elements are the IP addresses used to identify the hosts and the port number. The port number is used to provide additional communication channels so that you can support multiple connections between two hosts. There are some standards in the port definitions. For example, port 25 is for email (SMTP) traffic, and most websites operate on port 80 (HTTP). These conventions are used to allow programs to communicate with each other over a known channel in the same way as you would choose a phone or fax number.

While these conventions exist, there is no limit or restriction on what ports you use. In fact, in many cases, subversive network applications and some security methods will deliberately use non-standard ports. For example, some will hide content by misusing a standard port with a different protocol, like using HTTP over port 25. Other examples include using a different port from the standard so that it is not obvious which port is being used for the traffic (like using port 99 for HTTP), or by encapsulating specific protocol traffic within another protocol. This last method is actually the one used by network tunneling and virtual private networks (VPNs).

Regardless of the reasons or complexities of the network traffic, the first step is always to start recording the data.


Recording raw data

There are a number of different tools available if you want to record the raw network data so that you can examine the information yourself. Most of the network sniffers will also decode and decipher specific packet contents, which will help you when you want to study the content of a recognized protocol.

Under Solaris you can use the snoop tool, or under AIX the iptrace tool. You can also try the cross-platform tcpdump tool, which is supported on most UNIX and Linux operating systems. These provide a combination of both capturing and decoding content for you, often performing the bulk of the protocol analysis process for you. Note that with modern switches the Ethernet packets are not echoed to every port, which often limits the information you can extract to the current host. Many modern switches provide a management port that often carries a copy of all packets for exactly this type of monitoring.

The primary complexity behind decoding network transmission is the levels of information that are provided within the network packets. In addition, much of this information is also sent encoded in binary format, and capturing pure raw packets off of the network requires a significant amount of work to pick out the data that you need. By using a tool that provides some of the processing, you can simplify the process of decoding network data.

To give an example, on an Ethernet network when looking at a typical TCP/IP protocol, the data transmitted over the network will include:

  • Ethernet packet headers, including the Ethernet source and destination address, packet size and the Ethernet packet type.
  • IP header, consisting of the IP addressing (source and destination), protocol identity and IP flags. You will also get information about the fragmentation and packet sequence.
  • TCP header, which includes information on the port, implied protocol, flags and sequencing numbers.

Even with all this information, we still haven't hit the actual content. Beneath the TCP (or UDP) protocol will be additional protocols, standard data protocols (including HTTP, SMTP and FTP), or encapsulating protocols such as Remote Procedure Call (RPC) and the subtype of RPC such as NFS.

Often these tools rely on the protocol and/or port number to identify the content being transferred. So, if traffic is being transferred on a non-standard port, the information may not be decoded properly.


Basic network analysis

Many of the network sniffing tools already mentioned in this article provide varying levels of protocol decoding by looking at the port and content details to determine the protocol being used.

For example, snoop and tcpdump both provide detailed information on different protocols under both UDP and TCP to varying levels. In snoop, for example, you can get detailed information about NFS operations, from the top level of the protocol right down to the individual data blocks transferred. For example, you can monitor NFS traffic with snoop by specifying to monitor RPC using the NFS protocol: $ snoop -v rpc nfs.

The output from this is quite detailed for each packet and deserves some closer investigation. Listing 1 provides the Ethernet header data.

Listing 1. Ethernet header data
ETHER:  ----- Ether Header -----
ETHER:  
ETHER:  Packet 64 arrived at 16:14:41.79434
ETHER:  Packet size = 238 bytes
ETHER:  Destination = 0:1a:ee:1:1:c0, 
ETHER:  Source      = 0:21:28:3c:c0:61, 
ETHER:  Ethertype = 0800 (IP)
ETHER:

The output here specifies that the Ethernet packet contains IP data, specifies the overall packet size and time, and the destination and source Ethernet addresses for the packet.

Listing 2 shows the IP header. Much of the IP data is not useful, beyond the protocol and source/destination address information.

Listing 2. IP header
IP:   ----- IP Header -----
IP:   
IP:   Version = 4
IP:   Header length = 20 bytes
IP:   Type of service = 0x00
IP:         xxx. .... = 0 (precedence)
IP:         ...0 .... = normal delay
IP:         .... 0... = normal throughput
IP:         .... .0.. = normal reliability
IP:         .... ..0. = not ECN capable transport
IP:         .... ...0 = no ECN congestion experienced
IP:   Total length = 224 bytes
IP:   Identification = 27460
IP:   Flags = 0x4
IP:         .1.. .... = do not fragment
IP:         ..0. .... = last fragment
IP:   Fragment offset = 0 bytes
IP:   Time to live = 64 seconds/hops
IP:   Protocol = 6 (TCP)
IP:   Header checksum = 4d11
IP:   Source address = 192.168.0.112, tiger.mcslp.pri
IP:   Destination address = 192.168.0.2, bear.mcslp.pri
IP:   No options
IP:

In Listing 3 you can see the TCP header. Again, this information is usually useful only for the source port and destination port numbers as these will either identify the expected protocol or provide the information you need to further investigate traffic on this port.

Listing 3. TCP header
TCP:  ----- TCP Header -----
TCP:  
TCP:  Source port = 2049
TCP:  Destination port = 889 (Sun RPC)
TCP:  Sequence number = 2834727685
TCP:  Acknowledgement number = 2654368001
TCP:  Data offset = 32 bytes
TCP:  Flags = 0x18
TCP:        0... .... = No ECN congestion window reduced
TCP:        .0.. .... = No ECN echo
TCP:        ..0. .... = No urgent pointer
TCP:        ...1 .... = Acknowledgement
TCP:        .... 1... = Push
TCP:        .... .0.. = No reset
TCP:        .... ..0. = No Syn
TCP:        .... ...0 = No Fin
TCP:  Window = 32806
TCP:  Checksum = 0x4852
TCP:  Urgent pointer = 0
TCP:  Options: (12 bytes)
TCP:    - No operation
TCP:    - No operation
TCP:    - TS Val = 34449495, TS Echo = 253458642
TCP:

The penultimate section, Listing 4, shows the RPC header data.

Listing 4. RPC header data
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Record Mark: last fragment, length = 168
RPC:  Transaction id = 3041181596
RPC:  Type = 1 (Reply)
RPC:  This is a reply to frame 63
RPC:  Status = 0 (Accepted)
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  Accept status = 0 (Success)
RPC:

Finally, Listing 5 provides the content of the NFS packet, including the permissions (file mode), file size, ownership and other information. In this case, the NFS operation requested is for the filesystem statistics (triggered by the equivalent of an ls operation), hence the level of detail.

Listing 5. Content of NFS packet
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 18 (Get filesystem statistics)
NFS:  Status = 0 (OK)
NFS:  Post-operation attributes: 
NFS:    File type = 2 (Directory)
NFS:    Mode = 0777
NFS:     Setuid = 0, Setgid = 0, Sticky = 0
NFS:     Owner's permissions = rwx
NFS:     Group's permissions = rwx
NFS:     Other's permissions = rwx
NFS:    Link count = 24, User ID = 502, Group ID = 10
NFS:    File size = 29, Used = 2560
NFS:    Special: Major = 4294967295, Minor = 4294967295
NFS:    File system id = 781684113418, File id = 4304616
NFS:    Last access time      = 28-Feb-10 15:49:51.042953989 GMT
NFS:    Modification time     = 25-Feb-10 09:39:07.965422590 GMT
NFS:    Attribute change time = 25-Feb-10 09:39:07.965422590 GMT
NFS:  
NFS:  Total space = 759567510016 bytes
NFS:  Available space = 659048374272 bytes
NFS:  Available space - this user = 659048374272 bytes
NFS:  Total file slots = 1288161604
NFS:  Available file slots = 1287203856
NFS:  Available file slots - this user = 1287203856
NFS:  Invariant time = 0 sec
NFS:

In this case, we can see the file being looked up was in fact a directory (see the File type line). Although we do not get the actual path to the file, we could find the directory in question by using Find to look for the file/path with the corresponding inode number (see Listing 6).

Listing 6. Looking for a file with the corresponding inode number
$ find /scratch -xdev -inum 4304616
/scratch/installed/mysql-6.0.11

The best way to use these tools if you are trying to identify traffic is first to run them and collect as much data as possible, and then manually examine the content looking for items that you don't expect to see on your network.

Once you have identified suspicious traffic, you can then start to add specifications on the command line to zero in on the detail of the traffic. For example, you can specify to only display traffic to a given host using either of the following shown in Listing 7.

Listing 7. Specifying to only display traffic to a give hose
$ snoop host 192.168.0.2
$ tcpdump host 192.168.0.2

To further restrict things, you could port the protocol details: $ snoop host 192.168.0.2 and port 25.


Parsing the raw data to understand the content

Another way to process the content from tcpdump is to save the raw network packet data to a file and then process the file to find and decode the information that you want.

There are a number of modules in different languages that provide functionality for reading and decoding the data captured by tcpdump and snoop. For example, within Perl, there are two modules: Net::SnoopLog (for snoop) and Net::TcpDumpLog (for tcpdump). These will read the raw data content. The basic interfaces for both of these modules is the same.

To start, first you need to create a binary record of the packets going past on the network by writing out the data to a file using either snoop or tcpdump. For this example, we'll use tcpdump and the Net::TcpDumpLog module: $ tcpdump -w packets.raw.

Once you have amassed the network data, you can start to process the network data contents to find the information you want. The Net::TcpDumpLog parses the raw network data saved by tcpdump. Because the data is in it's raw binary format, parsing the information requires processing this binary data. For convenience, another suite of modules, NetPacket::*, provides decoding of the raw data.

For example, Listing 8 shows a simple script that prints out the IP address information for all of the packets.

Listing 8. Simple script that prints out the IP address info for all packets
use Net::TcpDumpLog;
    
use NetPacket::Ethernet;
    
use NetPacket::IP;

    
my $log = Net::TcpDumpLog->new();
 
$log->read("packets.raw");
 
 
foreach my $index ($log->indexes)
       
{
    
    my $packet = $log->data($index);
           

    my $ethernet = NetPacket::Ethernet->decode($packet);

  
    if ($ethernet->{type} == 0x0800)
       
    {
    
        my $ip = NetPacket::IP->decode($ethernet->{data});
          

    
        printf("  %s to %s protocol %s \n",
               $ip->{src_ip},$ip->{dest_ip},$ip->{proto});
   }

}

The first part is to extract each packet. The Net::TcpDumpLog module serializes each packet, so that we can read each packet by using the packet ID. The data() method then returns the raw data for the entire packet.

As with the output from snoop, we have to extract each of the blocks of data from the raw network packet information. So in this example, we first need to extract the ethernet packet, including the data payload, from the raw network packet. The NetPacket::Ethernet module does this for us.

Since we are looking for IP packets, we can check for IP packets by looking at the Ethernet packet type. IP packets have an ID of 0x0800.

The NetPacket::IP module can then be used to extract the IP information from the data payload of the Ethernet packet. The module provides the source IP, destination IP and protocol information, among others, which we can then print.

Using this basic framework you can perform more complex lookups and decoding that do not rely on the automated solutions provided by tcpdump or snoop. For example, if you suspect that there is HTTP traffic going past on a non-standard port (i.e., not port 80), you could look for the string HTTP on ports other than 80 from the suspected host IP using the script in Listing 9.

Listing 9. Looking for strong HHTP on ports other than 80
use Net::TcpDumpLog;
    
use NetPacket::Ethernet;
    
use NetPacket::IP;
    
use NetPacket::TCP;
    

    
my $log = Net::TcpDumpLog->new();
       
$log->read("packets.raw");
       

    
foreach my $index ($log->indexes)
       
{
    
    my $packet = $log->data($index);
       

    
    my $ethernet = NetPacket::Ethernet->decode($packet);
       

    
    if ($ethernet->{type} == 0x0800)
       
    {
    
        my $ip = NetPacket::IP->decode($ethernet->{data});
          

    
        if ($ip->{src_ip} eq '192.168.0.2')
       
        {
    
            if ($ip->{proto} == 6)
       
            {
    
                my $tcp = NetPacket::TCP->decode($ip->{data});
       
                if (($tcp->{src_port} != 80) &&
               
                    ($tcp->{data} =~ m/HTTP/))
       
                {
    
                    print("Found HTTP traffic on non-port 80\n");
    
                    printf("%s (port: %d) to %s (port: %d)\n%s\n",
    
                           $ip->{src_ip},
       
                           $tcp->{src_port},
       
                           $ip->{dest_ip},
       
                           $tcp->{dest_port},
       
                           $tcp->{data});
 
                }
    
            }
    
        }
    
   }
    
}

Running the above script on a sample packet set returned the following shown in Listing 10.

Listing 10. Running the script on a sample packet set
$ perl http-non80.pl
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39280) to 168.143.162.100 (port: 80)
GET /statuses/user_timeline.json HTTP/1.1
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39282) to 168.143.162.100 (port: 80)
GET /statuses/friends_timeline.json HTTP/1

In this particular case we're seeing traffic from the host to an external website (Twitter).

Obviously, in this example, we are dumping out the raw data, but you could use the same basic structure to decode and the data in any format using any public or proprietary protocol structure. If you are using or developing a protocol using this method, and know the protocol format, you could extract and monitor the data being transferred.


Using a protocol analyzer

Although, as already mentioned, tools like tcpdump, iptrace and snoop provide basic network analysis and decoding, there are GUI-based tools that make the process even easier. Wireshark is one such tool that supports a vast array of network protocol decoding and analysis.

One of the main benefits of Wireshark is that you can capture packets over a period of time (just as with tcpdump) and then interactively analyze and filter the content based on the different protocols, ports and other data. Wireshark also supports a huge array of protocol decoders, enabling you to examine in minute detail the contents of the packets and conversations.

You can see the basic screenshot of Wireshark showing all of the packets of all types being listed in Figure 1. The window is divided into three main sections: the list of filtered packets, the decoded protocol details, and the raw packet data in hex/ASCII format.

Figure 1. Wireshark interface
Screenshot of Wireshark interface

As an example of the level of information and decoding that is provided by the Wireshark tool, while writing this article I noticed that there were some error packets being returned by one of the MySQL servers on the network.

To zero in on the content, I first applied the MySQL filter to the output. You can do this either by typing an expression (like those provided to tcpdump, snoop or iptrace) into the Filter box. Alternatively, you can click the Expression button and chose the filter from the built-in list. You can see a sample of the filters available in Figure 2. Once you have chosen the filter, click Apply to filter the packet list.

Figure 2. Choosing a Wireshark filter
Screenshot of choosing a Wireshark filter

By filtering on the MySQL protocol, I was able to identify the error packets. The MySQL protocol returns a specific packet type with the error information. In this case, error 1242 means that the query execution failed because there was a problem in a subquery. You can see the MySQL protocol content details by expanding the MySQL protocol section of the Wireshark window, as seen here in Figure 3.

Figure 3. Examining a MySQL error packet
Screenshot showing how to exam a MySQL error packet

Here we can see the detail of the error. By tracking back to the previous 'Request Query' packet, it is possible to determine the query that triggered the error response (Figure 4).

Figure 4. The MySQL query that triggered the error response
Screenshot of MySQL query that triggered the error response

By drilling down into the packets, I could identify a problem with the code I hadn't previously noticed and identify both the error and the query that triggered the problem.

Wireshark supports such a wide variety of protocols and filters that you can get detailed information. Another common use is to monitor the exact content of detailed protocols, such as web services. Figure 5 shows the detailed (and structured) content from a SOAP request used to log status information.

Figure 5. Looking at the details of SOAP web service request
Screenshot of the details of SOAP web service request

This kind of detail can be invaluable when trying to debug any network protocol that you are using.

Another useful feature is that Wireshark can work with both live information, and it can record information for later filtering and processing. This means that you can use it to monitor specific periods of suspicious traffic and then allow you to drill down into the information at your leisure to find out exactly what was occurring on your network.


Summary

Protocol analysis of the information going across the wire of your UNIX network could be a complex process. However, with the combination of some simple and widely available tools, you can decode and examine the details of your network traffic from the basics of the source and destination through to the specific protocol and data being exchanged.

As shown in this article, using tools like tcpdump, snoop or iptrace, you can extract a wide range of data at the command line. With tools like Wireshark, you can go even deeper and get more detailed information on a much wider range of protocols and content. For custom protocols and data structures, you can use Perl to extract the raw data and get all the information you need.

Resources

Learn

Get products and technologies

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX, Open source
ArticleID=494077
ArticleTitle=Deep-protocol analysis of UNIX networks
publish-date=06082010