Build a Python app for parsing shared memory dumps

Use the struct utility to extract system data for analysis

Learn how to parse a machine-readable shared memory dump on a Linux platform and extract your expected data format using Python and the struct utility. In this article, you'll first see how to determine the format of the data by reading the binary file format of the dump file; you need this in order to parse, extract, and analyze the data. Next, you'll see how to parse the file based on the format, and then match the results with the expected format to output a validation result.

Update: In the Download section, you'll find a working Python application and dump file that you can use as is or modify for your own needs. We changed the name of the dump file throughout this article to match the name used in the download. -Ed.

Asha Shivalingaiah, Software Engineer, IBM

Asha Shivalingaiah is a Software Engineer working as a tester at the Australia Development Lab for IBM Security Solution, Tivoli. Since joining the IBM Rational team in 2008, she has worked with Rational Rhapsody and other Rational portfolio tools for managing the software development life cycle. She is well versed in modeling languages such as UML 2 and SysML, and in using Rational Rhapsody for model-driven development.



30 May 2011 (First published 17 May 2011)

Also available in Chinese Russian Japanese Spanish

Memory dumps reveal the recorded state of working memory at a specific point in operation. They are an important tool in system administration because they provide "forensic" evidence of the system's condition.

Before you begin

For the instructions, code samples, and code download (see Download, below) in this article, I used Python version 2.4, which you can download from the Python site (see Resources below). You may get different results with other versions.

Before you start, make sure you are familiar with the following:

  • The /dev/shm implementation of traditional shared memory
  • Viewing shared memory data dumps manually on a Linux system
  • Certain dependencies (Linux file open, read, write, close concepts; using the file descriptor and the modes that the file can be opened in; basic Python structure concepts)
  • GNU/Linux, in general

Understanding shared memory dumps in Linux

/dev/shm is an implementation of the traditional shared memory concept. It is a widely used and accepted means of passing data between programs. In /dev/shm, one program or daemon creates a memory portion that other processes (at relevant permission levels) can access. This is a quick and easy method of sharing data between processes.

Each program creates its own file; in my examples, I use the file name devmem located at /dev/shm/devmem.

Viewing the shared memory dump manually on Linux

You cannot view the shared memory files (commonly referred as shm files) by using the cat utility generally used for file display in Linux since these shm files are in a binary format. They will look like a chunk of garbled characters if you try to view them with generic file-viewing methods. I use the hexdump) utility to read the mem files and view them in a readable format; other utilities are available for this purpose.

For this article, the usage pattern for hexdump looks like this:

hexdump <optional switches> /dev/shm/devmem for <switches> supported

See Resources for a link to more information on hexdump.


Defining the scenario

The scenario we'll work with is a network sniffer that analyzes the packets received by the host and stores the data in a shared mem file, /dev/shm/devmem. This data contains information about the packet received.

The file looks generally like this:

  • The memory file storage is /dev/shm/devmem
  • The devmem file format contains:
    • 4 bytes of source address to notify who sent it
    • 4 bytes of destination address to notify who is it going to
    • 2 bytes of source port (in other words, the port on source that the packet used)
    • 2 bytes of destination port (similarly, the port on destination that the packet will use)
    • 2 bytes of protocol (the protocol that the packet is a part of)
    • 4 bytes of time to indicate the time stamp that the packet was seen by the network snippet
  • 1 record length = sum of the devmem specs (that is, 18 bytes)
  • The maximum size of memory file is 1KByte so it can contain 1024 bytes (1024 / 18 = 56 records)

If you hexdump and display the file manually on a Linux terminal, it will look something like this:

Listing 1. Displaying a dump file
# hexdump /dev/shm/devmem
0000000 0004 0000 0400 0000 fc64 0a00 00fb e000
0000010 14e9 14e9 0011 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0800
0000030 1668 0000 0000 0000 0032 0000 0000 0000
0000040 0000 0000 0001 e000 0000 0000 0002 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0800 0100 0000 0000 0000
0000070 0008 0000 0000 0000 fc64 0a00 fd64 0a00
0000080 2328 03ea 0006 0000 0000 0000 0000 0000
0000090 0000 0000 0000 0000 0000 0000 0000 0800
00000a0 7700 0001 0000 0000 0040 0000 0000 0000
00000b0 fd64 0a00 fc64 0a00 03ea 2328 0006 0000
00000c0 0000 0000 0000 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0800 0a00 0000 0000 0000
00000e0 0040 0000 0000 0000 fc64 0a00 fd64 0a00
00000f0 2328 03ec 0006 0000 0000 0000 0000 0000
0000100 0000 0000 0000 0000 0000 0000 0000 0800
0000110 7700 0001 0000 0000 0040 0000 0000 0000

Let's look at the steps involved in parsing the file.


Parsing the dump file

The steps to understanding the data in a memory dump file (identifying the format, parsing, and reading the file) are relatively simple:

  1. Open the file.
  2. Read the bytes with the file descriptor.
  3. Convert data to a readable string format when necessary.
  4. Verify whether the buffer that has been read is intact and whether it has any truncations or errors.
  5. Unpack the data from the buffer.
  6. Extract the information.
  7. Print the data.
  8. Build a loop to do steps 1 to 7 on each record in a shared data dump. (You don't want to do it manually, do you?)

Let's go through the process flow in more detail.

Open the file

To open a shared memory file, use the general form fd = open(fileName, mode). fd is the file descriptor, a pointer to the file. For this example, use the following:

  • fileName: /dev/shm/devmem
  • mode: rb (read only in binary mode)
Listing 2. Opening a shared memory file
fd = open('/dev/shm/devmem ','rb')

Read the bytes

To read the bytes using the file descriptor obtained in the previous function call, I use the following code. It reads the indicated number of bytes from the file parameter passed:

Listing 3. Opening a shared memory file
def ReadBytes(self, fd, noBytes):
 '''
 Read file and return the number of bytes as specified
 '''
 data = fd.read(noBytes)
 return data

buffer = ReadBytes('/dev/shm/devmem ', 18)
# Pass the file name and pass the number of bytes
# Number of bytes is 18 since in the example scenario each record
#  is of length 18

Here, reading the bytes is not enough to extract the necessary information; it returns a buffer if the string is read. It needs to be parsed and converted to an understandable string format.

Convert the data

Python structs can be used to handle binary data stored in files or from network connections, among other sources. The Python struct has two broad functionalities: pack and unpack.

The job of pack is to return a string containing the values v1, v2, ... packed according to the given format. The arguments must exactly match the values required by the format.

The role of unpack is to unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format: len(string) must equal calcsize(fmt).

The acceptable formats:

  • 1-byte formats:
    • b for signed char
    • B for unsigned char
  • 2-byte formats:
    • h for short integer
    • H for unsigned short
  • 4-byte formats:
    • l for long
    • L for unsigned long
  • 8-byte formats:
    • q for long long
    • Q for unsigned long long

For the other formats supported for packing and unpacking the buffer bytes, refer to the Python literature listed in Resources.

Verify the buffer

To verify that the buffer of 18 bytes that has been read is intact and that it does not have any truncations or errors, you can use the calcsize function to check if the byte size is still 18 as expected when read. You can use the Python assert function for this purpose.

Listing 4. Verifying that the buffer is correct
self.assertEqual(len(buffer), struct.calcsize('llllh'))

# 4 l's is 4*4 bytes = 16 bytes and h is 2 bytes so that is 18 bytes
# we could use QQh which is 2*8 + 2 = 18 bytes as well

Unpack the data

Now that you have verified the buffer is indeed 18 bytes, you can unpack your data from the buffer. struct provides a helpful unpack_from function that provides the number of bytes, buffer name, and the offset at which it needs to be read:

struct.unpack_from(fmt, buffer[, offset=0])

Extract the details

In our scenario, these are the details we want to extract:

Listing 5. Details to be extracted
sourceAddress = (struct.unpack_from('B', buffer,0),
                     struct.unpack_from('B', buffer,1),
                     struct.unpack_from('B', buffer,2),
                     struct.unpack_from('B', buffer,3))
destinationAddress = (struct.unpack_from('B', buffer,4),
                     struct.unpack_from('B', buffer,5),
                     struct.unpack_from('B', buffer,6),
                     struct.unpack_from('B', buffer,7))
sourcePort = (struct.unpack_from('B', buffer,8),
                     struct.unpack_from('B', buffer,9))
destinationPort = (struct.unpack_from('B', buffer,10),
                     struct.unpack_from('B', buffer,11))
protocolUsed = (struct.unpack_from('B', buffer,12),
                     struct.unpack_from('B', buffer,13))
timeStamp = (struct.unpack_from('B', buffer,14),
                     struct.unpack_from('B', buffer,15),
                     struct.unpack_from('B', buffer,16),
                     struct.unpack_from('B', buffer,17))

Note: Depending on the platform and whether the mem structure is big endian or little endian, you may need to swap the order in which bytes are read.

Print the output

Now that you have the unpacked values from the binary buffer that you read, you can use the standard print commands to get the output necessary.

Listing 6. Printing the details
print "sourceAddress =" ,  
      (struct.unpack_from('B', buffer,0),struct.unpack_from('B', buffer,1),
      struct.unpack_from('B', buffer,2),struct.unpack_from('B', buffer,3))
print "destinationAddress = " ,
      (struct.unpack_from('B', buffer,4),struct.unpack_from('B', buffer,5),
      struct.unpack_from('B', buffer,6),struct.unpack_from('B', buffer,7))
print "sourcePort = " , (struct.unpack_from('H',buffer,8))
print "destinationPort = " , (struct.unpack_from('H',buffer,10))
print "protocolUsed = " , (struct.unpack_from('H',buffer,12))
print "timeStamp = " ,  
      (struct.unpack_from('B', buffer,14),struct.unpack_from('B', buffer,15),
      struct.unpack_from('B', buffer,16),struct.unpack_from('B', buffer,17))

The expected output from Listing 6 should be in this format:

Listing 7. Output from printing
sourceAddress =  ((192,), (168,), (10,), (102,))
destinationAddress =  ((207,), (168,), (1,), (103,))
sourcePort =  (11299,)
destinationPort =  (11555,)
protocolUsed =  (256,)
timeStamp =  ((1,), (12,), (0,), (1,))

Automate the process for all the records

Now, to read and print all the records from the entire shared memory file, create a loop:

Listing 8. Creating a loop to read and print all records
for element in range (0,56):
#loop 18 since we know the file size and
#the record length: 1024/18 = 56 records
		
      buffer = ReadBytes('/dev/shm/devmem ', 18)
      self.assertEqual(len(buffer), struct.calcsize('llllh'))
        
      sourceAddress = struct.unpack_from('B', buffer,0),
                  struct.unpack_from('B', buffer,1),
                  struct.unpack_from('B', buffer,2),
                  struct.unpack_from('B', buffer,3))
      destinationAddress = struct.unpack_from('B', buffer,4),
                       struct.unpack_from('B', buffer,5),
                       struct.unpack_from('B', buffer,6),
                       struct.unpack_from('B', buffer,7))
      sourcePort = struct.unpack_from('B', buffer,8),
                 struct.unpack_from('B', buffer,9)
      destinationPort = struct.unpack_from('B', buffer,10),
                    struct.unpack_from('B', buffer,11))
      protocolUsed = ,struct.unpack_from('B', buffer,12),
                  struct.unpack_from('B', buffer,13))
      timeStamp = struct.unpack_from('B', buffer,14),
                struct.unpack_from('B', buffer,15),
                struct.unpack_from('B', buffer,16),
                struct.unpack_from('B', buffer,17))
        
      print "sourceAddress = " ,  
            struct.unpack_from('B', buffer,0),
            struct.unpack_from('B', buffer,1),
            struct.unpack_from('B', buffer,2),
            struct.unpack_from('B', buffer,3))
      print "destinationAddress =  " ,
            struct.unpack_from('B', buffer,4),
            struct.unpack_from('B', buffer,5),
            struct.unpack_from('B', buffer,6),
            struct.unpack_from('B', buffer,7))
      print "sourcePort =  " ,
            struct.unpack_from('H',buffer,8))
      print "destinationPort =  " ,
            struct.unpack_from('H',buffer,10))
      print "protocolUsed =  " ,
            struct.unpack_from('H',buffer,12))
      print "timeStamp = " ,  
            struct.unpack_from('B', buffer,14),
            struct.unpack_from('B', buffer,15),
            struct.unpack_from('B', buffer,16),
            struct.unpack_from('B', buffer,17))

That's all there is to it! We parsed a known format of binary mem dump in Linux, and used structs from Python to read the binary data dump and display it in a readable format.


Download

DescriptionNameSize
Python app for parsing memory dumpParseBinaryInPython.zip6KB

Resources

Learn

Get products and technologies

  • Download Python from the Python website.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=659524
ArticleTitle=Build a Python app for parsing shared memory dumps
publish-date=05302011