Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Build a Python app for parsing shared memory dumps

Use the struct utility to extract system data for analysis

Asha Shivalingaiah, Software Engineer, IBM
Asha Shivalingaiah is a Software Engineer working as a tester at the Australia Development Lab for IBM Security Solution, Tivoli. Since joining the IBM Rational team in 2008, she has worked with Rational Rhapsody and other Rational portfolio tools for managing the software development life cycle. She is well versed in modeling languages such as UML 2 and SysML, and in using Rational Rhapsody for model-driven development.

Summary:  Learn how to parse a machine-readable shared memory dump on a Linux platform and extract your expected data format using Python and the struct utility. In this article, you'll first see how to determine the format of the data by reading the binary file format of the dump file; you need this in order to parse, extract, and analyze the data. Next, you'll see how to parse the file based on the format, and then match the results with the expected format to output a validation result.

Update: In the Download section, you'll find a working Python application and dump file that you can use as is or modify for your own needs. We changed the name of the dump file throughout this article to match the name used in the download. -Ed.

Date:  30 May 2011 (Published 17 May 2011)
Level:  Intermediate PDF:  A4 and Letter (36KB | 11 pages)Get Adobe® Reader®
Also available in:   Korean  Russian  Japanese  Portuguese  Spanish

Activity:  37813 views
Comments:  

Memory dumps reveal the recorded state of working memory at a specific point in operation. They are an important tool in system administration because they provide "forensic" evidence of the system's condition.

Before you begin

For the instructions, code samples, and code download (see Download, below) in this article, I used Python version 2.4, which you can download from the Python site (see Resources below). You may get different results with other versions.

Before you start, make sure you are familiar with the following:

  • The /dev/shm implementation of traditional shared memory
  • Viewing shared memory data dumps manually on a Linux system
  • Certain dependencies (Linux file open, read, write, close concepts; using the file descriptor and the modes that the file can be opened in; basic Python structure concepts)
  • GNU/Linux, in general

Understanding shared memory dumps in Linux

/dev/shm is an implementation of the traditional shared memory concept. It is a widely used and accepted means of passing data between programs. In /dev/shm, one program or daemon creates a memory portion that other processes (at relevant permission levels) can access. This is a quick and easy method of sharing data between processes.

Each program creates its own file; in my examples, I use the file name devmem located at /dev/shm/devmem.

Viewing the shared memory dump manually on Linux

You cannot view the shared memory files (commonly referred as shm files) by using the cat utility generally used for file display in Linux since these shm files are in a binary format. They will look like a chunk of garbled characters if you try to view them with generic file-viewing methods. I use the hexdump) utility to read the mem files and view them in a readable format; other utilities are available for this purpose.

For this article, the usage pattern for hexdump looks like this:

hexdump <optional switches> /dev/shm/devmem for <switches> supported

See Resources for a link to more information on hexdump.


Defining the scenario

The scenario we'll work with is a network sniffer that analyzes the packets received by the host and stores the data in a shared mem file, /dev/shm/devmem. This data contains information about the packet received.

The file looks generally like this:

  • The memory file storage is /dev/shm/devmem
  • The devmem file format contains:
    • 4 bytes of source address to notify who sent it
    • 4 bytes of destination address to notify who is it going to
    • 2 bytes of source port (in other words, the port on source that the packet used)
    • 2 bytes of destination port (similarly, the port on destination that the packet will use)
    • 2 bytes of protocol (the protocol that the packet is a part of)
    • 4 bytes of time to indicate the time stamp that the packet was seen by the network snippet
  • 1 record length = sum of the devmem specs (that is, 18 bytes)
  • The maximum size of memory file is 1KByte so it can contain 1024 bytes (1024 / 18 = 56 records)

If you hexdump and display the file manually on a Linux terminal, it will look something like this:


Listing 1. Displaying a dump file

# hexdump /dev/shm/devmem
0000000 0004 0000 0400 0000 fc64 0a00 00fb e000
0000010 14e9 14e9 0011 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0800
0000030 1668 0000 0000 0000 0032 0000 0000 0000
0000040 0000 0000 0001 e000 0000 0000 0002 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0800 0100 0000 0000 0000
0000070 0008 0000 0000 0000 fc64 0a00 fd64 0a00
0000080 2328 03ea 0006 0000 0000 0000 0000 0000
0000090 0000 0000 0000 0000 0000 0000 0000 0800
00000a0 7700 0001 0000 0000 0040 0000 0000 0000
00000b0 fd64 0a00 fc64 0a00 03ea 2328 0006 0000
00000c0 0000 0000 0000 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0800 0a00 0000 0000 0000
00000e0 0040 0000 0000 0000 fc64 0a00 fd64 0a00
00000f0 2328 03ec 0006 0000 0000 0000 0000 0000
0000100 0000 0000 0000 0000 0000 0000 0000 0800
0000110 7700 0001 0000 0000 0040 0000 0000 0000

Let's look at the steps involved in parsing the file.


Parsing the dump file

The steps to understanding the data in a memory dump file (identifying the format, parsing, and reading the file) are relatively simple:

  1. Open the file.
  2. Read the bytes with the file descriptor.
  3. Convert data to a readable string format when necessary.
  4. Verify whether the buffer that has been read is intact and whether it has any truncations or errors.
  5. Unpack the data from the buffer.
  6. Extract the information.
  7. Print the data.
  8. Build a loop to do steps 1 to 7 on each record in a shared data dump. (You don't want to do it manually, do you?)

Let's go through the process flow in more detail.

Open the file

To open a shared memory file, use the general form fd = open(fileName, mode). fd is the file descriptor, a pointer to the file. For this example, use the following:

  • fileName: /dev/shm/devmem
  • mode: rb (read only in binary mode)

Listing 2. Opening a shared memory file

fd = open('/dev/shm/devmem ','rb')

Read the bytes

To read the bytes using the file descriptor obtained in the previous function call, I use the following code. It reads the indicated number of bytes from the file parameter passed:


Listing 3. Opening a shared memory file

def ReadBytes(self, fd, noBytes):
 '''
 Read file and return the number of bytes as specified
 '''
 data = fd.read(noBytes)
 return data

buffer = ReadBytes('/dev/shm/devmem ', 18)
# Pass the file name and pass the number of bytes
# Number of bytes is 18 since in the example scenario each record
#  is of length 18

Here, reading the bytes is not enough to extract the necessary information; it returns a buffer if the string is read. It needs to be parsed and converted to an understandable string format.

Convert the data

Python structs can be used to handle binary data stored in files or from network connections, among other sources. The Python struct has two broad functionalities: pack and unpack.

The job of pack is to return a string containing the values v1, v2, ... packed according to the given format. The arguments must exactly match the values required by the format.

The role of unpack is to unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format: len(string) must equal calcsize(fmt).

The acceptable formats:

  • 1-byte formats:
    • b for signed char
    • B for unsigned char
  • 2-byte formats:
    • h for short integer
    • H for unsigned short
  • 4-byte formats:
    • l for long
    • L for unsigned long
  • 8-byte formats:
    • q for long long
    • Q for unsigned long long

For the other formats supported for packing and unpacking the buffer bytes, refer to the Python literature listed in Resources.

Verify the buffer

To verify that the buffer of 18 bytes that has been read is intact and that it does not have any truncations or errors, you can use the calcsize function to check if the byte size is still 18 as expected when read. You can use the Python assert function for this purpose.


Listing 4. Verifying that the buffer is correct

self.assertEqual(len(buffer), struct.calcsize('llllh'))

# 4 l's is 4*4 bytes = 16 bytes and h is 2 bytes so that is 18 bytes
# we could use QQh which is 2*8 + 2 = 18 bytes as well

Unpack the data

Now that you have verified the buffer is indeed 18 bytes, you can unpack your data from the buffer. struct provides a helpful unpack_from function that provides the number of bytes, buffer name, and the offset at which it needs to be read:

struct.unpack_from(fmt, buffer[, offset=0])

Extract the details

In our scenario, these are the details we want to extract:


Listing 5. Details to be extracted

sourceAddress = (struct.unpack_from('B', buffer,0),
                     struct.unpack_from('B', buffer,1),
                     struct.unpack_from('B', buffer,2),
                     struct.unpack_from('B', buffer,3))
destinationAddress = (struct.unpack_from('B', buffer,4),
                     struct.unpack_from('B', buffer,5),
                     struct.unpack_from('B', buffer,6),
                     struct.unpack_from('B', buffer,7))
sourcePort = (struct.unpack_from('B', buffer,8),
                     struct.unpack_from('B', buffer,9))
destinationPort = (struct.unpack_from('B', buffer,10),
                     struct.unpack_from('B', buffer,11))
protocolUsed = (struct.unpack_from('B', buffer,12),
                     struct.unpack_from('B', buffer,13))
timeStamp = (struct.unpack_from('B', buffer,14),
                     struct.unpack_from('B', buffer,15),
                     struct.unpack_from('B', buffer,16),
                     struct.unpack_from('B', buffer,17))

Note: Depending on the platform and whether the mem structure is big endian or little endian, you may need to swap the order in which bytes are read.

Print the output

Now that you have the unpacked values from the binary buffer that you read, you can use the standard print commands to get the output necessary.


Listing 6. Printing the details

print "sourceAddress =" ,  
      (struct.unpack_from('B', buffer,0),struct.unpack_from('B', buffer,1),
      struct.unpack_from('B', buffer,2),struct.unpack_from('B', buffer,3))
print "destinationAddress = " ,
      (struct.unpack_from('B', buffer,4),struct.unpack_from('B', buffer,5),
      struct.unpack_from('B', buffer,6),struct.unpack_from('B', buffer,7))
print "sourcePort = " , (struct.unpack_from('H',buffer,8))
print "destinationPort = " , (struct.unpack_from('H',buffer,10))
print "protocolUsed = " , (struct.unpack_from('H',buffer,12))
print "timeStamp = " ,  
      (struct.unpack_from('B', buffer,14),struct.unpack_from('B', buffer,15),
      struct.unpack_from('B', buffer,16),struct.unpack_from('B', buffer,17))

The expected output from Listing 6 should be in this format:


Listing 7. Output from printing

sourceAddress =  ((192,), (168,), (10,), (102,))
destinationAddress =  ((207,), (168,), (1,), (103,))
sourcePort =  (11299,)
destinationPort =  (11555,)
protocolUsed =  (256,)
timeStamp =  ((1,), (12,), (0,), (1,))

Automate the process for all the records

Now, to read and print all the records from the entire shared memory file, create a loop:


Listing 8. Creating a loop to read and print all records

for element in range (0,56):
#loop 18 since we know the file size and
#the record length: 1024/18 = 56 records
		
      buffer = ReadBytes('/dev/shm/devmem ', 18)
      self.assertEqual(len(buffer), struct.calcsize('llllh'))
        
      sourceAddress = struct.unpack_from('B', buffer,0),
                  struct.unpack_from('B', buffer,1),
                  struct.unpack_from('B', buffer,2),
                  struct.unpack_from('B', buffer,3))
      destinationAddress = struct.unpack_from('B', buffer,4),
                       struct.unpack_from('B', buffer,5),
                       struct.unpack_from('B', buffer,6),
                       struct.unpack_from('B', buffer,7))
      sourcePort = struct.unpack_from('B', buffer,8),
                 struct.unpack_from('B', buffer,9)
      destinationPort = struct.unpack_from('B', buffer,10),
                    struct.unpack_from('B', buffer,11))
      protocolUsed = ,struct.unpack_from('B', buffer,12),
                  struct.unpack_from('B', buffer,13))
      timeStamp = struct.unpack_from('B', buffer,14),
                struct.unpack_from('B', buffer,15),
                struct.unpack_from('B', buffer,16),
                struct.unpack_from('B', buffer,17))
        
      print "sourceAddress = " ,  
            struct.unpack_from('B', buffer,0),
            struct.unpack_from('B', buffer,1),
            struct.unpack_from('B', buffer,2),
            struct.unpack_from('B', buffer,3))
      print "destinationAddress =  " ,
            struct.unpack_from('B', buffer,4),
            struct.unpack_from('B', buffer,5),
            struct.unpack_from('B', buffer,6),
            struct.unpack_from('B', buffer,7))
      print "sourcePort =  " ,
            struct.unpack_from('H',buffer,8))
      print "destinationPort =  " ,
            struct.unpack_from('H',buffer,10))
      print "protocolUsed =  " ,
            struct.unpack_from('H',buffer,12))
      print "timeStamp = " ,  
            struct.unpack_from('B', buffer,14),
            struct.unpack_from('B', buffer,15),
            struct.unpack_from('B', buffer,16),
            struct.unpack_from('B', buffer,17))

That's all there is to it! We parsed a known format of binary mem dump in Linux, and used structs from Python to read the binary data dump and display it in a readable format.



Download

DescriptionNameSizeDownload method
Python app for parsing memory dumpParseBinaryInPython.zip6KBHTTP

Information about download methods


Resources

Learn

Get products and technologies

  • Download Python from the Python website.

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

Asha Shivalingaiah is a Software Engineer working as a tester at the Australia Development Lab for IBM Security Solution, Tivoli. Since joining the IBM Rational team in 2008, she has worked with Rational Rhapsody and other Rational portfolio tools for managing the software development life cycle. She is well versed in modeling languages such as UML 2 and SysML, and in using Rational Rhapsody for model-driven development.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=659524
ArticleTitle=Build a Python app for parsing shared memory dumps
publish-date=05302011

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers