LoP/Cell/B.E.: Buffer overflow vulnerabilities, Part 1: Understanding buffer overflow issues for Linux on Power-based systems

Get acquainted with buffer overflow vulnerabilities in Linux® running on Power™/Cell Broadband Engine™ Architecture processor-based servers. Buffer overflows occur when a process tries to store data outside of the bounds of a fixed-length buffer. When that happens, all sorts of erratic system behavior can result, and some can be detrimental to your system's security. Part 1 of this article series briefly discusses buffer overflows and the Power and Cell/B.E.™ architectures, and then shows how you can change the process-execution flow in the target systems and overwrite a local variable in 32- and 64-bit modes. (Part 2 will show how to overwrite a function pointer in 32- and 64-bit modes and illustrate assembly components through shell, network, and socket code samples.)

Share:

Ramon de Carvalho Valle (rcvalle@br.ibm.com), Software Engineer, IBM

Ramon is a Software Engineer at the IBM Linux Technology Center in Sao Paulo, Brazil. He is a Founder/Security Researcher at RISE Security and has extensive experience in vulnerability research, exploitation techniques, exploit development, and reverse engineering on a wide range of operating systems and architectures. He also contributes to open source projects like The Metasploit Framework.



06 January 2009

Also available in Russian

In this article, all examples of buffer overflow vulnerabilities in Linux running on Power/Cell Broadband Engine Architecture processor-based servers were developed and executed on an IBM BladeCenter® JS22 Express server, an IBM BladeCenter QS21 server, and a Sony Playstation 3 running Red Hat Enterprise Linux 4 Update 7.

Review of buffer overflows

Let's start with a quick review of buffer overflows. A buffer overflow, or buffer overrun, occurs when a process attempts to store data beyond the boundaries of a fixed-length buffer. The result is that the extra data overwrites adjacent memory locations. The overwritten data can include other buffers, variables, program flow data, etc. Overwriting this data can cause such problems as erratic program behavior, memory-access exceptions, program terminations of the crash variety, the wrong returned results, or the most dangerous thing for systems integrity: a breach of security.

Buffer overflows cause many software weaknesses and, therefore, are the basis of malicious exploits. C/C++ systems are especially prone to overflows. They provide no built-in protection to stop accessing or overwriting data in any part of memory, and they don't automatically check that data written to a built-in buffer array is within the boundaries of that array. That's why you should always support a system that does bounds checking, either by you or by the compiler and runtime.

To learn more about buffer overflows and how to avoid them, read the developerWorks article "Secure programmer: Countering buffer overflows."


Power Architecture

The POWER (Performance Optimization With Enhanced RISC) Architecture, originally developed by IBM, was introduced with the RISC System/6000 product family in early 1990. In 1991, Apple, IBM, and Motorola, known as the AIM alliance, began the collaboration to evolve to the PowerPC® Architecture, expanding the architecture's applicability. In 1997, Motorola and IBM began another collaboration focused on optimizing PowerPC for embedded systems. At the end of 2004, the Power.org consortium was launched with the goal of developing community specifications and supporting development tools that work together to facilitate integration and enhanced implementations focused on the Power Architecture.

The Power Architecture is an open architecture defined by the Power Instruction Set Architecture (Power ISA) maintained by the Power Architecture Advisory Council, which ensures compatibility among implementations and allows anyone to design and fabricate Power Architecture-compliant processors. The Xbox 360 processor and the Cell Broadband Engine processor both shine as excellent examples.

A processor implementation that conforms to the Power Architecture has four basic classes of instructions:

  • Branch instructions
  • Fixed-point instructions and other instructions that use the fixed-point registers
  • Floating-point instructions and decimal floating-point instructions
  • Vector instructions

Fixed-point instructions operate on byte, halfword, word, and doubleword operands. Floating-point instructions operate on single-precision and double-precision floating-point operands. Vector instructions operate on vectors of scalar quantities and on scalar quantities, where the scalar size is byte, halfword, word, and quadword. The processor uses instructions that are four bytes long and word-aligned. It provides for byte, halfword, word, and doubleword operand fetches and stores between storage and a set of 32 General Purpose Registers (GPRs). It provides for word and doubleword operand fetches and stores between storage and a set of 32 Floating-Point Registers (FPRs). It also provides for byte, halfword, word, and quadword operand fetches and stores between storage and a set of 32 Vector Registers (VRs).

  • The Condition Register (CR) is a 32-bit register that reflects the result of certain operations and provides a mechanism for testing (and branching).
  • The Link Register (LR) is a 64-bit register. It can be used to provide the branch target address for the Branch Conditional to Link Register instruction, and it holds the return address after Branch instructions.
  • The Count Register (CTR) is a 64-bit register. It can be used to hold a loop count that can be decremented during execution of Branch instructions.
  • The Machine State Register (MSR) is a 64-bit register. This register defines the state of the processor. The 64th bit defines whether the processor is in 32-bit or 64-bit mode (0 or 1).

Processors provide two execution modes: 64-bit and 32-bit mode. In both modes, instructions that set a 64-bit register affect all 64 bits. The computational mode controls how the effective address is interpreted, how status bits are set, how the Link Register is set by Branch instructions, and how the Count Register is tested by Branch Conditional instructions. Nearly all instructions are available in both modes. In both modes, effective address computations use all 64 bits of the relevant registers (GPRs, LRs, CTRs, etc.) and produce a 64-bit result. However, in 32-bit mode, the high-order 32 bits of the computed effective address are ignored for the purpose of addressing storage.

All instructions are four bytes long and word-aligned. Thus, whenever instruction addresses are presented to the processor (as in Branch instructions), the low-order two bits are ignored. Similarly, whenever the processor develops an instruction address, the low-order two bits are zero.

Bits 0:5 always specify the opcode. Many instructions also have an extended opcode. The remaining bits of the instruction contain one or more fields for the different instruction formats.

A program references storage using the effective address computed by the processor when it executes a Storage Access or Branch instruction or when it fetches the next sequential instruction. Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corresponding byte. The byte ordering (Big-Endian or Little-Endian) for a storage access is specified by the operating system.


Cell Broadband Engine Architecture (CBEA)

The Cell Broadband Engine (Cell/B.E.) processor is the first implementation of a new multiprocessor family conforming to the Cell Broadband Engine Architecture (CBEA). The CBEA is an architecture that extends the 64-bit Power Architecture. The CBEA and the Cell/B.E. processor are the result of a collaboration between Sony, Toshiba, and IBM, known as STI, formally started in early 2001.

Although the Cell/B.E. processor was initially intended for applications in media-rich consumer-electronics devices, such as game consoles and high-definition televisions, the architecture is designed to enable fundamental advances in processor performance. These advances are expected to support a broad range of applications in both commercial and scientific fields.

The most distinguishing feature of the Cell/B.E. processor is that, although all processor elements share memory, their function is specialized into two types: the Power Processor Element (PPE) and the Synergistic Processor Element (SPE). The processor has one PPE and eight SPEs.

The first type of processor element, the PPE, contains a 64-bit Power Architecture core. It complies with the 64-bit Power Architecture and can run 32-bit and 64-bit operating systems and applications.

The second type of processor element, the SPE, is optimized for running compute-intensive SIMD applications; it is not optimized for running an operating system. The SPEs are independent processor elements, each running their own individual application programs or threads. Each SPE has full access to coherent shared memory, including the memory-mapped I/O space.

There is a mutual dependence between the PPE and the SPEs. The SPEs depend on the PPE to run the operating system and, in many cases, the top-level thread control for an application. The PPE depends on the SPEs to provide the bulk of the application performance.

The most significant difference between the SPE and PPE lies in how they access memory. The PPE accesses main storage (the effective-address space) with load and store instructions that move data between main storage and a private register file, the contents of which may be cached.

The SPEs access main storage with direct memory access (DMA) commands that move data and instructions between main storage and a private local memory, called a local store or local storage (LS). An SPE's instruction-fetches and load and store instructions access its private LS rather than shared main storage; the LS has no associated cache. This three-level organization of storage (register file, LS, main storage), with asynchronous DMA transfers between LS and main storage, is a radical break from conventional architecture and programming models because it explicitly parallelizes computation with the transfers of data and instructions that feed computation and store the results of computation in main storage.


Controlling buffer overflows

Now we'll look at:

  • Changing the process-execution flow
  • Overwriting a local variable in 32-bit mode
  • Overwriting a local variable in 64-bit mode

(And in Part 2, we'll cover overwriting a function pointer in both 32- and 64-bit modes and provide some example code.)

Changing process execution flow

Similarly to the x86/x86_64 architectures, a given process's execution flow in Power/CBEA can be changed by the following:

  • Overwriting a local variable that is near the buffer in memory of a given process's virtual address space to change the behavior of the application.
  • Overwriting the saved return-instruction pointer in a stack frame. Upon returning from the called function, execution will resume at the saved return-instruction pointer as specified.
  • Overwriting a function pointer or exception handler that is subsequently executed.

Overwriting a local variable in 32-bit mode

This section discusses how a given process's execution flow can be changed by overwriting a local variable.

The following example is vulnerable to a heap-based buffer overflow:

Listing 1. example1.c (vulnerable to a heap-based buffer overflow)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct mystruct {
    unsigned char buffer[16];
    unsigned long cookie;
};

int
main(int argc, char **argv)
{
    struct mystruct *s;

    if ((s = malloc(sizeof(struct mystruct))) == NULL) {
        perror("malloc");
        exit(EXIT_FAILURE);
    }

    s->cookie = 0;

    if (argc > 1)
        strcpy(s->buffer, argv[1]);

    if (s->cookie == 0x42424242) {
        printf("Congratulations! You won a cookie!\n");
        exit(EXIT_SUCCESS);
    }

    printf("Hello world!\n");

    exit(EXIT_SUCCESS);
}

Listing 1 does not validate user-supplied data when copying it to the buffer member of the previously allocated struct mystruct using the strcpy function, resulting in a heap-based buffer overflow. Normal execution of the example writes the "Hello world!" string to stdout.

Listing 2. Compilation and execution of Listing 1
$ gcc -Wall -o example1 example1.c
$ ./example1
Hello world!
$

Figure 1 represents the struct mystruct and its members in the heap segment.

Figure 1. The struct mystruct
Lesser                                                              Greater
addresses                                                         addresses

                         struct mystruct
                         buffer            cookie
                         [                ][    ]

Bottom of                                                            Top of
heap                                                                   heap

The process's execution flow can be changed by overwriting the cookie member of struct mystruct that is located right after the buffer in memory with the 0x42424242 value (BBBB in the ASCII character set).

Listing 3. Overwriting the cookie member
$ ./example1 AAAAAAAAAAAAAAAABBBB
Congratulations! You won a cookie!
$

Figure 2 represents the struct mystruct and its members in the heap segment after the overflow.

Figure 2. The struct mystruct after the overflow
Lesser                                                              Greater
addresses                                                         addresses

                         struct mystruct
                         buffer            cookie
                         [AAAAAAAAAAAAAAAA][BBBB]

Bottom of                                                            Top of
heap                                                                   heap

Overwriting a local variable in 64-bit mode

Now let's talk about how a given process's execution flow can be changed by overwriting a local variable in 64-bit mode. In the C language, only long and pointer data types are changed between 32-bit and 64-bit modes. Any pointer arithmetic should be performed using variables of type long regardless if in 32-bit or 64-bit mode. Pointer assignment should only be performed between other pointers or variables of type long.

The following example is vulnerable to a heap-based buffer overflow:

Listing 4. example2.c (vulnerable to a heap-based buffer overflow)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct mystruct {
    unsigned char buffer[16];
    unsigned long cookie;
};

int
main(int argc, char **argv)
{
    struct mystruct *s;

    if ((s = malloc(sizeof(struct mystruct))) == NULL) {
        perror("malloc");
        exit(EXIT_FAILURE);
    }

    s->cookie = 0;

    if (argc > 1)
        strcpy(s->buffer, argv[1]);

    if (s->cookie == 0x4242424242424242) {
        printf("Congratulations! You won a cookie!\n");
        exit(EXIT_SUCCESS);
    }

    printf("Hello world!\n");

    exit(EXIT_SUCCESS);
}

The process's execution flow can be changed by overwriting the cookie member of struct mystruct that is located right after the buffer in memory with the 0x4242424242424242 value (BBBBBBBB in the ASCII character set).

Listing 5. Overwriting the cookie member
$ gcc -Wall -m64 -o example2 example2.c
$ ./example2 AAAAAAAAAAAAAAAABBBBBBBB
Congratulations! You won a cookie!
$

Figure 3 represents the struct mystruct and its members in the heap segment after the overflow.

Figure 3. The struct mystruct after the overflow
Lesser                                                              Greater
addresses                                                         addresses

                       struct mystruct
                       buffer            cookie
                       [AAAAAAAAAAAAAAAA][BBBBBBBB]

Bottom of                                                            Top of
heap                                                                   heap

In the next installment

We've just scratched the surface. In Part 2, I'll show how to overwrite a function pointer and cover assembly components and some juicy shell, network, socket code samples.

Resources

Learn

Get products and technologies

  • With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=362392
ArticleTitle=LoP/Cell/B.E.: Buffer overflow vulnerabilities, Part 1: Understanding buffer overflow issues for Linux on Power-based systems
publish-date=01062009