Skip to main content

Migrating from x86 to PowerPC, Part 6: Add vision to your robot submarine

The better to see you with, my dear

Lewin Edwards (sysadm@zws.com), Author, Freelance
Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years developing x86, ARM and PA-RISC-based networked multimedia appliances at Digi-Frame Inc. He has extensive experience in encryption and security software and is the author of two books on embedded systems development. He can be reached at sysadm@zws.com.

Summary:  In this episode of the ongoing Kuro Box project, learn how to add a USB camera to the machine. This article includes example Linux code to initialize and read from a USB camera through Video4Linux. Also find a brief introduction to edge detection techniques in captured images.

View more content in this series

Date:  21 Jul 2005
Level:  Introductory
Activity:  1943 views

The last few articles in this series discussed topics that fall under the heading of "infrastructure." You've seen how the Kuro Box is put together, from a hardware and software perspective, and how to build the scaffolding that lets you talk to it over TCP/IP using a Web browser or FTP client.

Part 6 looks at some code that has a real, observable function in the submarine project, demonstrating how to interface and talk to a digital camera connected to the PowerPC® board over USB. I'll also share with you some introductory material about image preprocessing techniques used in machine vision applications. (Don't worry, I'll keep the math very light).

Why such an overpowered processor?

But first, this is an appropriate moment to explain further my rationale for using a PowerPC (or, in the original incarnation, an x86 single-board computer) in my submarine. One major downside of these high-end processors in a battery-powered application is that they're much more energy-hungry than a simple 8-bit microcontroller. And anyone who has built a vehicle application like this (say, NASA) knows you can do an awful lot of navigation and autonomous control with an 8-bit chip; Sojourner, the first Mars rover, ran on an Intel® 80C85. (Note, however, that the two rovers currently prowling the surface of Mars are controlled by the RAD 6000, a radiation-hardened RS/6000 variant. The Pathfinder probe that carried Sojourner also ran on this processor).

So, why did I build a hungry 32-bit hippo into my submarine? The main reason is because it's vastly easier to interface standard consumer peripherals to these standardized platforms. This means that it's much cheaper and considerably faster to build a prototype, working demonstration, or one-off production piece if you build it around a standard hardware platform running an off-the-shelf operating system. Moreover, while you're in the proof-of-concept phase of any complex embedded project that has a science aspect (as opposed to pure engineering), you probably don't have a perfect understanding of the CPU resources required to carry out the tasks you'd like the device to do. Using an over-powered system allows you to get a better feel for the data sizes and processing requirements in the application. You can develop your code rapidly in a high-level language like C, and when you're ready to commercialize the product, you can port it down into a single microcontroller or a group of smaller microcontrollers.

A practical example

A practical example of this illustrates my point. I want to be able to run some simple image capture and analysis code in the submarine so I can determine if an interesting sea beast is swimming past. To do this, I need at least one digital camera from which I can acquire image data for processing. (The reason I spelled out the requirement in those precise terms is to distinguish it from the path of simply attaching a camera with a computer-controlled shutter button -- that's easy to interface and great for acquiring images, but useless for processing them during the mission).

Now, I know that all the navigation and system maintenance functions in my submarine can run on a reasonably frisky, high-end 8-bit microcontroller. You can certainly take an off-the-shelf CMOS image sensor chip and interface it to the same 8-bit micro. It's possible (though difficult and restrictive) to squeeze some image-processing functionality into the microcontroller. But getting this all working is an incredibly complex job. First, you need to find someplace where you can actually buy small hobbyist quantities of an image sensor IC (this step is considerably harder than it sounds; the easiest route is usually to buy a camera and cannibalize it). Then, you need to work out the hardware and software details of interfacing to the sensor. You probably need to convert the data from Bayer pattern to simple RGB; you must handle white balance and exposure, and you might have to solve annoying timing problems in order to get the chip to play nice.(As a free bonus, the realtime performance of the rest of your code suffers).

By this time, you're also probably running up against RAM and ROM size limits inside your 8-bit microcontroller, so you have to start juggling textbook algorithms so that they can work in your system, which is a huge hassle. You don't want to be optimizing your code at the front end of a research project -- you want to develop and test your algorithms, decide if the idea even works at all, and only then make an informed decision as to whether to optimize into a smaller chip or plunk down the cash for a high-end part.

Therefore, it's a much more efficient use of your time to buy a fairly generic hardware platform with an off-the-shelf OS (RTOS or non-RTOS according to your needs) because it will come with driver support for a bunch of standard consumer peripherals. You can go out and buy these cheap peripherals, plug them right in, and start working.

Please note, by the way, that I don't at all advocate using consumer-grade equipment in a fielded commercial product. I've got some years of bitter experience with this approach to manufacturing; be advised that consumer peripherals change every few months, and it is a very frustrating exercise to keep shipping a consistent product if you're relying on a volatile component. Servicing field returns is even harder, because units produced in month A won't contain the same parts as units from month A+1.


Introducing the USB pencam

The specific camera I have chosen to use for this article is a Jazz Digi-Stix JDC11 "pencam." The same code I provide here will, however, work practically unmodified on many other USB and older parallel port cameras supported by Linux™.

The JDC11 is a very cheap and simple USB-connected camera based on the STV0680 chip (literally dozens of other cameras use the same chip and are practically identical to the JDC11). The STV0680 is one of several Webcam chipsets supported natively by the Linux kernel. It supports the capture of color images at a resolution up to PAL CIF (352x288 pixels, or a quarter of standard TV resolution; 101 kilopixels if you prefer that sort of metric). This might sound horribly low compared to the multimegapixel snapshots expected from modern digital cameras, but in fact this image size is more than satisfactory for tasks like simple visual navigation and motion detection.

Important: Like most pencams, the JDC11 can function either as a tethered Webcam-type device (no batteries required) or as a stand-alone digital camera, running off two AAA cells. If you have the power switch in the "on" position, the camera assumes you want to run in stand-alone mode. To run the code in this article, you want the camera in Webcam mode, so leave the power switch in the "off" position and don't install batteries.

Note that by default, Kuro Box's software bundle does not include the driver modules for USB cameras. Before going any further, please ensure that you followed the instructions in the third article of this series, where I described how to install the complete set of modules from linkstation.yi.org; remember to depmod -qa to fix module dependencies after you install the updates.

You'll also need to create nodes in /dev in order to be able to access the camera driver. The video capture devices /dev/video0 through video3 are character devices, major 81, minor 0 through 3. Unfortunately, however, the software distribution on Kuro Box doesn't include mknod(1), so you either need to download and build that utility, or steal ready-made device nodes off a working Linux system. In the source code archive linked in Resources, I've included a tarball called devices.tar.gz -- simply copy it into the root directory of your Kuro and tar zxvf it (from the root) in order to create the necessary nodes.

Once the drivers are installed, you can load them by plugging in the camera and, if necessary, force-loading the drivers using modprobe stv680 ; modprobe videodev. Or you can simply restart the Kuro Box with the camera connected.


Video4Linux and you

Video input devices are supported in Linux through the Video4Linux APIs (commonly abbreviated as V4L). Unfortunately, these APIs don't seem to be very well documented. The canonical reference "document" for this programming interface is the source code for xawtv; if you ask around for help with V4L you'll invariably be told to read the xawtv source. The problem with this is that xawtv is a fiendishly complicated piece of software with a lot of features and workarounds for device-specific oddities, and it's not exactly easy to learn from it.

A further difficulty is that there are two flavors of V4L: the original version (V4L or V4L1), and a newer version called V4L2. The newer version is included in kernel 2.5 and beyond. I'm dealing here with V4L1, mainly because of the vintage of the kernel shipped with Kuro Box and the difficulty and risk of upgrading to a 2.6.x kernel. (V4L2 is, mercifully, somewhat better documented than V4L1, but it's still not the de facto standard yet).

Here is a thumbnail description of how to use V4L to acquire an image from a USB video input device (note that all of these structures are defined in linux/videodev.h):

  1. Open the appropriate /dev/video device (video0 by default).
  2. Use the VIDIOCGCAP ioctl to populate a video_capability structure with information about the device. Among other information, this will tell you the range of supported resolutions and whether the device supports audio capture.
  3. Use the VIDIOCGPICT ioctl to populate a video_picture structure with information about the device's current settings. The code I provide here requires that the device can deliver image data in 24bpp RGB format. We ensure this by checking the vp.palette member and if it's something other than VIDEO_PALETTE_RGB24, we set it to that value and pass the same video_picture structure to the VIDIOCSPICT ioctl. Note that most V4L devices don't support this format in hardware, since it's byte-inefficient. However, the underlying driver can do the conversion in software with a surprisingly low processing overhead. Since you have to do this conversion step anyway, you may as well let the driver do it. By the way, in this step you might also want to modify some other picture parameters such as white level, hue, brightness, and so on. Again, not all devices have hardware support for these settings.
  4. Create a video_window structure describing the desired image capture size. My code simply picks the largest possible capture window (as returned in Step 2). Pass this structure to the VIDIOCSWIN ioctl.
  5. At this point, you should do a sanity check by using the VIDIOCGWIN ioctl to verify that your window request was honored. The reason this is essential is because not all devices support every conceivable combination of resolutions and color formats. Simply because the device told you that it supports a resolution of up to (say) 1024x768 pixels and that it also supports 24bpp RGB data does not imply that it supports both of those options at once. You might be limited to black and white capture if you bump the resolution up to 1024x768, for example.
  6. Allocate RAM for the captured image.
  7. Use read(2) on the file descriptor opened in Step 1 to gather the frame. data. The data size to be read, in bytes, is width x height x 3 (RGB).
  8. You may continue reading frames until you don't need any more.

Note that this is not a completely generic description of how to acquire an image from just any old V4L-supported device. Many V4L input devices work differently from the above: They acquire data into an internal buffer (often in your video card's frame buffer RAM), which you have to map into your process's address space. The great thing about the USB cameras is that they can be accessed very simply, as you see above.

A simple program to grab video

If you extract the source code tarball linked in Resources , you'll find two directories: vidcap and vidproc. Ignore vidproc for the moment, and build the application you'll find in vidcap. This application simply looks at /dev/video0, gives you a bunch of information about the device attached there, captures a single image, and saves it in Windows® BMP format to a filename specified on the command line.

I use BMP because it's a very simple file format (in other words, it's easy for other little applets to work with BMPs), and it's lossless. JPEG files would be much smaller, but the quantization noise would make subsequent edge-detection steps very unreliable. Here's a quick description of the BMP file format header.


Listing 1. The BMP header

00 char signature[] = "BM";     // type header
02 unsigned int size;           // (32 bits) Size of file, including this 
header and all data
06 unsigned short reserved1;    // (16 bits) Reserved! (0x00)
08 unsigned short reserved2;    // (16 bits) Reserved! (0x00)
0A unsigned int bitsoffset;     // (32 bits) Offset of bitmap data from 
start of file (0x36)
0E unsigned int headersize;     // (32 bits) Size of BITMAPINFOHEADER (0x28)
12 unsigned int width;          // (32 bits) Horizontal pixel count
16 unsigned int height;         // (32 bits) Vertical pixel count
1A unsigned short planes;       // (16 bits) Number of planes (0x0001 for 
24-bit BMPs)
1C unsigned short bitsperpixel; // (16 bits) Number of bits per pixel 
(0x0018 for 24-bit BMPs)
1E unsigned int compression;    // (32 bits) Compression method (0)
22 unsigned int imagesize;      // (32 bits) Size of bitmap data area, 
excluding header
26 unsigned int xres;           // (32 bits) Target device x-resolution 
(0x0b12)
2A unsigned int yres;           // (32 bits) Target device y-resolution 
(0x0b12)
2E unsigned int color_indices;  // (32 bits) Number of color indices (N/A - 
use 0)
32 unsigned int important_indices;// (32 bits) Number of "important" color 
indices (N/A - use 0)

For 24-bit BMPs, the actual bitmap data is stored next, in left-to-right order with the bottom scanline first. The first byte is BLUE, the next byte is GREEN, and the next byte is RED data for the leftmost pixel of bottom scanline, then so on to the rightmost pixel of the scanline, followed by the leftmost pixel of the second-from-bottom scanline, and so forth. BMPs at color depths other than 24bpp have a more complex family of formats, which I won't get into here. (Note also that OS/2® has a somewhat different BMP format that isn't compatible with Windows tools).

By the way, avid readers should be aware that the bmplib.c library I've included in these source files is a slightly newer version than the code that appeared in my second book; I optimized the write speed considerably.


From pixels to objects

Why BMP?

Uncompressed BMPs are almost nothing more than a dump of display memory with a little header that describes the dimensions. You can load a BMP into RAM without doing any decoding whatsoever, manipulate the pixel data, and write it out again -- it's a good choice for little proto-applets that do algorithmic magic on images.

TIFF doesn't give great compression ratios on photographic images, and both TIFF and PNG are complex to decode. Of course, libraries can do the dirty work, but I don't see a reason to invoke them in this sort of situation.

So, now that you have a means of acquiring images, see what you can do with them. Since the price of CMOS image sensors and microcontrollers with enough RAM to work with images are both dropping rapidly, many simple machine vision applications are poised to appear in the consumer arena. I have seen demonstrated a chipset costing less than US$5 in production quantity, which can recognize shapes and colors and announce them. For example, you can hold up a green triangle and the unit will say "Green triangle."

My specific area of interest is, however, shape and motion processing. You can find a lot of rather dry and complicated literature on shape recognition; I won't get into that, because the main purpose of this article was to introduce you to acquiring the raw image data. I would, however, like to focus on one of the building blocks of shape processing, which is edge detection. Find a simple demo of the sort of preprocessor you might build in the vidproc directory.

Here's a sample image (a picture of a floppy disk) before and after being processed:


Figure 1. Floppy in good lighting
Floppy in good lighting

Figure 2. Floppy in good lighting, processed
Floppy in good lighting, processed

Now, here's a copy of the same image, shot in lower light conditions, with an LED flashlight aimed at part of the picture to confuse things.


Figure 3. Floppy in poor lighting
Floppy in poor lighting

Figure 4. Floppy in poor lighting, processed
Floppy in poor lighting, processed

The important points to glean from these images are:

  • The object has been successfully picked out of its background despite the fact that the background has noise introduced by the low-quality image sensor and the uneven lighting.
  • The low-light image looks very different from the bright-light image before processing, but after processing, the two look almost identical (there's slightly more noise in the low-light image, but nothing major).
  • The LED flashlight had absolutely no effect on the processed images despite being clearly visible in the unprocessed low-light image.

To see how this was achieved, look at the FindEdgeScanline function in edge.c. This function works by looking at adjacent pixels in a scanline in groups of three. First, it converts the color image to an unweighted grayscale value; this is a simple arithmetic average. (More accurate results could be obtained by using a weighted average calibrated for the color sensitivity of the image sensor).

Next, it turns the three pixels into two derivative values representing the gradient between pixels 1 and 2, and the gradient between pixels 2 and 3. At this point, a special "fudge" factor is added; if the absolute value of the gradient is less than 8, it is deemed to be zero. This is kind of a sharpness factor; the larger the fudge factor, the sharper an edge has to be to remain visible in the processed output. (This fudge factor erased the smooth gradient of the LED flashlight beam).

Finally, the code looks at the difference between these two derivatives (or, it finds the second derivative). If there is a change in sign of this second derivative, the corresponding output pixel is made black, indicating an edge. If there is no change in sign, the output pixel is made white. The end result is a fairly good separation of objects (except lines parallel to the top and bottom of the frame). Basically, what we are doing is taking the second derivative (arithmetically), quantizing the result, and looking for a change of sign.

Great! Now you have a submarine that can see the world around it, and that you can connect to (and, by implication, control) using a Web browser. In the next article, you'll see how to start building some circuits, so limber up your soldering iron. I'll introduce the "real" hardware block diagram of the E-2 submarine and start building both sides of the interface firmware that drives the sensors and actuators in the vehicle.



Download

DescriptionNameSizeDownload method
Source codepa-migrate6code.tar.gz7 KB HTTP

Information about download methods


Resources

About the author

Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years developing x86, ARM and PA-RISC-based networked multimedia appliances at Digi-Frame Inc. He has extensive experience in encryption and security software and is the author of two books on embedded systems development. He can be reached at sysadm@zws.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=89228
ArticleTitle=Migrating from x86 to PowerPC, Part 6: Add vision to your robot submarine
publish-date=07212005
author1-email=sysadm@zws.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers