The last few articles in this series discussed topics that fall under the heading of "infrastructure." You've seen how the Kuro Box is put together, from a hardware and software perspective, and how to build the scaffolding that lets you talk to it over TCP/IP using a Web browser or FTP client.
Part 6 looks at some code that has a real, observable function in the submarine project, demonstrating how to interface and talk to a digital camera connected to the PowerPC® board over USB. I'll also share with you some introductory material about image preprocessing techniques used in machine vision applications. (Don't worry, I'll keep the math very light).
Why such an overpowered processor?
But first, this is an appropriate moment to explain further my rationale for using a PowerPC (or, in the original incarnation, an x86 single-board computer) in my submarine. One major downside of these high-end processors in a battery-powered application is that they're much more energy-hungry than a simple 8-bit microcontroller. And anyone who has built a vehicle application like this (say, NASA) knows you can do an awful lot of navigation and autonomous control with an 8-bit chip; Sojourner, the first Mars rover, ran on an Intel® 80C85. (Note, however, that the two rovers currently prowling the surface of Mars are controlled by the RAD 6000, a radiation-hardened RS/6000 variant. The Pathfinder probe that carried Sojourner also ran on this processor).
So, why did I build a hungry 32-bit hippo into my submarine? The main reason is because it's vastly easier to interface standard consumer peripherals to these standardized platforms. This means that it's much cheaper and considerably faster to build a prototype, working demonstration, or one-off production piece if you build it around a standard hardware platform running an off-the-shelf operating system. Moreover, while you're in the proof-of-concept phase of any complex embedded project that has a science aspect (as opposed to pure engineering), you probably don't have a perfect understanding of the CPU resources required to carry out the tasks you'd like the device to do. Using an over-powered system allows you to get a better feel for the data sizes and processing requirements in the application. You can develop your code rapidly in a high-level language like C, and when you're ready to commercialize the product, you can port it down into a single microcontroller or a group of smaller microcontrollers.
A practical example of this illustrates my point. I want to be able to run some simple image capture and analysis code in the submarine so I can determine if an interesting sea beast is swimming past. To do this, I need at least one digital camera from which I can acquire image data for processing. (The reason I spelled out the requirement in those precise terms is to distinguish it from the path of simply attaching a camera with a computer-controlled shutter button -- that's easy to interface and great for acquiring images, but useless for processing them during the mission).
Now, I know that all the navigation and system maintenance functions in my submarine can run on a reasonably frisky, high-end 8-bit microcontroller. You can certainly take an off-the-shelf CMOS image sensor chip and interface it to the same 8-bit micro. It's possible (though difficult and restrictive) to squeeze some image-processing functionality into the microcontroller. But getting this all working is an incredibly complex job. First, you need to find someplace where you can actually buy small hobbyist quantities of an image sensor IC (this step is considerably harder than it sounds; the easiest route is usually to buy a camera and cannibalize it). Then, you need to work out the hardware and software details of interfacing to the sensor. You probably need to convert the data from Bayer pattern to simple RGB; you must handle white balance and exposure, and you might have to solve annoying timing problems in order to get the chip to play nice.(As a free bonus, the realtime performance of the rest of your code suffers).
By this time, you're also probably running up against RAM and ROM size limits inside your 8-bit microcontroller, so you have to start juggling textbook algorithms so that they can work in your system, which is a huge hassle. You don't want to be optimizing your code at the front end of a research project -- you want to develop and test your algorithms, decide if the idea even works at all, and only then make an informed decision as to whether to optimize into a smaller chip or plunk down the cash for a high-end part.
Therefore, it's a much more efficient use of your time to buy a fairly generic hardware platform with an off-the-shelf OS (RTOS or non-RTOS according to your needs) because it will come with driver support for a bunch of standard consumer peripherals. You can go out and buy these cheap peripherals, plug them right in, and start working.
Please note, by the way, that I don't at all advocate using consumer-grade equipment in a fielded commercial product. I've got some years of bitter experience with this approach to manufacturing; be advised that consumer peripherals change every few months, and it is a very frustrating exercise to keep shipping a consistent product if you're relying on a volatile component. Servicing field returns is even harder, because units produced in month A won't contain the same parts as units from month A+1.
The specific camera I have chosen to use for this article is a Jazz Digi-Stix JDC11 "pencam." The same code I provide here will, however, work practically unmodified on many other USB and older parallel port cameras supported by Linux™.
The JDC11 is a very cheap and simple USB-connected camera based on the STV0680 chip (literally dozens of other cameras use the same chip and are practically identical to the JDC11). The STV0680 is one of several Webcam chipsets supported natively by the Linux kernel. It supports the capture of color images at a resolution up to PAL CIF (352x288 pixels, or a quarter of standard TV resolution; 101 kilopixels if you prefer that sort of metric). This might sound horribly low compared to the multimegapixel snapshots expected from modern digital cameras, but in fact this image size is more than satisfactory for tasks like simple visual navigation and motion detection.
Important: Like most pencams, the JDC11 can function either as a tethered Webcam-type device (no batteries required) or as a stand-alone digital camera, running off two AAA cells. If you have the power switch in the "on" position, the camera assumes you want to run in stand-alone mode. To run the code in this article, you want the camera in Webcam mode, so leave the power switch in the "off" position and don't install batteries.
Note that by default, Kuro Box's software bundle does not include the
driver modules for USB cameras. Before going any further, please ensure
that you followed the instructions in the third article of this series,
where I described how to install the complete set of modules from
linkstation.yi.org; remember to depmod -qa to fix module dependencies after
you install the updates.
You'll also need to create nodes in /dev in order to be able to access the
camera driver. The video capture devices /dev/video0 through video3 are
character devices, major 81, minor 0 through 3. Unfortunately, however, the
software distribution on Kuro Box doesn't include mknod(1), so you either
need to download and build that utility, or steal ready-made device nodes
off a working Linux system. In the source code archive linked in
Resources,
I've included a tarball called devices.tar.gz -- simply copy it into the
root directory of your Kuro and tar zxvf it (from the root) in order to
create the necessary nodes.
Once the drivers are installed, you can load them by plugging in the camera
and, if necessary, force-loading the drivers using modprobe stv680 ;
modprobe videodev. Or you can simply restart the Kuro Box with the camera
connected.
Video input devices are supported in Linux through the Video4Linux APIs (commonly abbreviated as V4L). Unfortunately, these APIs don't seem to be very well documented. The canonical reference "document" for this programming interface is the source code for xawtv; if you ask around for help with V4L you'll invariably be told to read the xawtv source. The problem with this is that xawtv is a fiendishly complicated piece of software with a lot of features and workarounds for device-specific oddities, and it's not exactly easy to learn from it.
A further difficulty is that there are two flavors of V4L: the original version (V4L or V4L1), and a newer version called V4L2. The newer version is included in kernel 2.5 and beyond. I'm dealing here with V4L1, mainly because of the vintage of the kernel shipped with Kuro Box and the difficulty and risk of upgrading to a 2.6.x kernel. (V4L2 is, mercifully, somewhat better documented than V4L1, but it's still not the de facto standard yet).
Here is a thumbnail description of how to use V4L to acquire an image from a USB video input device (note that all of these structures are defined in linux/videodev.h):
- Open the appropriate /dev/video device (video0 by default).
- Use the
VIDIOCGCAPioctl to populate avideo_capabilitystructure with information about the device. Among other information, this will tell you the range of supported resolutions and whether the device supports audio capture. - Use the
VIDIOCGPICTioctl to populate avideo_picturestructure with information about the device's current settings. The code I provide here requires that the device can deliver image data in 24bpp RGB format. We ensure this by checking thevp.palettemember and if it's something other thanVIDEO_PALETTE_RGB24, we set it to that value and pass the same video_picture structure to theVIDIOCSPICTioctl. Note that most V4L devices don't support this format in hardware, since it's byte-inefficient. However, the underlying driver can do the conversion in software with a surprisingly low processing overhead. Since you have to do this conversion step anyway, you may as well let the driver do it. By the way, in this step you might also want to modify some other picture parameters such as white level, hue, brightness, and so on. Again, not all devices have hardware support for these settings. - Create a video_window structure describing the desired image capture
size. My code simply picks the largest possible capture window (as returned
in Step 2). Pass this structure to the
VIDIOCSWINioctl. - At this point, you should do a sanity check by using the
VIDIOCGWINioctl to verify that your window request was honored. The reason this is essential is because not all devices support every conceivable combination of resolutions and color formats. Simply because the device told you that it supports a resolution of up to (say) 1024x768 pixels and that it also supports 24bpp RGB data does not imply that it supports both of those options at once. You might be limited to black and white capture if you bump the resolution up to 1024x768, for example. - Allocate RAM for the captured image.
- Use
read(2)on the file descriptor opened in Step 1 to gather the frame. data. The data size to be read, in bytes, is width x height x 3 (RGB). - You may continue reading frames until you don't need any more.
Note that this is not a completely generic description of how to acquire an image from just any old V4L-supported device. Many V4L input devices work differently from the above: They acquire data into an internal buffer (often in your video card's frame buffer RAM), which you have to map into your process's address space. The great thing about the USB cameras is that they can be accessed very simply, as you see above.
A simple program to grab video
If you extract the source code tarball linked in Resources , you'll find two directories: vidcap and vidproc. Ignore vidproc for the moment, and build the application you'll find in vidcap. This application simply looks at /dev/video0, gives you a bunch of information about the device attached there, captures a single image, and saves it in Windows® BMP format to a filename specified on the command line.
I use BMP because it's a very simple file format (in other words, it's easy for other little applets to work with BMPs), and it's lossless. JPEG files would be much smaller, but the quantization noise would make subsequent edge-detection steps very unreliable. Here's a quick description of the BMP file format header.
Listing 1. The BMP header
00 char signature[] = "BM"; // type header 02 unsigned int size; // (32 bits) Size of file, including this header and all data 06 unsigned short reserved1; // (16 bits) Reserved! (0x00) 08 unsigned short reserved2; // (16 bits) Reserved! (0x00) 0A unsigned int bitsoffset; // (32 bits) Offset of bitmap data from start of file (0x36) 0E unsigned int headersize; // (32 bits) Size of BITMAPINFOHEADER (0x28) 12 unsigned int width; // (32 bits) Horizontal pixel count 16 unsigned int height; // (32 bits) Vertical pixel count 1A unsigned short planes; // (16 bits) Number of planes (0x0001 for 24-bit BMPs) 1C unsigned short bitsperpixel; // (16 bits) Number of bits per pixel (0x0018 for 24-bit BMPs) 1E unsigned int compression; // (32 bits) Compression method (0) 22 unsigned int imagesize; // (32 bits) Size of bitmap data area, excluding header 26 unsigned int xres; // (32 bits) Target device x-resolution (0x0b12) 2A unsigned int yres; // (32 bits) Target device y-resolution (0x0b12) 2E unsigned int color_indices; // (32 bits) Number of color indices (N/A - use 0) 32 unsigned int important_indices;// (32 bits) Number of "important" color indices (N/A - use 0) |
For 24-bit BMPs, the actual bitmap data is stored next, in left-to-right order with the bottom scanline first. The first byte is BLUE, the next byte is GREEN, and the next byte is RED data for the leftmost pixel of bottom scanline, then so on to the rightmost pixel of the scanline, followed by the leftmost pixel of the second-from-bottom scanline, and so forth. BMPs at color depths other than 24bpp have a more complex family of formats, which I won't get into here. (Note also that OS/2® has a somewhat different BMP format that isn't compatible with Windows tools).
By the way, avid readers should be aware that the bmplib.c library I've included in these source files is a slightly newer version than the code that appeared in my second book; I optimized the write speed considerably.
So, now that you have a means of acquiring images, see what you can do with them. Since the price of CMOS image sensors and microcontrollers with enough RAM to work with images are both dropping rapidly, many simple machine vision applications are poised to appear in the consumer arena. I have seen demonstrated a chipset costing less than US$5 in production quantity, which can recognize shapes and colors and announce them. For example, you can hold up a green triangle and the unit will say "Green triangle."
My specific area of interest is, however, shape and motion processing. You can find a lot of rather dry and complicated literature on shape recognition; I won't get into that, because the main purpose of this article was to introduce you to acquiring the raw image data. I would, however, like to focus on one of the building blocks of shape processing, which is edge detection. Find a simple demo of the sort of preprocessor you might build in the vidproc directory.
Here's a sample image (a picture of a floppy disk) before and after being processed:
Figure 1. Floppy in good lighting
Figure 2. Floppy in good lighting, processed
Now, here's a copy of the same image, shot in lower light conditions, with an LED flashlight aimed at part of the picture to confuse things.
Figure 3. Floppy in poor lighting
Figure 4. Floppy in poor lighting, processed
The important points to glean from these images are:
- The object has been successfully picked out of its background despite the fact that the background has noise introduced by the low-quality image sensor and the uneven lighting.
- The low-light image looks very different from the bright-light image before processing, but after processing, the two look almost identical (there's slightly more noise in the low-light image, but nothing major).
- The LED flashlight had absolutely no effect on the processed images despite being clearly visible in the unprocessed low-light image.
To see how this was achieved, look at the FindEdgeScanline function in
edge.c. This function works by looking at adjacent pixels in a scanline in
groups of three. First, it converts the color image to an unweighted
grayscale value; this is a simple arithmetic average. (More accurate
results could be obtained by using a weighted average calibrated for the
color sensitivity of the image sensor).
Next, it turns the three pixels into two derivative values representing the gradient between pixels 1 and 2, and the gradient between pixels 2 and 3. At this point, a special "fudge" factor is added; if the absolute value of the gradient is less than 8, it is deemed to be zero. This is kind of a sharpness factor; the larger the fudge factor, the sharper an edge has to be to remain visible in the processed output. (This fudge factor erased the smooth gradient of the LED flashlight beam).
Finally, the code looks at the difference between these two derivatives (or, it finds the second derivative). If there is a change in sign of this second derivative, the corresponding output pixel is made black, indicating an edge. If there is no change in sign, the output pixel is made white. The end result is a fairly good separation of objects (except lines parallel to the top and bottom of the frame). Basically, what we are doing is taking the second derivative (arithmetically), quantizing the result, and looking for a change of sign.
Great! Now you have a submarine that can see the world around it, and that you can connect to (and, by implication, control) using a Web browser. In the next article, you'll see how to start building some circuits, so limber up your soldering iron. I'll introduce the "real" hardware block diagram of the E-2 submarine and start building both sides of the interface firmware that drives the sensors and actuators in the vehicle.
| Description | Name | Size | Download method |
|---|---|---|---|
| Source code | pa-migrate6code.tar.gz | 7 KB | HTTP |
Information about download methods
- Participate in the discussion forum.
-
Migrating from x86 to PowerPC is the only
developerWorks Power Architecture technology series on the entire Internet
that will help you build your own remote-controlled robot submarine army.
Missed a previous installment? Don't dismay: it's astonishingly easy to
read them all now.
- Need to
hack a serial
port onto your Kuro box? Lewin has posted all of the details to
his site.
- The xawtv homepage
is purportedly the best reference material you'll find on using the
Video4Linux APIs.
- The datasheet for the STV0680
is no longer available at ST's US web site, but you can download it from their
Japanese web site, as
well as from a number of subscription-only datasheet archives.
- If you're determined to go down to the bare metal,
this interesting site describes
how to interface an
OV6620-based camera module directly to an Atmel
ATmega16 microcontroller.
- Again for the bare-metal mavens, this article discusses
the native pixel
format used by many digital imaging devices (and practically all of
the cheap image sensors).
- The
RAD6000 microprocessor
that Lewin mentioned is a rad-hardened version of the
IBM RS/6000 microprocessor,
an ancestor of the PowerPC, and an early member of the
Power Architecture family.
- The hardening was done by RAD6000 Space Computers, who
say "There is, of course, now intelligent life on Mars:
We put
it there." Also some interesting discussion of the mission, from back
in the day, and a
fact
sheet about the RAD60000.
- Edge detection isn't just used for robot submarines. Image analysis
can be useful in data hiding as well.
- Once Lewin is ready to mass-produce his robot submarines for purposes
either commercial or nefarious, he may want
to contact
IBM
Silicon Solutions (nee IBM Microelectronics)
for customized CMOS image sensor chips
with a 2.5-micron copper stack incorporating an on-chip color filter and
microlens.
Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years developing x86, ARM and PA-RISC-based networked multimedia appliances at Digi-Frame Inc. He has extensive experience in encryption and security software and is the author of two books on embedded systems development. He can be reached at sysadm@zws.com.
Comments (Undergoing maintenance)




