 | Level: Introductory Lewin Edwards (sysadm@zws.com), Author, Freelance
21 Jul 2005 In this episode of the ongoing Kuro Box project, learn how to add a USB camera to the machine. This article includes example Linux code to initialize and read from a USB camera through Video4Linux. Also find a brief introduction to edge detection techniques in captured images. The last few articles in this series discussed topics that fall under
the heading of "infrastructure." You've seen how the Kuro Box is put
together, from a hardware and software perspective, and how to build the
scaffolding that lets you talk to it over TCP/IP using a Web browser or
FTP client.
Part 6 looks at some code that has a real, observable function in
the submarine project, demonstrating how to interface
and talk to a digital camera connected to the PowerPC® board over USB. I'll
also share with you some introductory material about image preprocessing
techniques used in machine vision applications. (Don't worry, I'll keep the
math very light).
Why such an overpowered processor?
But first, this is an appropriate moment to explain further my rationale
for using a PowerPC (or, in the original incarnation, an x86 single-board
computer) in my submarine. One major downside of these high-end processors
in a battery-powered application is that they're much more energy-hungry
than a simple 8-bit microcontroller. And anyone who has built a vehicle
application like this (say, NASA) knows you can do an awful
lot of navigation and autonomous control with an 8-bit chip; Sojourner, the
first Mars rover, ran on an Intel® 80C85. (Note, however, that the two
rovers currently prowling the surface of Mars are controlled by the RAD
6000, a radiation-hardened RS/6000 variant. The Pathfinder probe that
carried Sojourner also ran on this processor).
So, why did I build a hungry 32-bit hippo into my submarine? The main
reason is because it's vastly easier to interface standard consumer
peripherals to these standardized platforms. This means that it's much
cheaper and considerably faster to build a prototype, working
demonstration, or one-off production piece if you build it around a
standard hardware platform running an off-the-shelf operating system.
Moreover, while you're in the proof-of-concept phase of any complex
embedded project that has a science aspect (as opposed to pure
engineering), you probably don't have a perfect understanding of the CPU
resources required to carry out the tasks you'd like the device to do.
Using an over-powered system allows you to get a better feel for the data
sizes and processing requirements in the application. You can develop your
code rapidly in a high-level language like C, and when you're ready to
commercialize the product, you can port it down into a single
microcontroller or a group of smaller microcontrollers.
A practical example
A practical example of this illustrates my point. I want to be
able to run some simple image capture and analysis code in the submarine
so I can determine if an interesting sea beast is swimming past. To do
this, I need at least one digital camera from which I can acquire image
data for processing. (The reason I spelled out the requirement in those
precise terms is to distinguish it from the path of simply attaching a
camera with a computer-controlled shutter button -- that's easy to interface
and great for acquiring images, but useless for processing them during the
mission).
Now, I know that all the navigation and system maintenance functions in my
submarine can run on a reasonably frisky, high-end 8-bit microcontroller.
You can certainly take an off-the-shelf CMOS image sensor chip and
interface it to the same 8-bit micro. It's possible (though difficult and
restrictive) to squeeze some image-processing functionality into the
microcontroller. But getting this all working is an incredibly complex job. First, you need to find someplace where you can actually buy small
hobbyist quantities of an image sensor IC (this step is considerably harder
than it sounds; the easiest route is usually to buy a camera and
cannibalize it). Then, you need to work out the hardware and software
details of interfacing to the sensor. You probably need to convert the data
from Bayer pattern to simple RGB; you must handle white balance and exposure, and you might have to solve annoying timing problems in order to
get the chip to play nice.(As a free bonus, the realtime performance of the
rest of your code suffers).
By this time, you're also probably running up against RAM and ROM size
limits inside your 8-bit microcontroller, so you have to start juggling
textbook algorithms so that they can work in your system, which is a huge
hassle. You don't want to be optimizing your code at the front end of a
research project -- you want to develop and test your algorithms, decide if
the idea even works at all, and only then make an informed decision as to
whether to optimize into a smaller chip or plunk down the cash for a
high-end part.
Therefore, it's a much more efficient use of your time to buy a fairly
generic hardware platform with an off-the-shelf OS (RTOS or non-RTOS
according to your needs) because it will come with driver support for a
bunch of standard consumer peripherals. You can go out and buy these cheap
peripherals, plug them right in, and start working.
Please note, by the way, that I don't at all advocate using consumer-grade
equipment in a fielded commercial product. I've got some years of bitter
experience with this approach to manufacturing; be advised that consumer
peripherals change every few months, and it is a very frustrating exercise
to keep shipping a consistent product if you're relying on a volatile
component. Servicing field returns is even harder, because units produced
in month A won't contain the same parts as units from month A+1.
Introducing the USB pencam
The specific camera I have chosen to use for this article is a Jazz
Digi-Stix JDC11 "pencam." The same code I provide here will, however, work
practically unmodified on many other USB and older parallel port cameras
supported by Linux™.
The JDC11 is a very cheap and simple USB-connected camera based on the
STV0680 chip (literally dozens of other cameras use the same
chip and are practically identical to the JDC11). The STV0680 is one of
several Webcam chipsets supported natively by the Linux kernel. It supports
the capture of color images at a resolution up to PAL CIF (352x288 pixels,
or a quarter of standard TV resolution; 101 kilopixels if you prefer that
sort of metric). This might sound horribly low compared to the
multimegapixel snapshots expected from modern digital cameras, but in
fact this image size is more than satisfactory for tasks like simple visual
navigation and motion detection.
Important: Like most pencams, the JDC11 can function either as a tethered
Webcam-type device (no batteries required) or as a stand-alone digital
camera, running off two AAA cells. If you have the power switch in the "on"
position, the camera assumes you want to run in stand-alone mode. To run the code in this article, you want the camera in Webcam mode, so
leave the power switch in the "off" position and don't install batteries.
Note that by default, Kuro Box's software bundle does not include the
driver modules for USB cameras. Before going any further, please ensure
that you followed the instructions in the third article of this series,
where I described how to install the complete set of modules from
linkstation.yi.org; remember to depmod -qa to fix module dependencies after
you install the updates.
You'll also need to create nodes in /dev in order to be able to access the
camera driver. The video capture devices /dev/video0 through video3 are
character devices, major 81, minor 0 through 3. Unfortunately, however, the
software distribution on Kuro Box doesn't include mknod(1), so you either
need to download and build that utility, or steal ready-made device nodes
off a working Linux system. In the source code archive linked in
Resources,
I've included a tarball called devices.tar.gz -- simply copy it into the
root directory of your Kuro and tar zxvf it (from the root) in order to
create the necessary nodes.
Once the drivers are installed, you can load them by plugging in the camera
and, if necessary, force-loading the drivers using modprobe stv680 ;
modprobe videodev. Or you can simply restart the Kuro Box with the camera
connected.
Video4Linux and you
Video input devices are supported in Linux through the Video4Linux APIs
(commonly abbreviated as V4L). Unfortunately, these APIs don't seem to be
very well documented. The canonical reference "document" for this
programming interface is the source code for xawtv; if you ask around for
help with V4L you'll invariably be told to read the xawtv source. The
problem with this is that xawtv is a fiendishly complicated piece of
software with a lot of features and workarounds for device-specific
oddities, and it's not exactly easy to learn from it.
A further difficulty is that there are two flavors of V4L:
the original version (V4L or V4L1), and a newer version called V4L2. The
newer version is included in kernel 2.5 and beyond. I'm dealing here with
V4L1, mainly because of the vintage of the kernel shipped with Kuro Box and
the difficulty and risk of upgrading to a 2.6.x kernel. (V4L2 is,
mercifully, somewhat better documented than V4L1, but it's still not the de
facto standard yet).
Here is a thumbnail description of how to use V4L to acquire an image from a
USB video input device (note that all of these structures are defined in
linux/videodev.h):
- Open the appropriate /dev/video device (video0 by default).
- Use the
VIDIOCGCAP ioctl to populate a video_capability structure with
information about the device. Among other information, this will tell you
the range of supported resolutions and whether the device supports
audio capture.
- Use the
VIDIOCGPICT ioctl to populate a video_picture structure with
information about the device's current settings. The code I provide here
requires that the device can deliver image data in 24bpp RGB format. We
ensure this by checking the vp.palette member and if it's something other
than VIDEO_PALETTE_RGB24, we set it to that value and pass the same
video_picture structure to the VIDIOCSPICT ioctl. Note that most V4L
devices don't support this format in hardware, since it's byte-inefficient.
However, the underlying driver can do the conversion in software with a
surprisingly low processing overhead. Since you have to do this conversion
step anyway, you may as well let the driver do it. By the way, in this step
you might also want to modify some other picture parameters such as white
level, hue, brightness, and so on. Again, not all devices have hardware
support for these settings.
- Create a video_window structure describing the desired image capture
size. My code simply picks the largest possible capture window (as returned
in Step 2). Pass this structure to the
VIDIOCSWIN ioctl.
- At this point, you should do a sanity check by using the
VIDIOCGWIN ioctl
to verify that your window request was honored. The reason this is
essential is because not all devices support every conceivable combination
of resolutions and color formats. Simply because the device told you that
it supports a resolution of up to (say) 1024x768 pixels and that it also
supports 24bpp RGB data does not imply that it supports both of those
options at once. You might be limited to black and white capture if you
bump the resolution up to 1024x768, for example.
- Allocate RAM for the captured image.
- Use
read(2) on the file descriptor opened in Step 1 to gather the frame.
data. The data size to be read, in bytes, is width x height x 3 (RGB).
- You may continue reading frames until you don't need any more.
Note that this is not a completely generic description of how to acquire an
image from just any old V4L-supported device. Many V4L input devices work
differently from the above: They acquire data into an internal buffer
(often in your video card's frame buffer RAM), which you have to map into
your process's address space. The great thing about the USB cameras is that
they can be accessed very simply, as you see above.
A simple program to grab video
If you extract the source code tarball linked in Resources , you'll find two
directories: vidcap and vidproc. Ignore vidproc for the moment, and build
the application you'll find in vidcap. This application simply looks at
/dev/video0, gives you a bunch of information about the device attached
there, captures a single image, and saves it in Windows® BMP format to a
filename specified on the command line.
I use BMP because it's a very simple file format (in other words, it's
easy for other little applets to work with BMPs), and it's lossless. JPEG
files would be much smaller, but the quantization noise would make
subsequent edge-detection steps very unreliable. Here's a quick description
of the BMP file format header.
Listing 1. The BMP header
00 char signature[] = "BM"; // type header
02 unsigned int size; // (32 bits) Size of file, including this
header and all data
06 unsigned short reserved1; // (16 bits) Reserved! (0x00)
08 unsigned short reserved2; // (16 bits) Reserved! (0x00)
0A unsigned int bitsoffset; // (32 bits) Offset of bitmap data from
start of file (0x36)
0E unsigned int headersize; // (32 bits) Size of BITMAPINFOHEADER (0x28)
12 unsigned int width; // (32 bits) Horizontal pixel count
16 unsigned int height; // (32 bits) Vertical pixel count
1A unsigned short planes; // (16 bits) Number of planes (0x0001 for
24-bit BMPs)
1C unsigned short bitsperpixel; // (16 bits) Number of bits per pixel
(0x0018 for 24-bit BMPs)
1E unsigned int compression; // (32 bits) Compression method (0)
22 unsigned int imagesize; // (32 bits) Size of bitmap data area,
excluding header
26 unsigned int xres; // (32 bits) Target device x-resolution
(0x0b12)
2A unsigned int yres; // (32 bits) Target device y-resolution
(0x0b12)
2E unsigned int color_indices; // (32 bits) Number of color indices (N/A -
use 0)
32 unsigned int important_indices;// (32 bits) Number of "important" color
indices (N/A - use 0)
|
For 24-bit BMPs, the actual bitmap data is stored next, in left-to-right
order with the bottom scanline first. The first byte is BLUE, the next byte
is GREEN, and the next byte is RED data for the leftmost pixel of bottom
scanline, then so on to the rightmost pixel of the scanline, followed by
the leftmost pixel of the second-from-bottom scanline, and so forth. BMPs at
color depths other than 24bpp have a more complex family of formats, which
I won't get into here. (Note also that OS/2® has a somewhat different BMP
format that isn't compatible with Windows tools).
By the way, avid readers should be aware that the bmplib.c library I've
included in these source files is a slightly newer version than the code
that appeared in my second book; I optimized the write speed considerably.
From pixels to objects
 |
Why BMP?
Uncompressed BMPs are almost nothing more than a dump of display memory
with a little header that describes the dimensions. You can load a BMP
into RAM without doing any decoding whatsoever, manipulate the pixel data,
and write it out again -- it's a good choice for little proto-applets that
do algorithmic magic on images.
TIFF doesn't give great compression ratios on photographic images, and
both TIFF and PNG are complex to decode. Of course, libraries can do the dirty work, but I don't see a reason to invoke them in this sort of
situation.
|
|
So, now that you have a means of acquiring images, see what you can do with
them. Since the price of CMOS image sensors and microcontrollers with
enough RAM to work with images are both dropping rapidly, many
simple machine vision applications are poised to appear in the
consumer arena. I have seen demonstrated a chipset costing less than US$5 in
production quantity, which can recognize shapes and colors and announce
them. For example, you can hold up a green triangle and the unit will say
"Green triangle."
My specific area of interest is, however, shape and motion processing.
You can find a lot of rather dry and complicated literature on shape
recognition; I won't get into that, because the main purpose of this
article was to introduce you to acquiring the raw image data. I would,
however, like to focus on one of the building blocks of shape processing,
which is edge detection. Find a simple demo of the sort of preprocessor you
might build in the vidproc directory.
Here's a sample image (a picture of a floppy disk) before and after being
processed:
Figure 1. Floppy in good lighting
Figure 2. Floppy
in good lighting, processed
Now, here's a copy of the same image, shot in lower light conditions, with
an LED flashlight aimed at part of the picture to confuse things.
Figure 3. Floppy in poor lighting
Figure 4. Floppy in poor lighting, processed
The important points to glean from these images are:
- The object has been successfully picked out of its background despite the
fact that the background has noise introduced by the low-quality image
sensor and the uneven lighting.
- The low-light image looks very different from the bright-light image
before processing, but after processing, the two look almost identical
(there's slightly more noise in the low-light image, but nothing major).
- The LED flashlight had absolutely no effect on the processed images
despite being clearly visible in the unprocessed low-light image.
To see how this was achieved, look at the FindEdgeScanline function in
edge.c. This function works by looking at adjacent pixels in a scanline in
groups of three. First, it converts the color image to an unweighted
grayscale value; this is a simple arithmetic average. (More accurate
results could be obtained by using a weighted average calibrated for the
color sensitivity of the image sensor).
Next, it turns the three pixels into two derivative values representing the
gradient between pixels 1 and 2, and the gradient between pixels 2 and 3.
At this point, a special "fudge" factor is added; if the absolute value of
the gradient is less than 8, it is deemed to be zero. This is kind of a
sharpness factor; the larger the fudge factor, the sharper an edge has to
be to remain visible in the processed output. (This fudge factor erased the smooth gradient of the LED flashlight beam).
Finally, the code looks at the difference between these two derivatives
(or, it finds the second derivative). If there is a change in sign of this
second derivative, the corresponding output pixel is made black, indicating
an edge. If there is no change in sign, the output pixel is made white. The
end result is a fairly good separation of objects (except lines parallel to
the top and bottom of the frame). Basically, what we are doing is taking
the second derivative (arithmetically), quantizing the result, and looking
for a change of sign.
Great! Now you have a submarine that can see the world around it, and that
you can connect to (and, by implication, control) using a Web browser. In
the next article, you'll see how to start building some circuits, so limber up
your soldering iron. I'll introduce the "real" hardware block diagram of
the E-2 submarine and start building both sides of the interface
firmware that drives the sensors and actuators in the vehicle.
Download | Description | Name | Size | Download method |
|---|
| Source code | pa-migrate6code.tar.gz | 7 KB | HTTP |
|---|
Resources - Participate in the discussion forum.
-
Migrating from x86 to PowerPC is the only
developerWorks Power Architecture technology series on the entire Internet
that will help you build your own remote-controlled robot submarine army.
Missed a previous installment? Don't dismay: it's astonishingly easy to
read them all now.
- Need to
hack a serial
port onto your Kuro box? Lewin has posted all of the details to
his site.
- The xawtv homepage
is purportedly the best reference material you'll find on using the
Video4Linux APIs.
- The datasheet for the STV0680
is no longer available at ST's US web site, but you can download it from their
Japanese web site, as
well as from a number of subscription-only datasheet archives.
- If you're determined to go down to the bare metal,
this interesting site describes
how to interface an
OV6620-based camera module directly to an Atmel
ATmega16 microcontroller.
- Again for the bare-metal mavens, this article discusses
the native pixel
format used by many digital imaging devices (and practically all of
the cheap image sensors).
- The
RAD6000 microprocessor
that Lewin mentioned is a rad-hardened version of the
IBM RS/6000 microprocessor,
an ancestor of the PowerPC, and an early member of the
Power Architecture family.
- The hardening was done by RAD6000 Space Computers, who
say "There is, of course, now intelligent life on Mars:
We put
it there." Also some interesting discussion of the mission, from back
in the day, and a
fact
sheet about the RAD60000.
- Edge detection isn't just used for robot submarines. Image analysis
can be useful in data hiding as well.
- Once Lewin is ready to mass-produce his robot submarines for purposes
either commercial or nefarious, he may want
to contact
IBM
Silicon Solutions (nee IBM Microelectronics)
for customized CMOS image sensor chips
with a 2.5-micron copper stack incorporating an on-chip color filter and
microlens.
About the author  | |  | Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years
developing x86, ARM and PA-RISC-based networked multimedia appliances at
Digi-Frame Inc. He has extensive experience in encryption and security
software and is the author of two books on embedded systems development.
He can be reached at sysadm@zws.com.
|
Rate this page
|  |