In the not-too-distant future, we are likely to rely on a rapidly evolving machine-to-machine infrastructure with advanced sensing that closely rivals (and in some cases, surpasses) human capabilities. Intelligent transportation systems and self-driving cars have already been demonstrated. Toyota and Audi are joining Google in field testing similar technology in Nevada. Today, the autonomous operations are enabled by using costly light-detection and ranging (LIDAR) technology, in addition to simpler instruments and software. But costs for this instrumentation are decreasing, and the potential for visible-spectrum computer vision solutions can also lower the cost of implementation.
Even more interesting are applications such as using vehicles to transmit fleet behavior to optimize traffic flow and using vehicles to communicate with infrastructure such as traffic control systems (stop lights, for example), airport air traffic control equipment, and infrastructure sensors, automation, or control systems found in buildings, on roadways, ports, airports, and transportation systems, in general.
As is already the case, no one needs to stop or even slow down for toll collection. This technology is known as machine-to-machine automation.
Congested and heavily traveled corridors such as Interstate 5 in California could benefit from fleet automation for trucking. Private vehicles might be able to take advantage of this function in less than a decade (see Resources). The Google car has now been around for more than eight years, since Sebastian Thrun's team at Stanford University won the 2005 Defense Advanced Research Projects Agency Challenge, and work goes on. In places much less congested, such as Alaska, similar benefits can come from machine-to-machine advanced sensing to monitor arctic operations, to provide better safety and security for ports, to do environmental surveys and protect resources, and to explore for energy on the north slope (see Resources).
The largest payoff for machine-to-machine systems is likely to be realized in some of the most remote, harsh, and congested environments. In these locations, the savings from automation will most quickly outweigh the initial costs. For remote applications where infrastructure is much more limited, drop-in-place self-powering safety and security monitors have the highest value.
Drop-in-place sensor networks (see Resources) have also been in development for some time. Perhaps the best known are sensor networks often called motes, such as the Berkeley Network Embedded Systems Technology (NEST) and Smart Dust projects (see Resources). Likewise, machine vision has been in use for decades, with automation that employs the full spectrum, from thermal imaging to x-ray. Machine vision often surpasses human ability to inspect fabrication processes and improve safety with remote operations.
This article focuses on drop-in-place deployment of low-cost 3D and multi-spectral imaging for machine-to-machine systems.
The value of drop-in-place computer vision
Many safety and security applications that can benefit from computer and machine vision exist outside urban environments with power and data network infrastructure. A simple example is field operations such as those found on the north slope of Alaska for energy production and exploration, as well as for the pipeline (and proposed new natural gas pipeline). Another example is disaster areas where power and data infrastructure has been damaged. Environmental compliance, resource management, and surveys (handled by the U.S. Geological Survey in the United States; see Resources) involve field data collection. Finally, construction projects where power and data might not be fully developed can benefit from drop-in-place computer vision.
In the past, safety and security services relied on piloted aircraft, human hikers, and arduous land navigation at high cost and low frequency. To provide truly useful data, a drop-in-place computer vision platform must rival and even surpass human observation by extending vision into the infrared spectrum and by employing 3D imaging to produce ranging and point-cloud models of environments (see Resources). Point clouds provide a 3D data model of a scene based on multiple camera observations, often with binocular cameras or multiple, coordinated viewpoints, located at the same scene, from many cameras. Any camera that can be dropped in place (perhaps on a tripod) or mounted on a tree can be adapted for use in unmanned, aerial systems. Imagine a multi-spectral 3D camera similar to the popular GoPro wearable cameras (see Resources), but with more sophisticated detectors and computational capability built in.
Along with some research sponsors, and graduate and undergraduate students at the University of Colorado and University of Alaska Anchorage, I have been working on this type of device. Although you can create such a device with off-the-shelf hardware and software, our team decided we needed something better. The following sections describe the steps and required technology.
How to build a drop-in-place 3D camera
Building an off-the-shelf, drop-in-place, 3D camera is fairly straightforward. To make the device practical, however, the power requirements, size, uplink mechanism, data storage, and intelligence of the computer vision processing must be workable. To build this device with open source software and off-the-shelf hardware, you must integrate the following components:
- Computational photometer open hardware can be built readily using the BeagleBoard xM platform from Texas Instruments running Linux®. In fact, I preconfigured a pre-built distribution based on Ubuntu and Debian for this purpose (see Resources). I use this simple recipe as a platform for mobile, embedded computer vision teaching and lab work for university classes. The xM platform has a built-in digital camera port for HD cameras.
- Solid-state recording with the BeagleBoard xM simply requires file system space on the Linux file system or an external USB flash drive with proper codecs (encoding and decoding software that uses Texas Instruments Open Multimedia Applications Platform [TI-OMAP] hardware acceleration to compress digital video and images with MPEG and JPEG, respectively). The reference image available in the Downloads section includes FFmpeg (also known as avconv) and OpenCV.
- GStreamer streams video off the open computational photometer that either can't be stored or must be stored in the cloud. This is a new feature that student researchers working on the Open Computational Photometer project intend to integrate in future reference Linux configurations.
- Integration of field-programmable gate array (FPGA) or GPU co-processing for the camera interface is a key feature to support low-power, highly concurrent pixel transformations for the Open Computational Photometer concept. The project uses the Altera Development and Education boards DE0 and DE2i. This work is unfinished, but you can explore on you own or look for updates from the project in 2014 (see Resources). This integration step is tricky but important and has required the Open Computational Photometer project at the University of Colorado and University of Alaska Anchorage to build custom hardware that will be released as an open hardware reference design by our research sponsors.
- Battery power for the BeagleBoard xM TI-OMAP and FPGA co-processing is required for the drop-in-place aspect of the open computational photometer. Luckily, many mobile Linux users have already invented battery power options for the BeagleBoard xM. Likewise, Altera has suitable battery-powered solutions for the DE0 Nano (see Resources).
- Wireless uplink has simple off-the-shelf USB device solutions, including Xbee, ZigBee, and cellular modem GSM (see Resources). The challenge is to enable wireless uplink from really remote locations at MPEG transport bandwidths ranging from 1Mbps and up for standard-definition video. Higher compression available from H.265 (standard released in 2013) and H.264 will help, but uplinks will be lossy video. Urban deployments will of course be much easier and might simply use an 802.11 USB adapter and an urban wireless hotspot.
- Advanced Linux power management and awareness is also required for the drop-in-place aspect of the open computational photometer. Significantly more work is needed than is presented here, but starting points are general-purpose input/output (I/O) to turn off external devices when not in use (see Resources). A primary reason for using the Altera DE0 and DE0 Nano for the image processing features in the open computational photometer is for higher-efficiency frame transformation compared with digital camera port or USB cameras.
- Custom binocular camera interface board is the one
element that is not off the shelf. The University of Colorado and
University of Alaska Anchorage team is working to build this board as
open hardware. It is not available to the public yet, but we intend to
publish a reference design and would like to identify a manufacturer
that might make it available for purchase through a university program
and for developers, much like the DE0. Look for it in 2014, after we
test our first revision. The team came to the conclusion to build a
camera board for the following reasons:
- Fully open design down to the signal level (for education)
- Fully time-coordinated image capture from two or more cameras for accurate time and image registration
- Reliable frame rates and buffer delays
This is by no means an exhaustive list of components required for computational photometer technology, but you can find more to explore in Resources. The goal here is to reset your thinking on computational photography and photometry, to lower cost, and to make computational 3D and multi-spectral drop-in-place instrumentation more accessible for education and research.
Design concept for the computational photometer
In the embedded systems and Capstone programs at many universities, students often build stereo vision systems based on frame grabbers (and a VxWorks driver I adapted from Linux for the Bt878).
Capstone students build custom interface solutions as well, but the cameras are often difficult to integrate (at full frame rate), difficult to embed for mobile use, or simply difficult to deal with because they are proprietary hardware with poor documentation. Additional challenges of low-cost off-the-shelf webcams and mobile cameras include limited options for advanced optics configurations (most lenses are built in, and you have to live with them), awkward packaging, and frame encoding that makes computer vision a challenge (for example, MJPEG with no raw RGB or YUV output).
The old analog National Television System Committee (NTSC) cameras and frame grabbers like the Bt878 remained one of the better options based on large-gamut NTSC color, consistent frame rate, low-level driver integration for direct memory access (DMA) and interrupts, and a huge number of options for optics and complementary metal-oxide semiconductor (CMOS)/charge-coupled device (CCD) detectors that ranged from $10 (U.S.) up to $1,000 (U.S.) but with clear optical advantages for the higher-cost cameras. The webcam is a poor replacement — proprietary, fixed optics; inconsistent frame rates and buffering; limited documentation; and pixel color formats that are lossy compared with using Alpha, Red, Green, and Blue (ARGB) channels for color pixel illumination (YCbCr, for example). Furthermore, for research and education, it would be ideal to feed captured pixel data directly through a FIFO, with data transformation provided by highly concurrent FPGA state machine operations and with highly accurate stereo vision two-channel coordination, time-stamping, and registration.
None of this seems to be available off the shelf, so we started a project to build a two-channel analog camera capture board (using Texas Instruments TVP decoders) for the DE line of Altera FPGA boards. The goal of this project is to use open hardware that the research and education community can use for computational photometry. Alternatives like Camera Link (see Resources) are available for higher-end Altera DE4 boards using daughter cards, but this solution is expensive and not great for students. The project has as a goal a cost for the two-channel card that is no more than a textbook.
Figure 1 shows the conceptual design for the open computational photometer. The key feature to note is that the analog cameras feed directly through dual Texas Instruments analog video decoders into the DE0 or DE2i I/O header and into an FPGA FIFO. This configuration allows the open computational photometer FGPA design to transform data in the FIFO and to produce associated metadata-like timestamps for cross-link on the DE2i over PCIe to the mobile Intel Atom multi-core microprocessor or over dual USB 2.0 links to the TI-OMAP BeagleBoard processors (or any Linux laptop, for that matter). The open computational photometer includes an update to the USB Video Class driver (UVC) to allow this custom binocular camera to interface easily with Linux. Much more technical information will be published in the future, and the reference design will be made available through a university program working with our industry sponsors. Exactly how this will work remains to be determined, but we are planning to test and integrate in spring 2014 and want to release sometime in 2014.
Figure 1. The partially off-the-shelf open computational photometer: Design concept
So why not just use webcams, OpenCV, and Linux for the computational photometer?
OpenCV, the Open Computer Vision application programming interface (API) developed by Intel and returned to open source, came about because of the observation that universities researching computer vision and interactive systems benefited greatly from reusable algorithms for image processing. Based on the numerous excellent references for OpenCV, I have created a stereo webcam example (see Downloads).
The example works, but as you'll probably see, the frame rate is not a deterministic 29.97Hz, such as color NTSC closed-circuit cameras provide. The frame rate tends to vary. Second, all of the image transformation is done after the data has been accessed through direct memory access (DMA) from the webcam to the Linux CPU. Assume that they are roughly synchronized in time (probably a safe assumption unless it is a high-motion scene). The real problem with CPU Canny edge transformations, even blurring, sharpening, and more advanced transformations like Hough linear, is that CPUs weren't optimized for this processing. Much like a GPU offloads a CPU from rasterization of render data with vector processing and purpose-built multi-core steam processors, we envision a CVPU, and so does Khronos (OpenVX).
This purpose-built co-processing to offload OpenCV is a huge advantage. OpenCV has hardware acceleration features that use GPUs and general-purpose GPUs, but why not apply the processing directly on the path from the cameras rather than doing I/O from the camera to the CPU out to a GPU back to the CPU. A coprocessor that directly interfaces with low-cost cameras is highly useful — and possible with Camera Link and DE4 daughter cards, but not for the cost of a textbook (comparable with a high-end webcam). For now, let's proceed with the webcam and OpenCV on a Linux laptop (costs $60 at most, if you don't already have a webcam) to explore more. I used Logitech C270 cameras (see Resources).
Frame acquisition and disparity image generation using OpenCV and webcams
Another challenge with webcam stereo vision is computation of intrinsics and extrinsics for the cameras, which is needed to compute distances to objects based on binocular disparity. This process is complicated and requires observations of reference scenes (the OpenCV chessboard) to calibrate camera pixel coordinates to physical coordinates, compute dimensions of the camera optics (detector size and focal length), and account for any out-of-plane orientation of the two focal planes separated on a baseline.
Figure 2 depicts the stereo vision ranging calculation for perfectly planar cameras (detectors lie in the same plane separated by a known baseline, with known detector size and focal lengths that are identical). Non-linearities in the cameras and distortions (spherical aberration — fish-eye or hourglass scene distortions) as well as out-of-plane detectors and differences in the camera detectors and focal lengths all cause significant error in the simple triangulation. Furthermore, for a webcam, you might have trouble finding the focal length and detector size (except by indirect characterization of the camera). Chapter 12 of Learning OpenCV and numerous excellent OpenCV references cover this in much more detail (see Resources).
Figure 2. Simple planar stereo triangulation
Figure 2 shows the computation for distance to an object registered on each camera detector with a common center pixel. (Some processing is required to find the common point. This topic is not covered in this article, but you can learn more in OpenCV samples/cpp/stereo_match.cpp.) The diagram is conceptual and only works for perfectly aligned cameras and focal planes (a feat that is not really possible), but the figure shows a good concept for the basic math to derive distance to objects observed by a binocular camera, such as the one proposed here or by two simple web cams. Chapter 12 of Learning OpenCV and stereo_match.cpp provide much better computational examples; in fact, stereo_match.cpp can produce a point cloud from a left and right image. A good model for the optical intrinsics and extrinsics for binocular cameras is a huge help, along with accurate alignment. But even with a better optical design, characterization and calibration of each camera is still required and well supported by OpenCV.
Similar to stereo ranging, two cameras (or two eyes) see a common object offset on the left and right image, proportional to the distance to the observed (common) object from the camera baseline. This is a major distance cue that allows for fine judgement of distance to objects that are close and much less accurate judgement for objects far away. As shown in Figure 3, it is possible to compute a disparity image that shades a gray map image based on the disparity between common points in a left and right image. The example has many errors resulting from misalignment, blurring in the right image because of a rapidly moving subject (my 3-year-old son) and lack of calibration, but it does at least produce a disparity estimation. With significantly more work required to learn the parameters in the disparity algorithms in OpenCV and to calibrate and align cameras using tripods or an optical bench, we could get a good disparity map.
Figure 3. Right, left, and disparity image from webcam software
The main point of the stereo_capture.cpp download (see Downloads in example-stereo.zip) is to enable you to experiment with and learn OpenCV but also to underscore the value of a high-quality, low-cost binocular camera that is not proprietary and includes a computational coprocessor. As described, webcam frame rates are often not reliable.
Figure 4. Low and inconsistent frame rates typical of web cams
The goal for the computational photometer effort under way at the University of Colorado Boulder and University of Alaska Anchorage is to create a reference design for use by other researchers and educators interested in computer vision and instrument building and design. The emphasis is visible 3D image capture, but we hope this opens up more academic open hardware work to include multi-spectral computational photometers, as well, especially infrared. This could be useful, for example, in Alaska for ice observation in the arctic. Watch the comments section of this article for updates.
Drop-in-place advanced computer vision systems are valuable for use in safety and security applications. Join us in supporting open hardware and the Open Computational Photometer project through soon-to-be announced university and developer programs. Even if our design does not work perfectly, much like open source Linux, everyone will have access to improve it. High-quality, affordable stereo computer vision will be accessible to all. The value, of course, goes beyond safety and security to include applications such as agriculture (crop damage assessment from unmanned aerial vehicles and automated pesticide application, for example). For geophysical surveys and monitoring, the ability to create high-quality 3D images and models is valuable to waterway monitoring, volcanoes, forests, and ecosystems. For remote social interaction, these devices are valuable for telemedicine and social networking applications.
One challenge that remains is how to present and display 3D information. To that end, the Point Cloud Library offers options as well as a growing pool of consumer displays. Explore these topics further and consider suggestions for low-cost hardware accelerated computational photometry.
The use of drop-in-place computational photometers and digital video analytics processing in the cloud is only getting started. The promise to provide a safer and more secure world is exciting. At the same time, many people worry that there will never be privacy again. Keep in mind that the more than 7 billion people of the world are basically organic, binocular vision systems that invade your privacy every day. Sites such as EarthCam enable you to view the world from your web page. It's not clear how many Internet-connected cameras exist today. Join the discussion on vision (one of the most valued human senses) and the efforts to extend, replicate, or even improve on it. It all starts with better and smarter cameras.
|OpenCV stereo camera examples||example-stereo.zip||96KB|
|Beagle xM Natty Linux Image||beagle-xm-image.zip||2397372KB|
- Learn more about the concept of computational photography. My team and I suggest a similar concept for computational photometry for instrumentation using a CVPU to hardware-accelerate image transformation between the camera interface and computing.
- See footage from the Chelyabinsk meteor from National Public Radio and CBS in the United States.
- In one sense, you can think of all of the cameras much like the 7 billion humans' eyes to help us observe the earth. The idea of being observed by cameras regularly, however, will take some time to gain social acceptance; much like human observation and witnesses of events, policies and legislation are required to prevent invasion of privacy and related concerns.
- For information about field testing a self-driving car like Google, Toyota, and Audi in Nevada, see how to apply here and find general updates on this topic from AUVSI. Watch some video of recent testing of the Google car. Likewise, for unmanned aerial vehicles, Alaska is helping lead the way on integration with air traffic; you can find more information at the Alaska state website along with U.S. Federal Aviation Administration initiatives.
- Sensor networks and motes (the nodes in the network) have been in development for a long time — for example, NEST and Smart Dust.
- Learn more about computational photography.
- The OpenCV API for computer vision is well documented and in numerous books, such as Learning OpenCV (Adrian Kaehler and Gary Bradski, O'Reilly Media, 2013) and Mastering OpenCV with Practical Computer Vision Projects (Shervin Emami et al., O'Reilly Media, 2012). Note that OpenCV was implemented in C but has been updated with a C++ implementation that is encouraged for future applications compared with C. You can learn computer and machine vision theory from a wide variety of excellent academic texts, including Computer Vision: Algorithms and Applications (Richard Szeliski), Computer Vision: Models, Learning, and Inference (Simon J.D. Prince), and Computer & Machine Vision (E. Davies).
- Many new wearable cameras are rapidly becoming ubiquitous, including GoPro, and adding AR-type features, such as Google Glass. Some cameras are designed for animals, such as National Geographic's Crittercam, now commoditized for dogs and cats with products like Eyenimal.
- Learn more about big data in the developerWorks big data content area. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on Twitter.
Get products and technologies
- The use of drop-in-place sensors like the computational photometer will require low-power mobile compute platforms and batteries such as those provided by BeagleJuice, 2nd generation, and small, low-power FPGA co-processors like the Altera DE0 Nano.
- The use of drop-in-place sensors will also require wireless uplink such as cellular GSM modules and Xbee or ZigBee.
- Texas Instruments has useful tips on power management for the OMAP in the Sitara Power Management User Guide, in the OMAP Power Management wiki, and in the TI E2E Community pages.
- Many computer vision researchers use MATLAB, but as you can see from my method for verifying my DCT and iDCT code for two-dimensional spatial transformations, I prefer GNU Octave for teaching because it works well and is open source. I often use GIMP, avconv (FFmpeg), VLC, and GStreamer when working on open source digital media applications and systems.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Check out the developerWorks blogs and get involved in the developerWorks community.
- Check out IBM big data and analytics on Facebook.