Cloud scaling, Part 3: Explore video analytics in the cloud

Using methods, tools, and system design for video and image analysis, monitoring, and security

Explore and consider methods, tools, and system design for video and image analysis with cloud scaling. As described in earlier articles in this series, video analytics requires a more balanced data-centric compute architecture compared to traditional compute-centric, scalable, high-performance computing. The author examines the use of OpenCV and similar tools for digital video analysis and methods to scale this analysis using cluster and distributed system design.

The use of coprocessors designed for video analytics and the new OpenVX hardware acceleration discussed in previous articles can be applied to the computer vision (CV) examples presented in this article. This new data-centric technology for CV and video analytics requires the system designer to re-think application software and system design to meet demanding requirements, such as real-time monitoring and security for large, public facilities and infrastructure as well as a more entertaining, interactive, and safer world.

Sam B. Siewert (ssiewert@UAA.Alaska.EDU), Assistant Professor, University of Alaska Anchorage

Sam Siewert photoDr. Sam Siewert is an assistant professor in the Computer Science and Engineering department at the University of Alaska Anchorage. He is also an adjunct assistant professor at the University of Colorado at Boulder and teaches several summer courses in the Electrical, Computer, and Energy Engineering department. As a computer system design engineer, Dr. Siewert has worked in the aerospace, telecommunications, and storage industries since 1988. Ongoing interests as a researcher and consultant include scalable systems, computer and machine vision, hybrid reconfigurable architecture, and operating systems. Related research interests include real-time theory, digital media, and fundamental computer architecture.



04 June 2013

Also available in Chinese Russian

Public safety and security

The integration of video analytics in public places is perhaps the best way to ensure public safety, providing digital forensic capabilities to law enforcement and the potential to increase detection of threats and prevention of public safety incidents. At the same time, this need has to be balanced with rights to privacy, which can become a contentious issue if these systems are abused or not well understood. For example, the extension of facial detection, as shown in Figure 1, to facial recognition has obvious identification capability and can be used to track an individual as he or she moves from one public place to another. To many people, facial analytics might be seen an invasion of privacy, and use of CV and video analytics should adhere to surveillance and privacy rights laws and policies, to be sure—any product or service developer might want to start by considering best practices outlined by the Federal Trade Commission (FTC; see Resources).

Digital video using standards such as that from Motion Picture Experts Group (MPEG) for encoding video to compress, transport, uncompress, and display it has led to a revolution in computing ranging from social networking media and amateur digital cinema to improved training and education. Tools for decoding and consuming digital video are widely used by all every day, but tools to encode and analyze uncompressed video frames are needed for video analytics, such as Open Computer Vision (OpenCV). One of the readily available and quite capable tools for encoding and decoding of digital video is FFmpeg; for still images, GNU Image Processing (GIMP) is quite useful (see Resources for links). With these three basic tools, an open source developer is fully equipped to start exploring computer vision (CV) and video analytics. Before exploring these tools and development methods, however, let's first define these terms better and consider applications.

The first article in this series, Cloud scaling, Part 1: Build your own and scale with HPC on demand, provided a simple example using OpenCV that implements a Canny edge transformation on continuous real-time video from a Linux® web cam. This is an example of a CV application that you could use as a first step in segmenting an image. In general, CV applications involve acquisition, digital image formats for pixels (picture elements that represent points of illumination), images and sequences of them (movies), processing and transformation, segmentation, recognition, and ultimately scene descriptions. The best way to understand what CV encompasses is to look at examples. Figure 1 shows face and facial feature detection analysis using OpenCV. Note that in this simple example, using the Haar Cascade method (a machine learning algorithm) for detection analysis, the algorithm best detects faces and eyes that are not occluded (for example, my youngest son's face is turned to the side) or shadowed and when the subject is not squinting. This is perhaps one of the most important observations that can be made regarding CV: It's not a trivial problem. Researchers in this field often note that although much progress has been made since its advent more than 50 years ago, most applications still can't match the scene segmentation and recognition performance of a 2-year-old child, especially when the ability to generalize and perform recognition in a wide range of conditions (lighting, size variation, orientation and context) is considered.

Figure 1. Using OpenCV for facial recognition
Image showing facial recognition analysis

To help you understand the analytical methods used in CV, I have created a small test set of images from the Anchorage, Alaska area that is available for download. The images have been processed using GIMP and OpenCV. I developed C/C++ code to use the OpenCV application programming interface with a Linux web cam, precaptured images, or MPEG movies. The use of CV to understand video content (sequences of images), either in real time or from precaptured databases of image sequences, is typically referred to as video analytics.

Defining video analytics

Video analytics is broadly defined as analysis of digital video content from cameras (typically visible light, but it could be from other parts of the spectrum, such as infrared) or stored sequences of images. Video analytics involves several disciplines but at least includes:

  • Image acquisition and encoding. As a sequence of images or groups of compressed images. This stage of video analytics can be complex, including photometer (camera) technology, analog decoding, digital formats for arrays of light samples (pixels) in frames and sequences, and methods of compressing and decompressing this data.
  • CV. The inverse of graphical rendering, where acquired scenes are converted into descriptions compared to rendering a scene from a description. Most often, CV assumes that this process of using a computer to "see" should operate wherever humans do, which often distinguishes it from machine vision. The goal of seeing like a human does most often means that CV solutions employ machine learning.
  • Machine vision. Again, the inverse of rendering but most often in a well-controlled environment for the purpose of process control—for example, inspecting printed circuit boards or fabricated parts to make sure they are geometrically correct within tolerances.
  • Image processing. A broad application of digital signal processing methods to samples from photometers and radiometers (detectors that measure electromagnetic radiation) to understand the properties of an observation target.
  • Machine learning. Algorithms developed based on the refinement of the algorithm through training data, whereby the algorithm improves performance and generalizes when tested with new data.
  • Real-time and interactive systems. Systems that require response by a deadline relative to a request for service or at least a quality of service that meets SLAs with customers or users of the services.
  • Storage, networking, database, and computing. All required to process digital data used in video analytics, but a subtle, yet important distinction is that this is an inherently data-centric compute problem, as was discussed in Part 2 of this series.

Video analytics, therefore, is broader in scope than CV and is a system design problem that might include mobile elements like a smart phone (for example, Google Goggles) and cloud-based services for the CV aspects of the overall system. For example, IBM has developed a video analytics system known as the video correlation and analysis suite (VCAS), for which the IBM Travel and Transportation Solution Brief Smarter Safety and Security Solution for Rail [PDF] is available; it is a good example of a system design concept. Detailed focus on each system design discipline involved in a video analytics solution is beyond the scope of this article, but many pointers to more information for system designers are available in Resources. The rest of this article focuses on CV processing examples and applications.


Basic structure of video analytics applications

You can break the architecture of cloud-based video analytics systems down into two major segments: embedded intelligent sensors (such as smart phones, tablets with a camera, or customized smart cameras) and cloud-based processing for analytics that can't be directly computed on the embedded device. Why break the architecture into two segments compared to fully solving in the smart embedded device? Embedding CV in transportation, smart phones, and products is not always practical. Even when embedding a smart camera is smart, so often, the compressed video or scene description may be back-hauled to a cloud-based video analytics system, just to offload the resource-limited embedded device. Perhaps more important, though, than resource limitations is that video transported to the cloud for analysis allows for correlation with larger data sets and annotation with up-to-date global information for augmented reality (AR) returned to the devices.

The smart camera devices for applications like gesture and facial expression recognition must be embedded. However, more intelligent inference to identify people and objects and fully parse scenes is likely to require scalable data-centric systems that can be more efficiently scaled in a data center. Furthermore, data processing acceleration at scale ranging from the Khronos OpenVX CV acceleration standards to the latest MPEG standards and feature-recognition databases are key to moving forward with improved video analytics, and two-segment cloud plus smart camera solutions allow for rapid upgrades.

With sufficient data-centric computing capability leveraging the cloud and smart cameras, the dream of inverse rendering can perhaps be realized where, in the ultimate "Turing-like" test that can be demonstrated for CV, scene parsing and re-rendered display and direct video would be indistinguishable for a remote viewer. This is essentially done now in digital cinema with photorealistic rendering, but this rendering is nowhere close to real time or interactive.


Video analytics apps: Individual scenarios

Killer applications for video analytics are being thought of every day for CV and video analytics, some perhaps years from realization because of computing requirements or implementation cost. Nevertheless, here is a list of interesting applications:

  • AR views of scenes for improved understanding. If you have ever looked at, for example, a landing plane and thought, I wish I could see the cockpit view with instrumentation, this is perhaps possible. I worked in Space Shuttle mission control long ago, where a large development team meticulously re-created a view of the avionics for ground controllers that shadowed what astronauts could see—all graphical, but imaging fusion of both video and graphics to annotate and re-create scenes with meta-data. A much simplified example is presented here in concept to show how an aircraft observed via a tablet computer camera could be annotated with attitude and altitude estimation data (see the example in this article).
  • Skeletal transformations to track the movement and estimate the intent and trajectory of an animal that might jump onto a highway. See the example in this article.
  • Fully autonomous or mostly autonomous vehicles with human supervisory control only. Think of the steps between today's cruise control and tomorrow's full autonomous car. Cars that can parallel park themselves today are a great example of this stepwise development.
  • Beyond face detection to reliable recognition and, perhaps more importantly, for expression feedback. Is the driver of a semiautonomous vehicle aggravated, worried, surprised?
  • Virtual shopping (AR to try products). Shoppers can see themselves in that new suit.
  • Signage that interacts with viewers. This is based on expressions, likes and dislikes, and data that the individual has made public.
  • Two-way television and interactive digital cinema. Entertainment for which viewers can influence the experience, almost as if they were actors in the content.
  • Interactive telemedicine. This is available any time with experts from anywhere in the world.

I make no attempt in this article to provide an exhaustive list of applications, but I explore more by looking closely at both AR (annotated views of the world through a camera and display—think heads-up displays such as fighter pilots have) and skeletal transformations for interactive tracking. To learn more beyond these two case studies and for more in-depth application-specific uses of CV and video analytics in medicine, transportation safety, security and surveillance, mapping and remote sensing, and an ever-increasing list of system automation that includes video content analysis, consult the many entries in Resources. The tools available can help anyone with computer engineering skills get started. You can also download a larger set of test images as well as all OpenCV code I developed for this article.

Example: Augmented reality

Real-time video analytics can change the face of reality by augmenting the view a consumer has with a smart phone held up to products or our view of the world (for example, while driving a vehicle) and can allow for a much more interactive experience for users for everything from movies to television, shopping, and travel to how we work. In AR, the ideal solution provides seamless transition from scenes captured with digital video to scenes generated by rendering for a user in real time, mixing both digital video and graphics in an AR view for the user. Poorly designed AR systems distract a user from normal visual cues, but a well-designed AR system can increase overall situation awareness, fusing metrics with visual cues (think fighter pilot heads-up displays).

The use of CV and video analytics in intelligent transportation systems has significant value for safety improvement, and perhaps eventually CV may be the key technology for self-driving vehicles. This appears to be the case based on the U.S. Defense Advanced Research Projects Agency challenge and the Google car, although use of the full spectrum with forward-looking infrared and instrumentation in addition to CV has made autonomous vehicles possible. Another potentially significant application is air traffic safety, especially for airports to detect and prevent runway incursion scenarios. The imagined AR view of an aircraft on final approach at Ted Stevens airport in Anchorage shows a Hough linear transform that might be used to segment and estimate aircraft attitude and altitude visually, as shown in Figure 2. Runway incursion safety is of high interest to the U.S. Federal Aviation Administration (FAA), and statistics for these events can be found in Resources.

Figure 2. AR display example
Image showing an example of video augmentation

For intelligent transportation, drivers will most likely want to participate even as systems become more intelligent, so a balance of automation and human participation and intervention should be kept in mind (for autonomous or semiautonomous vehicles).

Skeletal transformation examples: Tracking movement for interactive systems

Skeletal transformations are useful for applications like gesture recognition or gate analysis of humans or animals—any application where the motion of a body's skeleton (rigid members) must be tracked can benefit from a skeletal transformation. Most often, this transformation is applied to bodies or limbs in motion, which further enables the use of background elimination for foreground tracking. However, it can still be applied to a single snapshot, as shown in Figure 3, where a picture of a moose is first converted to a gray map, then a threshold binary image, and finally the medial distance is found for each contiguous region and thinned to a single pixel, leaving just the skeletal structure of each object. Notice that the ears on the moose are back—an indication of the animal's intent (higher-resolution skeletal transformation might be able to detect this as well as the gait of the animal).

Figure 3. Skeletal transformation of a moose
Image showing an example of a skeletal transformation

Skeletal transformations can certainly be useful in tracking animals that might cross highways or charge a hiker, but the transformation has also become of high interest for gesture recognition in entertainment, such as in the Microsoft® Kinect® software developer kit (SDK). Gesture recognition can be used for entertainment but also has many practical purposes, such as automatic sign language recognition—not yet available as a product but a concept in research. Certainly skeletal transformation CV can analyze the human gait for diagnostic or therapeutic purposes in medicine or to capture human movement for animation in digital cinema.

Skeletal transformations are widely used in gesture-recognition systems for entertainment. Creative and Intel have teamed up to create an SDK for Windows® called the Creative* Interactive Gesture Camera Developer Kit (see Resources for a link) that uses a time-of-flight light detection and ranging sensor, camera, and stereo microphone. This SDK is similar to the Kinect SDK but intended for early access for developers to build gesture-recognition applications for the device. The SDK is amazingly affordable and could become the basis from some breakthrough consumer devices now that it is in the hands of a broad development community. To get started, you can purchase the device from Intel, and then download the Intel® Perceptual Computing SDK. The demo images are included as an example along with numerous additional SDK examples to help developers understand what the device can do. You can use the finger tracking example shown in Figure 4 right away just by installing the SDK for Microsoft Visual Studio® and running the Gesture Viewer sample.

Figure 4. Skeletal transformation using the Intel Perceptual Computing SDK and Creative Interactive Gesture Camera Developer Kit
Image showing a skeletal and blob transformation of a hand

The future of video analytics

This article makes an argument for the use of video analytics primarily to improve public safety; for entertainment purposes, social networking, telemedicine, and medical augmented diagnostics; and to envision products and services as a consumer. Machine vision has quietly helped automate industry and process control for years, but CV and video analytics in the cloud now show promise for providing vision-based automation in the everyday world, where the environment is not well controlled. This will be a challenge both in terms of algorithms for image processing and machine learning as well as data-centric computer architectures discussed in this series. The challenges for high-performance video analytics (in terms of receiver operating characteristics and throughput) should not be underestimated, but with careful development, this rapidly growing technology promises a wide range of new products and even human vision system prosthetics for those with sign impairments or loss of vision. Based on the value of vision to humans, no doubt this is also fundamental to intelligent computing systems.


Downloads

DescriptionNameSize
OpenCV Video Analytics Examplesva-opencv-examples.zip600KB
Simple images for use with OpenCVexample-images.zip6474KB

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • Cloud digest

    Complete cloud software, infrastructure, and platform knowledge.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing
ArticleID=932698
ArticleTitle=Cloud scaling, Part 3: Explore video analytics in the cloud
publish-date=06042013