 | Level: Introductory Lewin Edwards (sysadm@zws.com), Design Engineer, Freelance
07 Mar 2006 Explore the technical issues in video playback, and see how a blend of hardware and software achieves good performance at a reasonable cost. Also, Lewin Edwards reveals that MP3 does not mean MPEG-3, which alone is worth the price of admission.
The previous articles in this series show you how to create a scriptable, network-connected appliance that can play back still images and
scale them to fit an arbitrary display window. I promised a while ago to show you how to get the device to show movies, and this is the episode where I
fulfill that promise.
Movie playback is one of those functions where you really don't want to
have to reinvent the wheel. The next few paragraphs discuss -- very briefly -- the history behind the common digital video
formats and some of the data processing steps they contain. This
information should put the task of movie playback in
context and give you some idea of the complexities involved. Since we're
using a hardware platform of known capabilities, detailed video decoding
specifications don't figure into the design equation the way they might in
a "real" (commercial) product; our design has hardware with XYZ
capabilities, and we have to make do with that.
MPEG
standards
Most of the digital video content you need to care about will be encoded
with one of the following three standards, developed by the Moving Picture
Coding Experts Group (MPEG):
- MPEG-1 (finalized in 1992). The fundamental goal of MPEG-1 was to
encode audio and video signals for storage on standard 650MB compact disks,
with a visual quality approximating that of VHS video, at a data rate of
1.5Mbit/sec. You can store roughly an hour of MPEG-1 video on a single
CD-ROM at this compression level. The baseline resolution for NTSC MPEG-1
video is 352x240 pixels; this resolution is referred to as SIF (Standard
Interchange Format). MPEG-1 is used commonly in the low-cost VideoCD (VCD)
movie format, once wildly popular (and still supported to a certain extent)
in Asia.
- MPEG-2 (finalized in 1994). The goal of MPEG-2 was to achieve higher
(broadcast) quality at higher bitrates from 3 to 10 Mbit/sec. MPEG-2 is
typically used at the full-frame NTSC resolution of 720x480 pixels. It has
better support for interlaced video than MPEG-1, and offers greatly
superior video quality at higher bitrates. Note that a compliant MPEG-2
decoder can handle MPEG-1 bitstreams, offering good backwards
compatibility. You'll probably be very familiar with MPEG-2 implementations
in the form of DVD players and satellite TV broadcasts.
- MPEG-4 (finalized in 1998). This is a much more flexible and advanced
encoding system, offering a wide selection of bitrate versus quality setpoints
for different types of content. It performs particularly well (compared to
MPEG-1 and -2) at very low-bitrate, low-resolution settings. Unfortunately,
it is also an exceedingly complicated standard, rarely fully implemented.
The standard includes support for programmable visual objects, content copy
protection and other intellectual property management features, and much
more.
If you start researching this topic, you might find mention of a couple of
other MPEG standards: MPEG-3, which is an abandoned standard intended to
apply to high-definition TV signals (MPEG-2 overtook this role), and
MPEG-7, which refers, essentially, to a standardized method of encoding
metadata for media objects. Neither of these standards have any direct part
to play in a media playback appliance of the type we're building.
Video encoder/decoder compatibility with MPEG standards is specified in
terms of
"profiles" and "levels," and these are written in the form
"profile@level." For example, MPEG-2 "Main Profile,
Main Level" (MP@ML) specifies, among other things, a video
resolution of 720x480 at 30 frames per second (for NTSC systems). Profiles
and levels are defined in excruciating detail in the MPEG standards
documents (Find some exemplar material in Resources). The main
reason for specifying these different protocol levels is to define the
minimum system requirements (RAM and processing horsepower) required to
implement a basic player for standard off-the-shelf video media; in fact,
some compromises in MPEG-1 profiles specifically address the
(then) high cost of moving up from a single DRAM chip to multiple chips, or
the next size bigger. Note, by the way, that MP3 refers, not to MPEG-3,
but to MPEG Audio Layer
3. Most MP3 files you would find for download are probably MPEG-2, Layer
3, but some are MPEG-1, Layer
3. The "layers" are different audio codecs, with Layer 2 used mainly in
broadcast applications. These "layers" have nothing to do with the video
encoder "level." Confused yet?
You'll observe that the various MPEG encoding formats are open, though not
public domain, ISO standards. For the purposes of this article, I'm ignoring the closed video encoding formats such as QuickTime, RealPlayer
and Windows® Media Video (and a number of others used in various industries
that distribute video content for money -- no, I won't be providing links in
Resources!). Note that many so-called proprietary
formats are technologically practically identical to MPEG-4, but are
wrapped in slight modifications or customized digital rights management
shims.
Performance
requirements
This leads nicely into a discussion of how to dimension the system for
smooth video playback. How much processing power is necessary to play back
movies smoothly? Unfortunately, you can't answer this question with a
simple MHz rating; it leads to a complicated matrix of CPU multimedia
acceleration features versus formats supported versus resolution, audio
quality, and so forth. In general, however, motion video decoding encounters
the following bottlenecks:
- File transfer speed. Using lower data rates (implying better
compression ratios or lower quality reproduction) can improve
performance here. MPEG-4, which can generate good output from exceedingly
low data rates, wins on this point.
- Decryption. This only applies to encrypted video streams, obviously.
DVDs and broadcast digital video content are usually encrypted.
- Stream parsing. This is usually not particularly compute-intensive,
but it's one of the tasks you need to do.
- iDCT (inverse Discrete Cosine Transform) performance. The Discrete
Cosine Transform is used to transfer the input image data from the spatial
domain into the frequency domain; the output data is then quantized and
run-length compressed. Decoding the video stream involves reversing these
processes, which requires use of the iDCT.
- MC (motion compensation). You'll find a reasonably informative link
about this process in Resources, but in brief: you can find big savings in bitrate by locating objects that are moving onscreen and encoding the
resultant frame changes as motion vectors. For example, if you have a
spaceship moving across a static background, it takes fewer bits to say
"take the shape at position x,y in frame #1 and move it ten pixels right in
frame #2" than it does to communicate a list of exactly which pixels have
changed between the two frames. (For purists, this description is an
oversimplification, especially for MPEG-1 and MPEG-2, but it gets the idea
across).
- Colorspace conversion. Digital video is compressed in the YUV
colorspace, for various reasons -- one of which is that it facilitates
cunning space-saving hacks such as keeping all the luminance information
but throwing away most of the chrominance, since the human eye is less
sensitive to color scales than overall light intensity. Computer monitors
almost universally operate in the RGB colorspace, so a conversion is
necessary.
- Output scaling. This is possibly more important for the multimedia
player niche than for single-purpose devices like DVD players. Find more detail on this issue below.
As it happens, PowerPC® cores are encountered frequently in embedded
digital video applications such as set-top boxes. However, in these
applications the PowerPC is always surrounded by external hardware that assists
with the decoding process (all this magic is usually integrated into a
single chip). The PowerPC core mainly coordinates system
activities, drives the user interface, and perhaps runs provider-specific
middleware. For example, your satellite TV box might contain a Java
bytecode interpreter so it can run custom applets from your service
provider. Because the PowerPC isn't handling the workload of actual video
decompression, you won't find multiGHz cores in such applications; a
typical speed range would be in the 233 to 400MHz ballpark.
So, why would you choose to use a high-performance general-purpose PowerPC (as
we're doing here) rather than application-specific decoder hardware plus a
lower-end PowerPC core running at a more sedate pace? For this application,
the situation is simple -- the hardware platform is already defined,
and you want to use it for this new functionality. However, you might
intentionally elect to go this route if you're only building a small number of
units (perhaps only a prototype run), or you don't have a lot of startup
capital.
The reason for this is licensing. The chipsets that are used in DVD players
and set-top boxes are very cheap, but you can't buy them, or even see the
full datasheets, until you have (a) decided to commit to a huge volume, and
(b) acquired all the relevant licenses for the technologies implemented in
these parts -- even if you don't intend to use those features. These license
fees are tens of thousands of dollars (the DVD standard license is US$10,000
by itself), and that's just to get your foot in the door and acquire the
relevant technical documents. You must pay additional per-unit royalty fees as well! By choosing a pure-software approach, you can sidestep this
problem by buying software licenses in whatever quantity you need. For an
educational project or in-house prototype, you might not even need to
negotiate licenses at all.
A second point to consider is flexibility. Hardware decoders -- especially
at the low end -- can be quite constrained with respect to the bitstreams
they can handle. If you've ever experimented with homemade VideoCD or DVD
disks in regular living room DVD players, you've most likely seen all kinds
of weird symptoms such as disks that can't be resumed if you pause them,
audio synchronization doing strange things, bizarre video corruption
artifacts in certain scenes, and so forth. If you're implementing most of
the decode operation in software, you can tweak both ends of the
encode-decode equation as much as you like without affecting playability.
In fact, the compute-intensity situation isn't quite as bad as you might
have been led to infer from the previous couple of paragraphs. In these
decadent modern times, all general-purpose computer video chipsets are
designed to assist with digital video decoding algorithms. The Radeon 9200
used in the Mac mini, for instance, has hardware support for iDCT and MC,
as well as YUV-to-RGB colorspace conversion. Very roughly speaking, this
hardware offloads 85% of the decode effort from the CPU. Actually using the
acceleration features at a register level is not trivial (especially
because the datasheets for these video chips are, again, largely secret),
but fortunately the XFree86 folks have done the research and the hard
implementation work for us. Note, by the way, that support for ATI products
in Linux® is generally quite good; many other vendors' graphics hardware is
significantly irksome to get working.
I mentioned at the start of this discourse that you really don't want to
reinvent the wheel when it comes to digital video decoding, and hence you
need to use off-the-shelf software to do the playback. The package I've
chosen to use is mplayer (see Resources), an
open source player that is
incredibly flexible and very amenable to integration into other programs.
If you're pursuing an actual commercial project, you might prefer to look
at xine (see Resources). The reason for this is
that Linspire has created a
fully (DVD) licensed version of this player for their consumer Linux
variant, and in the past they have expressed a willingness to sell batches
of serial numbers (covering the content-scrambling license) for embedded
applications. The binary they make available is, of course, for x86 -- but
it seems likely you could negotiate the licensing independently of the
specific build that's available from the Linspire Web site.
In case you haven't already gleaned it, licensing in general is a very
itchy issue around multimedia formats, especially digital video. MPEG-2 by
itself -- not including the various goodies related to DVD playback -- lives
on a raft of more than 600 patents held by different parties. This is why
central clearinghouses like the MPEG Licensing Authority (see Resources)
are, unfortunately, absolutely necessary.
To use mplayer and one of the other external programs we'll add a
little later, you'll have to install three additional libraries: libao,
libmad, and libid3tag. These are, respectively, a cross-platform audio
library, an MPEG audio-decoder library, and a library for extracting ID3
tag information out of MP3 files. For your convenience, I've gathered these
libraries together into a single archive for you to download (see Downloads). To
make and install each library, simply unpack it, and from within the
directory thus created, run:
Listing 1. Building a
library
./configure ; make all ; make install
|
You use the exact same steps to build mplayer. I used the MPlayer-1.0pre7try2 version, but if you find a newer
version by the time you read this, you almost certainly won't have to
change anything in these procedures to get it working properly.
At this point, you can also build and install the new version of the
slide show program; download and unpack it, make all; make install and
you're there. You can integrate MPEG video content (with .MPE, .MPG, or
.MPEG extensions) into your slide shows using the same HTML-style syntax described in the previous article. Note, however, that most Web browsers
will not properly preview an image link that points to a .MPG file, so you
will probably just see a blank square for any MPEG items in your slide show.
Addressing this problem would require adding significant intelligence to
the inbuilt HTML parser, so that it could recognize the HTML tags for
embedded movies, and that's not a high priority just at the moment.
Take a closer look at how I modified the slide show program. The first
change I made was in filescan.h (where I simply added a new media type,
MT_MPEG, and a new media_type field in the slide show item structure). I
also modified the FL_Identify function in filescan.c so that it would
identify the appropriate extensions for MPEG files. The other modifications
were to main.c, first in the beginning where we parse the script file,
and later to the slide show playback code.
But why go to the trouble of explicitly stating the media format
with a separate field in the slide show entry? Surely you can just determine
that by looking at the file extension? This is exactly how the FL_Identify
function works, after all. However, it turns out that you can gain considerable
amenity by keeping track of the media type as a separate
attribute. An almost trivial benefit is that you don't have to re-parse the
filename every time you need to make a decision based on media formats, and
you can use a simple switch statement to control execution flow for the
different supported formats. (There aren't many such decision points in the
code as it stands, but more will come as we add new features). A more
subtle, but also more important issue is that you might not always be able
to identify the media based on the last few characters of the filename,
particularly for content that is being fetched directly off the Internet.
The slide show program as it stands doesn't support this yet, but eventually
you want to be able to embed URLs in the slide show script that refer to
remote content. URLs served up by remote programs might have quite
indecipherable filenames, and you need another mechanism to know what
type of content you're actually trying to display.
The last change I made to main.c is the meaty part that actually spawns
mplayer, when appropriate, to play your embedded movie files. It's worth
dissecting the command-line switches I pass to mplayer, because they were
selected with some care.
The first two switches are -noconsolecontrols, which prevents mplayer from
looking at its input stream for command characters, and -really-quiet,
which suppresses the generation of a lot of printf'd information that you
won't be able to see when mplayer is spawned from your program.
The next switch is -vo xv, which tells mplayer to use the X Video
extensions for playback. This gives you access to the video card's
colorspace conversion (overlay) hardware, which saves a little CPU time. It
also opens the door for the -vm switch mentioned below.
If you type mplayer -vo help to get a list of supported output methods,
you'll see that this program supports a multiplicity of different outputs --
it can render video directly to the framebuffer (-vo fbdev -- observe that
this mode can be coaxed into running on practically any device that runs
Linux and supports bitmapped graphics), to vanilla X devices (-vo x11),
using the X Video extensions (-vo xv), through OpenGL, and even to various
animated file formats such as animated GIF. The output mode you select
depends on what you want to achieve and exactly what capabilities your
hardware offers. For instance, if your machine has some kind of OpenGL
hardware acceleration, you might choose OpenGL output -- that way, you can
use the acceleration hardware to handle scaling (and possibly colorspace
conversion).
Next, we have -fs (full screen) and -vm (which allows mplayer to choose a
different video mode; it automatically switches back to the previous mode
when playback is done, although this behavior can be overridden if desired). This latter feature alone is worth X11's price of admission. The sweet
spot for VHS-quality digital video is VideoCD format, which is MPEG-1 at
352x240 (for NTSC, anyway). Even the little old 233MHz iMac can decode this
smoothly in software, with accompanying audio. (By the way, that's really
impressive performance. An equivalent x86 system would be groaning under
the same workload).
Rendering this low-resolution bitstream on your 1024x768 screen, however,
would either result in a tiny video window, or waste a lot of CPU cycles
scaling the video output up to the full-display resolution. The solution is
to allow mplayer to change X's output resolution temporarily. No software
scaling is necessary; the actual pixel clock sent out to the monitor is
altered, and the monitor's electronics change the raster's speed to match
the sync rates coming from the computer. Observe that without this
functionality, you would have two competing design goals: still images
generally require very high resolution if they are to appear attractive,
whereas video content can be much lower-resolution (and in fact, needs to
be -- since your platform would have difficulty decoding a full-resolution
1024x768 video file).
The final switch is -framedrop, which simply allows mplayer to skip
rendering steps on some frames if the video output starts to lag audio.
This can help keep up the appearance of smooth playback if your system is
right on the performance borderline for a particular bitstream; an
occasional dropped frame here or there won't be noticed as readily as
out-of-sync audio or stuttering output.
That's it for this episode. In the next article (which will take less time
to release than this one did -- honest!), you'll see how to add Web-based remote
control functionality, and as a special bonus I'll show you how to put the elinks embedded
Web browser onto the player itself so the same user interface is presented
at both local and remote locations. In the meantime, as an exercise, try
adding AVI support by modifying filescan.c appropriately, and experiment
with various different media files to see what your hardware can handle
efficiently.
Downloads | Description | Name | Size | Download method |
|---|
| Example source code | pa-madmac4code.tar.gz | 7KB | HTTP |
|---|
| Libraries for mplayer | pa-madmac4code-mplayer-libs.tar.gz | 1.17MB | HTTP |
|---|
Resources Learn
Get products and technologies
About the author  | |  |
Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years
developing x86, ARM and PA-RISC-based networked multimedia appliances at
Digi-Frame Inc. He has extensive experience in encryption and security
software and is the author of two books on embedded systems development.
He can be reached at sysadm@zws.com.
|
Rate this page
|  |