 | Level: Introductory Lewin Edwards (sysadm@zws.com), Design Engineer, Freelance
10 Jan 2006 Lewin Edwards looks at the history and design of X and why it matters for an embedded graphics system and introduces a basic scripting language for controlling a multimedia display device.
The previous article showed you how to get your PowerPC®-based multimedia device
up and running, loading and displaying JPEG images. This episode shows you how to add image scaling and -- more importantly -- scripting
functionality, and I'm going to describe for you some of the design theory
and engineering difficulties to be encountered in building a multimedia
appliance. I'm also going to go into a bit more detail about some material
I glossed over last time.
In the last episode, I
started off by describing the Linux® framebuffer device and showing you how
to attach your application to it. I then donned a stunningly lifelike
weasel costume and ran our application inside X, while promising to explain
this decision later.
I introduced X into the equation in last episode for two reasons:
- There's apparently a bug in the kernel framebuffer driver for the ATI
Mach64, which is the graphics chip on my "backup" development system. You can work around this
bug by starting the X server, which correctly
initializes the graphics chip.
- Some of the later applications I'm going to discuss are easiest to
implement by pulling in external programs that require an X environment. You
can get a much faster, more seamless end result by having the X server and
window manager (see below) running at all times, although most of the time
you'll bypass it.
So, what is X? Variously known as XWindows, the X Window System, X11 or
just X, it is a highly platform-independent graphical environment. The
principal interesting design goal of X11 is to separate the display
subsystem (referred to as the "server") and the application needing display
services (referred to as the "client").
 |
Mistaking a client for a server
In general, client/server applications use a small, lightweight, client
front end on the user's machine, and a big heavy server application on a
back end somewhere far away. X reverses this, with the server being the
program running on the desktop. This is because the X server is an
interface server. This nonetheless confuses people sometimes, because
the "client" is most often the more processor-intensive application.
If you're used to X, you probably already knew this. I mention it only
because it seems to stay counterintuitive.
|
|
The X protocol as such only describes the raw mechanism for getting blocks
of pixels displayed on a screen. To organize those pixels into a
coherent GUI with functional radio buttons, checkboxes, windows with title
bars and resize gadgets, and so forth, you need another piece of software
called a window manager. The window manager handles presentation of windows
and user interaction with those windows, and provides a consistent look and
feel to different applications.
X toolkits
Because of the fact that it's excruciatingly difficult to program X
directly, most programs interact with toolkits that handle the low-level
details in a consistent manner. (Note that in the Bad Old Days, this used
to result in utterly incomprehensible user interface differences; no two
X-based boxes looked or felt the same because of the disparity in window
managers and toolkits. This problem has been ameliorated in recent years,
mainly due to the rising popularity of specific desktop environments such
as GNOME and KDE).
All this makes X rather schizophrenic. The underlying design assumption is
that your program and the GUI that's rendering its output are on opposite
ends of a network connection. Such a design is neatly compartmentalized but
horribly inefficient on a modern system where everything -- GUI code and
application code -- is running inside a single physical address space on one
machine. As a result, mouse holes of various sizes provide fast
transport across the territory between the application and the graphics
hardware. These range from simple shared memory areas to complicated
multilayer hardware-access APIs like the XFree86 Direct Rendering Interface
(see Resources).
Note that the X protocol does not describe any interface for non-graphical
multimedia functionality, such as audio input and output. A separate piece
of software, referred to as a sound server, handles this. For this example, I will not be
using a sound server, since I know that my application is going to be
running on the same machine where I want the audio to be heard.
Once you start getting into the specifics of X implementations on a given
platform (like, for example, the version included with Yellow Dog), things become
even more complicated, because there are all sorts of different ways you
can be talking to X and all sorts of ways X can be talking to your
hardware. For example, you can run X on top of the kernel framebuffer
driver in such a way that X never actually touches the video control
registers; it just uses framebuffer ioctls to find video memory and then
writes to it.
At the other extreme, you can run a "native" X server that does all its own
initialization of the graphics hardware. Each method has advantages and disadvantages, and I'll dig quite deep into those issues in
the next article when I talk about movie support.
For now, all this information is by way of background. The example application is
not going to use any X APIs; I need to run alongside X for the external
dependency reasons I discuss above, but I'm not actually going to use X
for anything directly. So, enough -- for the time being -- about X.
Defining the scripting language
When someone says "I want a programming language in which I need only say
what I wish done," give him a lollipop. -- Alan Perlis
Now, what of the scripting language? Since we're talking here about
something that is at least potentially a consumer appliance, the user interface and scripting system need careful thought.
There's rather a lot of prior art in the field of digital picture frames,
and most of it is not very good. (I can say this with such authority
because I'm the engineer behind several fielded commercial digital picture
frame products and a few unreleased prototypes; I was intimately involved
with the product category for some five years and had the opportunity to
test a lot of competitors' products). Let me begin by giving you a
thumbnail sketch of the design thinking behind user interfaces on these
sorts of products.
Multimedia appliances can broadly be divided into three categories:
- "Dumb" devices -- of diverse capabilities -- that exclusively play
commercially produced media containing implicit execution instructions,
or which use a paradigm that is intuitively understood by the average
consumer. A DVD player is a good example of this: it plays preprogrammed
media and uses a very familiar VCR metaphor for most of its operational
functions. Although it is possible to create your own DVDs, of course, the
design intent of the player itself doesn't encompass this functionality and
its user interface is concomitantly simple.
- "Smart, small" appliances that hold a library of user-supplied
content and don't offer much flexibility in how that content is played
back. These devices usually have a limited capability to display
information to the user. They always have a very narrow range of program
content options that can be programmed on the local user interface; an MP3
player or pocket video player is a good example of this category. These
appliances might connect to a computer and use the computer as an
enhanced user interface.
- "Smart, big" appliances, usually with lots of storage for
user-supplied content. These devices generally have the same order of
magnitude of processing capability as a desktop computer, and might have
fairly sophisticated controls allowing the user to set up complex content
playback options.
My categories here sound terribly dogmatic but if you think about specific
examples familiar to you, you'll realize that (with a few exceptions) the
devices that work the best in consumer hands tend to fall into one of these
pigeonholes. If you've been a gadget-hound for any length of time, you'll
certainly have bitter memories of devices that try to cross categories and
make a very poor job of it.
The most common design error -- not just in multimedia appliances, either --
is to take a category 2 appliance and squeeze category 3 functionality into
it. You can always devise some arcane sequence of button presses
and other events to achieve some sophisticated configuration feature, but
it's also always irritating for the user to have to deal with these
shoehorned user interfaces. Apple's iPod device is an excellent example of
how choosing not to implement complex functionality on a physically small
device leads to an elegant, simple, and commercially successful design. The
iPod hardware is quite capable of running a PDA-like OS with enormously
complicated applications on it, but it would be much harder to use than the
simple hierarchical tree menu system that Apple chose to develop.
The converse case is to develop a category 3 appliance with lots of
functionality and ample system resources, but make the configuration of
most of this functionality inaccessible to the user. The exasperation
arises here because the appliance is more than capable of running a user
interface giving full access to its capabilities, but the manufacturers
choose to force you to use some kind of external object -- usually custom
content authoring software on a personal computer -- to access the full
range of device features. Often, this step is added solely in order to
provide cross-marketing opportunities. For instance, a family of
products in this digital picture frame category required a live
Internet connection, and could only be administered by tinkering with a
back-end Web site. The reason for this was so that the frame always had
guaranteed access to a fresh stream of advertising images in addition to
the user-supplied images. There was no value added to the consumer, and it
was a very irksome sort of product.
Design requirements
With this in mind, consider the goals of a scripting language that
will fulfil the needs of a flexible, category 3 appliance of the type in this series:
- The file format should be easily read, decoded, re-encoded from
memory structure, and written back to storage media.
- It should be extensible and general-purpose; you might need to provide
special, currently unknown parameters when playing back some kinds of
media.
- It should support easy and rapid seeking from one entry to the next,
and random-access seeking.
- Ideally, it should be editable using a commonly available tool,
without exotic programming knowledge.
Unfortunately, the best way of satisfying the need for simple random
seeking is by using a binary file with a fixed record length. (I used this
method in the first generation of consumer digital picture frames I
developed). The problem with this is that although binary files are easily
read into memory, and it's easy to seek randomly within them (you just seek
to a byte offset calculated as record number * record size), it's
impossible to edit those files in a user-friendly way without writing
custom software to do it, and extending the file format to handle new
options involves tedious translations. In my previous life, I solved this
issue by making the user write the script in a text editor and compiling
the text into a temporary (volatile) binary file for easy random access at
runtime. The reason for jumping through this hoop was because these scripts
were permitted to be arbitrarily large.
For this appliance I'm going to assume arbitrarily that the script file
will always be small enough to fit entirely into RAM. If you look at
html.c, you'll see that the load-script function simply determines the size
of the scriptfile, allocates memory to load it, then pulls the whole file
in as a whole. In order to make on-the-fly parsing easier, the load function
also strips out control characters.
Note, by the way, that the limit of 16,384 slides imposed by the previous
version of the slide show application still exists in this version; I've merely
changed the way that 16,384-element structure gets populated.
For this appliance, I've decided to use a subset of HTML
and a simple HTML parser. The reason for this is that you can use any HTML
editor -- Openoffice.org, for example -- to create the script file in a
WYSIWYG manner.
The application -- which you should download and build at this point --
simply loads the file /web/script.html and attempts to play it. The
parser I've implemented scans the script file for <IMG> tags using a very
simplistic algorithm:
- Scan for the opening angle bracket.
- Wait for the characters "img" (case-insensitive, whitespace
ignored).
- Wait for the characters "src" (case-insensitive).
- Wait for a quotation mark.
- Assume the characters from this point up to the next quotation mark
are a filename to be loaded.
- Paths to image files are assumed to be relative to /web.
Observe that the syntax of the IMG tag supports any number of parameters
for each image. HTML defines some of these; for instance you can specify
WIDTH= and HEIGHT= to tell a browser to render the image with a specific
geometry. You can also add practically any meta-information your
slide show might need by means of custom tags.
Assuming you have two JPEGs called PIC001.JPG and PIC002.JPG, an example
slide show file would be the following:
Listing 1. A sample slide show script
<HTML>
<BODY>
<IMG SRC="PIC001.JPG"><BR>
<IMG SRC="PIC002.JPG"><BR>
</BODY>
</HTML>
|
Note that the HTML, BODY, and BR tags are just syntactic candy to make the
file palatable to a regular Web browser; the slide show program will run
fine without them.
The other major feature I've added to this version of the slide show
program is image scaling, an important feature for any digital photo album.
The scaling algorithm implemented here is a simple decimation system. The
code is a little tortuous to read because it uses integer arithmetic for
speed reasons; I always find it easiest to visualize this type of algorithm
in semi-algebraic terms.
Consider that you have a single line of an image which is m pixels wide,
and which you want to scale to n pixels onscreen. Further consider that you
have a source pointer s, pointed at the left-hand pixel of the source line
and a destination pointer d, pointed at the left-hand pixel of the target
rendering line. Clearly, you only want to render each destination pixel
once, so you are going to loop n times and increment d on each iteration of
the loop. The source pointer has m/n added to it on each loop iteration.
(That's the tricky part to do with integer arithmetic, by the way -- it's
easy with floating-point, but not as fast).
So, you now have a rather more controllable slide show program. The next
article shows you how to add support for movies. As part of this process, I'll
delve into a little more detail of the love-hate relationship we currently
have with X, and I'll enhance the script file format even more.
Downloads
The downloads for this article are being updated. Please try to download later.
Resources Learn
Get products and technologies
-
Download the source code referenced in this article from the table above.
The usual
warning applies to Internet Explorer users - make sure to save the file as
something.tar.gz!
Discuss
About the author  | |  | Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years
developing x86, ARM and PA-RISC-based networked multimedia appliances at
Digi-Frame Inc. He has extensive experience in encryption and security
software and is the author of two books on embedded systems development. |
Rate this page
|  |