Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Multipurpose multimedia processing with GStreamer

A universal solution for your multiple needs

Maciej Katafiasz (ibmdw@mathrick.org), Student, Computer Science
Maciej Katafiasz is a graduate student in Computer Science and has been using open source technologies since high school. Annoyed by the lack of simple, working ways to watch movies in his GNOME desktop, he picked up interest in the (then still young) GStreamer, and stayed around to help a bit. You can contact him at ibmdw@mathrick.org.

Summary:  This article introduces you to GStreamer, a universal multimedia processing library that makes multimedia handling easy.

Date:  11 Jul 2006
Level:  Intermediate
Also available in:   Chinese  Russian

Activity:  38909 views
Comments:  

Multimedia, by definition, means a variety of media types. You can store audio, video, and metadata in a myriad of file formats. However, this also means learning to use many tools to manipulate such diverse content.

This is where GStreamer comes to the rescue. By hiding all the different tools and libraries inside its plug-ins and using the general concept of a media pipeline, GStreamer is able to present the manipulation of different types of media in a uniform way. This allows you to concentrate on the media at hand, instead of wondering what pipe diameter your plumbing should have.

The benefits of such a unified approach are immediate. Instead of writing an MP3 player or an AVI/DivX player, you can write a music or a video player. When you want to support another format, you don't need to learn and then code for a new library. Instead, you simply install a plug-in for that format. That's all -- you don't even need to recompile. All GStreamer applications are able to pick up the new format on the go.

GStreamer can answer many problems, such as "I need to store all audio samples coming from various sources in a common format." Because all formats are treated alike, you only need to write one tool. This saves time and makes the solution more robust and easier to maintain. Moreover, after you learn the GStreamer concepts, there's almost no limit to what you can apply it to. If you want to stream audio over a network, you only need to think about the network, because the API (application programming interface) you use for sound and everything else stays the same.

The concepts

Because of its nature, GStreamer sits a bit above the level of a normal library. Thus, it's important to understand exactly what it is, what it isn't, as well as what it does.

GStreamer is a media processing library. That means it gives you an abstract model of some transformation -- composed of input, output, and different stages -- and allows you to construct concrete instances of such transformations to fit a particular end result and a particular media type. The following are examples of such processing:

  • Transcoding an MP3 audio file to Ogg Vorbis
  • Playing back an AVI movie file
  • Capturing a live performance with an IEEE1394 digital video (DV) camera and saving it as an MPEG-2 stream

To achieve such diverse results, GStreamer operates on the abstract notion of a pipeline. A pipeline is a directed graph in which media flows in a defined direction from the input to the output. Pipelines consist of elements -- another core concept. An element is an object that you can put inside a pipeline, wrapping some operation on the media inside. You can link elements together so they collectively yield a process that transforms the input into the desired output. By convention, pipelines are depicted with data flowing from the left (upstream) to the right (downstream). That is the same way they are written using gst-launch, which is described later in this article.

It is important to note that everything so far is completely abstract. There has been no mention of video or audio, and there's a good reason. The model described above is not restricted to any specific media type. As long as you can express it in terms of input, output, and transformation, your pipeline can manipulate it. For instance, your desktop can be a media source, and you can record a screencast of your operation to a video file. In fact, that's what the Istanbul application is designed to do (see the Resources section).

The core of GStreamer itself has no elements. All it provides is knowledge of plumbing. Everything specific is provided by plug-ins. A plug-in is a piece of compiled code, usually distributed as an object file (.so on UNIX® and .dll on Microsoft® Windows®), that provides one or more elements. At startup, GStreamer queries all installed plug-ins to derive the set of elements available for applications. Plug-ins usually call other libraries for specific tasks (for instance, an MPEG-2 decoder probably uses an existing library for handling the MPEG format), but the application doesn't need to know that. All it sees are elements that all look and behave the same.

Some plug-ins are distributed in the core source packages and compiled into the library itself, even though they are conceptually separate entities. Other basic plug-ins are distributed in a gst-plugins-base package. Those are present in most installations of GStreamer. Then there are the gst-plugins-good, -bad, and -ugly packages, where different plug-ins, depending on the level of support they get and licensing terms, are collected. Finally, there are plug-ins that are distributed by third-party vendors or registered for private use by a specific application.

Putting it all together

Now that you understand the pipeline, you need to understand how it maps to GStreamer's implementation. You also get to learn some more terminology along the way.

Why are source and sink swapped?

They aren't. A sink pad is a point where data flows into an element, and a source pad is where it originates. Thus, an element with only source pads is called a source, and one with only sink pads is called a sink. It really is quite logical, even if awkward at first.

As I mentioned, the basic unit of processing is an element, represented by the GstElement class. GStreamer is written in C, but it uses the same GObject library known from the GTK+ toolkit to get object-oriented features (see the Resources section). An element has pads, which are the linking points for other elements. There are two types of pads:

  • Sink pads provide input for an element.
  • Data produced by an element is available from source pads.

The pads have capabilities, called caps. Capabilities dictate what kind of data can flow through a pad. For instance, if you inspect a vorbisdec element, which is a decoder for the free Vorbis code, you see the code shown in Listing 1. A dollar sign ($) at the beginning of a line means it's a normal UNIX shell command.


Listing 1. A snippet from vorbisdec element information
$ gst-inspect-0.10 vorbisdec

[...]

Pad Templates:
  SRC template: 'src'
    Availability: Always
    Capabilities:
      audio/x-raw-float
                   rate: [ 8000, 50000 ]
               channels: [ 1, 6 ]
             endianness: 1234
                  width: 32

  SINK template: 'sink'
    Availability: Always
    Capabilities:
      audio/x-vorbis

[...]

You can see there are two pad templates: one for source (src) and one for sink. The source pad is always available (other possible availability values are sometimes and request) and can output raw float audio at rates between 8kHz and 50kHz, with one to six channels, in little-endian order, and with 32-bit-wide samples. The sink pad, on the other hand, simply accepts Vorbis-encoded audio stream.

These templates are crucial for the pipeline to function properly. Whenever you attempt to link two elements together to form a pipeline, GStreamer checks to see if their pad's templates are compatible. This process is called negotiation. During negotiation, elements try to come up with the best possible format that they both support. If there aren't any, linking fails. Otherwise, they agree on a common format. That format is no longer a template, but something called fixed caps -- meaning all values are concrete and unambiguous. The data can then pass from one to the other.

Now you know what you need to get started. For that, I'll introduce the Swiss Army knife of GStreamer, the gst-launch tool.


Using gst-launch

gst-launch is one of the most versatile tools you'll come across. It is for GStreamer what shell is for UNIX. Using it, you can construct even complex pipelines using a special syntax, appropriately called gst-launch syntax, as shown in Listing 2.


Listing 2. An example of a gst-launch line
$ gst-launch-0.10 filesrc location=
  "concept.mp3" ! decodebin ! alsasink

Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: audioclock0

Listing 2 is one of the simplest possible audio players. Here, I'm using it to listen to concept.mp3. Table 1 explains how to read it from left to right.


Table 1. Element descriptions for the syntax shown in Listing 2
ElementDescription
gst-launch-0.10This is the name of the command. The -0.10 indicates that a version specific to GStreamer 0.10 should be used, in case the older 0.8 release is also installed.
filesrc location="concept.mp3"This creates an element of class filesink and sets its location property to concept.mp3. Because the filesrc element can read a file specified by location, this creates a reader for the concept.mp3 file in the current directory.
!The exclamation mark means link to. Similar to shell's pipe symbol (|), it was chosen for its visual similarity and the fact that it doesn't have to be escaped when written in the shell, as long as there are spaces around it.
decodebin This is an autoplugger provided by GStreamer. An autoplugger is an element that, given a data type on its input and output, uses other available elements to find a sub-pipeline that provides the requested results. Remember that all links in GStreamer must be typed, thus the exclamation mark (!) implicitly carries the type information of the elements it links. Because filesrc has caps of type ANY, decodebin first attempts to typefind the stream. That is, it looks for characteristic marks that indicate the type. All this is hidden from the user.
alsasinkThis is the correct element to use for audio output on my Linux® system. It talks to the soundcard and feeds it with raw audio samples. It also times the whole pipeline, because the soundcard has a natural rate at which it can consume data.

When I press Enter, it prints several status messages until the pipeline reaches the PLAYING state. Then, the data starts flowing, and I hear the sound, as timed by my soundcard (audioclock0).

As you can see, GStreamer saves you a lot of work. You don't even need to know what type of media you're attempting to decode. Remember that, just as shell can't replace all your C programs, the gst-launch tool can't replace a full GStreamer application. For instance, gst-launch doesn't let you control the pipeline in any way after it's launched, so you can't skip parts of the stream. Nonetheless, it's still incredibly useful -- particularly for quick jobs, such as recoding an audio file to another format or simply experimenting with pipelines.


Going deeper

Get to know your tools

In addition to gst-launch, GStreamer is distributed with some other tools, such as gst-inspect and gst-typefind. Use them; they're your best friends.

Whenever in doubt about how to use an element, use gst-inspect. Given the name of any element or plug-in, it prints all the information GStreamer has about it, which is a lot.

Using gst-typefind, which is the GStreamer version of the good old file(1) UNIX utility, you can find out the file type (or to be precise, what GStreamer believes it to be).

This article provides just a teaser of what you can do with GStreamer. Obviously, creating an audio player using a simple shell command is cool. However, it's a rather poor player with no user interface or controls. To add those items and much more, you do need to use some code. Even so, GStreamer's API is simple and well thought out. And, if you don't fancy C, you can choose from several other bindings, including a vigilantly maintained set of Python language bindings.

Read the gst-launch man page. The full syntax is a bit bigger, and you can use it to create much more complex and interesting pipelines -- including the ones you create from your code. Yes, you can even have your own gst-launch (check out the gst_parse_launch () function documentation to see how).

Also, join the mailing list and drop by the IRC channel (#gstreamer@irc.freenode.net). GStreamer developers are a lively bunch, and there is always someone to help -- or be helped by -- you.


Resources

Learn

Get products and technologies

  • GStreamer homepage: Visit this site for the latest information and downloads.

  • Istanbul: Istanbul is a desktop session recorder that uses GStreamer.

  • IBM trial software: Build your next development project with IBM software, available for download directly from developerWorks.

Discuss

About the author

Maciej Katafiasz

Maciej Katafiasz is a graduate student in Computer Science and has been using open source technologies since high school. Annoyed by the lack of simple, working ways to watch movies in his GNOME desktop, he picked up interest in the (then still young) GStreamer, and stayed around to help a bit. You can contact him at ibmdw@mathrick.org.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=144794
ArticleTitle=Multipurpose multimedia processing with GStreamer
publish-date=07112006
author1-email=ibmdw@mathrick.org
author1-email-cc=