Digital media processing in the cloud using IBM Business Process Manager

The purpose of this article is to describe some of the challenges inherent in building a cloud-based digital media processing factory, and how those challenges can be addressed using IBM® Business Process Manager and related products. This content is part of the IBM WebSphere Developer Technical Journal.


Paul Ilechko (, Senior Solutions Architect, IBM

Paul Ilechko is a Senior Solutions Architect with IBM Software Services for WebSphere. Mr. Ilechko has over 25 years of experience in the IT Industry, including a background in both mainframe and distributed technologies. He has been involved with WebSphere and J2EE technology almost since their inception. His primary goal is to help IBM clients be successful with these products. Mr. Ilechko has a B.Sc. in Mathematics from the University of London.

developerWorks Contributing author

Werner Vanzyl (, Senior Managing Consultant, IBM

Werner Vanzyl is a Certified IT Specialist with IBM Software Services for WebSphere. He has over 15 years of experience working with a variety of software products, including IBM software and other development platforms. Currently, he consults on a daily basis with IBM customers in the areas of business process and workflow development, implementation, and deployment.

26 February 2014

Also available in Chinese


While working on a project for a major communications company, it quickly became clear that there were challenges inherent in the task of building a cloud-based digital media processing factory. The goal of the project was to create a fully automated digital media processing utility for video on demand that could support:

  • Automated media ingestion, processing, and delivery.
  • All the critical processing steps, including transcoding, digital rights management (DRM), quality control, asset registration and packaging.
  • Delivery of content to "edge" servers for highly available distributed access across all relevant consumer formats.

Try the Workflow service

Create long-running, stateful workflows that orchestrate tasks and services with synchronous or asynchronous event-driven interactions with the Workflow service from Bluemix. Try it for free!

This article describes some of these challenges and how they can be addressed using IBM Business Process Manager and related products.

About media processing

Digital media files are typically created by film or television studios, or similar creators of content. Video can be converted from film, or the video might be shot using a digital camcorder. For the latter case, as part of the filming process, the camcorder converts images from analog to digital, and stores them in an encoded format. The captured file is then transferred from the camera to external storage. During this file copy, the content could also be transcoded; that is, converted to a different format. This format change could be to a different codec, which is compatible with the editing tools being used, or it could be to a change in quality, such as to a more compressed format with a lower data rate or a different resolution.

Various editing steps can then take place, resulting in the final output file of the editing processing, typically known as a mezzanine file. A mezzanine file is a mid-resolution transcode of the original video asset that is of sufficient quality to produce all subsequent output files, but has been compressed enough so that it is small enough to move, copy, and archive. It is still a fairly large file, and is not in a form that is suitable for streaming over the Internet. That's where the factory comes into play. Typical examples of mezzanine files are MPEG2, MPEG4, and QuickTime H.264.

In order to produce a streamable version of the file, several additional processes need to be performed in the digital media processing factory. Depending on the source of the media, the target audience, and the delivery mechanism, these can include some or all of the following (Figure 1):

Figure 1. Media Processing Components
Media Processing Components

Aspera, an IBM Company

Aspera Sync is purpose-built by Aspera for high-performance, scalable, multidirectional, asynchronous file replication and synchronization. Designed to overcome the performance and scalability shortcomings of conventional synchronization tools, such as rsync, Aspera Sync can scale up and out for maximum speed replication and synchronization over WANs for today's largest big data file stores — from millions of individual files to the largest file sizes.

  • Ingest

    Ingestion of the content to the factory is performed using some type of file transfer protocol. There are media industry specific tools (such as Aspera and Signiant) that are designed for high speed transfer of video files, as well as standard protocols such as SFTP and rsync. The upload process will typically include a package of files, including the video asset itself, metadata which describes the asset, and possibly other related files, such as a .jpg image of the movie poster or box cover, and possibly a trailer or preview of the movie. The job that executes in order to perform all subsequent processing will take this package of assets as its input.

  • Asset management

    Cataloging of the content to an asset management system gives the system a way to track what assets are ingested and processed, and where the output "renditions" of the asset have been delivered. An asset management system is essentially a database with a set of supporting services.

  • Metadata processing

    Media is only as useful as the information that describes it. Metadata has three functions in media processing:

    • Classify the assets as something that can be acted upon; for example, a movie for transcoding, an image for resizing, or the metadata itself.
    • Describe the technical characteristics of the assets. These characteristics can range from something simple, such as file size or checksum, to more complex characteristics, such as encryption information.
    • Describe the commercial characteristics of the media. This can be information about the title (for example, a description of the movie, actors, and directors), but it can also include rights information, such as the license window during which the movie can be purchased and the price of the movie.

    The processing of metadata is important in ensuring the correct files are processed and in classifying the media, but also for identifying the media when it is published. The metadata will often be used as input to the media catalog on the storefront.

  • Transcoding

    A mezzanine file, which is the typical upload format to the factory, is too large and unwieldy for media delivery. This needs to be converted to the appropriate format (or formats) for consumption. There are two main classes of delivery formats: adaptive bit rate (ABR) and progressive download (PDL). PDL is the typical HTTP-based single file delivery, where the media starts playing while the file is still being downloaded. If you have watched a YouTube video, then you have seen this model, where the shaded bar in the player window shows you how far of ahead of your playback the file buffering is. The problem is if the downloading is not fast enough, the player can catch up with the buffered content and cause a halt in playback. ABR addresses this issue by splitting the video into small chunks. In addition, chunks can be created for multiple different bitrates (that is, different video qualities) so that the client can automatically switch to a higher or lower resolution depending on the available network bandwidth. This is particularly important given the prevalence of wireless devices, such as tablets and phones, which are used today for video display.

  • Encryption or Digital Rights Management

    Much of the video content that is being processed is intended for display to a restricted audience. This can be users who have a subscription to a service, or who have purchased the right to view that specific content. In order to support this requirement, there needs to be integration with a DRM scheme. Some of the more popular ones are SecureMedia from Arris, Microsoft® PlayReady®, and Google Widevine.

  • Advertising insertion

    One common model for digital content is the ad-supported "free" content model. In this scenario, the user does not need to purchase or license the content, but advertisements are streamed at certain pre-defined locations within the asset. Setting up the content to enable ad streaming in the right places requires extra steps in the media processing flow. The metadata for the media will need to send in additional information to define the ad breaks. There are several ways in which this can be passed. Examples are as ADI 3.0 signal points metadata, or as a standalone EDL file. This information is used by a stream conditioning step that examines the transcoded output and ensures that there is a chunk boundary at all of the points where an ad should be inserted. If there is not, the stream conditioner will split the existing chunk at that point, and update the manifest file that indexes the chunks. When the movie is then played by the player, the manifest will indicate to the player when the ad service should be called for a suitable commercial at the insertion point. From a viewer's perspective, the commercial is part of the video stream and seamless to the viewing experience.

  • Quality control

    As the factory is intended to be fully automated, quality control (QC) is essential. Both automated and manual QC steps should be built into the processing flow. Automated QC tools such as Interra's Baton can use a test plan to verify that media content meets certain qualities. If the automated tool finds a problem, it might be necessary to have the media file reviewed by a human quality expert to determine what the cause of the issue is.

  • Packaging and distribution

    Once the media processing is complete, the media file needs to be delivered to a location from which it can be viewed and, optionally, the metadata can be delivered to a storefront. Typically, this means placing the file in the origin server of a content delivery network. In addition, the metadata describing the content might need to be delivered to a storefront so that the content can be made available to users for purchase, subscription, or rental. Other files might also need to be delivered, such as images or trailers. The packaging process might need to rename the output files from the media processing workflow, or put them in certain folder structures required for the delivery mechanism. Some delivery channels (such as YouTube or iTunes) have very specific formats.

Technical challenges

As mentioned at the outset, there were a number of challenges involved in delivering the solution, based on both the functional and non-functional requirements of the project. From a functional perspective, these requirements involved the ability to create customized workflows for individual customers and to onboard those customers to the platform with a minimum of effort. The non-functional requirements were based around the creation of a platform with "carrier level" reliability and availability. What this means is that — similar to the way a dial tone is guaranteed to be there when you pick up a traditional telephone and your communications are never lost — the system should always be available, and that once a file has been uploaded it should never be necessary to ask the customer to send it again.

Consistency and availability

The requirement for a completely available system makes it necessary to partition processing across multiple physical sites. While it is possible to build a highly available system in a single location, disaster recovery remains an issue. As carrier level availability is a system commitment, the disaster site cannot be passive, as that would require time to bring it online; therefore an active-active datacenter configuration is necessary.

IBM Business Process Manager does not tolerate well a single cell running across geographically dispersed datacenters. This means that there will be multiple Business Process Manager cells active across the different locations, each with its own BPM database. The complete operational view of the system will thus be the sum of the activity in each datacenter. The challenge was how to provide a consistent view to all users (operational and customers) of that overall system state — or at least the part of it they were entitled to view.

The CAP theorem tells us that it is not possible for a distributed computer system to simultaneously provide all three of consistency, availability, and partition tolerance. In our case, partition tolerance has to exist as the system is being engineered to run in multiple locations, and it cannot be allowed to fail if one location is down. Availability is also a critical requirement, which precludes building a single view of operational state that is shared by all sites. Therefore, the only solution was to build multiple operational activity stores that are allowed to be temporarily inconsistent, but that tend towards a consistent view. This was achieved by building a separate instance of IBM Business Monitor at each location, and feeding the common event infrastructure (CEI) events from each site to both the local and remote copy of the Business Monitor. This would typically provide a consistent and complete view, except for such times as the cross cell queues were backed up, or communication between cells was unavailable. A single user viewing system status would get one site or the other at random.

The system also created human tasks in certain error cases. Human tasks are maintained within each cell's BPM database. This does not allow for the same kind of merging of data from multiple systems as does the Business Monitor solution. In this case, data from all cells needs to be retrieved and federated into a consistent user view on request.

Content ingestion and processing

In order to fulfill the no requirement to re-upload content requirement, it is essential that content is automatically copied to more than one location. Thus, if the datacenter that is processing a specific piece of media content should fail, the processing job could be re-started in the other location where the media is stored. The customer who is providing content should not be notified that the upload is complete until multiple copies have successfully been made. What this means in practice is that the duplication needs to be an inherent part of the file copy from the customer. This is relatively simple to achieve for users that are willing and able to use the Aspera transport, which has built in "splitter" capabilities for sending to multiple locations. It is more complex when the user wants to use a less "intelligent" transport such as secure FTP. In this case, a secondary Aspera copy needs to be made (Figure 2).

Figure 2. Consistency and availability
Consistency and availability

A possible alternative to Aspera-based copying that was not investigated in detail for this project would be to use a parallel file system technology such as IBM General Parallel File System (GPFS).

Once the file has been saved in more than one location, it needs to be processed. This implies that a cross-cell decision making process needs to exist to determine where to run the processing job. This was implemented as a custom solution for the client. The component developed was able to use a pluggable algorithm to determine where to execute the job. It also maintained a database in each location that could be used during a site disaster scenario to restart in-progress jobs at another location.

The workflow process

The complexity of the solution is based on the fact that while the steps that can be part of the process are well known, the specific steps needed for any particular customer may vary. Therefore, it is necessary that the process should be easily customizable as part of the onboarding activities for the customer onto the platform.

This was achieved by building common reusable components, or work units, as Business Process Execution Language (BPEL) processes. These processes encapsulate the complexity required to actually provision activity, but hide the complexities from the workflow. A simple advanced integration service is used to package these work unit interfaces as toolkit items that can be added to the palette in IBM Integration Designer for re-use in any flow.

Each work unit fulfills one generic system function, such as file ingest, transcode, DRM, and so on. Because the function can be fulfilled by any number of network elements, each with its own strengths, the generic work unit calls an adapter that is specific to the product being used. Each adapter exposes the same common interface to the work unit, but supports the specific API made available by the network element that it uses.

The adapters are developed as BPEL processes leveraging SCA (service component architecture) multi transport support and quality of service capabilities to implement a number of integration patterns. These interactions can range from request/response to more complex request/notify implementations. Fault handling becomes extremely important as the errors re-tuned by the actual products can range from generic exceptions to more detailed errors that actually mean something to a user.


Adding a new customer to the system involves creating and customizing workflows to meet the specific details of the customer requirements. This process involves several steps:

  1. Set up a communication channel with the customer for file transfer and provisioning access to the landing pad locations.
  2. Build a workflow process with the appropriate steps.
  3. Customize the workflow profile. This is an XML file that includes all the detailed parameters for the customer, such as file locations, transcoder presets, notification options, distribution locations, and so on.
  4. Build XSLT maps to transform customer metadata into the canonical format, based on ADI 3.0.
  5. Define any packaging rules needed for the output media or metadata content.

Leveraging Smarter Process products

Several products from IBM's Smarter Process portfolio were used to great advantage in the execution of this project. Some of these key software elements are listed below and shown in Figure 3.

Figure 3. Smarter Process components
Smarter Process components
  • IBM Business Process Manager for defining the workflow for each specific customer's needs. Activities in the process would call a work unit that implements a specific action on the media. The workflow will use an advanced integration service (AIS) to call a work unit. The AIS does service routing based on a service version passed into it from the workflow.
  • Advanced integration services in the form of BPEL processes and SCA to implement the work units and adapter logic. The transport and transaction quality of service capabilities of SCA and BPEL make this an obvious choice.
  • IBM Integration Bus implements service routing based on IBM WebSphere Service Registry and Repository that is used for service governance between the sdapter and the applications that perform the work on the media.
  • IBM Operational Decision Manager implements business rules, configuration, and data transformation with run time management capabilities.


IBM Business Process Manager provides many of the necessary features for building an effective cloud-based media processing factory. However, you should be prepared to address some complex technical challenges along the way. Certain capabilities that you would likely need might not be available out of the box and instead will need custom development. The information presented here as the result of a similar project will hopefully set you in the right direction.



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Business process management on developerWorks

Zone=Business process management, Cloud computing
ArticleTitle=Digital media processing in the cloud using IBM Business Process Manager