Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Thinking XML: Manage metadata with MusicBrainz

Digital media metadatabase uses RDF

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  Since its emergence in the mid-1980s, digital music has seen plenty of controversy, and even the management of digital music metadata has been subject to its own share of drama. But sometimes out of political dust-ups, good technical solutions emerge. In this article, Uche Ogbuji introduces MusicBrainz, a project for managing digital media metadata. MusicBrainz uses RDF in its core data formats and, in so doing, offers some important technical advantages over its predecessors.

Date:  01 Dec 2002
Level:  Introductory

Comments:  

Digital music has continued to be one of the big stories of the information age, important because of the convenience it affords music lovers, and the business opportunities it has opened up for high-tech companies. You can put dozens or hundreds of albums in digital storage, and catalog this music in any way you like. Since so much music these days is sold in the form of CDs, countless tools exist for gathering information on the artists and tracks to be maintained, or tagged, in the resulting digital formats (mp3, Ogg Vorbis, and so forth) This information is the common metadata of digital music.

In the early '90s, the Internet Compact Disc Database (CDDB) was born as a distributed database that matched CD characteristics to metadata. It grew rapidly through the efforts of many casual users, who contributed information on their CDs, assuming the system and software for CDDB was open and free. In a very controversial move, a commercial interest now known as Gracenote imposed licensing restrictions on CDDB, prompting the development of several truly open alternatives. freedb.org and MusicBrainz are the most prominent of these initiatives. The former continues to use the CDDB format for its database, whereas MusicBrainz made a fresh start and completely revamped their digital music metadata format and system. They chose RDF to play an important role in this effort.

MusicBrainz aims to be a metadatabase of digital audio and video that covers more than just CD track information. It's billed as an "open music encyclopedia." The openness is ensured by an explicit OpenContent license that's assigned to all MusicBrainz information. It is decentralized and ties together information at multiple Web locations. The server software is all readily available as open source. Currently there is information on about a million tracks. The basis of this data in RDF gives the service some unique advantages. First of all, each track, and all the other important concepts, have unique identifiers available in the form of URIs. With the URIs, a universal playlist can exist. This playlist can be published in compact forms and uniquely identifies a particular sequence of songs. CDDB does not have such global identifiers. MusicBrainz also defines RDF vocabularies for querying the encyclopedia.

Name that tune

The RDF subsystem of MusicBrainz is defined in the MusicBrainz Metadata Initiative 2.0 specification, which defines RDF for encyclopedia entries and for queries. MusicBrainz defines several base URIs (which they call namespaces) for the different (though related) RDF vocabularies it provides.

  • http://musicbrainz.org/mm/mm-2.0#: MusicBrainz Metadata namespace, usually associated with the prefix mm.
  • http://musicbrainz.org/mm/cdmp-1.0#: Compact Disc Lookup namespace, usually associated with the prefix cdmp.
  • http://musicbrainz.org/mm/mq-1.0#: MusicBrainz Query namespace, usually associated with the prefix mq.
  • http://musicbrainz.org/mm/mem-1.0#: MusicBrainz Extended Metadata namespace, usually associated with the prefix mem.

Let's concentrate on the mm and cdmp namespaces, as these are the most complete. mem is set up for extensions and refinements that are not yet in use. mq will probably become the focus of immediate activity in the project, but is not yet fully in place.

The MusicBrainz Metadata namespace covers core music metadata, using the following classes:

  • Artist: includes properties for the common name, and the name to be used for sorting (for example, "The Roots" could be sorted as "Roots, The"), as well as an RDF bag of the artists' albums.
  • Album: includes the dc:title property for the album title, as well as relationships to the artist and to an RDF sequence with the track listing.
  • Track: includes properties for track title, the creator, and the track number in the album.

MusicBrainz uses Dublin Core metadata elements wherever they make sense. As I discussed in the previous article, this allows MusicBrainz metadata to be somewhat accessible even to generic RDF agents.

Tracks are also given a property to connect them to their TRM Acoustic Fingerprint. TRM is a technology developed by Relatable, LLC as a unique bar code for digital media. Each TRM ID is a universally unique identifier (UUID). For example, the TRM for "Mellow My Man" by The Roots is f13069e3-da60-4782-82dd-a9f375e5c374. This information may optionally be used in digital rights management (DRM), though MusicBrainz is neutral on DRM issues.

Listing 1 is an example of a MusicBrainz Metadata record.


Listing 1. Snapshot from a music metadata example
	<rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc  = "http://purl.org/dc/elements/1.1/"
         xmlns:mm  = "http://musicbrainz.org/mm/mm-2.0#">

  <mm:Artist rdf:about=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11">
    <dc:title>Portishead</dc:title>
    <mm:sortName>Portishead</mm:sortName>
    <mm:albumList>
      <rdf:Bag>
        <rdf:li rdf:resource=
"http://musicbrainz.org/album/911e3f30-192e-4c3d-aa25-2a89d4202a3e"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/album/3677c7a6-03a6-4709-a7aa-edaea95ce473"/>
      </rdf:Bag>
    </mm:albumList>
  </mm:Artist>

  <mm:Album rdf:about=
"http://musicbrainz.org/album/911e3f30-192e-4c3d-aa25-2a89d4202a3e">
    <dc:title>Dummy</dc:title>
    <dc:creator rdf:resource=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11"/>
    <mm:trackList>
      <rdf:Seq>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/8facb8ab-0b31-4d06-907f-0a9c9a72383c"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/44d90dca-5290-4cb3-af38-518818835f23"/>
<!--
Rest of the tracks snipped for brevity...
-->
      </rdf:Seq>
    </mm:trackList>
  </mm:Album>

  <mm:Album rdf:about=
"http://musicbrainz.org/album/3677c7a6-03a6-4709-a7aa-edaea95ce473">
    <dc:title>Roseland NYC Live</dc:title>
    <dc:creator rdf:resource=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11"/>
    <mm:trackList>
      <rdf:Seq>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/1cf34447-7731-40a4-a2ba-347866a13c44"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/f71a27a7-4845-463c-9c67-ffb96a6b5a8f"/>
<!--
Rest of the tracks snipped for brevity...
-->
      </rdf:Seq>
    </mm:trackList>
  </mm:Album>

</rdf:RDF>

The album list is a bag because order is not pertinent. The track list is a sequence in order to preserve the track order. This is a bit redundant because each track already has a property with its track number.


Querying for CD information

MusicBrainz also defines a query service for CD metadata: the Compact Disc Metadata Proposal (CDMP). The protocol is very simple. You can HTTP POST an RDF query document to a MusicBrainz server, and get a response back in MusicBrainz metadata form similar to that in Listing 1, but with CDMP wrapper elements. You can also use an HTTP GET with some special query parameters. The most common scenario for CDMP is the case where a user puts a CD into a computer, and the CD player application fires up. It then reads the CD to determine the offsets of each track, which in many cases can be used to uniquely identify the CD. It sends these offsets to the MusicBrainz server in order to get the CD and track information of the CD that matches the offset data. Listing 2 is an example of such a query.


Listing 2. Sample query for CD and track information
<rdf:RDF xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:dc   = "http://purl.org/dc/elements/1.1/"
     xmlns:cdmp = "http://musicbrainz.org/mm/cdmp-1.0#"
     xmlns:mm   = "http://musicbrainz.org/mm/mm-2.0#">

 <cdmp:LookupCD>
  <cdmp:offsets>150-17895-34567-51432-68025-87365-106380-123452-140620-157792-175650
  </cdmp:offsets>
 </cdmp:LookupCD>

</rdf:RDF>
        

In effect, this is a query consisting of an RDF object with the properties as query parameters. This is a common approach to representing queries in RDF, although it gets unwieldy as queries get more complex. Luckily, most MusicBrainz queries are pretty simple. Listing 3 is a sample response to the query in Listing 2.


Listing 3. Sample response from CDMP lookup
<rdf:RDF xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc   = "http://purl.org/dc/elements/1.1/"
         xmlns:cdmp = "http://musicbrainz.org/mm/cdmp-1.0#"
         xmlns:mm   = "http://musicbrainz.org/mm/mm-2.0#">
<cdmp:ResultCD>
  <cdmp:cd>
    <cdmp:CDMetadata>
      <dc:title>Rubycon</dc:title>
      <cdmp:cdmpId>ivDFb2Tw6HzN.XdYZFj5zr1Q9EY-</cdmp:cdmpId>
      <mm:Artist>
         <rdf:Description>
            <dc:title>Tangerine Dream</dc:title>
         </rdf:Description>
      </mm:Artist>
      <mm:trackList>
        <rdf:Seq>
          <rdf:li>
            <mm:Track>
               <dc:title>Rubycon (Part I)</dc:title>
               <mm:trackNum>1</mm:trackNum>
            </mm:Track>
          </rdf:li>
          <rdf:li>
            <mm:Track>
               <dc:title>Rubycon (Part II)</dc:title>
               <mm:trackNum>2</mm:trackNum>
            </mm:Track>
         </rdf:li>
        </rdf:Seq>
      </mm:trackList>
    </cdmp:CDMetadata>
  </cdmp:cd>
</cdmp:ResultCD>
</rdf:RDF>
        

The response is pretty much the MusicBrainz Metadata namespace format in CDMP wrapper classes. One advantage over CDDB is that multiple CD results can be returned from such a query, to deal with possible collisions between track offset details for different CDs.

You can also make CDMP queries to search for CDs by exact or partial matches of title, artist, or other data. And CDMP users can submit new CD information. Usually, if your CD player does a lookup and cannot find the matching CD information, the software does allow you to manually enter the track data. You can then submit this data as a contribution to MusicBrainz. MusicBrainz has a moderation system in place to minimize abuse and unintentional errors in submissions. This is important, as demonstrated by recent cases where CDDB data was tainted by gag entries using foul language. Most bad data is much less egregious, and is more often a result of typos, transposed tracks, and the like. MusicBrainz allows users to edit entries after initial submission, subject to moderation.

CDMP was originally designed for cooperation with other open CD lookup systems, but such collaboration has been less healthy than hoped, so CDMP might be replaced with mp namespace queries, which are more focused on the general MusicBrainz encyclopedia concept.


Conclusion

MusicBrainz is important on several levels. For one thing, it demonstrates the power of communities dedicated to open technologies. They can often route efficiently around the damage caused by unscrupulous commercial interests. MusicBrainz was born when CDDB moved to a restrictive license, and the developers took the opportunity to redesign the CD information system to be more flexible, to have more features, and to support broader types of information. Users have contributed a huge amount of data to support the effort, and the database is a great public asset.

The use of RDF in MusicBrainz means that it can be readily integrated into other metadata initiatives. There are a few awkward things about the RDF forms. For one thing, they inherit all the awkwardness that comes with RDF 1.0 containers, and this clumsiness is combined with the fact that container relationships are sometimes used in addition to other relationships which need to be properly synchronized. As an example, the ordering in the sequence specified by mm:trackList is redundant against the equivalent mm:trackNum properties. Despite these small technical flaws, the data ends up being very clean and readily usable by a lot of tools. For example, MusicBrainz borrows decent internationalization from basic XML capabilities, as opposed to the murky status of internationalization in the original CDDB. And even those who are not familiar with RDF can take advantage of this because of the open-source client libraries available for MusicBrainz. If you develop any applications for handling digital media, consider using MusicBrainz formats and protocols for metadata.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12190
ArticleTitle=Thinking XML: Manage metadata with MusicBrainz
publish-date=12012002
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).