Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Cultured Perl: Fun with MP3 and Perl, Part 1

Manipulating and guessing MP3 tags with Perl

Teodor Zlatanov (tzz@bu.edu), Programmer, Gold Software Systems
Author photo
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, Unix system administration, CORBA, and project management. Contact Teodor at tzz@bu.edu.

Summary:  Every self-respecting computer and music fan needs to be able to manipulate MP3s -- the defacto standard for recreational digital music use. In this article, Ted looks at ways to manage and manipulate MP3s (searching, tagging, renaming, commenting, etc.) using the autotag.pl application. Ted takes you through the application, illustrating how CPAN modules enable the application.

Date:  11 Dec 2003
Level:  Intermediate

Comments:  

Manipulating MP3 files is a necessity for computer-savvy music lovers today. Although other formats exist and are flourishing, this article will concentrate on the MP3 format because it is by all appearances the most popular one today. However, the general approaches shown will work with other music file formats that allow tags as well. In fact, many file formats that use tags could benefit from an application like mine, autotag.pl. I welcome your suggestions.

This discussion in this article will be on Perl issues in general, manipulating MP3 files in particular, and the autotag.pl application specifically.

I used the MP3::Tag and WebService::FreeDB CPAN modules only, even though the MP3::Info, MP3::ID3Lib, MusicBrainz::Client, and AudioFile::Identify::MusicBrainz modules also exist and can be useful. The primary reason why MP3::ID3Lib was not used was because it requires the id3lib software (see Resources). While MP3::Info is pure Perl and simple to install, I found MP3::Tag more powerful. MusicBrainz::Client and AudioFile::Identify::MusicBrainz were not used because MusicBrainz appears to be a less comprehensive database of released CDs than FreeDB. In the end, the choice of ID3 tagging module and track information module is up to you. My experience, painfully gained through trial and error, is that MP3::Tag and WebService::FreeDB will serve you best.

I made the choice not to use the CDDB (Gracenote) disc database, even though it is very comprehensive. Gracenote is a company that keeps proprietary databases of CD track lists (only searching -- no wholesale downloading -- of those databases is allowed). Quite a bit of those databases' contents were contributed by volunteers in the early days when Gracenote was just CDDB. FreeDB is a volunteer effort organized to provide a free, unrestricted database of CD tracklists. The entire contents of the FreeDB databases are available for download without any copyright restrictions -- so you could set up your own FreeDB server if you wanted.

The modules that I did not use were not necessarily inferior, so if you like you can use them. I simply liked MP3::Tag and WebService::FreeDB better based on personal experience with them and for the reasons above. The actual reading and writing of tags is abstracted in functions, so you won't have to change a lot if you use a different module for MP3 tag reading and writing.

I should also mention that the Term::ReadLine::Gnu CPAN module works better for me than the default module, Term::ReadLine::Perl, in Linux inside xterm and Eterm terminal emulators. You may want to install it on top of Term::ReadLine if you notice strange behavior at the prompts that expect text.

A word about MP3 tags

First, there was music. Then came computers. Computers were slow, and they beeped. Even with such sad tools as the PC speaker (oh, how jealous I was of Apple and Amiga users), programs were written to produce music for games and entertainment. Then came better and better sound cards, and office walls around the world now shake with surround-sound and THX-certified speakers.

In parallel with these hardware developments came a multitude of sound formats. There was .mid for MIDI melodies, .voc, .mod, .wav, and so on. The proprietary MP3 format, which involves many patents owned by the German Fraunhofer institute, became popular over time -- it offered decent compression and performance. There are formats other than MP3, notably Ogg Vorbis, but today MP3 still appears to be the top choice for music storage.

One nice thing about MP3 files was that they could be tagged with ID3 tags. Inside the file was information about it -- what's commonly known as metadata. The album, artist, track name, comments, and (with ID3 version 1.1) even the track number could be stored in the ID3 tag as long as they were under a certain limit of characters.

The successor to ID3 version 1.1 was ID3 version 2 (ID3v2 for short), which is much better in almost every aspect except simplicity. ID3v2 can handle multiple languages, store arbitrarily long data in each tag element, and even store pictures as part of the tag. Unfortunately, dealing with ID3v2 involves learning that TALB is the album name, and TIT2 is the track number. It makes one long for the Ogg Vorbis format, where the artist tag element is called...wait for it...ARTIST! (To be fair, this is just a convention -- Ogg Vorbis comments are as free-form as you want to make them.) Unfortunately, the billions of MP3 files in existence can't be converted without loss of quality to Ogg Vorbis or any other format, so at the very least the next five years will find us dealing with MP3 files in addition to whatever the next "hot" format is.

I have tried very hard to abstract tags as content from the actual ID3 tags. It will be easy, when the time comes, to modify autotag.pl so it will handle other tagging formats besides ID3.

Fundamental autotag.pl functions

There were a few things in autotag.pl that I put in separate functions. First of all, contains_word_char() is a function that makes the decision whether some text contains a word (\w in Perl) character. It works correctly with undefined values as well, whereas with warnings turned on, a regular expression match on an undefined value will print a warning. It's primarily useful because it doesn't show a warning; in order to achieve that effect without a function you'd have to check whether the string is defined every time.


Listing 1. The contains_word_char() function
# {{{ contains_word_char: return 1 if the text contains a word character
sub contains_word_char
{
 my $text = shift @_;
 return $text && length $text && $text =~ m/\w/;
}
# }}}

Next come the input routines. These are pretty verbose, and they attempt to handle most cases of user interaction the program will need.


Listing 2. The get_tag() function
# {{{ get_tag: get a ID3 V2 tag, using V1 if necessary
sub get_tag
{
 my $file    = shift @_;
 my $upgrade = shift @_;
 my $mp3 = MP3::Tag->new($file);

 return undef unless defined $mp3;

 $mp3->get_tags();

 my $tag = {};

 if (exists $mp3->{ID3v2})
 {
  my $id3v2 = $mp3->{ID3v2};
  my $frames = $id3v2->supported_frames();
  while (my ($fname, $longname) = each %$frames)
  {
   # only grab the frames we know
   next unless exists $supported_frames{$fname};

   $tag->{$fname} = $id3v2->get_frame($fname);
   delete $tag->{$fname} unless defined $tag->{$fname};
   $tag->{$fname} = $tag->{$fname}->{Text} if $fname eq 'COMM';
   $tag->{$fname} = $tag->{$fname}->{URL} if $fname eq 'WXXX';
   $tag->{$fname} = '' unless defined $tag->{$fname};
  }
 }
 elsif (exists $mp3->{ID3v1})
 {
  warn "No ID3 v2 TAG info in $file, using the v1 tag";
  my $id3v1 = $mp3->{ID3v1};
  $tag->{COMM} = $id3v1->comment();
  $tag->{TIT2} = $id3v1->song();
  $tag->{TPE1} = $id3v1->artist();
  $tag->{TALB} = $id3v1->album();
  $tag->{TYER} = $id3v1->year();
  $tag->{TRCK} = $id3v1->track();
  $tag->{TIT1} = $id3v1->genre();

  if ($upgrade && read_yes_no("Upgrade ID3v1 tag to ID3v2 for $file?", 1))
  {
   set_tag($file, $tag);
  }
 }
 else
 {
  warn "No ID3 TAG info in $file, creating it";
  $tag = {
      TIT2 => '',
      TPE1 => '',
      TALB => '',
      TYER => 9999,
      COMM => '',
      };
 }
 print "Got tag ", Dumper $tag
  if $config->DEBUG();
 return $tag;
}
# }}}

The only slightly unusual function is read_yes_no(), which can be given a Y or 1 default parameter to make the default true, and any other parameter to make the default false. Thus, I can make the read_yes_no() function accept different default values when the user presses Enter or Space. In addition, the Backspace or Delete keys will reverse the default. It's not flashy code, but it's very useful.


autotag.pl preliminaries

The autotag.pl application begins with some initialization routines.


Listing 3. Initialization
use constant SEARCH_ALL   => 'all';

my %freedb_searches = (
   artist  => { keywords => [], abbrev => 'I', tagequiv => 'TPE1' },
   title   => { keywords => [], abbrev => 'T', tagequiv => 'TALB' },
   track   => { keywords => [], abbrev => 'K', tagequiv => 'TIT2' },
   rest    => { keywords => [], abbrev => 'R', tagequiv => 'COMM' },
      );

# maps ID3 v2 tag info to WebService::FreeDB info
my %info2freedb = (
   TALB  => 'cdname',
   TPE1  => 'artist',
      );

my %supported_frames = (
   TIT1 => 1,
   TIT2 => 1,
   TRCK => 1,
   TALB => 1,
   TPE1 => 1,
   COMM => 1,
   WXXX => 1,
   TYER => 1,
      );

my @supported_frames = keys %supported_frames;

my $term = new Term::ReadLine 'Input> '; # global input

The SEARCH_ALL constant is what I use when the user wants to search for a word everywhere -- track names, artist names, etc. I made it a constant in case anyone wants to change it to something else, but it could have been hard-coded as "all" as well.

The %freedb_searches hash maps FreeDB fields to information about them, including ID3v2 tag elements. For instance, it says that what FreeDB calls "artist" is known as "TPE1" in an MP3 tag. The "abbrev" field in the hash entry is used to define command-line switches, so later I can define an -artist switch that can be abbreviated to -i based on the %freedb_searches information.

The %info2freedb hash maps FreeDB fields common across all tracks in a disc to ID3v2 fields. These are not the fields in %freedb_searches, this is a different mapping that says that "cdname" and "artists," also known as "TALB" and "TPE1," respectively, are the same for all tracks in an album.

The %supported_frames hash and the @supported_frames list will be used to figure out what ID3v2 tag elements I support. I could have generated the hash from the list instead of getting the list from the hash, but I feel the difference is irrelevant. The supported frames are used for mass tagging and when writing ID3v2 tags (I only modify the supported frames).

Finally, I create a Term::ReadLine object for user input throughout the application.

Next, I initialize the AppConfig options. Bear with me, this is useful.


Listing 4. AppConfig initialization
# {{{ set up AppConfig and process -help

my $config = AppConfig->new();

$config->define(
   DEBUG       =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => 0, ALIAS => 'D' },

   CONFIG_FILE       =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => 0, ALIAS => 'F' },

   HELP        =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'H' },

   DUMP        =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0 },

   ACCEPT_ALL  =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'C' },

   DRYRUN      =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'N' },

   GUESS_TRACK_NUMBERS_ONLY  =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'G' },

    STRIP_COMMENT_ONLY =>
    { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'SC' },

   MASS_TAG_ONLY =>
   { ARGCOUNT => ARGCOUNT_HASH, ALIAS => 'M' },

   RENAME_ONLY =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'RO' },

   RENAME_MAX_CHARS =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => 30},

   RENAME_FORMAT =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => '%a-%t-%n-%c-%s.mp3'},

   RENAME_BADCHARS =>
   { ARGCOUNT => ARGCOUNT_LIST, ALIAS => 'RB' },

   RENAME_REPLACECHARS =>
   { ARGCOUNT => ARGCOUNT_LIST, ALIAS => 'RR' },

   RENAME_REPLACEMENT =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => '_' },

   FREEDB_HOST =>
   { ARGCOUNT => ARGCOUNT_ONE, DEFAULT => 'http://www.freedb.org', },

   OR =>
   { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => '0', },

   SEARCH_ALL()  =>
   { ARGCOUNT => ARGCOUNT_LIST, ALIAS => 'A' },
      );

foreach my $search (keys %freedb_searches)
{
 $config->define($search => {
      ARGCOUNT => ARGCOUNT_LIST,
      ALIAS => $freedb_searches{$search}->{abbrev},
      });
}
$config->args();

$config->file($config->CONFIG_FILE())
 if $config->CONFIG_FILE();

unless (scalar @{$config->RENAME_BADCHARS()})
{
 push @{$config->RENAME_BADCHARS()}, split(//, "\"`!'?&[]()/;\n\t");
}

unless (scalar @{$config->RENAME_REPLACECHARS()})
{
 push @{$config->RENAME_REPLACECHARS()}, split(//, " ");
}

if ($config->HELP())
{
 print <<EOHIPPUS;
$0 [options] File1.mp3 File2.mp3 ...

Options:
 -help (-h)          : print this help
 -config_file (-f) N : use this config file, see AppConfig module docs for format
 -debug (-d) N       : print debugging information (level N, 0 is lowest)
 -dump               : just dump the list of albums and tracks within them
 -dryrun (-n)        : do everything but modify the MP3 files
 -freedb_host H      : set the FreeDB host, default "www.freedb.org"
 -or                 : search for keyword A or keyword B, not A and B as usual

 -accept_all (c)     : accept all search results for consideration for each file,
                       also accept all renames without asking

 -rename_badchars (-rb) A -rb B     : characters A and B to remove when renaming

 -rename_replacechars (-rr) A -rr B : characters A and B to replace
                                      when renaming

 -rename_maxchars N : use at most this many characters from a tag
                      element when renaming, default: ${\$config->RENAME_MAX_CHARS()}

 -rename_replacement X : character to use when replacing,
                      default: [${\$config->RENAME_REPLACEMENT()}]

 -rename_format (-f) F : format for renaming; default "${\$config->RENAME_FORMAT()}"
                         %a -> Artist
                         %t -> Track number
                         %n -> Album name
                         %c -> Comment
                         %s -> Song title

 -guess_track_numbers_only (-g) : guess track numbers using the file
                     name, then exit

 -rename_only (-ro)  : rename tracks using the given format (see
                       -rename_format), then exit

 -mass_tag_only (-m) A=X -m B=Y : mass-tag files (tag element A is X,
                                  B is Y), then exit (tag elements
                                  available: @supported_frames)

 -strip_comment_only (-sc) : strip comments and URLs, then exit

Repeatable options (you can specify them more than once, K is the keyword):

 -all (-a)    K : search everywhere
 -artist (-i) K : search for these artists
 -title (-t)  K : search for these titles
 -track (-k)  K : search for these tracks
 -rest (-r)   K : search for these keywords everywhere else

Note that the repeatable options are cumulative, so artist A and title
B will produce matches for A and B, not A or B. In the same way,
artist A and artist B will produce matches for A and B, not A or B.
If you want to match A or B terms, use -or, for instance:

$0 -or -artist "pink floyd" -artist "fred flintstone"

EOHIPPUS

 exit;
}

# }}}

Yes, all that code just initialized the command-line options. With AppConfig, those options can be used and modified throughout the program; there are many benefits to using AppConfig that are outside the scope of this article (see Resources for more information on AppConfig).

Also, I use the entries in the %freedb_searches hash to create the appropriate configuration options, which makes life easier for the user and for the programmer. I can also use the entries in the %freedb_searches hash to create the appropriate configuration options.

After loading a configuration file, if the user specified it, I populate the character replacement and bad character arrays with a sensible default.

Finally, I handle the -help switch. Note how the default values for various options are printed inside the help text, using variable interpolation. This makes for a very readable help message. I always update my help message right after I added a feature, and sometimes even before. I believe that help should be synchronized with the functionality of the program, otherwise the program is confusing and the help is misleading. The autotag.pl program in particular needs more documentation -- a POD-style documentation would be nice, and that may be in place by the time you read this article. POD documentation is a part of the script, so downloading autotag.pl (see Resources) will include the POD documentation if I have written it already.


ID3v2 tag-related functions

The get_tag() function is essential to autotag.pl. Given an MP3 file name, it builds a hash tag from the file. If the tag is only ID3v1, get_tag() will offer to upgrade the ID3 tag for free (what a deal!). If there is no ID3 tag, get_tag() will create one. Furthermore, get_tag() knows to look at the Text and URL sub-elements of the COMM and WXXX tag elements, respectively.


Listing 5. The get_tag() function
# {{{ get_tag: get a ID3 V2 tag, using V1 if necessary
sub get_tag
{
 my $file    = shift @_;
 my $upgrade = shift @_;
 my $mp3 = MP3::Tag->new($file);

 return undef unless defined $mp3;

 $mp3->get_tags();

 my $tag = {};

 if (exists $mp3->{ID3v2})
 {
  my $id3v2 = $mp3->{ID3v2};
  my $frames = $id3v2->supported_frames();
  while (my ($fname, $longname) = each %$frames)
  {
   # only grab the frames we know
   next unless exists $supported_frames{$fname};

   $tag->{$fname} = $id3v2->get_frame($fname);
   delete $tag->{$fname} unless defined $tag->{$fname};
   $tag->{$fname} = $tag->{$fname}->{Text} if $fname eq 'COMM';
   $tag->{$fname} = $tag->{$fname}->{URL} if $fname eq 'WXXX';
   $tag->{$fname} = '' unless defined $tag->{$fname};
  }
 }
 elsif (exists $mp3->{ID3v1})
 {
  warn "No ID3 v2 TAG info in $file, using the v1 tag";
  my $id3v1 = $mp3->{ID3v1};
  $tag->{COMM} = $id3v1->comment();
  $tag->{TIT2} = $id3v1->song();
  $tag->{TPE1} = $id3v1->artist();
  $tag->{TALB} = $id3v1->album();
  $tag->{TYER} = $id3v1->year();
  $tag->{TRCK} = $id3v1->track();
  $tag->{TIT1} = $id3v1->genre();

  if ($upgrade && read_yes_no("Upgrade ID3v1 tag to ID3v2 for $file?", 1))
  {
   set_tag($file, $tag);
  }
 }
 else
 {
  warn "No ID3 TAG info in $file, creating it";
  $tag = {
      TIT2 => '',
      TPE1 => '',
      TALB => '',
      TYER => 9999,
      COMM => '',
      };
 }
 print "Got tag ", Dumper $tag
  if $config->DEBUG();
 return $tag;
}
# }}}

The set_tag() function is the sibling of get_tag(). It writes a ID3v2 tag, observing the COMM and WXXX frames' sub-elements. It takes a hash reference such as get_tag() might produce.


Listing 6. The set_tag() function
# {{{ set_tag: set a ID3 V2 tag on a file
sub set_tag
{
 my $file = shift @_;
 my $tag  = shift @_;
 my $mp3 = MP3::Tag->new($file);
 print Dumper $tag;
 my $tags = $mp3->get_tags();
 my $id3v2;

 if (ref $tags eq 'HASH' && exists $tags->{ID3v2})
 {
  $id3v2 = $tags->{ID3v2};
 }
 else
 {
  $id3v2 = $mp3->new_tag("ID3v2");
 }

 my %old_frames = %{$id3v2->get_frame_ids()};

 foreach my $fname (keys %$tag)
 {
  $id3v2->remove_frame($fname)
   if exists $old_frames{$fname};

  if ($fname eq 'WXXX')
  {
   $id3v2->add_frame('WXXX', 'ENG', 'FreeDB URL', $tag->{WXXX}) ;
  }
  elsif ($fname eq 'COMM')
  {
   $id3v2->add_frame('COMM', 'ENG', 'Comment', $tag->{COMM}) ;
  }
  else
  {
   $id3v2->add_frame($fname, $tag->{$fname});
  }
 }

 $id3v2->write_tag();
 return 0;
}
# }}}

The print_tag_info() function simply prints out a summary of the tag. Unlike Data::Dumper, which I've used elsewhere in autotag.pl (sometimes needlessly, I must say), print_tag_info() provides a nice, user-oriented printout of the hash tag elements. Note that this function takes a hash reference, not an actual file name.

The guess_track_number() and guess_artist_and_track() functions do the best they can, given a file name and possibly some ID3 tag information. Note that guess_track_number() understands that track numbers are very rarely higher than 30.


Listing 7. The print_tag_info(), guess_track_number(), and guess_artist_and_track() functions
# {{{ print_tag_info: print the tag info

sub print_tag_info
{
 my $filename = shift @_;
 my $tag      = shift @_;
 my $extra    = shift @_ || 'Track info';

 # argument checking
 return unless ref $tag eq 'HASH';

 print "$extra for '$filename':\n";

 foreach (keys %$tag)
 {
  printf "%10s : %s\n", $_, $tag->{$_};
 }
}

# }}}

# {{{ guess_track_number: guess track number from ID3 tag and file name
sub guess_track_number
{
 my $filename = shift @_;
 my $tag      = shift @_ || return undef;

 $filename = basename($filename);   # directories can contain confusing data

 # first try to guess the track number from the old tag
 if (exists $tag->{TRCK} && contains_word_char($tag->{TRCK}))
 {
  my $n = $tag->{TRCK} + 0;    # fix tracks like 1/10
  return $n;
 }
 elsif ($filename =~ m/([012]?\d).*\.[^.]+$/)
                     # now look for numbers in the filename (0 through 29)
 {
  print "Guessed track number $1 from filename '$filename'\n"
   if $config->DEBUG();
  return $1;
 }

 return undef; # if all else fails, return undef
}
# }}}

# {{{ guess_artist_and_track: guess artist and track from file name
sub guess_artist_and_track
{
 my $filename = shift @_;
 my $artist;
 my $track;

 $filename = basename($filename);   # directories can contain confusing data

 if ($filename =~ m/([^-_]{3,})\s*-\s*(.{3,})\s*\.[^.]+$/)
 {
  print "Guessed artist $1 from filename '$filename'\n"
   if $config->DEBUG();
  $artist = $1;
  $track = $2;
 }

 return ($artist, $track);
}
# }}}

I use the data returned from the FreeDB search to make an anonymous hash with the appropriate elements. The mapping between WebService::FreeDB fields and ID3v2 tag elements is tentative, but it has worked very well for me.


Listing 8. The make_tag_from_freedb() function
# {{{ make_tag_from_freedb: make the ID3 tag info from a FreeDB entry
sub make_tag_from_freedb
{
 my $disc  = shift @_;
 my $track = shift @_;

 # argument checking
 return undef unless $track =~ m/^\d+$/;

 # note that the user inputs track "1" but WebService::FreeDB gives us that
 # track at position 0, so we decrement $track
 $track--;

 return undef unless exists $disc->{trackinfo};

 return undef unless exists $disc->{trackinfo}->[$track];

 my $track_data = $disc->{trackinfo}->[$track];

 return {
      TIT1 => $disc->{genre},
      TIT2 => $track_data->[0],
      TRCK => $track+1,
      TPE1 => $disc->{artist},
      TALB => $disc->{cdname},
      TYER => $disc->{year},
      WXXX => $disc->{url},
      COMM => $disc->{rest}||'',
   };

}
# }}}


Mass tagging, mass renaming, stripping comments, and guessing track numbers

The main functionality of autotag.pl is to identify MP3 files. In the course of that process, however, minor adjustments often need to be made to large groups of files. Enter the Four Autotagging Horsemen.

Stripping comments is a very simple process. I get a hash tag with get_tag(), empty the COMM and WXXX fields, and write it back with set_tag(). In fact, comment stripping could have been done through mass tagging, but it's used so often that I felt I needed a separate option for it.

Guessing track numbers is also quite simple. Get the hash tag, use guess_track_number() on the file and the hash tag, ask for confirmation, and write the tag back to the file.

Mass tagging operates on multiple keys (e.g. TALB) on a series of files. You say, for instance,

autotag.pl -mt "TALB=Best" *.mp3

and all the files that have the mp3 extension will be assigned that TALB value in their ID3v2 tag. Mass-tagging is very nice when, for example, you have a directory full of music by an artist and want to tag all that music with the artist's name. Only supported tag elements can be mass-tagged. Again, I get the hash tag, make my changes, and write it back. The goal is to make it simple and easy to maintain.


Listing 9. Mass tagging, comment stripping, and guessing track numbers
# {{{ handle the one-shot options
if ($config->GUESS_TRACK_NUMBERS_ONLY() ||
    $config->STRIP_COMMENT_ONLY() ||
    scalar keys %{$config->MASS_TAG_ONLY()})
{
 foreach my $file (@ARGV)
 {
  my $tag = get_tag($file, 1);
  unless (defined $tag)
  {
   warn "No ID3 TAG info in '$file', skipping";
   next;
  }

  next if $config->DRYRUN();

  # delegate stripping comments to the mass tagging function
  if ($config->STRIP_COMMENT_ONLY())
  {
   $config->MASS_TAG_ONLY()->{COMM} = '';
   $config->MASS_TAG_ONLY()->{WXXX} = '';
  }

  if (scalar keys %{$config->MASS_TAG_ONLY()})
  {
   foreach (keys %{$config->MASS_TAG_ONLY()})
   {
    unless (exists $supported_frames{$_})
    {
     warn "Unsupported tag element $_ requested for mass tagging, skipping";
     next;
    }
    $tag->{$_} = $config->MASS_TAG_ONLY()->{$_};
   }
   set_tag($file, $tag);
  }
  else
  {
   my $track_number_guess = guess_track_number($file, $tag);

   next if $config->DRYRUN();

   if (defined $track_number_guess &&
              read_yes_no("Is track number $track_number_guess OK for '$file'?", 1))
   {
    $tag->{TRCK} = $track_number_guess;
    set_tag ($file, $tag);
   }
   else
   {
    warn "Could not guess a track number for file $file, sorry";
   }
  }
 }

 exit 0;
}
# }}}

Ah, the mass renaming option. I left it for last because it's the most complex one. For each renaming parameter, I make each "%" in the tag value appear as "{{{%}}}" because otherwise, those "%" characters, when followed by one of the special renaming parameters, could be misinterpreted. Take "100%true" for instance, for the track name, and see how it would become "100%TRACKNAMErue" instead, where TRACKNAME is the track name I get from the hash tag.

Mass renaming also eliminates bad characters, and replaces certain characters with "_" to ensure a reasonable file name. Finally, unless the -c (accept_all) option is given from the command line, autotag.pl will ask if it's okay to rename the file.


Listing 10. Mass renaming
# {{{ handle the -rename_only option
if ($config->RENAME_ONLY())
{
 foreach my $file (@ARGV)
 {
  my $tag = get_tag($file, 1);
                 # the extra parameter will ask us about upgrading V1 to V2
  unless (defined $tag)
  {
   warn "No ID3 TAG info in '$file', skipping";
   next;
  }

  my %map = (
     '%c' => 'COMM',
     '%s' => 'TIT2',
     '%a' => 'TPE1',
     '%t' => 'TALB',
     '%n' => 'TRCK',
    );

  my $name = $config->RENAME_FORMAT();

  foreach my $key (keys %map)
  {
   my $tagkey = $map{$key};
   my $replacement = '';
   if (exists $tag->{$tagkey})
   {
    $replacement = substr $tag->{$tagkey}, 0, $config->RENAME_MAX_CHARS();
                    # limit to N characters
    if ($tagkey eq 'TRCK' && $replacement =~ m/^\d$/)
    {
     $replacement = "0$replacement";
    }
   }

   $replacement =~ s/%/{{{%}}}/g;
                    # this is how we preserve %a in the fields, for example

   $name =~ s/$key/$replacement/;
  }

  $name =~ s/{{{%}}}/%/g;   # turn the {{{%}}} back into % in the fields

  print "The name after % expansion is $name\n" if $config->DEBUG();

  foreach my $char (map { quotemeta } @{$config->RENAME_BADCHARS()})
  {
   $name =~ s/$char//g;
  }

  print "The name after character removals is $name\n" if $config->DEBUG();

  my $newchar = quotemeta $config->RENAME_REPLACEMENT();

  foreach my $char (map { quotemeta } @{$config->RENAME_REPLACECHARS()})
  {
   $name =~ s/$char/$newchar/eg;
  }

  print "The name after character replacements is $name\n" if $config->DEBUG();


  if ($name eq $file)
  {
   # do nothing
   print "Renaming $file is unnecessary, it already answers to our high standards\n"
    if $config->DEBUG();
  }
  elsif (-e $name)
  {
   warn "Could not use name $name, it's already taken by an existing
                        file or directory $file";
  }
  elsif ($config->ACCEPT_ALL() || read_yes_no("Is name $name OK for '$file'?", 1))
  {
   next if $config->DRYRUN();
   print "Renaming $file -> $name\n";
   rename($file, $name);
  }
  else
  {
   # do nothing
  }
 }

 exit 0;
}
# }}}


Conclusion

The second part of this article will discuss the main loop of autotag.pl, and show common usage of the program.


Resources

About the author

Author photo

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, Unix system administration, CORBA, and project management. Contact Teodor at tzz@bu.edu.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11359
ArticleTitle=Cultured Perl: Fun with MP3 and Perl, Part 1
publish-date=12112003
author1-email=tzz@bu.edu
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).