Manipulating and guessing MP3 tags with Perl
Every self-respecting computer and music fan needs to be able to manipulate MP3s -- the defacto standard for recreational digital music use. In this article, Ted looks at ways to manage and manipulate MP3s (searching, tagging, renaming, commenting, etc.) using the autotag.pl application. Ted takes you through the application, illustrating how CPAN modules enable the application.
Manipulating MP3 files is a necessity for computer-savvy music lovers today. Although other formats exist and are flourishing, this article will concentrate on the MP3 format because it is by all appearances the most popular one today. However, the general approaches shown will work with other music file formats that allow tags as well. In fact, many file formats that use tags could benefit from an application like mine, autotag.pl. I welcome your suggestions.
This discussion in this article will be on Perl issues in general, manipulating MP3 files in particular, and the autotag.pl application specifically.
I used the
WebService::FreeDB CPAN modules only, even
AudioFile::Identify::MusicBrainz modules also exist and can be useful.
The primary reason why
MP3::ID3Lib was not used was because it requires
the id3lib software (see Resources). While
MP3::Info is pure Perl and
simple to install, I found
MP3::Tag more powerful.
AudioFile::Identify::MusicBrainz were not used because MusicBrainz appears
to be a less comprehensive database of released CDs than FreeDB. In the
end, the choice of ID3 tagging module and track information module is up
to you. My experience, painfully gained through trial and error, is that
WebService::FreeDB will serve you best.
I made the choice not to use the CDDB (Gracenote) disc database, even though it is very comprehensive. Gracenote is a company that keeps proprietary databases of CD track lists (only searching -- no wholesale downloading -- of those databases is allowed). Quite a bit of those databases' contents were contributed by volunteers in the early days when Gracenote was just CDDB. FreeDB is a volunteer effort organized to provide a free, unrestricted database of CD tracklists. The entire contents of the FreeDB databases are available for download without any copyright restrictions -- so you could set up your own FreeDB server if you wanted.
The modules that I did not use were not necessarily inferior, so if
you like you can use them. I simply liked
better based on personal experience with them and for the reasons above.
The actual reading and writing of tags is abstracted in functions, so you
won't have to change a lot if you use a different module for MP3 tag
reading and writing.
I should also mention that the
Term::ReadLine::Gnu CPAN module works
better for me than the default module,
Term::ReadLine::Perl, in Linux inside xterm and Eterm terminal emulators. You may want to install it on
Term::ReadLine if you notice strange behavior at the prompts that
A word about MP3 tags
First, there was music. Then came computers. Computers were slow, and they beeped. Even with such sad tools as the PC speaker (oh, how jealous I was of Apple and Amiga users), programs were written to produce music for games and entertainment. Then came better and better sound cards, and office walls around the world now shake with surround-sound and THX-certified speakers.
In parallel with these hardware developments came a multitude of sound formats. There was .mid for MIDI melodies, .voc, .mod, .wav, and so on. The proprietary MP3 format, which involves many patents owned by the German Fraunhofer institute, became popular over time -- it offered decent compression and performance. There are formats other than MP3, notably Ogg Vorbis, but today MP3 still appears to be the top choice for music storage.
One nice thing about MP3 files was that they could be tagged with ID3 tags. Inside the file was information about it -- what's commonly known as metadata. The album, artist, track name, comments, and (with ID3 version 1.1) even the track number could be stored in the ID3 tag as long as they were under a certain limit of characters.
The successor to ID3 version 1.1 was ID3 version 2 (ID3v2 for short), which is much better in almost every aspect except simplicity. ID3v2 can handle multiple languages, store arbitrarily long data in each tag element, and even store pictures as part of the tag. Unfortunately, dealing with ID3v2 involves learning that TALB is the album name, and TIT2 is the track number. It makes one long for the Ogg Vorbis format, where the artist tag element is called...wait for it...ARTIST! (To be fair, this is just a convention -- Ogg Vorbis comments are as free-form as you want to make them.) Unfortunately, the billions of MP3 files in existence can't be converted without loss of quality to Ogg Vorbis or any other format, so at the very least the next five years will find us dealing with MP3 files in addition to whatever the next "hot" format is.
I have tried very hard to abstract tags as content from the actual ID3 tags. It will be easy, when the time comes, to modify autotag.pl so it will handle other tagging formats besides ID3.