Saturday, May 26, 2007

HOWTO organize a music library of 10,000 songs

Several days ago, I set out on an epic quest to organize my music library of roughly 10,000 songs. It's a daunting task, and one I've put off for a while, because my files are in several different places in several different formats with several different ID3 tag/naming standards. And there are more problems too: identifying mislabeled files I got from other places, weeding out corrupted files, etc, etc etc. Suffice it to say that it's been an intense few days with me in a zen-like state of focus trying to get everything in its right place. So here's the short version of how I did it.

The first (major) problem I faced was the massive swaths of files that had incorrect ID3 tags. For this problem I needed a competent ID3 tagger. About a year ago I asked on ubuntu-users what the list's tool of choice was. Suggestions ranged from Quod Libet to Foobar2000 on WINE (outstanding reviews: "best program ever" and "the only program i miss since switching to ubuntu") to gtkpod to EasyTag (mixed reviews: "tolerable" to "great") to id3v2 (command line based) to normal music players like amarok and banshee. There is also a Python module for accessing and editing ID3 tags, if I felt like scripting some things myself.

Eventually I settled on Picard, the tagger that Musicbrainz (a great music metadata site) recommends. Thanks to Ubuntu and the Picard team, installation is as simple as adding deb http://ftp.musicbrainz.org/pub/musicbrainz/users/luks/ubuntu edgy musicbrainz to /etc/apt/sources.list (even if you're not running edgy), importing the PGP key with wget http://ftp.musicbrainz.org/pub/musicbrainz/users/luks/public.key -O- | sudo apt-key add - and then a sudo aptitude update and sudo aptitude install picard. There are only a few limited operations Picard can perform, but they are very powerful. Picard interacts with the Musicbranz database to identify tracks in an album-based scheme. It correctly identified and summarily tagged more than 95% of the CDs I threw at it, and I listen to some pretty obscure/out there music. The ones it couldn't recognize were mostly tracks that didn't have a lot of existing context to go with them, and although I don't know exactly what information Picard looks at to determine the ID3 tags, I suspect that was the problem. There are a few problems with Picard, however. First of all, the UI sucks. Second, there are very limited capabilities. I would like to see, say, options that move files around on the file system depending upon how Picard tags (or fails to tag) them. The albums that have their ID3 tags written by Picard don't seem to be organized in any meaningful way, except in the current running instance of the program, where they are added to the bottom of the list (which is also annoying because of the need to keep scrolling to the top to add files to tag and then to the bottom to click and drag the files it missed and back again). Increased automation in the program would be nice, too, as I found myself performing the same operations over and over again. There are also minor issues like it opening up a new browser tab on each database lookup (leading me to have firefox open with ~100 or more tabs at some points). Fortunately for the future Picard user, I saw most of these issued acknowledged and marked for future improvement in the documentation.

The second (major) task ahead of me was conversion of all of my files into the near-universal mp3 format (yes, I know I should be gunning for ogg, but you can't use those files on your iPod... well, without the free firmware you can load onto it... another day, another day). For m4a files, I used the script found at the end of this thread which happily copied my ID3 tags as well. For ogg files, I would need something more general. At first glance, SoX seemed like a reasonable tool, but it needs to be recompiled for mp3 support. I also found some other scripts, but those didn't copy the ID3 tags, which made them unacceptable for my purposes. I asked in the freenode #ubuntu channel regarding my quandary, and someone mentioned Soundconverter as an option.

Soundconverter is a simple tool, but gets the job done. I installed it with a simple sudo aptitude install soundconverter. It provides a nice drag-and-drop interface that doesn't require knowledge of ID3 tag layouts or encoding/decoding details. Most of the options are located in the Edit->Preferences menu; the rest of the program is idiot-proof. Just drag the files, and hit Convert. It does do some annoying things, like saving files in URL-encoded names (example: files in folder "Cannibal Corpse" with the option to save in the same folder on were saved in a newly created folder, "Cannibal%20Corpse"), but generally works well.

Things were a bit more complicated than this, of course. I was making nightly backups of my changes to a local FTP server, wrote up some Python scripts to assist me in my task of keeping track of where everything was going and manually edited some outlying tags in amarok -- but now my music library is (almost) officially organized. Success! Now to just incorporate all of my friends' libraries...

Some cool links I found along the way
Some relevant command line tricks
  • find -type f ! -iregex '\(.*.flac\|.*.ogg\|.*.m4a\|.*.mp3\)' -delete -or -empty -delete

1 comment:

Matt said...

AmaroK handles large libraries very well. It can organize your library based on your criteria (album artist / artist / genre / whatever you want), is already set up to tag with musicbrainz, has last.fm suggestions built in, and is intergrated with wikipedia for artist information. What more could you want? A script for FTP support, that's what.... still working on that one. Cheers!