Bulk CD Ripping — Part Two: FLAC images to MP3 files

Part one of this series showed how I ripped a large pile of CDs into FLAC image files. The FLAC image files are playable, but most of our hardware wants MP3 files and I’d prefer one file per track instead of one file per album. The FLAC image files also have crummy tags from CDDB so there will be a lot of typos in artist names and track titles.

A friend and I started tackling this problem a few years ago and wrote a pair of Python scripts called tag.py and transcode.py that converted FLAC images intoMP3s. When we wrote these scripts FLAC images were a sort of new concept and we came up with our own unique way of tagging them that no one else used. Tag.py read an EAC-generated CUEsheet (including CDDB tags) and enter them into the FLAC image. We could (painfully)hand edit the tags in the FLAC image after this step. Transcode.py read the FLAC image, including the tags embedded by tag.py and generated MP3s (or any other format that we wanted, including per-track FLAC files).

The process worked, but it was clumsy. We also still relied on CDDB tags for our music and then a lot of hand editting to fix them.

Since writing those scripts a CDDB replacement called MusicBrainz has become mature. MusicBrainz is a much stricter tag database where all entries are reviewed and there are tight relationships between artists, releases (albums) and tracks. This tight relationship means that each artist only has one name, you won’t have problems with one CD being tagged “The Beatles” while another is tagged “Beatles” and a third is tagged “Beatles, The”.

We talked about our old scripts and came up with a better system:

  • tag.py would find the CD in the MusicBrainz database and get the MusicBrainz ID for it. This is a unique ID that identifies that album.
  • tag.py would embed the MusicBrainzID into the FLAC image.
  • transcode.py would read the MusicBrainzID, get the track metadata from MusicBrainz, and encode our MP3s.

Once you’ve run tag.py you can regenerate MP3s (or any other type of music file) just by rerunning transcode.py. This future proofs our music.

I rewrote tag.py to have a small GUI. When opening a FLAC file tag.py first computes the DiscId (a mostly-unique identifier used to find the disc in the MusicBrainz database) and sends this to MusicBrainz. If the DiscId isn’t found then it searches based on the artist and title that we already have in the FLAC image from CDDB. The GUI lets you see each of these matches and pick the right one. If you don’t find the right one then you can search on your own and get the MusicBrainzID and paste that into the GUI. Once you’ve found the right tags for this album you hit save and it writes the ID out the FLAC.

Note that my day job is writing server software, not GUIs, and it shows in the ugliness of this tool. Also,even though this has a few buttons along the bottom it is really designed to be keyboard driven. A lot of output is written to the console that you started the tool from. Consider it a half-GUI/half command line tool.

You start the tagger by running tag.py and passing it some filename globs. For instance “tag.py *.flac” will have it work on all of the FLAC files in the current directory.

The screen is divided into three sections. The top section shows you the important data from the FLAC image (filename,CDDB artist/title, DiscId, embedded MusicBrainz ID and number of tracks). The fields used for a MusicBrainz search will show up in red. The second section show you the results for a current MusicBrainz query. You can enter a new MusicBrainz ID if you find something better on their search page. The last section has buttons that map to some of the key presses and a status bar that tells you what it is doing.

Here are the keys that matter:

  • Control-N and Control-P — Go to the next and previous FLAC file
  • Alt-N and Alt-P — Go to the next and previous hit from a MusicBrainz search
  • Alt-S — Save the current MusicBrainz ID to the current FLAC
  • Alt-L — Reload MusicBrainz matches (this searches by MusicBrainz ID, Disc ID, and CDDB artist and title)
  • Control-Shift-N — Find the first FLAC without a MusicBrainzID
  • Alt-Q — quit

With this tool I can find the right tags for our FLAC images from MusicBrainz at a rate of about 100 CDs every 20 minutes. This includes entering releases into MusicBrainz for CDs that we own, but which they don’t have in their database.

Once tag.py has been run on a bunch of FLAC files you just run transcode.py and walk away. This will transcode each of your FLAC images into seperate MP3 files and put them in the right directory. The exact method for doing this is controlled by the file transcode.cfg. Here is my version of the file:

[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F

This tells transcode.py that we are going to use one encoder and that it will make mp3 files. The files will go into d:musicmp3new and then be listed under performer and release title. If the album is a compilation (coming from multiple artists) then the second command listed is used to encode them, otherwise the first one is. If we wanted two different qualities of MP3 (high for home use,crappy for portable device use) you could just make another config that looks like this:

[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3,mp3crappy
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F
[mp3crappy]
Directory: f:/music-rerip/mp3crappy/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F

All of these scripts are at http://www.phred.org/~alex/transcode. Note that they are likely to change a lot in the next couple of weeks. Here is whatyou’ll find there today:

  • MusicBrainzHelper.py — A helper class for Python tomake it easier to work with MusicBrainz
  • FlacHelper.py — A helperclass for Python to make it easier to work with FLAC files.
  • tag.py — The GUI tool show above for adding the DISC_MUSICBRAINZ_ID tag to FLAC images.
  • transcode.py — The command line tool to convert FLAC images to MP3 or other files
  • cache-tags.py — This will cache the tags for each FLAC image if you want to run transcode.py while disconnected from the internet.
  • TODO — Known bugs

alex

7 Comments

  1. Rob says:

    What a difference a few months makes… MusicBrainz has matured and has been incorporated into several apps as has AccurateRip, and Replay Gain.

    If you were all to do it again today from scratch (read: if you were me), what would you do differently?

  2. AlexWetmore says:

    I haven”t looked at any new software released in the last 6 months. As far as I know no one has released an app that does EAC-quality ripping while controlling a CD/DVD changer to rip in bulk.

    MusicBrainz support was pretty wide spread when I wrote this stuff, but I didn”t find anything that worked on FLAC CD images instead of a track by track basis.

    Based on what I know today I”d have written the same thing.

  3. Smiley says:

    Alex … This has been wonderful software. Most of my CDs tagged without any problem. Unfortunately, my soundtracks (Pulp Fiction, Blade, etc) do not work properly in your tagger. I am also unable to manually enter a musicbrainz id. When the tagger gets to these titles I get a listindex out of range error. How do I either add these cds to musicbrainz or force save a manually entered muscbrainz URL. I have done this for non-soundtracks with success but it does not seem to work for soundtracks. Thanks.

  4. Jasper says:

    Error Message:

    I get the following error message when running your really helpful tagging software. Any Ideas ?

    M:FLAC-images>tag.py
    Various – Pump Up the Volume.wav.flac
    Traceback (most recent call last):
    File “C:Python25tag.py”, line 435, in
    app = Application(["*.flac"])
    File “C:Python25tag.py”, line 41, in __init__
    self.loadFlac()
    File “C:Python25tag.py”, line 178, in loadFlac
    [artist, title] = self.flac.getArtistAndTitle()
    File “C:Python25FlacHelper.py”, line 91, in getArtistAndTitle
    artist = artist[0].partition(“=”)[2]
    IndexError: list index out of range

  5. AlexWetmore says:

    What happens when you run “metaflac –show-tag=DISC_PERFORMER” on that FLAC?

    You can set the MUSICBRAINZID into the FLAC from the command line by running:
    “metaflac –set-tag=DISC_MUSICBRAINZ_ID=” where is replaced with the musicbrainz URL. Look at the function setMusicBrainzId in FlacHelper.py.

  6. Rob says:

    Yay! I got it working finally. Big problem I had was that EAC 0.99pb1 doesn”t work with REACT2 (filenames for @cuesheet@ and @eaclog@ don”t resolve right). Turns out that Synthetic Sound has a REACT2 mod that also happens to fix this.

    Now onto replay gain: does the stock transcode.cfg apply replay gain? I see lame print replaygain info during encoding, but I don”t see the replaygain tag in my tag viewer. If not, how do you suggest modding this to use the replay gain from the FLAC encoding (since it’’s already been computed)?

  7. this looks great. You should check out a similar tool that my friend wrote which deals with the same problem. Also, I am a huge fan of your website and as a tinker-er and hobby framebuilder have been appreciating it for years. Thanks.