Bulk CD Ripping — Part Two: FLAC images to MP3 files
Part one of this series showed how I ripped a large pile of CDs into FLAC image files. The FLAC image files are playable, but most of our hardware wants MP3 files and I’d prefer one file per track instead of one file per album. The FLAC image files also have crummy tags from CDDB so there will be a lot of typos in artist names and track titles.
A friend and I started tackling this problem a few years ago and wrote a pair of Python scripts called tag.py and transcode.py that converted FLAC images intoMP3s. When we wrote these scripts FLAC images were a sort of new concept and we came up with our own unique way of tagging them that no one else used. Tag.py read an EAC-generated CUEsheet (including CDDB tags) and enter them into the FLAC image. We could (painfully)hand edit the tags in the FLAC image after this step. Transcode.py read the FLAC image, including the tags embedded by tag.py and generated MP3s (or any other format that we wanted, including per-track FLAC files).
The process worked, but it was clumsy. We also still relied on CDDB tags for our music and then a lot of hand editting to fix them.
Since writing those scripts a CDDB replacement called MusicBrainz has become mature. MusicBrainz is a much stricter tag database where all entries are reviewed and there are tight relationships between artists, releases (albums) and tracks. This tight relationship means that each artist only has one name, you won’t have problems with one CD being tagged “The Beatles” while another is tagged “Beatles” and a third is tagged “Beatles, The”.
We talked about our old scripts and came up with a better system:
- tag.py would find the CD in the MusicBrainz database and get the MusicBrainz ID for it. This is a unique ID that identifies that album.
- tag.py would embed the MusicBrainzID into the FLAC image.
- transcode.py would read the MusicBrainzID, get the track metadata from MusicBrainz, and encode our MP3s.
Once you’ve run tag.py you can regenerate MP3s (or any other type of music file) just by rerunning transcode.py. This future proofs our music.
I rewrote tag.py to have a small GUI. When opening a FLAC file tag.py first computes the DiscId (a mostly-unique identifier used to find the disc in the MusicBrainz database) and sends this to MusicBrainz. If the DiscId isn’t found then it searches based on the artist and title that we already have in the FLAC image from CDDB. The GUI lets you see each of these matches and pick the right one. If you don’t find the right one then you can search on your own and get the MusicBrainzID and paste that into the GUI. Once you’ve found the right tags for this album you hit save and it writes the ID out the FLAC.
Note that my day job is writing server software, not GUIs, and it shows in the ugliness of this tool. Also,even though this has a few buttons along the bottom it is really designed to be keyboard driven. A lot of output is written to the console that you started the tool from. Consider it a half-GUI/half command line tool.
You start the tagger by running tag.py and passing it some filename globs. For instance “tag.py *.flac” will have it work on all of the FLAC files in the current directory.
The screen is divided into three sections. The top section shows you the important data from the FLAC image (filename,CDDB artist/title, DiscId, embedded MusicBrainz ID and number of tracks). The fields used for a MusicBrainz search will show up in red. The second section show you the results for a current MusicBrainz query. You can enter a new MusicBrainz ID if you find something better on their search page. The last section has buttons that map to some of the key presses and a status bar that tells you what it is doing.
Here are the keys that matter:
- Control-N and Control-P — Go to the next and previous FLAC file
- Alt-N and Alt-P — Go to the next and previous hit from a MusicBrainz search
- Alt-S — Save the current MusicBrainz ID to the current FLAC
- Alt-L — Reload MusicBrainz matches (this searches by MusicBrainz ID, Disc ID, and CDDB artist and title)
- Control-Shift-N — Find the first FLAC without a MusicBrainzID
- Alt-Q — quit
With this tool I can find the right tags for our FLAC images from MusicBrainz at a rate of about 100 CDs every 20 minutes. This includes entering releases into MusicBrainz for CDs that we own, but which they don’t have in their database.
Once tag.py has been run on a bunch of FLAC files you just run transcode.py and walk away. This will transcode each of your FLAC images into seperate MP3 files and put them in the right directory. The exact method for doing this is controlled by the file transcode.cfg. Here is my version of the file:
[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F
This tells transcode.py that we are going to use one encoder and that it will make mp3 files. The files will go into d:musicmp3new and then be listed under performer and release title. If the album is a compilation (coming from multiple artists) then the second command listed is used to encode them, otherwise the first one is. If we wanted two different qualities of MP3 (high for home use,crappy for portable device use) you could just make another config that looks like this:
[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3,mp3crappy
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F
[mp3crappy]
Directory: f:/music-rerip/mp3crappy/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F
All of these scripts are at http://www.phred.org/~alex/transcode. Note that they are likely to change a lot in the next couple of weeks. Here is whatyou’ll find there today:
-
MusicBrainzHelper.py — A helper class for Python tomake it easier to work with MusicBrainz
-
FlacHelper.py — A helperclass for Python to make it easier to work with FLAC files.
-
tag.py — The GUI tool show above for adding the DISC_MUSICBRAINZ_ID tag to FLAC images.
-
transcode.py — The command line tool to convert FLAC images to MP3 or other files
-
cache-tags.py — This will cache the tags for each FLAC image if you want to run transcode.py while disconnected from the internet.
-
TODO — Known bugs
alex