Archive for January, 2009

Running out of original titles on the subject of replay gain

31st January 2009

Now that Amarok can read replay gain tags from almost all the files that it can read metadata from, I feel replay gain support in Amarok is pretty much there.  And, yes, all this will be in Amarok 2.1.

A small caveat: the reading of replay gain tags from MP4 files only works if Amarok was built against libMP4v2.  Amarok uses its own (home-brewed?) code for reading MP4 tags when it can’t find libMP4v2 at build-time, but it doesn’t support freeform tags and I don’t see the point of implementing that support when libMP4v2 does the job.

Amarok 2.1 will support all the file formats that Amarok 1.4’s replay gain script did (and more), apart from Musepack (mpc).  Ironically, this is because of Musepack’s native support for replay gain – rather than abusing metadata tags to store the replay gain information, Musepack has a special field in the file header to specify the values.  However, TagLib doesn’t let us at this field, so we’re a bit stuck.

So the state of replay gain support in Amarok is that, for files in the main collection or elsewhere on your computer’s filesystem (so not streaming media and not portable music players), replay gain tags (both album and track mode) will be read from the following formats:

  • MP3 (as written by mp3gain/aacgain, Foobar2000 and Mutagen/Quod Libet – yes, there are three different ways of writing the tags to MP3 files)
  • OGG Vorbis (as written by vorbisgain and anything compatible)
  • OGG FLAC (assuming the tags are stored in the same way as in OGG Vorbis)
  • OGG Speex (assuming the tags are stored in the same way as in OGG Vorbis)
  • FLAC (as written by flac [with it’s –replay-gain switch] and anything compatible)
  • WMA/ASF (as written by the Amarok 1.4 replay gain script)
  • MP4 (as written by aacgain and anything compatible, providing Amarok was built with libMP4v2)
  • WavPack (assuming the tags are stored in a similar way to how mp3gain stores them in MP3 files)
  • TrueAudio (assuming the tags are stored in a similar way to how Foobar2000 or QuodLibet/Mutagen store the tags in MP3 files)

In addition, Amarok will use the track mode tags (or the album mode ones if there aren’t any track mode tags) to adjust the gain during playback.  I haven’t added an option to switch to album mode (or even disable it, if you really want to) yet, but that will come.  I just have to figure out where it should go in the user interface…

Of course, this is only half the functionality provided by the Amarok 1.4 replay gain script.  That also parsed and tagged files in the playlist.  This could be implemented nicely as a script (I certainly don’t believe it belongs in the core of Amarok).  But that’s for another day, and probably another person.


Replay This

18th January 2009

So, after some trial-and-error, I’ve cracked the RVA2 frame format as written by Quod Libet / Mutagen.

  • All values are stored big-endian (most significant byte first).  This tripped me up, because this is not true for all data stored in all tag formats.
  • If we think of the peak value in terms of the “units, tens, thousands” format you learned at school (only, of course, in binary this is “units, twos, fours”), the first bit is the units, the second is the halves, the third quarters and so on.  Simple, yes?  However, it’s not so easy to infer from the code that parses it.  So I’ve put plenty of comments in the amarokcollectionscanner code.
  • Everything else you could wish to know about the format is in the ID3v2.4 frames list, section 4.11.

So Amarok’s collection now has complete support for replay gain tags in Ogg Vorbis (as written by vorbisgain), FLAC (as written by flacenc) and MP3 (as written by Foobar2000 [ID3v2.3 TXXX comments], Quod Libet / Mutagen [ID3v2.4 RVA2] or mp3gain [APEv2]).

Other formats will follow.  As will replay gain support for files that aren’t in the collection, but are on the local machine.

Reverse Engineering Datatypes

17th January 2009

Amarok now reads the tags written by Foobar2000 and by mp3gain (written when you call mp3gain without the -a or -r options) from MP3 files.  However, the final part of MP3 support is tricker: the RVA2 tag in the ID3v2.4 spec.

Naturally, the specification leaves out an all-important detail: the format of the peak volume field.  It tells you which bits represent the peak volume, but not how to interpret them.

Luckily, mutagen, a Python audio metadata library, supports this tag, so it’s implementation can serve as a reference.  However, they try to be clever with their implementation, so reverse-engineering it to arrive at the format of the original data requires some work.

The documentation on the Python class implementing the RVA2 frame support says that the peak volume is a float between 0 and 1.  So 0 is silent, 1 is full volume (digital full scale).  This doesn’t seem right to me, because the replay gain specification points out that it is possible to have a peak volume over 1 in some circumstances in a compressed audio file.  But we’ll leave that aside for the moment.

Let’s start with the code.  data contains the raw bytes, the first of which is a number specifying how many bits (not bytes) of the remaining data is occupied by the number representing the peak volume.

        peak = 0
        bits = ord(data[0])
        bytes = min(4, (bits + 7) >> 3)
        # not enough frame data
        if bytes + 1 > len(data): raise ID3JunkFrameError
        shift = ((8 - (bits & 7)) & 7) + (4 - bytes) * 8
        for i in range(1, bytes+1):
            peak *= 256
            peak += ord(data[i])
        peak *= 2**shift
        return (float(peak) / (2**31-1))

Let’s start with bytes.  This is simply bits (the number of bits representing the peak volume) rounded up to the nearest 8, then divided by 8.  So if bits is 8n + k, bytes is n in the case that k = 0 and (n+1) in the case that k > 0.

The next variable is the shift.  This is the first bit of clever magic, and it takes some time spent staring at it (preferably with a pad and paper to hand) to arrive at the following conclusion:

  • if k = 0, shift is 8(4 – n)
  • if k > 0, shift is 8(4 – (n + 1))

Then we read the bits into peak.  Remember that if k > 0, the last (8 – k) bits will be junk.  Now we shift it right (shift is always at least 0, because of our contraint on bytes to be at least 4) so that the first 32 bits are all that remains (I assume here that Python is treating peak as an integer).  Then we turn peak into a float and divide it by (231) – 1.  This contant is a magic number, being the largest value that can be stored in a signed 32-bit integer.

Something that might shed light on this is that, when it writes the peak volume out, it simply writes the value multiplied by 215 as a 16-bit unsigned integer.  This would make interpreting the value as simple as placing a “decimal” point after the first binary digit (so we get 1 digit before the point and 15 after).  Note that this does indeed allow a peak volume greater than 1 (but less than 2).

I’m left with two questions:

  1. Why do we divide the number by MAX_INT_32, rather than simply 231? (I just made up that constant name now, don’t complain that it’s wrong.)
  2. Why does mutagen put a 32-bit minimum on the number, and then write a 16-bit number when it writes out RVA2 tags?

Answers on a postcard (or just in the comments).

Gaining Gain for Personal Gain

15th January 2009

SVN commit 911684 by alexmerry:

ReplayGain FTW!

Make replay gain support actually do something by
(a) getting the data we stored out of the collection database
(b) using it when the track changes

Also, improve the storage of replay gain tags by storing NULL when they weren’t present on the original track metadata. This allows us to substitute the track gain for the album gain when the latter is requested but doesn’t exist.

This closes the most popular feature request on bugzilla.

Of course, there’s still work to be done.

  • Currently, it’s fixed in track mode.
  • It only works on files in the collection – it should be possible to extend it to other files on the local computer at least.
  • Finally, it only works on Ogg Vorbis and FLAC files (mp3gain and aacgain modify MP3/MP4 files in such a way that we don’t need to do anything special in Amarok for them to sound right, providing you pass either the “-r” or “-a” option). Adding other file types is simply a case of adding the relevant code to amarokcollectionscanner.

But, right now, I’m listening to my music all playing at the right volume (apart from the WMA files, but what can you do?).

[edit] I should point out that it works without moving your volume slider up and down like the Amarok 1.4 script did.  It just works magically and invisibly, like it should.  This part of the implementation (actually changing the volume) was a doddle, thanks to the wonder of Phonon[/edit]

The Task-Oriented Revolution

12th January 2009

TheBlackCat posted this on an earlier post on this blog, and I thought it was worth sharing more prominently:

What I think will be a key revolution KDE will bring about is the task-oriented desktop. Plasma, Akonadi, Nepmuk, these are all parts of that. It will make computers smarter. Up until now computers did not care what you are doing, they cared about what you were using to do it. They organized themselves around what program you are using, not what you are doing with that program.

But people have different programs to accomplish the same task, and tasks often involve multiple programs. A computer that knows what you are doing and reorganizes itself to make that task easier is a huge leap forward in the way we work with computers. The Office 2007 ribbon interface is another example of that, but it is still embedded in the application-oriented desktop paradigm we have had up until now. It can even be taken a step further, allowing a computer to learn how you like to do certain tasks and organize itself appropriately. For instance such a computer could realize when you are chatting with your IT guy for a certain amount of time you generally pull up certain configuration programs, send him an email with an attachment, and check certain system monitoring applets. Let me get that all ready for your and stick them on a virtual desktop so you can get to it easily. KDE 4 provides the potential for a computer that automatically adapts itself to your work flow instead of you having to adapt yourself to its work flow. Imagine the benefit to businesses if you don’t have to train users to work with the system, the system will train itself to work with the users.

Everybody has different ways they like to do different things, but up until now the best they could do is try go set up their their desktop as best they can to make their most common tasks as easy as possible within the limits imposed by the system. Most people do not even bother to take advantage of the limited abilities their system provides, they simply use the default configuration. They never learn how they can modify and improve their computer experience, their efficiency, and their enjoyment. But a system that knows what users are trying to do, how they like to do it, and knows how take advantage of its own abilities to make those tasks easier would not need to rely on users spending the time and effort to learn the intricacies of the system, it would simply provide what they need when they need it.

Such a system requires four parts, I feel. First it needs a flexible and easily adaptable desktop. Plasma provides that. Second it needs something to track the relationships between data. Nepomuk and akondi provide that, or will soon. Third it needs programs that are able to understand how you perceive their relationship with the data and with each other. As I understand it is this is a major goal of KDE 4 over the long run.

Finally it needs to understand how you like to physically interact with the computer’s hardware. This, I feel, is still where KDE has serious limitations, and I think it is holding back the flexibility found in the rest of the desktop. The ability to configure the UI is amazing, but the ability to configure how the computer’s hardware interacts with and impacts the UI is very limited. Essentially we have keyboard shortcuts, that is it. Little else can be configured by the user.

The way we interact using the mouse is not flexible at all, we have three buttons and a scroll wheel. Modern mice generally have at least 5 buttons and a tilt wheel. The ability to dictate how the mouse interacts with the computer is pretty much limited to touching screen edges to activate a couple of effects, dragging windows across virtual desktops, and a few button presses on windows titles. Shortcuts involving mouse buttons are essentially unsupported. The ability to dictate how certain modes of interaction using the mouse effect the desktop environment is limited, in the relatively few cases where mouse interaction is configurable at all it has at most a couple of options.

Compiz has fairly extensive mouse interaction configuration, allowing pretty much any mouse button to be combined with a modifier key to control most aspects of the window manager. Windows 7 has some interesting ideas about moving windows, like the shaking windows to minimize others and dragging windows to screen edges to maximize them across half the screen. Of course certain people may not like these specific interactions, and in Windows 7 they do not appear very flexible, even compiz does not really support combining keyboard and mouse button presses beyond the use of modifier keys. But in KDE 4 the ability to dictate what effect a certain mouse interaction with a window or with a desktop will have is practically non-existent if you compare it to those examples, and is even more striking next to the extreme flexibility of the rest of the KDE 4 experience. So I think it important to be able to tell the system things like “shaking a window will have this effect”, “moving it to a screen edge will have this effect”, “tilting the scroll wheel left on the desktop will have this effect”, “meta+C+mouse button 5 on an applet will have this effect”.

This is even more limited when it comes to other types of devices. For instance there is no way at all to dictate what pushing a button on a joystick or a bluetooth device will do. They are simply not integrated into the KDE desktop interaction framework at all.

Another biggie that KDE, and Linux in general, essentially does not have at all is voice interaction. But I think this is an extremely natural way for people to interact, giving vocal commands is something people learn from a very early age. It is something that Microsoft has been working hard on supporting, and even most modern cell phones have it, but Linux in general and KDE in particular does not. Things like launching programs, switching desktops, and organizing windows seem particularly suited to voice commands since they are fairly simple and generally do not do anything terrible if there is a mistake.

The output side of hardware interaction is important as well. An example is having the computer know which printer you like to use when doing certain tasks (for instance a black and white office copier when printing PDFs, a color inkjet printer when printing photos). Or knowing that when you go into full screen when viewing a photo you want it to go full screen on your monitor, while if you set go into full screen mode with a video you want it to go full screen on your TV. Phonon and Solid seems to be trying to provide this to the Audio side of things, but it has applications for just about any output device.

Once you have the framework for being able to have a flexible method of interaction between input devices, output devices, programs, the desktop, and windows, it should become much easier for the computer to learn how you like to interact with it and adapt appropriately. For instance it could learn that when you are working with a text document and push the “play” button on your lirc remote you want amarok to open and start playing music, but if you stick a DVD in the drive and immediately after push the same button you want to open Dragon Player and play the DVD. It is extending the task-oriented desktop to the hardware side of things, to learn not only what you do and the process you use to do it but also what devices you use in that process and how you use them.

The Gains of Replay Gain

11th January 2009

So… no-one responded with anything helpful on my last post.  There was a Dot article about the semantic desktop (and hence Nepomuk), however.

On to more interesting things.  I’ve got fed up recently with changing volumes for different songs.  I actually notice it more on my portable MP3 player, but Amarok 2 has long bugged me by not having any Replay Gain support.  And I’ve been thinking about implementing it for about the same amount of time.  Yesterday, I finally got around to starting it.

The first hurdle I came up against was how to get the information I needed from the files.  If you run mp3gain or aacgain on your MP3 or MP4 files, or if you have MusePack files, there’s no problem since the information about changing the volume is stored in such a way that it is interpreted by the decoder, and so Amarok doesn’t need to do anything.  But with other file formats, and when you use other tools for calculating the gain for MP3 and MP4 files, the information is stored as tags.  Fine, we can read tags.  We can store the information we read in the database, and then access it later.  But we’re left with the question of what tags we’re looking for.

The official Replay Gain standard is incredibly unhelpful in this regard.  It gives you all sorts of details about the mathematics behind calculating how much gain to apply to a track, but almost nothing about how to store or retreive that value.  On top of that, it is woefully out of date, not having been updated since 2001.

OK, so over to the Replay Gain page on the slightly more active Hydrogenaudio Knowledgebase wiki.  However, which tag format (ID3v2 or APE, say) the relevant tags are stored in by various tools and for various formats is about as detailed as it gets.  None of the web pages for the various tools are any more helpful.

The vorbisgain man page is more forthcoming: the relevant tags are REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN and REPLAYGAIN_ALBUM_PEAK. The REPLAYGAIN_*_GAIN values are in decibels and have ” dB” appended to the end. The REPLAYGAIN_*_PEAK values are floats (no further information given).

So I turned to the old Amarok 1.4 Replay Gain script.  This is what I found:

  • Vorbis and FLAC works as described above. To get the peak gain relative to the adjusted gain, you apparently need to take the log of the float, base 10, and multiply the result by 20.
  • MP3s can have the data stored in one of several ways (apart from the method detailed above, which we don’t need to concern ourselves with):
    • In the APEv2 tags in the same manner as with Vorbis (possibly not all uppercase)
    • In the ID3v2 TXXX (user comment) tags in the same manner as Vorbis (possibly not all uppercase) – the TXXX frames have a description field and value field, which are used for the tag name and the tag value. Total of 4 TXXX tags to store all the values. These are written by Foobar2000.
    • In the ID3v2.4 RVA2 tags (only Quod Libet does this AFAIK). Each tag has an identification string, into which Quod Libet writes the values “album” and “track”. The format of the rest of the frame is specified in section 4.11 of the ID3v2.4 frames specification.
  • MP4s are done in the same way as Vorbis.
  • AAC (not in an MP4 container): APEv2, presumeably in the same manner as MP3 APE tags (the ReplayGain script doesn’t support AAC files).
  • MusePack: values are stored in the header in fields designated for the purpose. Should be supported natively by the decoder, according to the Hydrogenaudio Knowledgebase, but the 1.4 ReplayGain script stil reads these values. The peak value apparently needs to be multiplied by 2 to get the peak gain relative to the adjusted gain.
  • WMA: same as Vorbis, stored in the ASF tags (this may well be only files tagged by the ReplayGain script itself).
  • WavPack: stored in APEv2 tags, presumeably in the same manner as MP3 APE tags (the ReplayGain script doesn’t support WavPack files).
  • MOD files: Foobar2000 stores these in APEv2 format, presumeably in the same manner as MP3 APE tags (the ReplayGain script doesn’t support MOD files).

Don’t you feel all enlightened now?