cd understanding
I copy this from http://musicbrainz.org/doc/DiscIDCalculation and it
seems that gracenote and freedb has different cd information:
this is from freedb with cd Moby-Hotel: with the last track 270030 on
freedb but gracenote will show another number, actually freedb's number
is toc+150 but gracenote's will -11400 but still 150 different, seems
that on Windows this is a bug for windows(it will show 5:22 on windows)
but on Qnx and Linux it will show 5:20 and cdrom can only play it 5:20.
so on multisession cd or mixed cd, there are 150*75 lead out and lead in,
another 2*75 pre-gap so is 152*75=11400 gap between 2 sessions.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\freedb\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ # Track frame offsets:
# 150
# 8795
# 25745
# 40138
# 55085
# 70238
# 92398
# 109293
# 124405
# 141120
# 158079
# 175458
# 191905
# 208331
# 234618
# 270030
#
# Disc length: 4440 seconds
#
# Revision: 16
# Processed by: cddbd v1.5.2PL0 Copyright (c) Steve Scherf et al.
# Submitted via: ExactAudioCopy v0.99pb4
#
DISCID=dd115610
DTITLE=Moby / Hotel
DYEAR=2005
DGENRE=Ambient
TTITLE0=Hotel Intro
TTITLE1=Raining Again
TTITLE2=Beautiful
TTITLE3=Lift Me Up
TTITLE4=Where You End
TTITLE5=Temptation
TTITLE6=Spiders
TTITLE7=Dream About Me
TTITLE8=Very
TTITLE9=I Like It
TTITLE10=Love Should
TTITLE11=Slipping Away
TTITLE12=Forever
TTITLE13=Homeward Angel
TTITLE14=35 Minutes
TTITLE15=ghost 16
EXTD=
EXTT0=
EXTT1=
EXTT2=
EXTT3=
EXTT4=
EXTT5=
EXTT6=
EXTT7=
EXTT8=
EXTT9=
EXTT10=
EXTT11=
EXTT12=
EXTT13=
EXTT14=
EXTT15=
PLAYORDER=
.
]
Found audio CD with 16 tracks.
================ CD INFO === start =========
artist=[Moby]
album=[Hotel]
genre=[Ambient]
nb_tracks=16
length= 73:58.00
# 1 1:55.21 @ 150 [Hotel Intro]
# 2 3:46.01 @ 8795 [Raining Again]
# 3 3:11.69 @ 25745 [Beautiful]
# 4 3:19.23 @ 40138 [Lift Me Up]
# 5 3:22.04 @ 55085 [Where You End]
# 6 4:55.36 @ 70238 [Temptation]
# 7 3:45.21 @ 92398 [Spiders]
# 8 3:21.38 @ 109293 [Dream About Me]
# 9 3:42.66 @ 124405 [Very]
#10 3:46.10 @ 141120 [I Like It]
#11 3:51.55 @ 158079 [Love Should]
#12 3:39.23 @ 175458 [Slipping Away]
#13 3:39.02 @ 191905 [Forever]
#14 5:50.38 @ 208331 [Homeward Angel]
#15 7:52.13 @ 234618 [35 Minutes]
#16 13:59.46 @ 270030 [ghost 16]
================ CD INFO === end =========
35 Minutes
rawaudio file format detected.
==========================================================================
Opening audio decoder: [pcm] Uncompressed PCM audio decoder
AUDIO: 44100 Hz, 2 ch, s16le, 1411.2 kbit/100.00% (ratio: 176400->176400)
Selected audio codec: [pcm] afm: pcm (Uncompressed PCM)
==========================================================================
AO: [oss] 44100Hz 2ch s16le (2 bytes per sample)
Video: no video
Starting playback...
A: 243.6 (04:03.6) of 320.2 (05:20.1) 46.7%
MPlayer interrupted by signal 2 in module: decode_audio
A: 244.0 (04:03.9) of 320.2 (05:20.1) 47.2%
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
An Audio CD (CD-DA) can hold up to 99 audio tracks.
Sampling is done at a rate of 44.1 kHz using 16 bits resolution per
channel, thus there are 44100 x 2 bytes x 2 channels (Stereo) = 176400
bytes of PCM data stored per second.
This audio data is contained in logical blocks of 2352 bytes each on the
CD, holding 2352 / 176400 = 1 / 75 seconds of sound.
A logical block plus 882 bytes of error correction and control data forms
a raw block of 3234 bytes that is spread among 98 frames of 33 bytes each
that are all together written on one spiral track among the CD.
Reading the TOC
Audio CD
Now let us have a look at a real world CD-DA. Using a tool like [WWW]
cdrecord on [WWW] this CD will yield:
$ cdrecord dev=/dev/cdrom -toc
...
first: 1 last 6
track: 1 lba: 0 ( 0) 00:02:00 adr: 1 control: 2 mode: 0
track: 2 lba: 15213 ( 60852) 03:24:63 adr: 1 control: 2 mode: 0
track: 3 lba: 32164 ( 128656) 07:10:64 adr: 1 control: 2 mode: 0
track: 4 lba: 46442 ( 185768) 10:21:17 adr: 1 control: 2 mode: 0
track: 5 lba: 63264 ( 253056) 14:05:39 adr: 1 control: 2 mode: 0
track: 6 lba: 80339 ( 321356) 17:53:14 adr: 1 control: 2 mode: 0
track:lout lba: 95312 ( 381248) 21:12:62 adr: 1 control: 2 mode: -1
You can see that the CD has 6 audio tracks and the special lead-out track
(it has number 170 in the CD TOC). Also note that the LBA (Logical Block
Address) offsets start at address 0, but the first track starts actually
at 00:02:00 (the standard length of the lead-in track). So we need to add
150 frames to each LBA offset. The resulting data needed to calculate a
MusicBrainz disc ID are:
track 1: 150 (150 + 0)
track 2: 15363 (150 + 15213)
track 3: 32314 (150 + 32164)
track 4: 46592 (150 + 46442)
track 5: 63414 (150 + 63264)
track 6: 80489 (150 + 80339)
lead-out track: 95462 (150 + 95312)
Multi-session (audio + data) CD
Because MusicBrainz doesn't include data tracks in Disc IDs, reading the
TOC from a multi-session disc is a little more complicated. Running
cdrecord on [WWW] this CD will give us:
$ cdrecord dev=/dev/cdrom -toc
...
first: 1 last 8
track: 1 lba: 0 ( 0) 00:02:00 adr: 1 control: 0 mode: 0
track: 2 lba: 13959 ( 55836) 03:08:09 adr: 1 control: 0 mode: 0
track: 3 lba: 33436 ( 133744) 07:27:61 adr: 1 control: 0 mode: 0
track: 4 lba: 52927 ( 211708) 11:47:52 adr: 1 control: 0 mode: 0
track: 5 lba: 65631 ( 262524) 14:37:06 adr: 1 control: 0 mode: 0
track: 6 lba: 77742 ( 310968) 17:18:42 adr: 1 control: 0 mode: 0
track: 7 lba: 99024 ( 396096) 22:02:24 adr: 1 control: 0 mode: 0
track: 8 lba: 125824 ( 503296) 27:59:49 adr: 1 control: 6 mode: 1
track:lout lba: 188333 ( 753332) 41:53:08 adr: 1 control: 6 mode: -1
This is an example of a CD with an extra track of data (what you know as
CD-ROM), marketed in this case as CD-Extra, featuring a video and some
pictures. (More precise: this is disc with two sessions, audio and data)
This CD has 8 tracks, but only the first 7 audio tracks should be used to
calculate a MusicBrainz disc ID. The problem is that we can't use the
offset of track 8 as the "lead-out track" offset, because there is a gap
between the audio session and the data session. This gap is 11400 frames
long (11250 frames for lead-out/lead-in + 150 frames of pre-gap), so we
need to substract this value from the offset of track 8 to get the end of
track 7. The result is:
track 1: 150 (150 + 0)
track 2: 14109 (150 + 13959)
track 3: 33586 (150 + 33436)
track 4: 53077 (150 + 52927)
track 5: 65781 (150 + 65631)
track 6: 77892 (150 + 77742)
track 7: 99174 (150 + 99024)
lead-out track: 114574 (150 + 125824 - 11400)
Calculating the Disc ID
The CD Index algorithm simply takes the following pieces of data and runs
them through the [WWW] SHA-1 hash function:
*
First track number (normally one): 1 byte
*
Last track number: 1 byte
*
Lead-out track offset: 4 bytes
*
99 frame offsets: 4 bytes for each track
o
If there are less than 99 tracks (almost certainly), the
value 0 will be used instead.
Before the data is fed through the SHA-1 hash, it is converted to
upper-case hex ASCII using printf("%02X", value); for single-byte values
and printf("%08X", value); for 4-byte values.
Code is a better definition than English, so here is the code that
calculates the DiscID:
sprintf(temp, "%02X", pCDInfo->First-Track);
sha_update(&sha, (unsigned char*) temp, strlen(temp));
sprintf(temp, "%02X", pCDInfo->Last-Track);
sha_update(&sha, (unsigned char*) temp, strlen(temp));
for (i = 0; i < 100; i++) {
sprintf(temp, "%08X", pCDInfo->Frame-Offset[i]);
sha_update(&sha, (unsigned char*) temp, strlen(temp));
}
sha_final(digest, &sha);
Note that the lead-out track is stored in pCDInfo->Frame-Offset[0].
The resulting 20 byte SHA-1 signature is converted to a base 64 encoded
character ASCII string that is the disc ID. One uses base 64 encoding to
map arbitrary bytes onto a string of printable ASCII characters. It does
this by redistributing the 24 bits of three 8-bit codes each into four
6-bit codes from a table of 64 very common ASCII characters. Thus we end
up with an ID string of 28 characters, in the above audio CD example with
49HHV7Eb8UKF3aQiNmu1GR8vKTY-.
Note: This base 64 string is not the same one specified in RFC822. The
RFC822 spec uses +, /, and = characters, all of which are special
HTTP/URL characters. To avoid the problems with dealing with that, I
(Rob) used ., _, and -. For details on this, please refer to [WWW]
base64.c in the libdiscid source code.
This scheme has the advantage of being very simple (simple to understand,
simple to implement) and it is not ambiguous. However, two different
pressings of the same disc may have different IDs. To handle this case,
the MusicBrainz system will let a user check to see if the CD already
exists in the system under a different ID. If so, the system creates a
new association for the different pressing of the same CD.
If you'd like to know more about this, please [WWW] download the client
source code and check it out. The code is clean and self documenting.
If you are interested in creating other MusicBrainz clients and need the
SHA-1 source code, you can either dig through the CD Index source, or
check the [WWW] W3C page on SHA-1.