Well, this is a bit stronger than a proposal; this is "what I plan to do unless people see obvious flaws I missed"... The text comment header is the second (of three) header packets that begin a Vorbis bitstream. It is meant for short, text comments, not arbitrary metadata; arbitrary metadata will be put in a metadata stream, likely an XML stream type. We've discussed this in length-- several times :-) The comment header is a list of eight-bit-clean vectors; the number of vectors is bounded to 2^32 and the length of each vector is limited to 2^32 bytes. The vector length is encoded; the vector is not null terminated. In addition to the vector list, there is a single vector for vendor name (also 8 bit clean, length encoded in 32 bits). Libvorbis currently sets the vendor string to "Xiphophorus libVorbis I 20000508" (note: although the vector space in the ogg format is 8 bit lean, libvorbis currently assumes during encoding that the comments submitted for encapsulation are C style strings) Libvorbis comments are 'unstructured', so it's time to impose a little convention before things get out of hand. Given that the comments are meant for *simple*, *short* fields (think 'title', 'artist', etc), the structure should be simple. I say we pattern this after a simple UNIX style environment array with common 'variable' names agreed upon ahead of time. That is, fields look like: comment[0]="ARTIST=me"; comment[1]="TITLE=the sound of vorbis"; For the sake of completeness, I'm proposing: A case-insensitive field name that may consist of ASCII 0x20 through 0x7D, 0x3D ('=') excluded. ASCII 0x41 through 0x5A inclusive (A-Z) is to be considered equivalent to ASCII 0x61 through 0x7A inclusive (a-z). The field name is immediately followed by ascii 0x3D ('='); this equals sign is used to terminate the field name. 0x3D is followed by 8 bit clean field contents to the end of the field. Implications: field names should not be 'internationalized'; this is a concession to simplicity not an attempt to piss off the majority of the world that doesn't speak English. Field *contents*, however, should be internationalizable... suggestions on the proper encoding for that? We have the length of the entirety of the field and restrictions on the field name so that the field name is bounded in a known way. Thus we also have the length of the field contents. Individual 'vendors' may use non-standard field names within reason. The proper use of comment fields should be clear through context at this point. Abuse will be discouraged. Now all we need are a list of 'conventional' field anmes. A stream is not required to use any/all of these field names, they're suggested for interoperability. The suggestions below are also biased toward contemporary music album usage; analagous use for non music albums should be easy enough for people to figure out on their own... TRACK ALBUM ARTIST LABEL CONTENT (so there's the seed of a list. Please submit obvious one's I've forgotten...) Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
On Fri, May 12, 2000 at 02:10:19PM -0700, Monty wrote:> Implications: field names should not be 'internationalized'; this is a > concession to simplicity not an attempt to piss off the majority of the world > that doesn't speak English. Field *contents*, however, should be > internationalizable... suggestions on the proper encoding for that?I can't see any reason to not use UTF-8. It's 100% backwards compatible with the 7 bit US-ASCII that we all know, love, and use without thinking.> We have the length of the entirety of the field and restrictions on the field > name so that the field name is bounded in a known way. Thus we also have the > length of the field contents. > > Individual 'vendors' may use non-standard field names within reason. The > proper use of comment fields should be clear through context at this point. > Abuse will be discouraged. > > Now all we need are a list of 'conventional' field anmes. A > stream is not required to use any/all of these field names, they're > suggested for interoperability. The suggestions below are also > biased toward contemporary music album usage; analagous use for non > music albums should be easy enough for people to figure out on their > own... > > TRACK > ALBUM > ARTIST > LABEL > CONTENTI'd like "track number" for album encoding. I'm presuming "TRACK" is track name. -- David Terrell | "War is peace, Prime Minister, Nebcorp | freedom is slavery, dbt@meat.net | ignorance is strength http://wwn.nebcorp.com/ | Dishes are clean." - Chris Fester --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
I personally think it is a mistake to place any metadata in the comment section. It will be analogous to id3v1 and id3v2 tags on the mp3 side... even though people are moving to id3v2 we have to write out both to make sure we are readable by most players. Metadata should wait until it can be done correctly through the XML stream definition. elrod Monty wrote:> Well, this is a bit stronger than a proposal; this is "what I plan to do unless > people see obvious flaws I missed"... > > The text comment header is the second (of three) header packets that begin a Vorbis bitstream. It is meant for short, text comments, not arbitrary metadata; arbitrary metadata will be put in a metadata stream, likely an XML stream type. We've discussed this in length-- several times :-) > > The comment header is a list of eight-bit-clean vectors; the number of vectors is bounded to 2^32 and the length of each vector is limited to 2^32 bytes. The vector length is encoded; the vector is not null terminated. In addition to the vector list, there is a single vector for vendor name (also 8 bit clean, length encoded in 32 bits). Libvorbis currently sets the vendor string to "Xiphophorus libVorbis I 20000508" > > (note: although the vector space in the ogg format is 8 bit lean, libvorbis currently assumes during encoding that the comments submitted for encapsulation are C style strings) > > Libvorbis comments are 'unstructured', so it's time to impose a little > convention before things get out of hand. Given that the comments are meant > for *simple*, *short* fields (think 'title', 'artist', etc), the structure > should be simple. I say we pattern this after a simple UNIX style environment > array with common 'variable' names agreed upon ahead of time. > > That is, fields look like: > > comment[0]="ARTIST=me"; > comment[1]="TITLE=the sound of vorbis"; > > For the sake of completeness, I'm proposing: > > A case-insensitive field name that may consist of ASCII 0x20 through 0x7D, 0x3D ('=') excluded. ASCII 0x41 through 0x5A inclusive (A-Z) is to be considered equivalent to ASCII 0x61 through 0x7A inclusive (a-z). > > The field name is immediately followed by ascii 0x3D ('='); this equals sign is > used to terminate the field name. > > 0x3D is followed by 8 bit clean field contents to the end of the field. > > Implications: field names should not be 'internationalized'; this is a > concession to simplicity not an attempt to piss off the majority of the world > that doesn't speak English. Field *contents*, however, should be > internationalizable... suggestions on the proper encoding for that? > > We have the length of the entirety of the field and restrictions on the field > name so that the field name is bounded in a known way. Thus we also have the > length of the field contents. > > Individual 'vendors' may use non-standard field names within reason. The > proper use of comment fields should be clear through context at this point. > Abuse will be discouraged. > > Now all we need are a list of 'conventional' field anmes. A stream is not required to use any/all of these field names, they're suggested for interoperability. The suggestions below are also biased toward contemporary music album usage; analagous use for non music albums should be easy enough for people to figure out on their own... > > TRACK > ALBUM > ARTIST > LABEL > CONTENT > > (so there's the seed of a list. Please submit obvious one's I've forgotten...) > > Monty > > --- >8 ---- > List archives: http://www.xiph.org/archives/ > Ogg project homepage: http://www.xiph.org/ogg/--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Monty wrote:> Implications: field names should not be 'internationalized'; this is a > concession to simplicity not an attempt to piss off the majority of the world > that doesn't speak English. Field *contents*, however, should be > internationalizable... suggestions on the proper encoding for that?UTF-8 is your friend. UTF-8 encodes the entire Unicode character set, and it's an identity mapping for 7 bit ASCII.> [A variable name is a] case-insensitive field name that may consist > of ASCII 0x20 through 0x7D, 0x3D ('=') excluded. ASCII 0x41 through > 0x5A inclusive (A-Z) is to be considered equivalent to ASCII 0x61 > through 0x7A inclusive (a-z).Any reason you aren't restricting this to alphanumeric plus underscore? Do you really want variable names like ":-)" and "'%^]\''"?> TRACK > ALBUM > ARTIST > LABEL > CONTENT > > (so there's the seed of a list. Please submit obvious one's I've forgotten...)Um, TITLE? (Or is that what you mean by TRACK?) Are variables required to be unique? Or can I have, ARTIST=Dizzy Gillespie ARTIST=Sonny Rollins ARTIST=Sonny Stitt -- K<bob> kbob@jogger-egg.com, http://www.jogger-egg.com/ --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
On Fri, 12 May 2000, Monty wrote:> (so there's the seed of a list. Please submit obvious one's I've forgotten...)ID3, of course, has "GENRE", which I've never once found useful but some people might. (The following is AFAIK and could be completely wrong...) The genre is just a byte or two which gets mapped via a lookup table to some "official" genre list. I personally think it would be better to have just arbitrary text, since the list of genres is likely to end up somewhat silly and biased towards western music types (According to XMMS, ID3 has things like "Booty Bass" and "Dream" but nowhere to possibly fit my Ninja Tune CDs, which really deserve their own genre :) Lookup tables have their benefit if everyone used them and if all music was easily grouped (I could search my downloaded .ogg files for all "Industrial" songs, or whatever) but I really think arbitrary text is better. I was also thinking of some sort of...VERSION(maybe?) tag? Say, for example, I have 8 different versions of Orbital's "Nothing Left" (I do!)... It might be nice if I could set the artist and title to that and then in some other field, put "Tsunami One Remix". I could always cram that in the title, but... I'm not sure. So those are the only unmentioned two I can think of and I'm not even sure myself if they should be included. I'm leaning towards yes for both of them, but they could end up just being extra clutter. I wish CDs had something like an ISBN number. That would be nice to have, too. --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Monty <xiphmont@xiph.org> wrote:> > The text comment header is the second (of three) header packets that begin a Vorbis bitstream. It is meant for short, text comments, not arbitrary metadata; arbitrary metadata will be put in a metadata stream, likely an XML stream type. We've discussed this in length-- several times :-)Is the comment header optional? I'd like it to be. I always remove the id3 tag from mp3s, since the only thing it's "good" for in my experience is displaying a mis-spelled or wrong artist/song name in WinAmp, even though I've fixed the filename. A tool to edit/remove/insert the comment header would be nice. [Snip]> Individual 'vendors' may use non-standard field names within reason. The > proper use of comment fields should be clear through context at this point. > Abuse will be discouraged.Perhaps a vendor prefix should be mandatory, in order to avoid polluting the global namespace? Private individuals could use their name or handle as the vendor string, companies should of course use the company name. I.e. something like this: J_RANDOM_HACKER.FIELDNAME=Something RED_HAT.FIELDNAME=Something else MICROSOFT.FIELDNAME= ;) -- Patrik Rådman · patrik at iki dot fi · http://www.iki.fi/patrik/ "With sufficient thrust, pigs fly just fine." -- RFC 1925 --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
>The comment header is a list of eight-bit-clean vectors; the number ofvectors>is bounded to 2^32 and the length of each vector is limited to 2^32 bytes.The>vector length is encoded; the vector is not null terminated. In additionto the>vector list, there is a single vector for vendor name (also 8 bit clean,length>encoded in 32 bits). Libvorbis currently sets the vendor string to"Xiphophorus>libVorbis I 20000508"The vector length is encoded? This information isn't available, currently, if so. The vorbis_comments structure gives the NUMBER of comments, the vendor string, and the comments themselves - but right now, we HAVE to assume they're null terminated. This would be a good thing to fix soon, if it's not meant to be like this, before there's too much software around that assumes they will always be null terminated. Michael --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
On Fri, 12 May 2000, Monty wrote:> Well, this is a bit stronger than a proposal... > > ...the length of each vector is limited to 2^32 bytes...You may have meant 2^32-1, assuming 0 is a valid length which it probably ought to be, allowing a user to make positional assignments within the comment vector, leaving some comments null. I have used a counted-string format in the past where the count is variable length, something like UTF-8. If the count is encoded in N bytes, the high N-1 bits of the leading byte are 1, the next bit is 0, and the count is encoded in the remaining bits plus the following N-1 bytes, with the Nth byte being the LSB of the count. This has the pleasing result that almost all counts are just a single byte. It takes only a tiny amount of code to encode and decode this format and this just seems like a nice thing to do for a format whose whole raison d'etre is, after all compression. Another nice effect is to eliminate endedness considerations, at least in this one place. On the other hand, and inconsistently, I'd think that compressing the text fields would be an unecessarily awkwardness. -- Daniel --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Hello I have just read through the archieve on this topic. One thing I didn't see is a field for genre. Is it possible to incorporate a field for that also ? Thanks Karl --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
---------- Forwarded Message ---------- Subject: Re: [vorbis-dev] comment field proposal Date: Sat, 13 May 2000 13:35:22 -0700 From: Monty <xiphmont@xiph.org>> > ...the length of each vector is limited to 2^32 bytes... > > You may have meant 2^32-1, assuming 0 is a valid length which it probably > ought to be, allowing a user to make positional assignments within the comment > vector, leaving some comments null.Yes, point taken, correction noted.> It takes only a tiny amount of code to encode and decode this format and this > just seems like a nice thing to do for a format whose whole raison d'etre is, > after all compression.There are two reasons not to do this. First, there's no reason. This is not a place where saving any bits is vital. Second, the format is frozen... we can't change now unless the implications of not changing are apocalyptic ;-)> Another nice effect is to eliminate endedness considerations, at least in this > one place.This is already eliminated. Vorbis is a true bitstream, and the mapping to an octet stream is well defined (in summary, least significant bit of least significant byte first). All vorbis storage goes through this mapping. Monty ------------------------------------------------------- -- Daniel --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/