thr3ads.net - Vorbis - [vorbis] Obtaining tag-independent track uniqueness? [Mar 2002]

If this information is useful, please help other people find it:
Share via:

Tom Wadzinski

2002-Mar-11 17:43 UTC

[vorbis] Obtaining tag-independent track uniqueness?

Hello:

As seen in some of the MP3-oriented P2P programs and audio organizing
tools, the underlying uniqueness of a given mp3 file can be learned (for
the most part) by, for instance, taking a hash of the first 300,000
bytes of the non-id3 tag content of an mp3 file to obtain a content
signature (This hash could then further be paired with the length of the
non tag portion of the entire file for an even more unique signature).
This signature could then be stored in an organizer program DB (or in a
p2p system DB) such that even though the filenames and tag content can
change or be from different sources, the underlying audio content can be
tied back to a DB entry via the signature.  Note that this scheme is
understood to not work for identifying identical content encoded under
different bitrate/quality settings.

Can anyone guide me on whether or not there any way to accomplish the
same goal with Vorbis using the existing APIs, that is, getting at the
first x bytes of non-tagging/metadata content of a stream, and
similarly, getting the length of the non-tagging/metadata portion of an
entire file stream?  Or, if not that, any ideas on obtaining
"uniqueness" through another means in Vorbis?

One might say, "Why not just put a unique identifier in a tag in each
file, and not worry about this hash business?"  To preemptively respond
to this, arguments against this approach follow:
1) The DB program (organizer or p2p system) might not have write access
to the files, and thus can't set an identifier tag.  For instance, users
with large collections (let's call large 20 - 30,000 files) are likely
to have a good portion of it set to read-only(not to mention read-only
media), for archival purposes.  Also, large collection holders probably
have a specific tagging/metadata program that they trust, and don't want
a program that they just downloaded deciding to write to every single
one of their content files.
2) Files can't be checked for underlying audio content duplication,
other than through tagging / file size methods, which is generally
inadequate, due to different tagging/filename schemes.

Another might say, "How about decoding the first x seconds, and taking a
hash of that, to get uniqueness?".  This could work, except that
different decoder implementations/versions might produce different
hashes for the same file, and decoding is likely to be a much slower
technique.

Tom Wadzinski

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Greg Wooledge

2002-Mar-11 18:07 UTC

head link

[vorbis] Obtaining tag-independent track uniqueness?

Tom Wadzinski (twadzins@yahoo.com) wrote:
> Can anyone guide me on whether or not there any way to accomplish the
> same goal with Vorbis using the existing APIs, that is, getting at the
> first x bytes of non-tagging/metadata content of a stream
> One might say, "Why not just put a unique identifier in a tag in each
> file, and not worry about this hash business?"
Because you can't write to the files.  You covered that in the part I
snipped.
> Another might say, "How about decoding the first x seconds, and taking
a
> hash of that, to get uniqueness?".
Because MP3 decoding, at least, is nondeterministic.  I'm not sure about
Vorbis.

Why don't you just hash the first 300 kbytes like giFT does?  Then you
can be content-neutral, and your implementation is greatly simplified.
Third-party implementations might even get it right! ;-)

Besides that, imagine I download an Ogg from a P2P network and find that
it's not tagged.  I might decide to add at least ARTIST and TITLE tags.
If you hash my tagged version against the original, but ignore the tags,
then you get the same hash for both files.  That's no good -- you can't
swarm the file if it's not the same file on both sources.

So, if you skip over some part of the file arbitrarily, you ruin the
whole purpose of making a hash in the first place.

-- 
Greg Wooledge                  |   "Truth belongs to everybody."
greg@wooledge.org              |    - The Red Hot Chili Peppers
http://wooledge.org/~greg/     |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: part
Type: application/pgp-signature
Size: 241 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/vorbis/attachments/20020311/822a50a6/part-0001.pgp

Ross Levis

2002-Mar-11 18:10 UTC

head link

[vorbis] Obtaining tag-independent track uniqueness?

This is an interesting question which I can't fully answer.

I presume this technique is used for MP3 files in some file sharing apps
such as Morpheus & Audiogalaxy.  A similar routine will have to be
developed for Ogg files.

Something like this will not be too difficult.  I think all that is
needed is to omit all page headers and comments.  We would need to hear
from the ogg file format experts out there to confirm this.  I doubt it
is possible with existing library functions.

Regards,
Ross Levis.

Tom Wadzinski wrote:> Hello:
> 
> As seen in some of the MP3-oriented P2P programs and audio organizing
> tools, the underlying uniqueness of a given mp3 file can be 
> learned (for
> the most part) by, for instance, taking a hash of the first 300,000
> bytes of the non-id3 tag content of an mp3 file to obtain a content
> signature (This hash could then further be paired with the 
> length of the
> non tag portion of the entire file for an even more unique signature).
> This signature could then be stored in an organizer program 
> DB (or in a
> p2p system DB) such that even though the filenames and tag content can
> change or be from different sources, the underlying audio 
> content can be
> tied back to a DB entry via the signature.  Note that this scheme is
> understood to not work for identifying identical content encoded under
> different bitrate/quality settings.
> 
> Can anyone guide me on whether or not there any way to accomplish the
> same goal with Vorbis using the existing APIs, that is, getting at the
> first x bytes of non-tagging/metadata content of a stream, and
> similarly, getting the length of the non-tagging/metadata 
> portion of an
> entire file stream?  Or, if not that, any ideas on obtaining
> "uniqueness" through another means in Vorbis?
> 
> One might say, "Why not just put a unique identifier in a tag in each
> file, and not worry about this hash business?"  To 
> preemptively respond
> to this, arguments against this approach follow:
> 1) The DB program (organizer or p2p system) might not have 
> write access
> to the files, and thus can't set an identifier tag.  For 
> instance, users
> with large collections (let's call large 20 - 30,000 files) are likely
> to have a good portion of it set to read-only(not to mention read-only
> media), for archival purposes.  Also, large collection 
> holders probably
> have a specific tagging/metadata program that they trust, and 
> don't want
> a program that they just downloaded deciding to write to every single
> one of their content files.
> 2) Files can't be checked for underlying audio content duplication,
> other than through tagging / file size methods, which is generally
> inadequate, due to different tagging/filename schemes.
> 
> Another might say, "How about decoding the first x seconds, 
> and taking a
> hash of that, to get uniqueness?".  This could work, except that
> different decoder implementations/versions might produce different
> hashes for the same file, and decoding is likely to be a much slower
> technique.
> 
> Tom Wadzinski
> 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 
> 'vorbis-request@xiph.org'
> containing only the word 'unsubscribe' in the body.  No 
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
> 
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Possibly Parallel Threads

Search for more seemingly similar threads

Vorbis - Mar 2002 - Obtaining tag-independent track uniqueness?

[vorbis] Obtaining tag-independent track uniqueness?

[vorbis] Obtaining tag-independent track uniqueness?

[vorbis] Obtaining tag-independent track uniqueness?

Possibly Parallel Threads