I'd search freshmeat/sourceforge/cpan before doing anything... there might
already be a library or package for general-purpose web-grepping..
If not you can probably use a "HTML to ascii" converter (or some
Perl/awk tag strip construct) which will reduce fluff in the pages. I find it
easier to throw away data I know I _don't_ want first (though, if I knew
regular expressions better, I probably wouldn't need this crutch :-).
I'd worry about how to download the pages as the very last thing (just save
yourself a few documents using the web browser).
If you're not on UNIX, I'd reccomend investigating Cygwin. It's
overkill for this task, but it has many other uses..
-Scott
<p><p><p><p><p><p><p>>
-----Original Message-----> From: Leo Currie [mailto:leo.currie@strath.ac.uk]
> Sent: Thursday, November 21, 2002 12:04 PM
> To: icecast@xiph.org
> Subject: Re: [icecast] [OT] Online music database
>
>
> Scott Prive wrote:
> > (Already noting the posts that say this is not available)
> >
> > Are you willing to deal with "non authoritative" answers in
> your query?
>
> Yep. Basically, we're running a student 'radio' station
> (online, mainly
> music, multiple streams), most of which will be playlisted. At the
> moment I'm cobbling together a script to let folk upload .ogg
> files to
> the database. It presents them with a form containing all the
> tag's off
> the file, and a few other details we need (including the
> publisher) for
> them to confirm / edit before it's stored.
> The publisher details are required for our returns to the music
> licensing people (who will no doubt double check everything!)
> So it would be nice for the script to at least _guess_ at which label
> the track was licensed by.
>
> > If so, you can write some scripts that talk CGI to a
> commercial music vendor. I use a (no longer available) MP3
> tagger called "MP3 Internet Renamer". It would go out and
> request a web page from the "All Music Guide" (
> http://www.allmusic.com/ ).
>
> Sounds good. My skills aren't up to it, but there is only one way to
> learn...
>
> > The application will then parse out information for the
> tags. The tagger is no longer available (and was closed
> source anyways) so you'll need to write your own regular
> expressions/parser, but this is very doable using a
> lightweight web browser such as "links" (or lynx, w3m, etc).
>
> wonder if I could do it using libwww-perl?
>
> > This method breaks if the music vendor has incorrect
> information, or if they change page formatting sufficently
> that your text search no longer finds the data.
>
> I'll give it a shot! I guess it's just a case of taking a
> close look at
> the html results from a bunch of queries, and working out a
> strategy for
> extracting the right info.
> Thanks for the link! I'll let you know how I get on...
>
> Leo
>
>
> >
> >
> >
> >>-----Original Message-----
> >>From: Leo Currie [mailto:leo.currie@strath.ac.uk]
> >>Sent: Wednesday, November 20, 2002 7:04 AM
> >>To: icecast@xiph.org
> >>Subject: [icecast] [OT] Online music database
> >>
> >>
> >>Sorry this isn't really icecast related, but
> >>I'm looking for an online music database that will let me
search by
> >>artist+track and retrieve the likely record label / music
> >>publisher that
> >>owns the copyright.
> >>I want to do this from within a script as well :)
> >>Hoping somebody might know of one off the top of their head...
> >>
> >>Cheers
> >>
> >>Leo
> >>
> >>
> >>
> >
> > --- >8 ----
> > List archives: http://www.xiph.org/archives/
> > icecast project homepage: http://www.icecast.org/
> > To unsubscribe from this list, send a message to
> 'icecast-request@xiph.org'
> > containing only the word 'unsubscribe' in the body. No
> subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
>
>
>
> --- >8 ----
> List archives: http://www.xiph.org/archives/
> icecast project homepage: http://www.icecast.org/
> To unsubscribe from this list, send a message to
> 'icecast-request@xiph.org'
> containing only the word 'unsubscribe' in the body. No
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
>
>
--- >8 ----
List archives: http://www.xiph.org/archives/
icecast project homepage: http://www.icecast.org/
To unsubscribe from this list, send a message to
'icecast-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.