thr3ads.net - Rails - [ANN] scRUBYt! 0.2.0 - WWW::Mechanize and Hpricot on steroids [Feb 2007]

If this information is useful, please help other people find it:
Share via:
Peter Szinek
2007-Feb-05 23:40 UTC
[ANN] scRUBYt! 0.2.0 - WWW::Mechanize and Hpricot on steroids

Hello,

I am pleased to announce the first public release of scRUBYt!, a simple 
to learn and use, yet very powerful web extraction framework written in 
Ruby. Details follow from the README:

================================================scRUBYt! - WWW::Mechanize and
Hpricot on steroids
================================================
Navigate through the Web, Extract, query, transform and save relevant 
data from the Web page of your interest by the concise and easy to use DSL.

Do you think that Mechanize and Hpricot are powerful libraries? Youre 
right, they are, indeed - hats off to their authors: without these libs 
scRUBYt! could not exist now! I have been wondering whether their 
functionality could be still enhanced further - so I took these two 
powerful ingredients, threw in a handful of smart heuristics, wrapped 
them around with a chunky DSL coating and sprinkled the whole stuff with 
a lots of convention over configuration(tm) goodies - and  enter 
scRUBYt! and decide it yourself.

==================================================Wait why do we need one more
web-scraping toolkit?
==================================================
After all, we have HPricot, and Rubyful-soup, and Mechanize, and scrAPI, 
and ARIEL and scrapes and  Well, because scRUBYt! is different. It has 
an entirely different philosophy, underlying techniques, theoretical 
background, use cases, todo list, real-life scenarios etc. - shortly it 
should be used in different situations with different requirements than 
the previously mentioned ones.

If you need something quick and/or would like to have maximal control 
over the scraping process, I recommend HPricot. Mechanize shines when it 
comes to interaction with Web pages. Since scRUBYt! is operating based 
on XPaths, sometimes you will chose scrAPI because CSS selectors will 
better suit your needs. The list goes on and on, boiling down to the 
good old mantra: use the right tool for the right job!

I hope there will be also times when you will want to experiment with 
Pandoras box and reach after the power of scRUBYt! :-)

================================Sounds fine - show me an example!
================================
Lets apply the "show dont tell" principle. Okay, here we go:
----------------------------------------
ebay_data = Scrubyt::Extractor.define do

   fetch ''http://www.ebay.com/''
   fill_textfield ''satitle'', ''ipod''
   submit
   click_link ''Apple iPod''

   record do
     item_name ''APPLE NEW IPOD MINI 6GB MP3 PLAYER SILVER''
     price ''$71.99''
   end
   next_page ''Next >'', :limit => 5

end
----------------------------------------
output:

<root>
     <record>
       <item_name>APPLE IPOD NANO 4GB - PINK - MP3
PLAYER</item_name>
       <price>$149.95</price>
     </record>
     <record>
       <item_name>APPLE IPOD 30GB BLACK VIDEO/PHOTO/MP3
PLAYER</item_name>
       <price>$172.50</price>
     </record>
     <record>
       <item_name>NEW APPLE IPOD NANO 4GB PINK MP3
PLAYER</item_name>
       <price>$171.06</price>
     </record>
     <!-- another 200+ results -->
</root>

This was a relatively beginner-level example (scRUBYt knows a lot more 
than this and there are much complicated extractors than the above one) 
- yet it did a lot of things automagically. First of all, it 
automatically loaded the page of interest (by going to ebay.com, 
automatically searching for ipods and narrowing down the results by 
clicking on Apple iPod), then it extracted all the items that looked 
like the specified example (which btw described also how the output 
structure should look like) - on the first 5 result pages. Not so bad 
for about 10 lines of code, eh?

=======================================OK, OK, I believe you, what should I do?
=======================================
Check out the online README at:

http://scrubyt.rubyforge.org/files/README.html

there, scroll to the on-line version of this section (OK, OK, I believe 
you, what should I do?) - there are plenty of links to get you started.

Enjoy!

Cheers,
Peter

__
http://www.rubyrailways.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---
Rails - Feb 2007 - scRUBYt! 0.2.0 - WWW::Mechanize and Hpricot on steroids

[ANN] scRUBYt! 0.2.0 - WWW::Mechanize and Hpricot on steroids