Hi, guys: Can you please recommend a good crawler for Ferret? Nutch is pretty powerful in the Java side, do we have some thing is similar in Ruby? It will be great if the crawler also handlers incremental index update easily. Thanks Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090319/43e0111f/attachment-0001.html>
On 19.03.2009, at 22:32, Huang, Zijian(Victor) wrote:> Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is > pretty powerful in the Java side, do we have some thing is similar > in Ruby? It will be great if the crawler also handlers incremental > index update easily. >RDig can do http crawling, but cannot really be compared with Nutch feature- and performance wise as it was designed for intranet use, say indexing the web pages of a few hosts. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090319/4ddb1dfd/attachment.bin>
I wrote one called Suckr. http://goddard.net.nz/projects/suckr/ It does the crawling, including incremental update and provides a command line search interface. I''ve had some periodic stability issues with this on the old Debian box I''ve been using it on myself - please test thoroughly. It has some documentation in the README file. Please let me know if you have any questions. Cheers, Tim On Friday 20 March 2009 Huang, Zijian(Victor) wrote:> Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is pretty > powerful in the Java side, do we have some thing is similar in Ruby? It > will be great if the crawler also handlers incremental index update > easily. > > Thanks > > Victor-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090320/e167fd34/attachment.bin>
On Thu, 19 Mar 2009, Huang, Zijian(Victor) wrote:> Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is pretty > powerful in the Java side, do we have some thing is similar in Ruby? It > will be great if the crawler also handlers incremental index update > easily.And then this shows up in my news feeds: http://www.rubyinside.com/building-a-search-engine-in-200ish-lines-of-ruby-1655.html I''ve not followed the links off it, though, so YMMV.> > Thanks > > Victor > >Hugh