Jens Kraemer
2006-Mar-25 13:33 UTC
[Ferret-talk] [ANN] RDig - ferret-based website crawler/indexer
Hi! RDig is a small tool to build a Ferret index for the contents of a website or intranet. It contains a simple HTTP crawler and some support for extracting textual content from the fetched pages. I built this to implement a site-wide search for a recent project that combined a Rails application with lots of static html files generated by a CMS. Any feedback is very welcome! Rubyforge project page: http://rubyforge.org/projects/rdig RDocs: http://rdig.rubyforge.org/ `gem install rdig` should work once the gem has reached the rubyforge mirrors. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jan Prill
2006-Mar-25 15:30 UTC
[Ferret-talk] [ANN] RDig - ferret-based website crawler/indexer
Hi, Jens, great stuff. Just installed it and made a short test as described in the readme. It works as announced. Thanks for sharing this! The crawler has problems with frames but this is a quite common problem. I''ve had to configure it to the main content frame. You''ll probably know nutch. But here is a pointer anyway: http://lucene.apache.org/nutch/ just if you''re in search for some inspiration. Nutch is a great tool for webcrawling. I''ve used it and it worked great... Best Regards Jan Prill On 3/25/06, Jens Kraemer <kraemer at webit.de> wrote:> > Hi! > > RDig is a small tool to build a Ferret index for the contents of a > website or intranet. It contains a simple HTTP crawler and some support > for extracting textual content from the fetched pages. > > I built this to implement a site-wide search for a recent project > that combined a Rails application with lots of static html files > generated by a CMS. > > Any feedback is very welcome! > > Rubyforge project page: http://rubyforge.org/projects/rdig > RDocs: http://rdig.rubyforge.org/ > > `gem install rdig` should work once the gem has reached the rubyforge > mirrors. > > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060325/872b5828/attachment.htm