I need to parse and redisplay in html wikipedia articles (formatted with the wikipedia style). Has anyone encountered such a library in ruby ? Any libraries that are good at that? Thanks --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
David wrote:> I need to parse and redisplay in html wikipedia articles (formatted > with the wikipedia style). Has anyone encountered such a library in > ruby ? Any libraries that are good at that? > > Thanks > > > > > >Check out http://shanesbrain.net/articles/2006/10/02/screen-scraping-wikipedia Makes it dead easy to roll your own. Chris --------------------------------------- http://www.autopendium.co.uk Stuff about old cars --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Usually you shouldn''t use bots on wikipedia, but should download the free database instead and use that. Read about their policy here: http://en.wikipedia.org/wiki/Wikipedia:Bots If you have your own mediawiki install and want to use a bot, you can check out pywikipedia bot: http://sourceforge.net/projects/pywikipediabot/ It''s not in ruby, but it works great. On Apr 12, 2007, at 8:24 AM, David wrote:> > I need to parse and redisplay in html wikipedia articles (formatted > with the wikipedia style). Has anyone encountered such a library in > ruby ? Any libraries that are good at that? > > Thanks > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Actually, I''m not entirely sure that you shouldn''t use bots at all on the Wikipedia. According to the link you provided: "*Robots* or *bots* are automatic processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that interact with Wikipedia as though they were human editors" That last bit sounds like they''re talking about a very specific kind of bot and not just a scraper. RSL On 4/12/07, Andy Triboletti <andy-kRVt9sEkMUDQT0dZR+AlfA@public.gmane.org> wrote:> > > Usually you shouldn''t use bots on wikipedia, but should download the > free database instead and use that. > Read about their policy here: > http://en.wikipedia.org/wiki/Wikipedia:Bots > > If you have your own mediawiki install and want to use a bot, you can > check out pywikipedia bot: > http://sourceforge.net/projects/pywikipediabot/ It''s not in ruby, > but it works great. > > On Apr 12, 2007, at 8:24 AM, David wrote: > > > > > I need to parse and redisplay in html wikipedia articles (formatted > > with the wikipedia style). Has anyone encountered such a library in > > ruby ? Any libraries that are good at that? > > > > Thanks > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
njmacinnes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Apr-12 21:11 UTC
Re: Wikipedia Parser
"*Robots* or *bots* are automatic processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that interact with Wikipedia as though they were human editors." There''s nothing against screen-scraping there. That policy is about bots which edit content. Otherwise, Google would be breaking WP policy. This is taking the discussion a little off topic though. -Nathan On 12/04/07, Andy Triboletti <andy-kRVt9sEkMUDQT0dZR+AlfA@public.gmane.org> wrote:> > > Usually you shouldn''t use bots on wikipedia, but should download the > free database instead and use that. > Read about their policy here: > http://en.wikipedia.org/wiki/Wikipedia:Bots > > If you have your own mediawiki install and want to use a bot, you can > check out pywikipedia bot: > http://sourceforge.net/projects/pywikipediabot/ It''s not in ruby, > but it works great. > > On Apr 12, 2007, at 8:24 AM, David wrote: > > > > > I need to parse and redisplay in html wikipedia articles (formatted > > with the wikipedia style). Has anyone encountered such a library in > > ruby ? Any libraries that are good at that? > > > > Thanks > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
I wrote that article a while ago. It''ll be interesting to use WWW::Mechanize, or better yet, scRUBYt, which use Hpricot in the backend anyway. Shane http://shanesbrain.net On 4/12/07, Chris T <ctmailinglists-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> > David wrote: > > I need to parse and redisplay in html wikipedia articles (formatted > > with the wikipedia style). Has anyone encountered such a library in > > ruby ? Any libraries that are good at that? > > > > Thanks > > > > > > > > > > > > Check out > http://shanesbrain.net/articles/2006/10/02/screen-scraping-wikipedia > Makes it dead easy to roll your own. > Chris > --------------------------------------- > http://www.autopendium.co.uk > Stuff about old cars > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
If you just need to cache some pages for displaying later, screen scraping Wikipedia is a good choice compared to downloading the db. If you''re going to be parsing and redisplaying the content in real time that is against Wikipedia''s policy. See http://en.wikipedia.org/wiki/ Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.or g_at_runtime.3F On Apr 12, 2007, at 11:11 AM, njmacinnes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:> "Robots or bots are automatic processes that interact with > Wikipedia as though they were human editors." There''s nothing > against screen-scraping there. That policy is about bots which edit > content. Otherwise, Google would be breaking WP policy. > This is taking the discussion a little off topic though. > -Nathan > > On 12/04/07, Andy Triboletti <andy-kRVt9sEkMUDQT0dZR+AlfA@public.gmane.org > wrote: > > Usually you shouldn''t use bots on wikipedia, but should download the > free database instead and use that. > Read about their policy here: > http://en.wikipedia.org/wiki/Wikipedia:Bots > > If you have your own mediawiki install and want to use a bot, you can > check out pywikipedia bot: > http://sourceforge.net/projects/pywikipediabot/ It''s not in ruby, > but it works great. > > On Apr 12, 2007, at 8:24 AM, David wrote: > > > > > I need to parse and redisplay in html wikipedia articles (formatted > > with the wikipedia style). Has anyone encountered such a library in > > ruby ? Any libraries that are good at that? > > > > Thanks > > > > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---