Hi, What is the best way to parse HTML? Or is there a simple way to convert a table to an array? I tried beautiful_soup and the built-in htmltools, but have trouble getting them to run. Any pointers? Thanks, Hari -- Posted via http://www.ruby-forum.com/.
My experience with Ruby''s html parsing tools are that they are generally badly documented and buggy. I have had more success piping the html I need parsed to a Perl program that uses Perl''s much, much better html libraries and then reading back the output into my Ruby program. It''s ugly, but it works. Check out search.cpan.org to see what perl classes are available. -Scott On 6/5/06, Hari Nara <nhariraj@yahoo.com> wrote:> Hi, > > What is the best way to parse HTML? > > Or is there a simple way to convert a table to an array? > > I tried beautiful_soup and the built-in htmltools, but have trouble > getting them to run. > > Any pointers? > > Thanks, Hari > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- Easily help charity when you shop: www.GiveTeam.org I''m a member of the Give Team, are you?
I used html-tools: http://rubyforge.org/projects/ruby-htmltools/ and found it pretty simple to use but powerfull html parser library. Hari Nara wrote:> Hi, > > What is the best way to parse HTML? > > Or is there a simple way to convert a table to an array? > > I tried beautiful_soup and the built-in htmltools, but have trouble > getting them to run. > > Any pointers? > > Thanks, Hari > >
Igor Anic wrote:> I used html-tools: > http://rubyforge.org/projects/ruby-htmltools/ > and found it pretty simple to use but powerfull html parser library.Thanks Scott for your reply. I am looking for a simple ruby solution as the page I am trying to parse has only few tables in it. Igor, I tried to get the html-tools to run but couldn''t succeed :( I tried to run the ebaysearch.rb demo program and couldn''t run it either. I am using Ruby 1.8 with RoR 1.1. Where you able to use the html-tools? Can you share a simple/sample code? Thanks, Hari -- Posted via http://www.ruby-forum.com/.
- download gem from http://rubyforge.org/projects/ruby-htmltools/ - install it: gem install "htmltools-1[1].09.gem" - try demo: http://ruby-htmltools.rubyforge.org/ or this one :-) : require ''html/tree'' p = HTMLTree::Parser.new(true, false) p.feed("Hello") print p.html.children[0].children[0].children[0].to_s Hari Nara wrote: Igor Anic wrote: I used html-tools: http://rubyforge.org/projects/ruby-htmltools/ and found it pretty simple to use but powerfull html parser library. Thanks Scott for your reply. I am looking for a simple ruby solution as the page I am trying to parse has only few tables in it. Igor, I tried to get the html-tools to run but couldn''t succeed :( I tried to run the ebaysearch.rb demo program and couldn''t run it either. I am using Ruby 1.8 with RoR 1.1. Where you able to use the html-tools? Can you share a simple/sample code? Thanks, Hari _______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Igor Anic wrote:> _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/railsHi Igor, Have you quoted/attached something? I don''t see it. Thanks, Hari -- Posted via http://www.ruby-forum.com/.
BeautifulSoup has been ported to ruby as RubyfulSoup. http://www.crummy.com/software/RubyfulSoup/ it really works wonders when one must screen-scrape. cheers, jean-pierre On 6/5/06, Hari Nara <nhariraj@yahoo.com> wrote:> > Hi, > > What is the best way to parse HTML? > > Or is there a simple way to convert a table to an array? > > I tried beautiful_soup and the built-in htmltools, but have trouble > getting them to run. > > Any pointers? > > Thanks, Hari > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060605/2ab1b6b7/attachment.html