I am trying to spider a site using Hpricot, but I keep getting out of buffer error. It will only let me do about two sites at a time, is there a way to clear the buffer after I process each page so I won''t blow the buffer? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Can you post the code you are using? On 1/24/07, wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > > I am trying to spider a site using Hpricot, but I keep getting out of > buffer error. It will only let me do about two sites at a time, is > there a way to clear the buffer after I process each page so I won''t > blow the buffer? > > > > >-- Thanks, -Steve http://www.stevelongdo.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-25 12:45 UTC
Re: Buffer problem
require ''rubygems'' require ''hpricot'' require ''open-uri'' require ''active_record'' ActiveRecord::Base.establish_connection( #connection info ) class Major < ActiveRecord::Base has_many :courses end class Course < ActiveRecord::Base belongs_to :major end def scrape(url) doc = Hpricot(open(url)) tables =(doc/"table") (tables[6]/"tr").each do |major| createMajor major end end def createMajor(data) newMajor = Major.new newMajor.title = data.search("td").first.inner_html newMajor.abbrev =data.search("acronym").inner_html newMajor.link_to = data.search("a").to_s.split(''"'')[1] puts newMajor.save end def courses(url) puts url doc = Hpricot(open("http://courses.tamu.edu/"+url.to_s)) courses = (doc/"//td[@class=''sectionheading'']") courses.each do |course| createCourse course end end def createCourse(data) course = data.inner_html.strip.split('' '') major = course[0] course_no = course[1] puts major,course_no course.pop course_name = course.slice!(3,course.length).join('' '') puts course_name end AllMajors = Major.find(:all, :limit=>3,:offset=>0) AllMajors.each do |course| courses(course.link_to,course.id) end #scrape(url goes here) This what I was last test with. I had it where scrape would call courses, but that broke the buffer before I even got output, this outputs the data from two pages and then breaks. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-26 11:54 UTC
Re: Buffer problem
Anyone someone has to know something about the buffer? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---