I am trying to spider a site using Hpricot, but I keep getting out of buffer error. It will only let me do about two sites at a time, is there a way to clear the buffer after I process each page so I won''t blow the buffer? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Can you post the code you are using? On 1/24/07, wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > > I am trying to spider a site using Hpricot, but I keep getting out of > buffer error. It will only let me do about two sites at a time, is > there a way to clear the buffer after I process each page so I won''t > blow the buffer? > > > > >-- Thanks, -Steve http://www.stevelongdo.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-25 12:45 UTC
Re: Buffer problem
require ''rubygems''
require ''hpricot''
require ''open-uri''
require ''active_record''
ActiveRecord::Base.establish_connection(
#connection info
)
class Major < ActiveRecord::Base
has_many :courses
end
class Course < ActiveRecord::Base
belongs_to :major
end
def scrape(url)
doc = Hpricot(open(url))
tables =(doc/"table")
(tables[6]/"tr").each do |major|
createMajor major
end
end
def createMajor(data)
newMajor = Major.new
newMajor.title = data.search("td").first.inner_html
newMajor.abbrev =data.search("acronym").inner_html
newMajor.link_to =
data.search("a").to_s.split(''"'')[1]
puts newMajor.save
end
def courses(url)
puts url
doc = Hpricot(open("http://courses.tamu.edu/"+url.to_s))
courses = (doc/"//td[@class=''sectionheading'']")
courses.each do |course|
createCourse course
end
end
def createCourse(data)
course = data.inner_html.strip.split('' '')
major = course[0]
course_no = course[1]
puts major,course_no
course.pop
course_name = course.slice!(3,course.length).join('' '')
puts course_name
end
AllMajors = Major.find(:all, :limit=>3,:offset=>0)
AllMajors.each do |course|
courses(course.link_to,course.id)
end
#scrape(url goes here)
This what I was last test with. I had it where scrape would call
courses, but that broke the buffer before I even got output, this
outputs the data from two pages and then breaks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---
wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-26 11:54 UTC
Re: Buffer problem
Anyone someone has to know something about the buffer? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---