wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-26 14:15 UTC
Mechanize out of buffer space
I am trying to scrape a site and then its children to get data I relate
in tables, the only problems is that I keep getting an "OUT OF BUFFER
SPACE" error. Is there a way to clear the buffer after each iteration
or am I doing something wrong?
Here''s the code:
require ''rubygems''
require ''mechanize''
require ''active_record''
ActiveRecord::Base.establish_connection(
#connection goes here
)
class Major < ActiveRecord::Base
has_many :courses
end
class Course < ActiveRecord::Base
belongs_to :major
end
class Sections
def scrape(url)
agent = WWW::Mechanize.new
page = agent.get(url)
table = (page/''//table'')[6]
(table/"tr").each do |major|
@newMajor = Major.new
@newMajor.title = (major/''//td'').first.inner_html
@newMajor.abbrev = (major/''acronym'').inner_html
@newMajor.link_to =
(major/''a'').to_s.split(''"'')[1]
puts title,abbrev,link_to
end
end
end
class Classes
attr_writer :major_id
def scrape(url)
agent = WWW::Mechanize.new
page = agent.get("http://courses.tamu.edu/"+url.to_s)
(page/"//td[@class=''sectionheading'']").each do
|course|
course = course.inner_html.strip.split('' '')
course.pop
@newCourse = Course.new
@newCourse.major_id = @major_id
@newCourse.course_no = course[1]
@newCourse.name = course.slice!(3,course.length).join('' '')
@newCourse.save
end
end
end
AllMajors = Major.find(:all)
AllMajors.each do |course|
start = Time.now
newClass = Classes.new
newClass.major_id = course.id
newClass.scrape(course.link_to)
puts "Added courses for #{course.title}"
finish = Time.now
puts "Took #{finish-start} seconds"
end
puts "Finished scraping courses"
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---
wbsmith83-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2007-Jan-26 17:08 UTC
Re: Mechanize out of buffer space
After having to delve into the actual Hpricot source it turns out there''s a predefined buffer size and you can''t change it without actually editing the source and recompiling. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---