Hey all! Just to preface, I am fairly new to RoR, and brand new to
using hpricot.
I am using the following code to scrape this xpath:
"/html/body/div/div[5]/div/div[2]/div[2]/div[2]"
from this url:
"http://www.greatnonprofits.org/"
Here is my code to do so (taken from igvita.com''s related blogpost):
*************
require ''rubygems''
require ''open-uri''
require ''hpricot''
@url = "http://www.greatnonprofits.org/"
@response = ''''
begin
# open-uri RDoc: http://stdlib.rubyonrails.org/libdoc/open-uri/rdoc/index.html
open(@url, "User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "email-LLpXEq4AMUA@public.gmane.org",
"Referer" => "http://www.igvita.com/blog/") { |f|
puts "Fetched document: #{f.base_uri}"
puts "\t Content Type: #{f.content_type}\n"
puts "\t Charset: #{f.charset}\n"
puts "\t Content-Encoding: #{f.content_encoding}\n"
puts "\t Last Modified: #{f.last_modified}\n\n"
# Save the response body
@response = f.read
}
# HPricot RDoc: http://code.whytheluckystiff.net/hpricot/
doc = Hpricot(@response)
# Retrieve content
puts (doc/"/html/body/div/div[5]/div/div[2]/div[2]/div[2]").to_html
()
rescue Exception => e
print e, "\n"
end
***************
In my irb terminal, I get this:
***************
irb(main):031:0> load ''greatnonprofitsscraper.rb''
Fetched document: http://www.greatnonprofits.org/
Content Type: text/html
Charset: utf-8
Content-Encoding:
Last Modified: Tue Mar 31 23:43:52 -0700 2009
=> true
***************
Anyone know why this is happening? The code works with other urls/
xpaths. Can anyone specify for me why www.greatnonprofits.com is
different?
Thanks a million! I am quite frustrated, and I appreciate any help!!!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---