Hello,
I''ve been struggling with a scraper that was getting an exception,
"end of file reached".
The solution for me:
The exception was happening after agent.get(uri), when the server
returned a 302 redirect. The "Location" header in the redirect
response looked like this:
? ?? ?? httpS://blah.blahdy-blah.com/blahblahblah/blah.html
For reasons that are not clear at all to me, the protocol of "httpS"
was throwing mechanize into a bit of a tail-spin.
To test that adjusting the location to "https://..." would fix the
problem, I made this ugly little test method:
def get_blah(uri)
save_redirect_ok = @web_agent.redirect_ok
@web_agent.redirect_ok = false
begin
pg = @web_agent.get(uri)
while ["301","302"].include?(pg.code)
uri = pg.response[''location'']
uri = uri.gsub(/^httpS:/, ''https:'') # <--- TAH
DAH! Fixed it!
@web_agent.log.info("Redirecting to #{uri}") if @web_agent.log
pg = @web_agent.get(uri)
end
@page = pg
ensure
@web_agent.redirect_ok = save_redirect_ok
end
end
So, in case anyone else comes across a mysterious end-of-file
condition, you might check that your redirects are valid URLs, and
deal with it if they''re not.
Cheers,
Aaron