thr3ads.net - Mechanize users - [Mechanize-users] Problem with end-of-file solved [May 2009]

If this information is useful, please help other people find it:
Share via:

Aaron Starr

2009-May-29 04:20 UTC

[Mechanize-users] Problem with end-of-file solved

Hello,

I''ve been struggling with a scraper that was getting an exception,
"end of file reached".

The solution for me:

The exception was happening after agent.get(uri), when the server
returned a 302 redirect. The "Location" header in the redirect
response looked like this:

? ?? ?? httpS://blah.blahdy-blah.com/blahblahblah/blah.html

For reasons that are not clear at all to me, the protocol of "httpS"
was throwing mechanize into a bit of a tail-spin.

To test that adjusting the location to "https://..." would fix the
problem, I made this ugly little test method:

  def get_blah(uri)
    save_redirect_ok = @web_agent.redirect_ok
    @web_agent.redirect_ok = false
    begin
      pg = @web_agent.get(uri)
      while ["301","302"].include?(pg.code)
        uri = pg.response[''location'']
        uri = uri.gsub(/^httpS:/, ''https:'')   # <--- TAH
DAH! Fixed it!
        @web_agent.log.info("Redirecting to #{uri}") if @web_agent.log
        pg = @web_agent.get(uri)
      end
      @page = pg
    ensure
      @web_agent.redirect_ok = save_redirect_ok
    end
  end

So, in case anyone else comes across a mysterious end-of-file
condition, you might check that your redirects are valid URLs, and
deal with it if they''re not.

Cheers,

Aaron

Mechanize users - May 2009 - Problem with end-of-file solved

[Mechanize-users] Problem with end-of-file solved