[Cross posted to Ruby on Rails Forum and Mechanize mailing list.]
I''m using Mechanize for page scraping (Ruby 1.9.2 / Rails 3.0.5 /
Mechanize 2.0.1). I''m seeing a case where a single
agent.get(url)
generates two HTTP GETs. Why is this happening?
The response to the first GET is a 200 (no redirect) and doesn''t have
any meta-refresh. I don''t see why Mechanize is issuing the second GET
(which happens to be failing with an EOFError with Content-Length / body
length mismatch).
Details: I''m using the nifty Charles web proxy debugger to monitor
browser / server interactions.
====In the original browser + server exchange, I see:
Req: POST /login/Login HTTP/1.1
Rsp: sets two cookies + HTTP/1.1 302 Moved Temporarily =>
https://online.nationalgridus.com/eservice_enu/
Req: GET /eservice_enu/ HTTP/1.1
Rsp: set a cookie + HTTP/1.1 200 OK
The body contains onLoad Javascript to set this.location
''start.swe?SWECmd=Start''
Req: GET /eservice_enu/start.swe?SWECmd=Start HTTP/1.1
Rsp: sets four cookies + HTTP/1.1 200 OK
====In the mechanize = server exchange:
My code: page2 = agent.submit(login_form)
Req: POST /login/Login HTTP/1.1
Rsp: set two cookies + HTTP/1.1 302 Moved Temporarily =>
https://online.nationalgridus.com/eservice_enu/
Req: GET /eservice_enu/ HTTP/1.1
Rsp: set a cookie + HTTP/1.1 200 OK
The body contains onLoad Javascript to set this.location
''start.swe?SWECmd=Start'', but Mechanize can''t follow
that automatically.
So I do an agent.get() to emulate it:
My code: page3
agent.get("https://online.nationalgridus.com/eservice_enu/start.swe?SWECmd=Start")
Req: GET /eservice_enu/start.swe?SWECmd=Start HTTP/1.1
Rsp: sets four cookies + HTTP/1.1 200 OK
Note that at this point both the user driven and mechanize driven
interactions appear to be identical. But Mechanize appears to generate
another GET all by itself:
Req: GET /eservice_enu/start.swe?SWECmd=Start HTTP/1.1
Rsp: sets four cookies + HTTP/1.1 200 OK
... and this response throws an EOFError:
Content-Length (536) does not match response body length (524) -
EOFError
====So: Why did Mechanize generate that last GET without me asking it to?
Was the EOFError actually in the first GET and it''s doing a retry? If
so, how do I work around the length mismatch?
--
Posted via http://www.ruby-forum.com/.
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.