Robert Poor
2012-Jul-23 22:58 UTC
[Mechanize-users] Mechanize::Agent#post_connect_hook response != Mechanize#parse response
I''m working through an idea for a db-backed cache for Mechanize#get(). The idea is to use a Mechanize::Agent#post_connect_hook to cached any fetch data, and create a subclass of Mechanize::Agent#get() that checks the cache before calling super. I want to store the un-parsed (raw) page in the db, and call Mechanize#parse when there''s a cache hit, something along these lines: class CachingMechanize < Mechanize def initialize super self.agent.post_connect_hooks << lambda {|agent, uri, response, body| CachedWebPage.create!(:uri => uri.to_s, :response => response :contents => body) } end def get(uri, parameters = [], referer = nil, headers = {}) page = if (cached = CachedWebPage.find_by_uri(uri.to_s)) # cache hit parse(cached.uri, cached.response, cached.contents) else # cache miss -- post_connect_hook will write to cache super end yield page if block_given? page end end It turns out that Mechanize#parse needs an instance of Mechanize::Headers for its `response` argument. But in post_connect_hook, the `response` argument is an instance of Net::HTTPOK. So: * I need to store enough information in the db cache so Mechanize#parse(uri, response, contents) can parse it. How do I convert a Net::HTTPOK into a Mechanize::Header? * Am I pursuing a fools errand? That is, has someone already implemented this in some lovely gem? [Minutia: There are lots of details I''ve glossed over: I don''t want to cache responses that are the result of errors. I want to implement and honor an expires_at: timestamp in the cached record. I''ll have to decide if chunked responses need special handling. Probably other things.]
Robert Poor
2012-Jul-23 23:47 UTC
[Mechanize-users] Mechanize::Agent#post_connect_hook response != Mechanize#parse response
On Mon, Jul 23, 2012 at 3:58 PM, Robert Poor <rdpoor at gmail.com> wrote:> ... various drivel... > Mechanize#parse needs an instance of Mechanize::Headers for its `response` argument.Mea culpa -- I was wrong: both the post_connect_hook method and Mechanise#parse receive an instance of Net::HTTPOK. So my question becomes: how do I serialize and deserialize a Net::HTTPOK object so I can store and retrieve it from the db? Trying the obvious things using YAML and JSON don''t work: [1] YAML.load(YAML.dump(response)) raises "TypeError: allocator undefined for Proc." [2] ActiveSupport::JSON.decode(response) returns a Hash, not Net::HTTPOK (P.S.: I''m running Rails 3.2.1, Ruby 1.9.3).
Eric Hodel
2012-Jul-25 02:25 UTC
[Mechanize-users] Mechanize::Agent#post_connect_hook response != Mechanize#parse response
On Jul 23, 2012, at 15:58, Robert Poor wrote:> I''m working through an idea for a db-backed cache for Mechanize#get(). > The idea is to use a Mechanize::Agent#post_connect_hook to cached any > fetch data, and create a subclass of Mechanize::Agent#get() that > checks the cache before calling super. I want to store the un-parsed > (raw) page in the db, and call Mechanize#parse when there''s a cache > hit, something along these lines:#content is what you''ll want, it''s the raw response from the server after handling Content-Encoding.> [?] > > It turns out that Mechanize#parse needs an instance of > Mechanize::Headers for its `response` argument. But in > post_connect_hook, the `response` argument is an instance of > Net::HTTPOK.I''m responding without the source in front of me, but I think there''s an easier way than this approach. You should be able to implement a subclass of Mechanize::History which will give you all the checking for free. Mechanize automatically checks the history for the page by URI and returns it from history when available. It doesn''t follow HTTP caching rules properly, but it should be equivalent.> So: > > * I need to store enough information in the db cache so > Mechanize#parse(uri, response, contents) can parse it. How do I > convert a Net::HTTPOK into a Mechanize::Header?See below> * Am I pursuing a fools errand? That is, has someone already > implemented this in some lovely gem?Definitely possible. I haven''t heard of anyone implementing such a thing.> [Minutia: There are lots of details I''ve glossed over: I don''t want to > cache responses that are the result of errors. I want to implement > and honor an expires_at: timestamp in the cached record. I''ll have to > decide if chunked responses need special handling. Probably other > things.]So long as it isn''t an infinite stream you shouldn''t need to worry about chunking, zlib, etc. It should all be below a level that you care. On Jul 23, 2012, at 16:47, Robert Poor wrote:> On Mon, Jul 23, 2012 at 3:58 PM, Robert Poor <rdpoor at gmail.com> wrote: >> ... various drivel... >> Mechanize#parse needs an instance of Mechanize::Headers for its `response` argument. > > Mea culpa -- I was wrong: both the post_connect_hook method and > Mechanise#parse receive an instance of Net::HTTPOK. > > So my question becomes: how do I serialize and deserialize a > Net::HTTPOK object so I can store and retrieve it from the db? Trying > the obvious things using YAML and JSON don''t work:You can call to_hash and serialize that. Mechanize doesn''t require a Net::HTTPResponse, just hash-like access.