Eduardo Habkost
2008-Nov-13 18:29 UTC
[Ferret-talk] Ferret crash when using lazy_doc after closing IndexReader
Hi, As the Ferret site is down for a while, I am reporting this bug here, so it gets documented somewhere and people with more experience with Ferret can comment. I was hitting this crash easily on Sup, that uses Ferret for its index[1]. After some investigation I''ve found the cause of the crash, but I don''t know what would be the best behaviour for Ferret on this case. The crash happens when the IndexReader object where a lazy_doc was loaded from gets closed. After closing the IndexReader, trying to get a field from the lazy_doc will trigger a read from a closed and freed InputStream, sometimes causing segfaults, sometimes causing spurious I/O errors. I''ve initially seen the bug using Ferret 0.11.6, but I''ve tested this using Ferret from the git repository[2], and it happens there, also. Below is a simple script that will trigger the crash: =======================================# Example A require "ferret" p = "/tmp/ferret-test.#$$" puts "Using #{p} as storage" i = Ferret::Index::Index.new(:path => p) i << { :body => "Loren ipsum dolor "*1000 } doc = i[0] # this will cause the IndexReader to be closed by Ferret::Index::Index i << { :body => "another document" } puts doc[:body] ======================================= It happens because writing to the Ferret index will close the IndexReader. A simpler code that trigger the crash is: =======================================# Example B require "ferret" p = "/tmp/ferret-test.#$$" puts "Using #{p} as storage" puts "Generating a simple index" i = Ferret::Index::Index.new(:path => p) i << { :body => "Loren ipsum dolor "*1000 } i.close puts "Closed it. Will reopen and use it" i = Ferret::Index::IndexReader.new(p) doc = i[0] i.close puts doc[:body] ======================================= I see two issues here: The first one is the crash itself: what should happen to loaded lazy_docs when an IndexReader is closed? Lucene documentation[3] says an exception may be thrown on these cases. The same behavior could be the proper fix for Ferret on Example B, that can be considered invalid usage of the IndexReader anyway. The second issue is what should be the behaviour of Ferret::Index::Index after writing to the index with documents loaded (Example A). Should it really invalidate all lazy_docs read from the index on every write? That is the current behavior because its IndexReader is always closed when writing to the index, but I wonder if it is really desired. [1] http://rubyforge.org/pipermail/sup-talk/2008-November/001782.html [2] http://github.com/dbalmain/ferret [3] http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexReader.html#document(int,%20org.apache.lucene.document.FieldSelector) -- Eduardo