This program reliably crashes for me (usually a segfault): require ''rubygems'' require ''ferret'' reader=Ferret::Index::IndexReader.new ARGV fields=reader.field_infos.fields reader.max_doc.times{|n| fields.each{|field| reader.term_vector(n,field) } unless reader.deleted?(n) print "."; STDOUT.flush } As you can see, it just goes through the index, retrieving all the term vectors. I imagine term vectors must be enabled in at least one field to trigger this... I''ve seen this problem on two different systems, running debian and ubuntu. It may well be the result of something I''ve done wrong, but if so, I don''t know what. If anyone can provide some assistance with or information about this problem, I''d appreciate it.
On Wed, Nov 22, 2006 at 11:47:42AM -0800, Caleb Clausen wrote:> This program reliably crashes for me (usually a segfault): > > require ''rubygems'' > require ''ferret'' > > reader=Ferret::Index::IndexReader.new ARGV > fields=reader.field_infos.fields > reader.max_doc.times{|n| > fields.each{|field| > reader.term_vector(n,field) > } unless reader.deleted?(n) > print "."; STDOUT.flush > } > > As you can see, it just goes through the index, retrieving all the term > vectors. I imagine term vectors must be enabled in at least one field to > trigger this... > > I''ve seen this problem on two different systems, running debian and > ubuntu. It may well be the result of something I''ve done wrong, but if > so, I don''t know what. If anyone can provide some assistance with or > information about this problem, I''d appreciate it.hm, I have this snippet running here without problems (Ferret 0.10.13 on Debian): i = Ferret::I.new i << ''only a short test'' i << ''another document'' reader = i.reader fields = reader.field_infos.fields reader.max_doc.times{|n| fields.each{|field| puts reader.term_vector(n,field) } unless reader.deleted?(n) print "."; STDOUT.flush } Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jens Kraemer wrote:> hm, I have this snippet running here without problems (Ferret 0.10.13 on > Debian):[snippet snipped] Jens, thanks for trying it. Your snippet works perfectly for me as well, so I modified it til I could get it to fail again. I should have mentioned that I''m working with sizable indexes (10000-1000000 entries). Anyway, here''s another version that crashes for me. Here I build an index from my system''s man files: require ''rubygems'' require ''ferret'' require ''zlib'' i = Ferret::I.new #:path=>''temp_index'' manfiles=Dir["/usr/share/man/man*/*.gz"] manfiles.each{|mf| fd=Zlib::GzipReader.open(mf) i<<{:text=>fd.read} fd.close } reader=i.reader reader.max_doc.times{|n| reader.term_vector(n,:text) unless reader.deleted?(n) print "."; STDOUT.flush } This problem seems to be fairly sensitive to initial conditions. Printing out each term vector as it is found, like you did in your snippet, makes the problem go away. I also have 0.10.13, but the problem seems to be common to all the 0.10 series.