Neville Burnell
2006-Sep-14 06:19 UTC
[Ferret-talk] Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit
I''m playing with "updating" docs in my index, and I think I''ve found bug with IndexWriter counting deleted docs. Script and output follow: ====require ''rubygems'' require ''ferret'' p Ferret::VERSION @doc = {:id => ''44'', :name => ''fred'', :email => ''abc at ozemail.com.au''} @dir = Ferret::Store::RAMDirectory.new def add_then_delete_fred @writer = Ferret::Index::IndexWriter.new(:dir => @dir) p "adding doc :id=#{@doc[:id]}" @writer << @doc p "doc_count=#{@writer.doc_count}" p "deleting doc :id=#{@doc[:id]}" @writer.delete(:id, @doc[:id]) p "doc_count=#{@writer.doc_count}" @writer.commit @writer.close @writer = nil end add_then_delete_fred add_then_delete_fred add_then_delete_fred @reader = Ferret::Index::IndexReader.new(@dir) p "reader count=#{@reader.num_docs}" @writer = Ferret::Index::IndexWriter.new(:dir => @dir) p "writer count=#{@writer.doc_count}" == $>ruby test_delete.rb "0.10.4" "adding doc :id=44" "doc_count=1" "deleting doc :id=44" "doc_count=1" "adding doc :id=44" "doc_count=2" "deleting doc :id=44" "doc_count=2" "adding doc :id=44" "doc_count=3" "deleting doc :id=44" "doc_count=3" "reader count=0" "writer count=3"
David Balmain
2006-Sep-14 06:55 UTC
[Ferret-talk] Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit
On 9/14/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> I''m playing with "updating" docs in my index, and I think I''ve found bug > with IndexWriter counting deleted docs. Script and output follow: > > ====> require ''rubygems'' > require ''ferret'' > > p Ferret::VERSION > > @doc = {:id => ''44'', :name => ''fred'', :email => ''abc at ozemail.com.au''} > > @dir = Ferret::Store::RAMDirectory.new > > def add_then_delete_fred > @writer = Ferret::Index::IndexWriter.new(:dir => @dir) > > p "adding doc :id=#{@doc[:id]}" > @writer << @doc > p "doc_count=#{@writer.doc_count}" > > p "deleting doc :id=#{@doc[:id]}" > @writer.delete(:id, @doc[:id]) > p "doc_count=#{@writer.doc_count}" > > @writer.commit > @writer.close > @writer = nil > end > > add_then_delete_fred > add_then_delete_fred > add_then_delete_fred > > @reader = Ferret::Index::IndexReader.new(@dir) > p "reader count=#{@reader.num_docs}" > > @writer = Ferret::Index::IndexWriter.new(:dir => @dir) > p "writer count=#{@writer.doc_count}" > > ==> > $>ruby test_delete.rb > "0.10.4" > "adding doc :id=44" > "doc_count=1" > "deleting doc :id=44" > "doc_count=1" > "adding doc :id=44" > "doc_count=2" > "deleting doc :id=44" > "doc_count=2" > "adding doc :id=44" > "doc_count=3" > "deleting doc :id=44" > "doc_count=3" > "reader count=0" > "writer count=3"Hi Neville, Unfortunately this is the way it has to work. Deleted documents don''t get deleted until commit is called so there is no way to reliable tell how many undeleted documents exist in the index from the IndexWriter. It''s a performance thing. I should change IndexWriter#doc_count to IndexWriter#max_doc to be consistant with IndexReader. Cheers, Dave
Reasonably Related Threads
- Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit
- Possiible Bug ? indexWriter#doc_countcountsdeleted docs after #commit
- Help with Multiple Readers, 1 Writer scenario
- Error with :create => true and existing index
- Help with Multiple Readers, 1 Writer scenario