Hightman(马明练)
2008-Oct-16 17:50 UTC
[Xapian-discuss] Is there any good way to delete many documents in Xapian Indexed data?
Sometimes, I should delete a lot of documents from xapian indexed data, I had to call the WritableDatabase::delete_document() by a fake unique_term, becuase this term refer to many documents rather than only one document. But this operator will take for a long time even failed. So, I am finding a better way to solved this problem, can you help me?
Olly Betts
2008-Oct-17 03:02 UTC
[Xapian-discuss] Is there any good way to delete many documents in Xapian Indexed data?
On Fri, Oct 17, 2008 at 01:50:48AM +0800, Hightman(??????) wrote:> Sometimes, I should delete a lot of documents from xapian indexed data, > > I had to call the WritableDatabase::delete_document() by a fake > unique_term, becuase this term refer to many documents rather than > only one document. > > But this operator will take for a long time even failed. So, I am > finding a better way to solved this problem, can you help me?It is inherently a lot of work to delete a lot of documents as we need to update the posting lists for all the terms they contain. The work is comparable to what would be required to add the same documents. If this is failing, that sounds like a bug, but I need details before I can usefully comment on that... Cheers, Olly
Hightman(马明练)
2008-Oct-17 05:41 UTC
[Xapian-discuss] Is there any good way to delete many documents in Xapian Indexed data?
The failure reason is out of memory, So this maybe not a real bug. thanks you... I think XAPIAN cost memory very much becuase of 'Atomic modifications', Can I make sure single writer process by myself and try to disabled this feature? ======= 2008-10-17 04:02:00 ======>On Fri, Oct 17, 2008 at 01:50:48AM +0800, Hightman(??????) wrote: >> Sometimes, I should delete a lot of documents from xapian indexed data, >> >> I had to call the WritableDatabase::delete_document() by a fake >> unique_term, becuase this term refer to many documents rather than >> only one document. >> >> But this operator will take for a long time even failed. So, I am >> finding a better way to solved this problem, can you help me? > >It is inherently a lot of work to delete a lot of documents as we need >to update the posting lists for all the terms they contain. The work >is comparable to what would be required to add the same documents. > >If this is failing, that sounds like a bug, but I need details before >I can usefully comment on that... > >Cheers, > Olly= = = = = = = = = = = = = = = = = = = ????????? ?? ????????Hightman(???) ????????hightman at zuaa.zju.edu.cn ??????????2008-10-17