search for: delete_document

Displaying 14 results from an estimated 14 matches for "delete_document".

2007 Jun 19
2
Deleted documents not deleted
I seem to be seeing cases where I call db.delete_document(somedocid) with no error, then flush() and delete the database object, but the document is still there after process exit. The write lock is normally deleted, so it appears that the database close finished normally. If I then then call delete_document(somedocid) from another command/process, this...
2007 Apr 09
1
Re: [Xapian-commits] 8157: trunk/xapian-core/ trunk/xapian-core/backends/flint/ trunk/xapian-core/backends/quartz/
olly wrote: > Log message (6 lines): > backends/flint/flint_database.cc: Delete the corresponding entry > (if any) from doclens in delete_document(). Add assertion to > add_document_() that the corresponding entry in doclens isn't > already set, but in a non-debug build overwrite any existing > entry as that's more likely to be correct. > backends/quartz/quartz_database.cc: Ditto. This fixes an assertion error I was seei...
2011 May 30
1
How to check docid
I have a bit of code (Python) to delete a number of documents: for f in Flist: xapian_store.delete_document(f.pri_key) in which I am using a unique primary key from an SQL database as the docid for the Xapian database. The problem I have is that some of the documents may not have been created - so I get an error. Now I could just ignore the error (try-recover), but what would be the recommended way to...
2007 Apr 05
1
Re: [Xapian-commits] 8107: trunk/xapian-core/ trunk/xapian-core/backends/
...riend so just use LeafPostList directly > as that seems less bad than pulling in the whole of database.h > or making PostingIterator::internal public. Only problem with this patch is that it looks like it leaks the LeafPostList if an exception is thrown by one of the other methods (such as delete_document()). Actually, does the postlist get deleted at all? I've changed this to wrap the postlist in a RefCntPtr which should fix the leak issue. -- Richard
2009 Feb 12
1
problem when using xapian's static libs in windows
..."public: virtual unsigned int __thiscall RemoteDatabase::add_document(class Xapian::Document const &)" (?add_document at RemoteDatabase@@UAEIABVDocument at Xapian@@@Z) libbackend.lib(dbfactory_remote.obj) : error LNK2001: ????????? "public: virtual void __thiscall RemoteDatabase::delete_document(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?delete_document at RemoteDatabase@@UAEXABV?$basic_string at DU?$char_traits at D@std@@V?$allocator at D@2@@std@@@Z) libnet.lib(progclient.obj) : error LNK2001: ????????? &q...
2004 Sep 09
2
InMemory backend
I've just added a feature test for the new WritableDatabase methods - replace_document() and delete_document() with a unique term. This initially failed for inmemory due to bugs in the backend. They weren't trivial to fix and my initial attempt at a fix caused other tests to fail. I've come to the conclusion that the code there probably should be retired. It was written early on for testing pu...
2009 Jun 18
1
delete and update
Hi All, I need to update or delete some documents from a Xapian database. and I haven't been able to find anything in the API , Is there a way to do it ? What would be the easiest way to do it ? Thanks.
2020 Aug 23
0
MultiDatabase shard count limitations
...e up to tens of thousands of shards. > Managing removals of entire inboxes from an all-encompassing > Xapian DB would get much trickier. If each inbox is indexed by its own boolean term you can delete all the documents indexed by a specified term with one API call (Xapian::WritableDatabase::delete_document(term)). It may take a while for a large inbox, but it's more slow than tricky. Cheers, Olly
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6 8 AMD Opteron 64-bit Processors 32GB Memory -------------------------------------------------------------------------------- Environment: ------------------ XAPIAN_FLUSH_THRESHOLD=21000000 XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000 XAPIAN_PREFER_FLINT=True Indexing 20 million documents: --stemmer=none ------------------------------------------- real 79m9.378s user 77m28.696s
2004 May 11
2
"Error reading block xxx: got end of file"
Xapian (0.7.5) is spitting out this error on a regular basis: org.xapian.errors.DatabaseError: Error reading block 136618: got end of=20= file =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.XapianJNI.writabledatabase_repalce_document(Native Method) =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.WritableDatabase.replaceDocument(WritableDatabase.java:67) I don't have a gdb backtrace, only the Java
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a
2023 May 03
1
manual flushing thresholds for deletes?
...; # (also added "NR > 1" to ignore the delve header line) Which gives me 6.00067, so rounding to 6 seems fine either way. My Perl deletion code is something like: my $EST_LEN = 6; ... for my $docid (@docids) { $TXN_BYTES -= $xdb->get_doclength($docid) * $EST_LEN; $xdb->delete_document($docid); if ($TXN_BYTES < 0) { # flush within txn $xdb->commit_transaction; $TXN_BYTES = 8000000; $xdb->begin_transaction; } } > > (that awk bit should be overflow-free) <snip> > Or use a language which supports arbitrary precision > numbers. Actually, I...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm
2020 Aug 23
2
MultiDatabase shard count limitations
.... > > > Managing removals of entire inboxes from an all-encompassing > > Xapian DB would get much trickier. > > If each inbox is indexed by its own boolean term you can delete all > the documents indexed by a specified term with one API call > (Xapian::WritableDatabase::delete_document(term)). It may take a > while for a large inbox, but it's more slow than tricky. There's actually a good amount of cross-posting on kernel mailing lists, so I think a combined index should be able to deduplicate and reduce storage requirements. I'd rather pay the cost in deletions...