similar to: Deleted documents not deleted

Displaying 20 results from an estimated 1100 matches similar to: "Deleted documents not deleted"

2007 Apr 09
1
Re: [Xapian-commits] 8157: trunk/xapian-core/ trunk/xapian-core/backends/flint/ trunk/xapian-core/backends/quartz/
olly wrote: > Log message (6 lines): > backends/flint/flint_database.cc: Delete the corresponding entry > (if any) from doclens in delete_document(). Add assertion to > add_document_() that the corresponding entry in doclens isn't > already set, but in a non-debug build overwrite any existing > entry as that's more likely to be correct. >
2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000 When trying for force Xapian flush documents to flush after 20 million documents Xapian ignores the size and flush it after only 10,000 documents. Data captured from delve after 60 seconds interval when has been set as follow: XAPIAN_FLUSH_THRESHOLD=20000000 perl -e ' while(1) { system("delve ."); sleep(60); } '
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2011 Sep 04
5
Ranking and term proximity
Hi, I was reading an article recently about how google ranks results (among many other things of course) based on the proximity of the search terms in the source documents. In addition, the position of the search terms in the search query string itself is also taken into consideration when determining how important each term is. Does Xapian do something similar - at least for the first part?
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote: > > Some might notice the 50% index size increase. Excessive index size is > > already one relatively rare, but recurring complaint. Except if I did > > something wrong: I'm actually quite surprised by it. > > Did you try compacting the resulting databases? > >
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6 8 AMD Opteron 64-bit Processors 32GB Memory -------------------------------------------------------------------------------- Environment: ------------------ XAPIAN_FLUSH_THRESHOLD=21000000 XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000 XAPIAN_PREFER_FLINT=True Indexing 20 million documents: --stemmer=none ------------------------------------------- real 79m9.378s user 77m28.696s
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi, I have a user reporting the following error during recoll indexing: flush() failed: Db block overwritten - are there multiple writers? "flush() failed" is from recoll, the rest is, I think the text of the Xapian exception. This is with Xapian 1.4.3 on Linux (I asked for more details, should be coming). I don't think that I've ever seen this error, and I also
2017 Dec 08
2
xapian 1.4 performance issue
Olly Betts writes: > On Thu, Dec 07, 2017 at 10:29:09AM +0100, Jean-Francois Dockes wrote: > > Recoll builds snippets by partially reconstructing documents out of index > > contents. > > > [...] > > > > The specific operation which has become slow is opening many term position > > lists, each quite short. > > The difference will actually
2024 Mar 15
1
Using multiple temporary indexes during updates
On Fri, Mar 15, 2024 at 08:15:55PM +0100, Jean-Francois Dockes wrote: > I have been playing at converting the index update stage of the Recoll indexer to use > multiple temporary indexes and a final merge. > > This yields an improvement factor of almost 3 (on my quad-core CPU), for the total > indexing time for "easy" files like HTML pages. This is nice (!) and I wanted
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes: > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote: > > I am the recoll user mentioned in the first post above. I still have a copy > > of the (potentially) corrupted index and I did the requested testing. > > > > I ran delve -t '' ./xapiandb on the index and it returned a very long list > > of document IDs, separated
2018 Sep 14
3
How to make database build threaded?
On 14/09/2018 at 09:30, Jean-Francois Dockes wrote: > Hi, > > You may be interested by how Recoll does it: > > https://www.lesbonscomptes.com/recoll/idxthreads/threadingRecoll.html > > A few things in the document are slightly obsolete (esp. the last > paragraph: recollindex now does use vfork()), but it's overall quite close > to how the current indexer works.
2019 Jan 21
2
Amount of writes during index creation
Hi, I have had a problem report from a Recoll user about the amount of writes during index creation. https://opensourceprojects.eu/p/recoll1/tickets/67/ The issue is that the index is on SSD and that the amount of writes is significant compared to the SSD life expectancy (index size > 250 GB). >From the numbers he supplied, it seems to me that the total amount of block writes is roughly
2017 Dec 07
2
xapian 1.4 performance issue
Hi, I have had reports that Recoll has become unbearingly slow in some instances. After inquiry, this happens with Xapian 1.4 only, and the part which does not work any more is the snippets extraction. Recoll builds snippets by partially reconstructing documents out of index contents. For this, after determining a set of document term positions to be displayed (around the hopefully interesting
2019 Aug 26
2
Commit error with Xapian 1.4.11
A Recoll user gets the following message while indexing: "Attempted to delete or modify an entry in a non-existent posting list for #bannerholder" The exception happens during a commit call. Xapian version 1.4.11, Debian Buster A little more detail here: https://opensourceprojects.eu/p/recoll1/tickets/108/ I asked if this was reproducible, and to run the indexing in single-thread
2017 Jan 12
2
NEAR non-leaf subqueries
Olly Betts writes: > On Wed, Jan 04, 2017 at 07:29:58AM +0100, Jean-Francois Dockes wrote: > > Olly Betts writes: > > > The ticket has a patch which attempts to handle the OR case (which seems > > > to be the part you actually care about) but this suffers from issues with > > > object lifetimes which get a bit involved in the details. Since there >
2018 Sep 13
2
How to make database build threaded?
Hi everybody, I'm the author of a small C++11 program called XDGSearch. The source code is hosted on Github, for a quick overview you can visit this link https://github.com/frank67/XDGSearch/blob/master/README.md I'm writing to the mailing list because I'd like to make the database build process splitted in more thread. Is it possible? If you are a C++ programmer you can take a look at
2020 Jun 04
2
xapian-core and Windows non-ASCII paths
Hi, I am attaching a patch against the xapian-core 1.4 branch. On Windows with MSVC (probably mingw too but I did not test), it allows xapian-core to create and use an index located at a path containing arbitrary Unicode characters. As far as I could see, this does not work with the current code, and, from the question I asked on xapian-discuss nobody seems to have an obvious external solution
2016 Jan 08
2
Strange index consistency issue
Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique
2014 May 04
2
Xapian::Document and threads
Hi, While investigating very infrequent crashes in the Recoll indexer, I have come to a very basic question: is it safe to pass a copy of a Xapian::Document from thread to thread (multiple threads queue documents, other thread updates the index) ? I don't seem to get directly into trouble while doing this, but I don't see anything either in the RefCntr implementation which would
2024 Mar 15
1
Using multiple temporary indexes during updates
Hi, I have been playing at converting the index update stage of the Recoll indexer to use multiple temporary indexes and a final merge. This yields an improvement factor of almost 3 (on my quad-core CPU), for the total indexing time for "easy" files like HTML pages. This is nice (!) and I wanted to share my admiration for the "compact()" method. If someone is interested in a