Displaying 20 results from an estimated 10000 matches similar to: "Xapian::Document and threads"
2014 May 05
2
Xapian::Document and threads
Olly Betts writes:
> On Sun, May 04, 2014 at 08:16:50PM +0200, Jean-Francois Dockes wrote:
> > While investigating very infrequent crashes in the Recoll indexer, I have
> > come to a very basic question: is it safe to pass a copy of a
> > Xapian::Document from thread to thread (multiple threads queue documents,
> > other thread updates the index) ?
> >
>
2018 Sep 14
3
How to make database build threaded?
On 14/09/2018 at 09:30, Jean-Francois Dockes wrote:
> Hi,
>
> You may be interested by how Recoll does it:
>
> https://www.lesbonscomptes.com/recoll/idxthreads/threadingRecoll.html
>
> A few things in the document are slightly obsolete (esp. the last
> paragraph: recollindex now does use vfork()), but it's overall quite close
> to how the current indexer works.
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote:
> > The question which remains for me is if I should run xapian-compact
> > after an initial indexing operation. I guess that this depends on the
> > amount of expected updates and that there is no easy answer ?
>
> I think it's not obvious whether it's a good plan
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote:
> > Some might notice the 50% index size increase. Excessive index size is
> > already one relatively rare, but recurring complaint. Except if I did
> > something wrong: I'm actually quite surprised by it.
>
> Did you try compacting the resulting databases?
>
>
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi,
I have a user reporting the following error during recoll indexing:
flush() failed: Db block overwritten - are there multiple writers?
"flush() failed" is from recoll, the rest is, I think the text of the Xapian
exception.
This is with Xapian 1.4.3 on Linux (I asked for more details, should be
coming).
I don't think that I've ever seen this error, and I also
2019 Aug 26
2
Commit error with Xapian 1.4.11
A Recoll user gets the following message while indexing:
"Attempted to delete or modify an entry in a non-existent posting list for #bannerholder"
The exception happens during a commit call. Xapian version 1.4.11, Debian Buster
A little more detail here: https://opensourceprojects.eu/p/recoll1/tickets/108/
I asked if this was reproducible, and to run the indexing in single-thread
2007 Jun 19
2
Deleted documents not deleted
I seem to be seeing cases where I call db.delete_document(somedocid) with
no error, then flush() and delete the database object, but the document is
still there after process exit. The write lock is normally deleted, so it
appears that the database close finished normally.
If I then then call delete_document(somedocid) from another
command/process, this time it goes away.
I've been seeing
2017 Dec 08
2
xapian 1.4 performance issue
Olly Betts writes:
> On Thu, Dec 07, 2017 at 10:29:09AM +0100, Jean-Francois Dockes wrote:
> > Recoll builds snippets by partially reconstructing documents out of index
> > contents.
> >
> [...]
> >
> > The specific operation which has become slow is opening many term position
> > lists, each quite short.
>
> The difference will actually
2018 Sep 13
2
How to make database build threaded?
Hi everybody,
I'm the author of a small C++11 program called XDGSearch. The source
code is hosted on Github, for a quick overview you can visit this link
https://github.com/frank67/XDGSearch/blob/master/README.md
I'm writing to the mailing list because I'd like to make the database
build process splitted in more thread. Is it possible? If you are a C++
programmer you can take a look at
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes:
> On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote:
> > I am the recoll user mentioned in the first post above. I still have a copy
> > of the (potentially) corrupted index and I did the requested testing.
> >
> > I ran delve -t '' ./xapiandb on the index and it returned a very long list
> > of document IDs, separated
2017 Dec 07
2
xapian 1.4 performance issue
Hi,
I have had reports that Recoll has become unbearingly slow in some
instances.
After inquiry, this happens with Xapian 1.4 only, and the part which does
not work any more is the snippets extraction.
Recoll builds snippets by partially reconstructing documents out of index
contents.
For this, after determining a set of document term positions to be
displayed (around the hopefully interesting
2020 Jun 04
2
xapian-core and Windows non-ASCII paths
Hi,
I am attaching a patch against the xapian-core 1.4 branch.
On Windows with MSVC (probably mingw too but I did not test), it allows
xapian-core to create and use an index located at a path containing arbitrary
Unicode characters. As far as I could see, this does not work with the
current code, and, from the question I asked on xapian-discuss nobody seems
to have an obvious external solution
2024 Apr 22
2
How to use Xapian Omega directly (i.e., without using `recoll` and `xapiandb`) ... Full Set Of Questions Below:
Dear senior ML members and developers of Xapian Omega,
Mr. Olly has helped me cross the bump of the initial learning curve.
(ref: https://lists.xapian.org/pipermail/xapian-discuss/2024-April/010034.html)
How can I use Xapian Omega directly (i.e., without using `recoll` and
`xapiandb`) to index a directory of text files with all strings
greater than 3 characters, to create an index text file
2024 Mar 15
1
Using multiple temporary indexes during updates
On Fri, Mar 15, 2024 at 08:15:55PM +0100, Jean-Francois Dockes wrote:
> I have been playing at converting the index update stage of the Recoll indexer to use
> multiple temporary indexes and a final merge.
>
> This yields an improvement factor of almost 3 (on my quad-core CPU), for the total
> indexing time for "easy" files like HTML pages. This is nice (!) and I wanted
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes:
> On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote:
> > I have a user reporting the following error during recoll indexing:
> >
> > flush() failed: Db block overwritten - are there multiple writers?
> >
> > "flush() failed" is from recoll, the rest is, I think the text of the Xapian
> > exception.
2024 Mar 15
1
Using multiple temporary indexes during updates
Hi,
I have been playing at converting the index update stage of the Recoll indexer to use
multiple temporary indexes and a final merge.
This yields an improvement factor of almost 3 (on my quad-core CPU), for the total
indexing time for "easy" files like HTML pages. This is nice (!) and I wanted to share my
admiration for the "compact()" method.
If someone is interested in a
2019 Jan 21
2
Amount of writes during index creation
Hi,
I have had a problem report from a Recoll user about the amount of writes
during index creation.
https://opensourceprojects.eu/p/recoll1/tickets/67/
The issue is that the index is on SSD and that the amount of writes is
significant compared to the SSD life expectancy (index size > 250 GB).
>From the numbers he supplied, it seems to me that the total amount of block
writes is roughly
2012 Mar 20
2
Incremental indexing
Hi all,
I am trying to implement an Incremental indexing scheme. The problem
is that usually the modified documents are large but the modifications
are limited. Ideally, I would like to reindex only the modified parts
of these documents. If I am not mistaken, xapian cannot do that. Are
there any other approaches?
It would be nice if xapian supported something like the SQL "group
by".
2006 Jan 13
1
xapian-config --libs outputs libstdc++.la as a dependency
Hello,
I am hearing of users having trouble linking with libxapian (on slackware
and gentoo systems, and 0.9.2 I think), and I am not too sure where the
problem comes from, or what the correct solution could be, so I am just
asking here in case someone has a quick answer.
What happens is that "xapian-config --libs" outputs libstdc++.la in the
list of libraries. Something like:
2024 Dec 12
1
Using a document id as metadata key and merges
Hi,
Following a discussion a few years ago, Recoll stores the documents text
contents in database metadata entries, with keys derived from document ids.
More recently an index creation method using several temporary indexes
merged on completion was implemented. This is still a bit experimental. It
brings a significant speed increase in some cases.
I just realised that the merge lost many