Displaying 20 results from an estimated 600 matches similar to: "Incremental indexing"
2013 Apr 25
2
Converting MySQL database to Xapian
I am looking for some guidance on converting a large MySQL database to Xapian. The current structure is that the database is broken up into 160 "sub-databases". There are 50,000 or so records in each stub database. Each record has content that I am full-text indexing. The average size of the text is about 59k characters. The database is broken up into sub-databases because the MySQL
2013 Sep 24
3
2048-bit Diffie-Hellman parameters
Currently, dovecot generates two primes for Diffie-Hellman key
exchanges: a 512-bit one and a 1024-bit one. In light of recent
events, I think it would be wise to add support for 2048-bit primes as
well, or even better, add a configuration option that lets the user
select a file (or files) containing the DH parameters
In recent years, there has been increased interest in DH especially in
its
2006 Jan 29
1
Prioritizing xapian search results
Hello
It is possible somehow to give higher priority to recent document. For
example, when adding new document to the database, I will add a term
that specifies date of the document. During the search, the date of the
document is taken into account in the algorithm that calculates
document relevancy.
In another project I am working on, I would like to limit number of
pages returned from the
2011 Feb 20
0
No subject
"Another use is to group matches in a particular category (e.g. you
might collapse a mailing list search on the Subject: so that there's
only one result per discussion thread). In this case you can use
get_collapse_count() to give the user some idea how many other results
there are. And if you index the Subject: as a boolean term as well as
putting it in a value, you can offer a link to a
2007 Jan 13
1
xapian query group result by domain?
Hi
I know it might not possible, but just want to try my luck.
say, for a web search engine backed by xapian....
Is it possible to group the result by domain just like google's [ More
results from www.abc.com ],
when there are more than 1 results from the same domain?
Or, anyone have some work around to do it?
Cheers
Andrey Kong
2009 Sep 09
2
InvalidArgumentError throw using Turkish stemmer and posting text "'leri"
Hi all,
I've come across a very strange bug with Xapian 1.0.9.0 and the Turkish
query parser when trying to index a string (as posting) that looks like
this: "...bir araya getiren CD'leri son teknolojiyle piyasaya...". The
actual offending bit of the string is: 'leri
It throws the message I have shown below. The real annoyance is that I can't
seem to catch it because it
2008 Dec 06
1
Obtaining actual match count if using set_collapse_key()
Greets,
Is it possible to obtain the actual match count if you're using
set_collapse_key()? ie, the total count *before* the collapsing
occurs (without using get_mset()).
Alternatively, will MSet::get_matches_estimated() return the true -
pre-collapse - count, or will it also be affected by collapsing?
Thanks
Henry
2011 Apr 21
1
How to Retrieve content of the document?
Hi,
I have just started using xapian and I may sound like a noob. I want to know
how i can access the content of the document retrieved while searching. I
have used the code found on this mailing list itself to index my database.
#!/usr/bin/perl -w
use strict;
use Search::Xapian;
use File::Find;
my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db =
2013 Apr 26
1
remote backend
So, given what I've read in the documentation I would create a text file named document_database.txt that might have the following:
remote 192.168.1.10:30000
chert /var/lib/xapian_database/segment1
remote 192.168.1.10:30000 chert /var/lib/xapian_database/segment2
remote 192.168.1.10:30000 chert /var/lib/xapian_database/segment3
etc.
I would then in my PHP program open
2009 Jun 02
3
search without flush.
Hi,
Is it possible to perform a search without flushing the index? I've got
an application that updates the index every 4 hours but I need to be
able to search the new data fairly quickly after the index is updated.
The problem revolves around the fact that the update is often much less
than 10 000 documents so it isn't being flushed until quite a bit
latter. I realise I can do a flush
2011 Feb 22
1
collapsing by a key in a compound database
Hello all.
I have a problem with collapsing by a key in a compound database. I have
2 databases (e.g. clients and client branches), both of them have the
same attribute (with the same valueno), `client_id'.
What I need is to search in both these databases and collapse results by
`client_id' to get client IDs (set_collapse_key is used with
collapse_max=1).
The problem is that I receive 2
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes:
> On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote:
> > I am the recoll user mentioned in the first post above. I still have a copy
> > of the (potentially) corrupted index and I did the requested testing.
> >
> > I ran delve -t '' ./xapiandb on the index and it returned a very long list
> > of document IDs, separated
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi,
I have a user reporting the following error during recoll indexing:
flush() failed: Db block overwritten - are there multiple writers?
"flush() failed" is from recoll, the rest is, I think the text of the Xapian
exception.
This is with Xapian 1.4.3 on Linux (I asked for more details, should be
coming).
I don't think that I've ever seen this error, and I also
2016 Dec 10
6
Plain requirement: desktop search
Just wondering, what exactly is supported/suggested:
I need a comprehensive desktop search functionality. Not only
searching for file names but also for content and meta data. The
environment is EL6.8 / Gnome2. I have noticed that "beagle" is
not part of the distro anymore. Any suggestions for such requirement?
Thanks!
LF
2017 Dec 07
2
xapian 1.4 performance issue
Hi,
I have had reports that Recoll has become unbearingly slow in some
instances.
After inquiry, this happens with Xapian 1.4 only, and the part which does
not work any more is the snippets extraction.
Recoll builds snippets by partially reconstructing documents out of index
contents.
For this, after determining a set of document term positions to be
displayed (around the hopefully interesting
2016 Jan 10
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes:
>
> You could try:
>
> delve -t '' ./xapiandb
>
> That will list the document lengths, so you can see if document 6 is in
> that list or not.
I am the recoll user mentioned in the first post above. I still have a copy
of the (potentially) corrupted index and I did the requested testing.
I ran delve -t
2019 Aug 26
2
Commit error with Xapian 1.4.11
A Recoll user gets the following message while indexing:
"Attempted to delete or modify an entry in a non-existent posting list for #bannerholder"
The exception happens during a commit call. Xapian version 1.4.11, Debian Buster
A little more detail here: https://opensourceprojects.eu/p/recoll1/tickets/108/
I asked if this was reproducible, and to run the indexing in single-thread
2018 Sep 13
2
How to make database build threaded?
Hi everybody,
I'm the author of a small C++11 program called XDGSearch. The source
code is hosted on Github, for a quick overview you can visit this link
https://github.com/frank67/XDGSearch/blob/master/README.md
I'm writing to the mailing list because I'd like to make the database
build process splitted in more thread. Is it possible? If you are a C++
programmer you can take a look at
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote:
> > The question which remains for me is if I should run xapian-compact
> > after an initial indexing operation. I guess that this depends on the
> > amount of expected updates and that there is no easy answer ?
>
> I think it's not obvious whether it's a good plan
2018 Sep 14
3
How to make database build threaded?
On 14/09/2018 at 09:30, Jean-Francois Dockes wrote:
> Hi,
>
> You may be interested by how Recoll does it:
>
> https://www.lesbonscomptes.com/recoll/idxthreads/threadingRecoll.html
>
> A few things in the document are slightly obsolete (esp. the last
> paragraph: recollindex now does use vfork()), but it's overall quite close
> to how the current indexer works.