Displaying 20 results from an estimated 9000 matches similar to: "Incremental indexing"
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr:
http://www.anur.ag/blog/2009/03/xapian-and-solr/
According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB.
I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes:
> On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote:
> > I am the recoll user mentioned in the first post above. I still have a copy
> > of the (potentially) corrupted index and I did the requested testing.
> >
> > I ran delve -t '' ./xapiandb on the index and it returned a very long list
> > of document IDs, separated
2004 May 11
2
"Error reading block xxx: got end of file"
Xapian (0.7.5) is spitting out this error on a regular basis:
org.xapian.errors.DatabaseError: Error reading block 136618: got end of=20=
file
=A0=A0=A0=A0=A0=A0=A0 at=20
org.xapian.XapianJNI.writabledatabase_repalce_document(Native Method)
=A0=A0=A0=A0=A0=A0=A0 at=20
org.xapian.WritableDatabase.replaceDocument(WritableDatabase.java:67)
I don't have a gdb backtrace, only the Java
2012 Mar 20
2
Incremental indexing
Hi all,
I am trying to implement an Incremental indexing scheme. The problem
is that usually the modified documents are large but the modifications
are limited. Ideally, I would like to reindex only the modified parts
of these documents. If I am not mistaken, xapian cannot do that. Are
there any other approaches?
It would be nice if xapian supported something like the SQL "group
by".
2008 Jan 15
7
PHP indexing, what's the PHP method for indexscript
Currently I have the following indexscript:
pid : unique=Q boolean=Q field=pid
postdate : field=startdate
author_name: unhtml boolean=XAUTHORNAME field=author
author_id: boolean=XAUTHORID field=authorid
url : field=url
sample : weight=1 index field=sample
How can I create the same indexing using PHP?
With this, I can get an searchable index, but I have no idea how to set the fields, so that I
2018 Jan 22
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
>A possible workaround (and perhaps a better approach) would be to
>set BoolWeight as the weighting scheme, then feed in your score as
>a weight using a PostingSource. Then it's available via get_weight()
>on the MSetIterator object:
>
>https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/postingsource.html
>
>You may find that's faster because
2007 Jul 24
1
Xapian::DocNotFoundError on replace_document? (Called from Search::Xapian)
Hello,
I'm using Xapian 1.0.2 (flint) and matching Search::Xapian.
I'm getting:
terminate called after throwing an instance of
'Xapian::DocNotFoundError', which dumps core.
at first it was after adding my 2nd document (to an empty db, although
I don't know if that has any bearing) to the database with a
replace_document() call.
I shifted the first document off the
2017 Dec 15
5
How to get the serialise score returned in Xapian::KeyMaker->operator().
HI, all,
I am a user of Xapian, and now I have a problem in using it.
After using boolean terms to get some candidates of documents (still too much), we want sorted them by self-defined function which is used in Xapian::KeyMaker->operator(). But how can I get the serialise score in Xapian::MSetIterator object.
c++ code likes this:
class SortKeyMaker : public Xapian::KeyMaker {
std::string
2020 Oct 21
2
xapian-check sorted order error
Hi,
We were running xapian-check on one of our Xapian indexes and it
returns the following error:
position:
baseB blocksize=8K items=809896869 lastblock=2090419 revision=3161
levels=3 root=2084903
Failed to check B-tree: DatabaseError: Items not in sorted order
The other tables verify without issue. It looks like our oldest backup
of this database (a month old) has the same issue. Searching and
2024 Dec 13
1
Using a document id as metadata key and merges
On Thu, Dec 12, 2024 at 09:51:44AM +0100, Jean-Francois Dockes wrote:
> Following a discussion a few years ago, Recoll stores the documents text
> contents in database metadata entries, with keys derived from document ids.
>
> More recently an index creation method using several temporary indexes
> merged on completion was implemented. This is still a bit experimental. It
>
2016 Apr 22
2
Weighting recent results
I did some digging and found a thread from 2011 talking about how to
subclass Xapian::PostingSource in order to incorporate the date or
recency of a document in its weighting:
http://thread.gmane.org/gmane.comp.search.xapian.general/8849/focus=8856
As in that thread, I want to be clear that I don't want to sort by date,
but rather incorporate date information into the score by which I
2016 Jan 14
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes:
>
> On Thu, Jan 14, 2016 at 11:04:29AM +0100, Jean-Francois Dockes wrote:
> > Olly Betts writes:
> > > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote:
> > > > I will look into the bug you listed to see if it might be related.
If there
> > > > is anything else that I can do, please
2018 Mar 31
2
sorting large msets
Olly Betts <olly at survex.com> wrote:
> On Fri, Mar 30, 2018 at 05:21:43PM +0000, Eric Wong wrote:
> > Hello, is there a way to optimize sorting by certain values
> > for queries which return a huge amount of results?
> [...]
> > $enquire->set_sort_by_value_then_relevance(0, 1);
>
> If you're just wanting the 200 newest, it'll be faster not to
2020 Feb 07
2
prioritizing aggregated DBs
Hey all, I've been using ->add_database for a few years
to tie sharded DBs together and it works great.
Now, I want to be able to search across several DBs
which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB.
I want to search for something across all of them, but
prioritize results to favor one or some of those DBs over
others. Is there a way to do that without reindexing?
Or
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex
Currently Available Items
=========================
1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during
indexing.
2) Add the document?s last modified time to the value table (ID 0). This would allow incremental
indexing based on the timestamp and also sorting by date in omega (SORT=0)
a. Currently I store the timestamp
2020 Apr 07
2
crash after running notmuch new
Matt <mattator at gmail.com> writes:
> thanks didn't know about xapian-check !
> the output
> ===
> docdata:
> blocksize=8K items=70 firstunused=3 revision=421 levels=0 root=2
> B-tree checked okay
> docdata table structure checked OK
>
> termlist:
> blocksize=8K items=186136 firstunused=62058 revision=421 levels=2 root=12260
> B-tree checked okay
>
2013 Mar 02
3
How to add an custom weight to the relevancy value and sort it.
Hello guys,
I have an weight value which is calculated by some factor and i need to add
the weight with the relevancy value of a result and sort it with that value
is that possible in xapian.
Thanks,
VishnuKumar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130302/9831e287/attachment-0001.html>
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ?
i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc.
search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp.
This method is ok , but
2018 Jul 12
1
Error while compacting: Bad position key
Mike Hommey <mh at glandium.org> writes:
> Hi,
>
> When running `notmuch compact` today, it stopped with the following
> output:
>
> Compacting database...
> compacting table postlist
> Reduced by 25% 648656K (2498904K -> 1850248K)
> compacting table docdata
> Reduced by 15% 24K (152K -> 128K)
> compacting table termlist
> Reduced by
2016 May 16
2
Weighting recent results
I was thinking about this some more: Is there a reason I can't just
weight by some function of recency at indexing time?
$weight = get_weight_based_on_recency(...);
$tg->index_text($txt,$weight);
If I wanted to allow the user the option of searching either in
recency-weighted mode or not, I could index each document into 2
different databases, one with and one without.
This avoids