similar to: Most efficient update of already existing document?

Displaying 20 results from an estimated 10000 matches similar to: "Most efficient update of already existing document?"

2009 Jun 18
1
delete and update
Hi All, I need to update or delete some documents from a Xapian database. and I haven't been able to find anything in the API , Is there a way to do it ? What would be the easiest way to do it ? Thanks.
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I
2024 Dec 12
1
Using a document id as metadata key and merges
Hi, Following a discussion a few years ago, Recoll stores the documents text contents in database metadata entries, with keys derived from document ids. More recently an index creation method using several temporary indexes merged on completion was implemented. This is still a bit experimental. It brings a significant speed increase in some cases. I just realised that the merge lost many
2024 Dec 13
1
Using a document id as metadata key and merges
On Thu, Dec 12, 2024 at 09:51:44AM +0100, Jean-Francois Dockes wrote: > Following a discussion a few years ago, Recoll stores the documents text > contents in database metadata entries, with keys derived from document ids. > > More recently an index creation method using several temporary indexes > merged on completion was implemented. This is still a bit experimental. It >
2010 Feb 15
3
Xapian 1.0.18 released
I've uploaded Xapian 1.0.18 (including Search::Xapian 1.0.18.0), which as usual you can download from: http://xapian.org/download The most notable changes in this release are: QueryParser: * Improve support for languages such as Burmese which use Unicode enclosing mark and combining spacing mark characters. Flint backend: * When updating documents, don't update posting entries
2012 Jan 08
1
Testing document size preallocation.
https://gist.github.com/ad2accc5b4655753923d So here I am creating a database with no values for each small document and one with a bunch of blank values (uuid_blank). Once those are flushed then I reopen them and start replacing the documents of each with identical documents that have an identical large set of values. I am using replace_document and a specific document ID. Is there a specific
2007 Jul 24
1
Xapian::DocNotFoundError on replace_document? (Called from Search::Xapian)
Hello, I'm using Xapian 1.0.2 (flint) and matching Search::Xapian. I'm getting: terminate called after throwing an instance of 'Xapian::DocNotFoundError', which dumps core. at first it was after adding my 2nd document (to an empty db, although I don't know if that has any bearing) to the database with a replace_document() call. I shifted the first document off the
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2004 May 11
2
"Error reading block xxx: got end of file"
Xapian (0.7.5) is spitting out this error on a regular basis: org.xapian.errors.DatabaseError: Error reading block 136618: got end of=20= file =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.XapianJNI.writabledatabase_repalce_document(Native Method) =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.WritableDatabase.replaceDocument(WritableDatabase.java:67) I don't have a gdb backtrace, only the Java
2012 Jul 09
1
Question about Document and TermIterator.get_termfreq()
Hi, While porting the unit tests from perl for the node binding I noticed a test failed. I basically create a document, add a few terms, add the document to a database and then call doc->termlist_begin().get_termfreq(). This throws "Can't get term frequency from a document termlist which is not associated with a database." What I think this means is that I can not call
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes: > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote: > > I am the recoll user mentioned in the first post above. I still have a copy > > of the (potentially) corrupted index and I did the requested testing. > > > > I ran delve -t '' ./xapiandb on the index and it returned a very long list > > of document IDs, separated
2013 Mar 18
2
Incremental indexing
Hi all, I am trying to implement an Incremental indexing scheme. The problem is that usually the modified documents are large but the modifications are limited. Ideally, I would like to reindex only the modified parts of these documents. If I am not mistaken, xapian cannot do that. Are there any other approaches? It would be nice if xapian supported something like the SQL "group by".
2007 Jun 12
5
index browser inconsistent with IndexReader
Hi, We have an index of around 1M web pages as part of our web app. The app uses ferret by way of RDig to perform searches. We have noticed anecdotally that some searches don''t work the way we thought they should, as if documents were missing from the index. Yesterday we came upon a concrete instance of this. Our documents have several fields, one of which is called :keywords and
2015 Mar 11
2
stub-file and get_doccount
Hello, i switched from one big index to a stub file with many indexes and running into a problem. i have a tool to fetch a random document via: get_doccount random id up to get_doccount get_document with that id after changing to stub file this failes. Is there a nice way to get a random document from a stub file? ?MfG? Felix Ostmann
2006 Oct 19
1
Writing with xapian-tcpsrv and php
Hi, I think, there is missing constructor function supporting remote writing for XapianWritableDatabase class in the php bindings (0.9.7). This code: $db = new XapianWritableDatabase(remote_open($db_host, $db_port), $action); returns: Fatal error: No matching function for overloaded 'new_XapianWritableDatabase' (...) $db = new XapianWritableDatabase($path, $action); works fine.
2008 Jan 15
7
PHP indexing, what's the PHP method for indexscript
Currently I have the following indexscript: pid : unique=Q boolean=Q field=pid postdate : field=startdate author_name: unhtml boolean=XAUTHORNAME field=author author_id: boolean=XAUTHORID field=authorid url : field=url sample : weight=1 index field=sample How can I create the same indexing using PHP? With this, I can get an searchable index, but I have no idea how to set the fields, so that I
2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required. Backstory: I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera. One of the unfinished parts was removing expunged emails from the search database. We moved from having a single search database to supporting multiple
2011 Apr 21
1
How to Retrieve content of the document?
Hi, I have just started using xapian and I may sound like a noob. I want to know how i can access the content of the document retrieved while searching. I have used the code found on this mailing list itself to index my database. #!/usr/bin/perl -w use strict; use Search::Xapian; use File::Find; my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB'; my $db =
2010 Jun 24
1
Quickest way to retrieve data for a large match set?
We're using the Perl binding to access Xapian in a simple search of image metadata (title and keywords). Due to the specification for the search engine, by default we have to sort the results using a function of the search rank, age (well, newness) and popularity (rated by sales of the image). As a result, we have to fetch the complete result set and then calculate a new ranking based on
2007 Sep 30
1
Perl example of using termitrator?
I'm having trouble translating from C++ to perl objects. The TermIterator class looks like to get a set of terms in a document you might have C++ code like: Enquire::TermIterator termIt =enquire->get_matching_terms_begin(id); for(;termIt != enquire->get_matching_terms_end(id);termIt++) { string term = *termIt; } Or something similar. However when I attempt to translate that