similar to: Using a document id as metadata key and merges

Displaying 20 results from an estimated 5000 matches similar to: "Using a document id as metadata key and merges"

2024 Dec 13
1
Using a document id as metadata key and merges
On Thu, Dec 12, 2024 at 09:51:44AM +0100, Jean-Francois Dockes wrote: > Following a discussion a few years ago, Recoll stores the documents text > contents in database metadata entries, with keys derived from document ids. > > More recently an index creation method using several temporary indexes > merged on completion was implemented. This is still a bit experimental. It >
2016 Jan 14
3
Strange index consistency issue
Olly Betts writes: > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote: > > I am the recoll user mentioned in the first post above. I still have a copy > > of the (potentially) corrupted index and I did the requested testing. > > > > I ran delve -t '' ./xapiandb on the index and it returned a very long list > > of document IDs, separated
2016 Jan 08
2
Strange index consistency issue
Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique
2007 May 15
1
Document ID 0 is invalid... but not always...
Note: this is rather long and not very important and I don't want to prevent the team from releasing version 1.0, so go on reading only if you have too much free time !!! ;-) 0 is not a valid document ID, never, ever, but I just found a special case in which xapian will create a record and return 0 for the newly created record. In fact, I was "hacking", trying to store metadata
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi, I have a user reporting the following error during recoll indexing: flush() failed: Db block overwritten - are there multiple writers? "flush() failed" is from recoll, the rest is, I think the text of the Xapian exception. This is with Xapian 1.4.3 on Linux (I asked for more details, should be coming). I don't think that I've ever seen this error, and I also
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote: > > Some might notice the 50% index size increase. Excessive index size is > > already one relatively rare, but recurring complaint. Except if I did > > something wrong: I'm actually quite surprised by it. > > Did you try compacting the resulting databases? > >
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2018 Apr 06
1
sorting large msets
> > Olly Betts <olly at survex.com> wrote: > > > > > > The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds. > > > This is because in that case we can just stop once we've found 200 > > > matches. With a few million documents, that ENQ_ASCENDING sounds promising :) So, it looks like if I had ideal ordering, I could do
2009 Feb 12
1
problem when using xapian's static libs in windows
I have download source ?1.10? from the internet and build it into lib Then I create a project as the helpdoc said I using vc2005(vc8) The source in my test project is as follow??copy from the helpdoc? #include <xapian.h> #include <iostream> using namespace std; int main(int argc, char **argv) { // Simplest possible options parsing: we just require three or more
2016 Jan 14
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes: > > On Thu, Jan 14, 2016 at 11:04:29AM +0100, Jean-Francois Dockes wrote: > > Olly Betts writes: > > > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote: > > > > I will look into the bug you listed to see if it might be related. If there > > > > is anything else that I can do, please
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6 8 AMD Opteron 64-bit Processors 32GB Memory -------------------------------------------------------------------------------- Environment: ------------------ XAPIAN_FLUSH_THRESHOLD=21000000 XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000 XAPIAN_PREFER_FLINT=True Indexing 20 million documents: --stemmer=none ------------------------------------------- real 79m9.378s user 77m28.696s
2008 Jan 15
7
PHP indexing, what's the PHP method for indexscript
Currently I have the following indexscript: pid : unique=Q boolean=Q field=pid postdate : field=startdate author_name: unhtml boolean=XAUTHORNAME field=author author_id: boolean=XAUTHORID field=authorid url : field=url sample : weight=1 index field=sample How can I create the same indexing using PHP? With this, I can get an searchable index, but I have no idea how to set the fields, so that I
2016 Jan 10
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes: > > You could try: > > delve -t '' ./xapiandb > > That will list the document lengths, so you can see if document 6 is in > that list or not. I am the recoll user mentioned in the first post above. I still have a copy of the (potentially) corrupted index and I did the requested testing. I ran delve -t
2004 May 11
2
"Error reading block xxx: got end of file"
Xapian (0.7.5) is spitting out this error on a regular basis: org.xapian.errors.DatabaseError: Error reading block 136618: got end of=20= file =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.XapianJNI.writabledatabase_repalce_document(Native Method) =A0=A0=A0=A0=A0=A0=A0 at=20 org.xapian.WritableDatabase.replaceDocument(WritableDatabase.java:67) I don't have a gdb backtrace, only the Java
2019 Jan 31
4
Amount of writes during index creation
Olly Betts writes: > On Mon, Jan 21, 2019 at 03:25:01PM +0100, Jean-Francois Dockes wrote: > > I have had a problem report from a Recoll user about the amount of writes > > during index creation. > > > > https://opensourceprojects.eu/p/recoll1/tickets/67/ > > > > The issue is that the index is on SSD and that the amount of writes is > >
2007 Jul 24
1
Xapian::DocNotFoundError on replace_document? (Called from Search::Xapian)
Hello, I'm using Xapian 1.0.2 (flint) and matching Search::Xapian. I'm getting: terminate called after throwing an instance of 'Xapian::DocNotFoundError', which dumps core. at first it was after adding my 2nd document (to an empty db, although I don't know if that has any bearing) to the database with a replace_document() call. I shifted the first document off the
2010 Jun 10
0
Exception: Key too long
Started a new thread - don't want to hijack the previous one (or carry on hijacking it). On Thu, June 10, 2010 05:17, Olly Betts wrote: >> My issue is that exceptions (ie, "Exception: Key too long: length >> was...") > > You are hitting the Btree key size limit. For flint and chert, this > translates to a term length limit of 245 bytes. > If you are using
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2006 Sep 14
2
Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit
Hi David, > Deleted documents don''t get deleted until commit is called Ok, but FYI, my experiments show that #commit doesn''t affect #doc_count, even across ruby sessions. On a different note, I''d like to request a variation of #add_document which returns the doc_id of the document added, as opposed to self. I''m trying to track down an issue with a large
2019 Feb 03
2
Amount of writes during index creation
Bron Gondwana writes: > This is quite possibly part of the underlying write explosion that we ran into when we wrote: > > https://fastmail.blog/2014/12/01/email-search-system/ > > Which now almost 5 years on, has been running like a champion! We're really pleased with how well it works. Xapian reads from multiple databases are really easy, and the immediate writes onto