search for: xapian_flush_threshold

Displaying 19 results from an estimated 19 matches for "xapian_flush_threshold".

2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000 When trying for force Xapian flush documents to flush after 20 million documents Xapian ignores the size and flush it after only 10,000 documents. Data captured from delve after 60 seconds interval when has been set as follow: XAPIAN_FLUSH_THRESHOLD=20000000 perl -e ' while(1) { sys...
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
...aying around with a machine that has 2 GB of memory. Indexing about 5GB of data average of 2MB per document. The documents are plain text. I notice the omindex's memory fott print get's biger an bigger then the machine starts to swap and it all slows down to a crawl. In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000 Am I right in saying that for my setup I should be doing export XAPIAN_FLUSH_THRESHOLD=1000 because: 1000 documents * 2MB doc size = 2gig of memory required before a flush to disk?
2012 Dec 29
3
omindex killed
I'm finding that omindex is consistently ending prematurely when indexing certain files. The last output looks like this: [Entering directory /compounds/Acetic_acid] Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.TXT" as text/plain ... added. Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.pdf" as application/pdf ... "pdftotext -enc UTF-8
2012 Nov 21
1
about index speed of xapian
hi, i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second. code: try { Xapian::WritableDatabase database("testindex", Xapian::DB_CREATE_OR_OPEN); mybase::Timeval now; std::string l...
2004 Oct 08
1
indexing performance
...%, but memory use mounted to VSZ 244M RSS180M. considering we have 2G RAM, I wonder whether we have a way to utilize our machine more to get better performance with indexing. Question: How can I expedite our indexer? Did I do sth wrong with my indexer? BTW, I set the following env parameters: XAPIAN_FLUSH_THRESHOLD_LENGTH=5000000 XAPIAN_FLUSH_THRESHOLD=10000 Many many thanks. Hongyan Ma
2012 Aug 31
1
too slow when create index
I am create index for some files,in my program,a document is a line in a file. i create index for very lines in a file. is there any method to speed up this ??????
2008 Aug 21
2
How to speed up indexing ?
I'm new to Xapian & need some help, many thanks if anyone replies. I did a release build from xapian-core-1.0.7 with VS2008 by using Charlie Hull's makefiles. I'm trying to test-index my dataset -- some 200'000 docs, each document being (on average) 50 bytes long and having 6 words. I tried (a) not to use stemmer, (b) commit_transaction() on every 50/100/etc. docs, (c) not
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6 8 AMD Opteron 64-bit Processors 32GB Memory -------------------------------------------------------------------------------- Environment: ------------------ XAPIAN_FLUSH_THRESHOLD=21000000 XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000 XAPIAN_PREFER_FLINT=True Indexing 20 million documents: --stemmer=none ------------------------------------------- real 79m9.378s user 77m28.696s sys 1m36.654s # delve /home/kevin/index --------------------------------------- number of docu...
2017 Apr 03
3
errors on rebuild
...is from isn't going to be representative. But from the information you give, my guess is that the extra memory used for batching up changes is pushing you over an I/O cliff, and you would get better throughput by reducing the batch size (assuming the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or something equivalent). Especially likely if you tuned that batch size for chert. There are some longer term plans to rework the batching and flush process which should improve matters a lot (and hopefully remove the need for manually tweaking such settings). I'm hoping that will land in t...
2010 Mar 07
2
"Value in posting list too large" error with 1.1.4 (chert and brass, not flint)
Hi, I've a program which: 1. Sets XAPIAN_FLUSH_THRESHOLD=1000 2. Opens a (new) database for write 3. Indexes a few thousand documents 4. Periodically also does queries on the database With 1.1.4, with certain document sets (basically a particular mail folder of mine), Enquire.get_mset() sometimes (but not always) triggers a "RangeErr...
2007 Jun 17
2
Flint failed to deliver indexing performance to Quartz.
...ory servers. Flint so far absolutely failed to deliver nearly fractionally the performance that Quartz database has been achieving during high quantity documents indexing in short time using plenty of memory. Example of my benchmarks: Quartz database index 10 million of unique documents with set XAPIAN_FLUSH_THRESHOLD=10000000 in less then 1 hour. Flint database index 10 million of unique documents with set XAPIAN_FLUSH_THRESHOLD=10000000 in less then 16 hours. Please provide settings to remove Flint and add Quartz as default database. Unless the unacceptable indexing performance using Flint database will be r...
2009 Jun 02
3
search without flush.
Hi, Is it possible to perform a search without flushing the index? I've got an application that updates the index every 4 hours but I need to be able to search the new data fairly quickly after the index is updated. The problem revolves around the fact that the update is often much less than 10 000 documents so it isn't being flushed until quite a bit latter. I realise I can do a flush
2017 Dec 29
2
notmuch: Xapian exception during database creation
Running notmuch from git on Debian testing[1] with the mail and database sitting on a ZFS filesystem, adding mail to a new database: > agrajag-testing ~/s/notmuch % ./notmuch new > Found 605510 total files (that's not much mail). > add_file: A Xapian exception occurred36m 37s remaining). > A Xapian exception occurred adding message: Unexpected end of posting list for
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr: http://www.anur.ag/blog/2009/03/xapian-and-solr/ According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB. I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2007 Oct 16
1
Xapian 1.0.3_svn9466 - OK!
.... 2. Installed Xapian 1.0.3_svn9466. 3. libxapian.so.15 used to be in directory /usr/local/lib64/ however this time the library was in /usr/local/lib/ directory 4. cp /usr/local/lib/libxapian.so.15 /lib Indexing 52 million web sites took approximately 21 hours on Intel 8 core CPU with 12 GB memory XAPIAN_FLUSH_THRESHOLD=1000000 number of documents = 52746432 average document length = 89.6394 You can visit and test Xapian 1.0.3_svn9466 search engine with 52 million of indexed web sites on http://pacific-design.com -- Cheers Kevin Duraj http://pacific-design.com Los Angeles, California
2017 Dec 31
1
notmuch: Xapian exception during database creation
.... >> > position table structure checked OK > > This seems to be for an almost empty database (2 items in the postlist > table and nothing anywhere else) which doesn't really seem consistent > with the amount of data notmuch reports as having processed. Are you > setting XAPIAN_FLUSH_THRESHOLD very high? No, I didn't set any specific value. > You can look at the low level entries in the postlist table with: > > xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass > > (You'll need to build xapian-core from source to get xapian-inspect, > as it's really a...
2017 Apr 03
0
errors on rebuild
...to be representative. > > But from the information you give, my guess is that the extra memory > used for batching up changes is pushing you over an I/O cliff, and > you would get better throughput by reducing the batch size (assuming > the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or something > equivalent). Especially likely if you tuned that batch size for chert. > > There are some longer term plans to rework the batching and flush process > which should improve matters a lot (and hopefully remove the need for > manually tweaking such settings). I'm h...
2017 Dec 29
0
notmuch: Xapian exception during database creation
...> void B-tree checked okay > > position table structure checked OK This seems to be for an almost empty database (2 items in the postlist table and nothing anywhere else) which doesn't really seem consistent with the amount of data notmuch reports as having processed. Are you setting XAPIAN_FLUSH_THRESHOLD very high? You can look at the low level entries in the postlist table with: xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass (You'll need to build xapian-core from source to get xapian-inspect, as it's really a tool for developers). I'd guess the two entries are user metadat...
2017 Mar 02
2
errors on rebuild
Hi Olly, Thanks for the detailed response. I hadn’t realized there was a new xapian haystack backend. I’m going to try that but I have some upgrades to do first. Django 1.8, etc. Thanks, Ryan > On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com> wrote: > > On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote: >> I am trying to rebuild an index of 2+