Displaying 19 results from an estimated 19 matches for "xapian_flush_threshold".
2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
When trying for force Xapian flush documents to flush after 20 million
documents Xapian ignores the size and flush it after only 10,000
documents.
Data captured from delve after 60 seconds interval when has been set as follow:
XAPIAN_FLUSH_THRESHOLD=20000000
perl -e ' while(1) { sys...
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
...aying around with a machine that has 2 GB of memory.
Indexing about 5GB of data average of 2MB per document.
The documents are plain text.
I notice the omindex's memory fott print get's biger an bigger then the
machine starts to swap and it all slows down to a crawl.
In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000
Am I right in saying that for my setup I should be doing export
XAPIAN_FLUSH_THRESHOLD=1000 because:
1000 documents * 2MB doc size = 2gig of memory required before a flush
to disk?
2012 Dec 29
3
omindex killed
I'm finding that omindex is consistently ending prematurely when
indexing certain files. The last output looks like this:
[Entering directory /compounds/Acetic_acid]
Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.TXT" as text/plain ...
added.
Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.pdf" as
application/pdf ... "pdftotext -enc UTF-8
2012 Nov 21
1
about index speed of xapian
hi,
i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second.
code:
try
{
Xapian::WritableDatabase database("testindex", Xapian::DB_CREATE_OR_OPEN);
mybase::Timeval now;
std::string l...
2004 Oct 08
1
indexing performance
...%, but memory use mounted to VSZ 244M RSS180M. considering we have 2G RAM,
I wonder whether we have a way to utilize our machine more to get better
performance with indexing.
Question:
How can I expedite our indexer? Did I do sth wrong with my indexer?
BTW, I set the following env parameters:
XAPIAN_FLUSH_THRESHOLD_LENGTH=5000000
XAPIAN_FLUSH_THRESHOLD=10000
Many many thanks.
Hongyan Ma
2012 Aug 31
1
too slow when create index
I am create index for some files,in my program,a document is a line in a
file. i create index for very lines in a file. is there any method to
speed up this ??????
2008 Aug 21
2
How to speed up indexing ?
I'm new to Xapian & need some help, many thanks if anyone replies.
I did a release build from xapian-core-1.0.7 with VS2008 by using
Charlie Hull's makefiles.
I'm trying to test-index my dataset -- some 200'000 docs, each
document being (on average) 50 bytes long and having 6 words.
I tried (a) not to use stemmer, (b) commit_transaction() on every
50/100/etc. docs, (c) not
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6
8 AMD Opteron 64-bit Processors
32GB Memory
--------------------------------------------------------------------------------
Environment:
------------------
XAPIAN_FLUSH_THRESHOLD=21000000
XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000
XAPIAN_PREFER_FLINT=True
Indexing 20 million documents:
--stemmer=none
-------------------------------------------
real 79m9.378s
user 77m28.696s
sys 1m36.654s
# delve /home/kevin/index
---------------------------------------
number of docu...
2017 Apr 03
3
errors on rebuild
...is from isn't going to be representative.
But from the information you give, my guess is that the extra memory
used for batching up changes is pushing you over an I/O cliff, and
you would get better throughput by reducing the batch size (assuming
the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or something
equivalent). Especially likely if you tuned that batch size for chert.
There are some longer term plans to rework the batching and flush process
which should improve matters a lot (and hopefully remove the need for
manually tweaking such settings). I'm hoping that will land in t...
2010 Mar 07
2
"Value in posting list too large" error with 1.1.4 (chert and brass, not flint)
Hi,
I've a program which:
1. Sets XAPIAN_FLUSH_THRESHOLD=1000
2. Opens a (new) database for write
3. Indexes a few thousand documents
4. Periodically also does queries on the database
With 1.1.4, with certain document sets (basically a particular mail
folder of mine), Enquire.get_mset() sometimes (but not always) triggers
a "RangeErr...
2007 Jun 17
2
Flint failed to deliver indexing performance to Quartz.
...ory servers.
Flint so far absolutely failed to deliver nearly fractionally the
performance that Quartz database has been achieving during high
quantity documents indexing in short time using plenty of memory.
Example of my benchmarks:
Quartz database index 10 million of unique documents with set
XAPIAN_FLUSH_THRESHOLD=10000000 in less then 1 hour.
Flint database index 10 million of unique documents with set
XAPIAN_FLUSH_THRESHOLD=10000000 in less then 16 hours.
Please provide settings to remove Flint and add Quartz as default
database. Unless the unacceptable indexing performance using Flint
database will be r...
2009 Jun 02
3
search without flush.
Hi,
Is it possible to perform a search without flushing the index? I've got
an application that updates the index every 4 hours but I need to be
able to search the new data fairly quickly after the index is updated.
The problem revolves around the fact that the update is often much less
than 10 000 documents so it isn't being flushed until quite a bit
latter. I realise I can do a flush
2017 Dec 29
2
notmuch: Xapian exception during database creation
Running notmuch from git on Debian testing[1] with the mail and database
sitting on a ZFS filesystem, adding mail to a new database:
> agrajag-testing ~/s/notmuch % ./notmuch new
> Found 605510 total files (that's not much mail).
> add_file: A Xapian exception occurred36m 37s remaining).
> A Xapian exception occurred adding message: Unexpected end of posting list for
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr:
http://www.anur.ag/blog/2009/03/xapian-and-solr/
According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB.
I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2007 Oct 16
1
Xapian 1.0.3_svn9466 - OK!
....
2. Installed Xapian 1.0.3_svn9466.
3. libxapian.so.15 used to be in directory /usr/local/lib64/ however
this time the library was in /usr/local/lib/ directory
4. cp /usr/local/lib/libxapian.so.15 /lib
Indexing 52 million web sites took approximately
21 hours on Intel 8 core CPU with 12 GB memory
XAPIAN_FLUSH_THRESHOLD=1000000
number of documents = 52746432
average document length = 89.6394
You can visit and test Xapian 1.0.3_svn9466 search engine with 52
million of indexed web sites on http://pacific-design.com
--
Cheers
Kevin Duraj
http://pacific-design.com
Los Angeles, California
2017 Dec 31
1
notmuch: Xapian exception during database creation
....
>> > position table structure checked OK
>
> This seems to be for an almost empty database (2 items in the postlist
> table and nothing anywhere else) which doesn't really seem consistent
> with the amount of data notmuch reports as having processed. Are you
> setting XAPIAN_FLUSH_THRESHOLD very high?
No, I didn't set any specific value.
> You can look at the low level entries in the postlist table with:
>
> xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass
>
> (You'll need to build xapian-core from source to get xapian-inspect,
> as it's really a...
2017 Apr 03
0
errors on rebuild
...to be representative.
>
> But from the information you give, my guess is that the extra memory
> used for batching up changes is pushing you over an I/O cliff, and
> you would get better throughput by reducing the batch size (assuming
> the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or something
> equivalent). Especially likely if you tuned that batch size for chert.
>
> There are some longer term plans to rework the batching and flush process
> which should improve matters a lot (and hopefully remove the need for
> manually tweaking such settings). I'm h...
2017 Dec 29
0
notmuch: Xapian exception during database creation
...> void B-tree checked okay
> > position table structure checked OK
This seems to be for an almost empty database (2 items in the postlist
table and nothing anywhere else) which doesn't really seem consistent
with the amount of data notmuch reports as having processed. Are you
setting XAPIAN_FLUSH_THRESHOLD very high?
You can look at the low level entries in the postlist table with:
xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass
(You'll need to build xapian-core from source to get xapian-inspect,
as it's really a tool for developers).
I'd guess the two entries are user metadat...
2017 Mar 02
2
errors on rebuild
Hi Olly,
Thanks for the detailed response. I hadn’t realized there was a new xapian haystack backend. I’m going to try that but I have some upgrades to do first. Django 1.8, etc.
Thanks,
Ryan
> On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com> wrote:
>
> On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote:
>> I am trying to rebuild an index of 2+