similar to: Indexing speed benchmark - Xapian, Solr

Displaying 20 results from an estimated 11000 matches similar to: "Indexing speed benchmark - Xapian, Solr"

2017 Dec 29
2
notmuch: Xapian exception during database creation
Running notmuch from git on Debian testing[1] with the mail and database sitting on a ZFS filesystem, adding mail to a new database: > agrajag-testing ~/s/notmuch % ./notmuch new > Found 605510 total files (that's not much mail). > add_file: A Xapian exception occurred36m 37s remaining). > A Xapian exception occurred adding message: Unexpected end of posting list for
2012 Nov 21
1
about index speed of xapian
hi, i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second. code: try { Xapian::WritableDatabase
2017 Apr 03
3
errors on rebuild
On Sat, Mar 25, 2017 at 06:36:25PM -0500, Ryan Cross wrote: > After upgrades my stack is now: > > Python 2.7 > Django 1.8 > Haystack 2.6.0 > Xapian 1.4.3. (latest xapian haystack backend with some modifications) > > Using the same rebuild command as below but with —batch-size=50000 > > The issue has now become one of performance. I am indexing 2.2 million >
2010 Jan 14
1
Latest revision and backwards compatibility
Greetings, I've been wondering about the index format and backwards compatibility. We're using the dev version (for chert) and each svn up means that any indexes created prior to this revision cannot be read. Is this purely a cautious move to prevent errors, and, barring any obvious index format changes, can I safely force the current revision to read existing indexes? eg, by
2004 Oct 08
1
indexing performance
I've some trouble with my indexer, which builds on simpleindex.cc. The problem is that indexing process becomes very slow after we indexed 2000k docs (though the indexer works quite well with first 2000k docs). It took almost three weeks to index 8 million docs. However, we need to index about 20 million docs. I have to stop the indexer due to its performance. I think my question is
2017 Mar 02
2
errors on rebuild
Hi Olly, Thanks for the detailed response. I hadn’t realized there was a new xapian haystack backend. I’m going to try that but I have some upgrades to do first. Django 1.8, etc. Thanks, Ryan > On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com> wrote: > > On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote: >> I am trying to rebuild an index of 2+
2016 Jul 06
2
Xapian 1.4.0 released
I have installed the new Xapian 1.4.0 , during the installation, I haven't seen any problems, however, when I execute commands quest and delve I get different versions, and my Perl-based searches return Exception: Couldn't detect type of database ... and what are these glass things in the index directories? There is a no new version of Perl Search::Xapian. $ quest -version quest -
2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000 When trying for force Xapian flush documents to flush after 20 million documents Xapian ignores the size and flush it after only 10,000 documents. Data captured from delve after 60 seconds interval when has been set as follow: XAPIAN_FLUSH_THRESHOLD=20000000 perl -e ' while(1) { system("delve ."); sleep(60); } '
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2011 Jun 10
2
Just starting to experiment with php
I took one of the examples and tried to run against my database ls -l /data1/mail/db/cur.1 total 1129624 -rw-r--r-- 1 jwl jwl 0 2011-06-09 02:27 flintlock -rw-r--r-- 1 jwl jwl 28 2011-06-09 02:27 iamchert -rwxrwxrwx 1 jwl jwl 7258 2011-06-09 02:27 position.baseA -rwxrwxrwx 1 jwl jwl 7046 2011-06-09 02:27 position.baseB -rwxrwxrwx 1 jwl jwl 474226688 2011-06-09 02:28
2010 Mar 07
2
"Value in posting list too large" error with 1.1.4 (chert and brass, not flint)
Hi, I've a program which: 1. Sets XAPIAN_FLUSH_THRESHOLD=1000 2. Opens a (new) database for write 3. Indexes a few thousand documents 4. Periodically also does queries on the database With 1.1.4, with certain document sets (basically a particular mail folder of mine), Enquire.get_mset() sometimes (but not always) triggers a "RangeError: Value in posting list too
2014 Mar 28
2
Reducing Xapian memory usage
Hey guys I noticed xapian using a lot of memory while indexing [1] so I decided to look at the bottle necks and where this can be improved. Here are some large spots that I noticed (Chert) - 1. Every document has map<string, OmDocumentTerm> and OmDocumentTerm contains the same string again. This results in every term being stored in memory twice. Additionally multiple documents may
2020 Oct 21
2
xapian-check sorted order error
Hi, We were running xapian-check on one of our Xapian indexes and it returns the following error: position: baseB blocksize=8K items=809896869 lastblock=2090419 revision=3161 levels=3 root=2084903 Failed to check B-tree: DatabaseError: Items not in sorted order The other tables verify without issue. It looks like our oldest backup of this database (a month old) has the same issue. Searching and
2019 Jul 04
2
solr vs fts
Am Donnerstag, den 04.07.2019, 12:27 +0300 schrieb Aki Tuomi via dovecot: > On 4.7.2019 12.22, Maciej Milaszewski IQ PL via dovecot wrote: > > Hi > > So you're advised to use a solr or something else? > > > > Using any FTS is advisable, currently suitable ones would be SOLR or > Xapian (see https://github.com/grosjo/fts-xapian) > Hi Aki, I didn't yet
2019 Jul 04
2
solr vs fts
>> A few clients have 25K and more e-mail >> >> I thinking about use solr like: >> ?fts = solr >> ?fts_solr = debug url=http://IP:8983/solr/ (solr in external machine) >> >> Does it make sense ? use dovecot_indexes and fts ? >> What is the difference in performance? >> > Hi! > > Dovecot indexes are not actually related to FTS that
2012 Apr 16
1
Rebuilding corrupt databases from .DB files.
We've had some catastrophic filesystem failures that have left us with corrupted databases with empty files and no backup for about 15TB of our data. Recreating the 15TB from source data backups is possible but will take a very very long time. I'm hoping that, given all of the .DB files are still intact, there my be some way to extract their contents and rebuild the other tables. This
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
I'm playing around with a machine that has 2 GB of memory. Indexing about 5GB of data average of 2MB per document. The documents are plain text. I notice the omindex's memory fott print get's biger an bigger then the machine starts to swap and it all slows down to a crawl. In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000 Am I right in saying that for my setup
2020 Aug 27
4
Xapian on Android?
Friends, I would like to hear from anyone who has experience deploying Xapian on Android. I'm new to Xapian, but I know it is used by a couple partners for offline projects on Linux and Windows. Our small nonprofit, WiderNet, provides off-line access to thousands of Web sites for people who lack Internet connectivity (www.widernet.org). Over 2,000 universities, schools, health care sites,
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote: > > Some might notice the 50% index size increase. Excessive index size is > > already one relatively rare, but recurring complaint. Except if I did > > something wrong: I'm actually quite surprised by it. > > Did you try compacting the resulting databases? > >