Displaying 20 results from an estimated 1200 matches similar to: "How to speed up indexing ?"
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6
8 AMD Opteron 64-bit Processors
32GB Memory
--------------------------------------------------------------------------------
Environment:
------------------
XAPIAN_FLUSH_THRESHOLD=21000000
XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000
XAPIAN_PREFER_FLINT=True
Indexing 20 million documents:
--stemmer=none
-------------------------------------------
real 79m9.378s
user 77m28.696s
2004 May 05
1
buffered tables, sessions, and transactions
Quartz has a QuartzDiskTable class which is a thin wrapper for a pair of
Btree objects (or just one if the table is opened readonly):
http://www.xapian.org/docs/sourcedoc/html/classQuartzDiskTable.html
There's also a QuartzBufferedTable class which adds memory buffering of
changes to this:
http://www.xapian.org/docs/sourcedoc/html/classQuartzBufferedTable.html
However, as of 0.8.0 we now
2004 Oct 08
1
indexing performance
I've some trouble with my indexer, which builds on simpleindex.cc. The problem
is that indexing process becomes very slow after we indexed 2000k docs (though
the indexer works quite well with first 2000k docs). It took almost three
weeks to index 8 million docs. However, we need to index about 20 million
docs. I have to stop the indexer due to its performance.
I think my question is
2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
When trying for force Xapian flush documents to flush after 20 million
documents Xapian ignores the size and flush it after only 10,000
documents.
Data captured from delve after 60 seconds interval when has been set as follow:
XAPIAN_FLUSH_THRESHOLD=20000000
perl -e ' while(1) { system("delve ."); sleep(60); } '
2005 Sep 09
7
[PATCH 0/6] jbd cleanup
The following 6 patches cleanup the jbd code and kill about 200 lines.
First of 4 patches can apply to 2.6.13-git8 and 2.6.13-mm2.
The rest of them can apply to 2.6.13-mm2.
fs/jbd/checkpoint.c | 179 +++++++++++--------------------------------
fs/jbd/commit.c | 101 ++++++++++--------------
fs/jbd/journal.c | 11 +-
fs/jbd/revoke.c | 158
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
I'm playing around with a machine that has 2 GB of memory.
Indexing about 5GB of data average of 2MB per document.
The documents are plain text.
I notice the omindex's memory fott print get's biger an bigger then the
machine starts to swap and it all slows down to a crawl.
In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000
Am I right in saying that for my setup
2010 Aug 04
6
[PATCH -v2 0/3] jbd2 scalability patches
This version fixes three bugs in the 2nd patch of this series that
caused kernel BUG when the system was under race. We weren't accounting
with t_oustanding_credits correctly, and there were race conditions
caused by the fact the I had overlooked the fact that
__jbd2_log_wait_for_space() and jbd2_get_transaction() requires
j_state_lock to be write locked.
Theodore Ts'o (3):
jbd2: Use
2012 Dec 29
3
omindex killed
I'm finding that omindex is consistently ending prematurely when
indexing certain files. The last output looks like this:
[Entering directory /compounds/Acetic_acid]
Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.TXT" as text/plain ...
added.
Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.pdf" as
application/pdf ... "pdftotext -enc UTF-8
2012 Nov 21
1
about index speed of xapian
hi,
i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second.
code:
try
{
Xapian::WritableDatabase
2007 Jun 12
1
Empty results OMEGA with XAPIAN 1.0.1
Hi,
I configured XAPIAN 1.0.1 and OMEGA 1.0.1. on my development machine
(first removed the old ones). I recreated my databases (both quartz
and flint) and tried to run original queries against the databases
created by the new versions.
I'm getting empty result sets from OMEGA. If I use the delve tool I
actually see that the records are created fine. No log files are
written as far as I
2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required.
Backstory:
I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera.
One of the unfinished parts was removing expunged emails from the search database.
We moved from having a single search database to supporting multiple
2015 Feb 10
3
Bitsize project - Krovetz stemmer
Hello Xapian devs,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20150210/c848e9b7/attachment-0002.html>
2005 Oct 12
2
Stemmer Modifications
I'm using Xapian as a search back-end on a website. My client has
certain search terms that the stemmer does not stem in a way they would
like. For example "continuity" stems to "continu", which produces
undesirable results in their application. Is there a way to override the
stemming of certain words in a way that is compatible with the indexing
stemmer and the query
2008 Aug 16
1
python how do i stem words in python?
hi,
i am newbie to xapian and am trying to get started with it in python.
there is no stemmer.stem_word method in the latest python library. how
do i stem words before doing doc.add_posting?
is there any sample hello world code in python that i can use?
thanks a lot!
>>> stemmer = xapian.Stem('english')
>>> stemmer.
stemmer.__call__
2014 Nov 29
4
Adding Support for Krovetz Stemmer Algo in Xapian
Hello,
As mentioned on the project ideas page, Adding more support for stemmer
algorithm,
i found an implementation of Krovetz Stemmer Algo in C++ but before
working on it to merge it into xapian, i needed help in recognizing the
license information associated with the source code.
To avoid further licensing issues kindly someone check the link
2007 Jan 09
2
non-snowball stemmer
Hi!
I am going to use non-snowball russian stemmer with Xapian. There is a
good one at http://www.aot.ru. I've found that current implementation of
Xapian::Stem does not allow it (there is no public interface for
Xapian::Stem::Internal). Do you apply patches? Are there any
recommendations for writing patches?
Regards,
Oleg Obolenskiy
highpower at mail.ru
2002 May 31
2
PATCH for filesys corruption in ext3 with data=journal
Hi,
as I mentioned in earlier mail to ext3-users I have been getting some
corruption on an ext3 filesystem that has been serving NFS. I am now
confident that I fully understand the problem and have a patch.
It only affects data=journal mode and I wonder if it might also be the
cause of the corruption noted by a number of people on linux-kernel.
First I will explain the problem. Then display
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr:
http://www.anur.ag/blog/2009/03/xapian-and-solr/
According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB.
I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2007 Jun 17
2
Flint failed to deliver indexing performance to Quartz.
Flint failed to deliver indexing performance to Quartz.
I am proposing to remove Flint as default database and place Quartz
database back as default. The catch is not that Flint database is
smaller and faster during searches then Quartz database as developers
were concerning when were measuring and neglecting to measure
performance when creating the large indexes.
The truth is that Flint
2012 Aug 31
1
too slow when create index
I am create index for some files,in my program,a document is a line in a
file. i create index for very lines in a file. is there any method to
speed up this ??????