Displaying 20 results from an estimated 20000 matches similar to: "Large database problem"
2018 Jul 02
2
Is there a large variance in xapian searching?
Dear XAPIAN developers,
I was using xapian to index large than 13 million document about Q &
A(Quora similarly). I will share some performance data about indexing
and searching, and I will seek some help for improving performance of
searching.
My computer has 8 i7 at 3.4G CPU and 16G memory, ubuntu 16.04. Dataset
include about 13M document, each document will be cut into 35
term(Chinese
2008 Dec 03
1
Compiling latest svn revision
Greetings,
Before I head off to bed I thought I'd fire off this email wrt
compiling the latest svn revision.
I finally resolved all the dependencies, ran bootstrap/configure, but
make eventually fails with:
/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
Xapian.o: In function `boot_Search__Xapian':
2012 Apr 19
1
Xapian::Database->close() for perl missing
I have a xapian-daemon, which can be queried via http. A background-process
generated every hour one new index and then remove and create a new symlink
to the current database.
/path/to/index/20120419010000
/path/to/index/20120419020000
/path/to/index/20120419030000
/path/to/index/default => /path/to/index/20120419030000
So the daemon only check the mtime of /path/to/index/default/iamchert
2011 Jun 10
2
Just starting to experiment with php
I took one of the examples and tried to run against my database
ls -l /data1/mail/db/cur.1
total 1129624
-rw-r--r-- 1 jwl jwl 0 2011-06-09 02:27 flintlock
-rw-r--r-- 1 jwl jwl 28 2011-06-09 02:27 iamchert
-rwxrwxrwx 1 jwl jwl 7258 2011-06-09 02:27 position.baseA
-rwxrwxrwx 1 jwl jwl 7046 2011-06-09 02:27 position.baseB
-rwxrwxrwx 1 jwl jwl 474226688 2011-06-09 02:28
2008 Dec 02
1
NFSv4 and locking
Greetings,
We use NFSv4 on our cluster and perform distributed indexing (well, we
used to on our previous system which used a simple touch() locking
mechanism).
I'm having a spot of bother getting Xapian to obtain a lock (hangs on
fcntl64()).
I've read http://trac.xapian.org/wiki/XapianOverNFS and other list
posts, and noted that a lock daemon should be running to allow locks
2008 Nov 21
1
Multiple databases vs Single large database
Hi
I've decided to use xapian because my files table in my mysql database is going
to grow very large, and it seems mysql isn't good at full text searching. I'm
doing this with the php wrapper by the way.
The way my system is set out, each user has their own set of files, and when
doing a search it is going to be for a specific user's file (based on file
name, title,
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings,
Using latest svn.
I've noticed the following error when performing index merging:
postlist:
baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459
B-tree checked okay
Tag containing meta information is corrupt.
postlist table errors found: 1
I can still search on this index (I've only checked very small indexes),
but merging is now a problem since I check
2010 Jan 20
2
Error when creating trac bug ticket
Greets
Just tried to create a bug ticket on trac.xapian.org and it croaked with
the error:
-----------
Trac detected an internal error:
IntegrityError: columns ticket, name are not unique
The action that triggered the error was:
POST: /newticket
-----------
Clicking on the Create button to report the error results in an invalid URL.
What's the best way to proceed to report my bug?
Thanks
2011 Jun 20
1
Revision: 15699: $tg->index_text ($text, $weight) fails with "No matching function for overloaded 'TermGenerator_index_text'"
Hi,
I've been out of touch recently, so perhaps I've missed something (the last
time I checked the svn pulse the Perl code was under search-xapian/ - looks
like things have moved to swig).
The latest trunk (revision 15699) has a problem with Perl:
$tg->index_text ($text, $weight);
It fails with "No matching function for overloaded 'TermGenerator_index_text'..."
I
2011 Jul 19
1
xapian-compact ok, xapian-check failure
Greets,
I've encountered the following while performing test merges (and writing code
to handle errors, etc so things can be automated) and wondering about the best
way to proceed:
xapian-compact -b64k -m src1 src2.... tmp_dst -- works as expected, exit code 0.
xapian-check tmp_dst -- produces the following error for the postlist:
postlist:
baseB blocksize=64K items=28175410
2011 Jun 21
3
Error after upgrading to latest xapian distro
I upgraded to latest xapian version and I have started getting
xapian.InvalidArgumentError: Term too long (> 245): XTEXT...
This issue was not there in 1.0.16 but it is in the latest version.
Any solutions.
thanks
2008 Dec 06
1
Obtaining actual match count if using set_collapse_key()
Greets,
Is it possible to obtain the actual match count if you're using
set_collapse_key()? ie, the total count *before* the collapsing
occurs (without using get_mset()).
Alternatively, will MSet::get_matches_estimated() return the true -
pre-collapse - count, or will it also be affected by collapsing?
Thanks
Henry
2008 Nov 26
1
Trying to patch xapian perl add/remove_spelling
Greets,
I'm giving a stab at patching the CPAN module to add the missing
WritableDatabase::add_spelling and remove_spelling, but need a bit of
guidance since I'm coming in cold, and pressed for time (aren't we all).
I've modified XS/WritableDatabase.xs and added the two necessary
functions, and also added the two basic tests in t/index.t.
Compilation completes cleanly, but
2010 Feb 02
1
Optimal usage of xapian-compact for merging
Greets,
I've been wondering, what's the sane/optimal use of xapian-compact when
merging many indexes with a view to maximum merging performance?
The obvious:
- only use -F on the final db.
- use -m since I'm merging more than 3 dbs.
Best strategy?
a) loop: merge batches (of say 50, where the individual db's are small)
into a temp index, then merge the (larger) temp into the
2010 Jun 11
1
Interesting xapian-compact observations
Greets,
I've had xapian-compact (without -F) sessions running for several days now
on 10 'merge' machines and I've noticed that the average compaction
average can swing wildly:
18% 76% 10% 19% 39% 13% 69% 43% 19% 42%
The average so far is about 35% (ie, 65% reduction in target index sizes,
which is unexpected and pleasingly welcomed).
I'm curious about the large variance in
2011 Jul 13
1
Feature request: Determining source index of xapian-compact DatabaseError exception
Greets,
When merging lots of subindexes in batches like so:
xapian-compact -m idx1 idx2... dstidx
Errors such as:
xapian-compact: DatabaseError: Error reading block 0: got end of file
present a problem since it does not provide the offending path name (of the
broken index) for easy identification/removal in automated/batch scenarios
(the way DatabaseOpeningError:.... does, eg). The only way
2011 Sep 30
1
Slow phrase performance
I've been getting excellent performance out of xapian but when
searches on phrases of common terms such as [ "north america" ] or [
"art history" ] get run it will take a very long time to come up with
results.
Examples:
------------------------------
[ south africa ] -- 10379 results found in ~.2 sec
[ white house ] -- 17988 results found in <1 sec
Quoting either of
2010 Apr 16
2
best practices - combining sql database and xapian, size of database?
Newbie-alert: I'm just getting started on a new project involving a
full text search requirement, and my initial investigation points to
xapian being the way to go.
Two questions:
- eventually I'll most likely be indexing towards 50 million
documents - is this reasonable to expect or attempt with xapian?
- each of my documents come with a set of attributes. These are easily
stored
2008 Nov 28
1
Lucene & Solr
Hi all,
I've been asked to prepare a comparison of Lucene/Solr and Xapian and
I'm trying to find some differences between the two. I'm not that
familiar with Lucene myself but I expect there are lots of people who
will have looked at both before ending up on this mailing lists.
Can anyone help? I'm looking for both differences between the two
systems and perhaps the reasons
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ?
i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc.
search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp.
This method is ok , but