Displaying 20 results from an estimated 600 matches similar to: "Xapian Index: 607GB = 219 million of unique documents"
2011 Apr 01
0
Xapian-discuss Digest, Vol 83, Issue 1
I think this is a shining example of how well Xapian works with large
document collections. I was just discussing this with my colleagues here
and one of the issues that came up is that we'd love Xapian to become
really lot more popular but have found that the documentation's a bit
difficult to get into, as is the API.
So I was wondering: do you have any thoughts on improving this and
2011 Apr 02
1
Xapian docs (was Re: Xapian-discuss Digest, Vol 83, Issue 2)
> I think this is a shining example of how well Xapian works with large
> document collections. I was just discussing this with my colleagues here
> and one of the issues that came up is that we'd love Xapian to become
> really lot more popular but have found that the documentation's a bit
> difficult to get into, as is the API.
I agree. There are a few gotchas, as well
2011 May 13
0
Xapian Index 253 million documents = 704G
Xapian Index 253 million documents = 704G
I just build my largest single Xapian index with 253 million unique
documents on single server using single hard disk, less that 8G RAM
and single processor 2.0 GHz. I do not see any search performance
decreases in searching my indexes between 100 million and 250 million,
which indicates a good scalability of Xapian and it looks like, I can
push it easily
2010 Dec 18
1
Xapian index size 475GB = 170 million documents (URLs)
Xapians,
I am maintaining about two indexes for my search engines which
approximately is each the same size. I would like to share this
knowledge with you, since many of you have never seen Xapian index of
this size. And of course you can search the index by yourself at
- http://myhealthcare.com/
- http://find1friend.com/
I need 2 x 100 million more documents into each index, and I hope it
will
2011 Jun 10
2
Just starting to experiment with php
I took one of the examples and tried to run against my database
ls -l /data1/mail/db/cur.1
total 1129624
-rw-r--r-- 1 jwl jwl 0 2011-06-09 02:27 flintlock
-rw-r--r-- 1 jwl jwl 28 2011-06-09 02:27 iamchert
-rwxrwxrwx 1 jwl jwl 7258 2011-06-09 02:27 position.baseA
-rwxrwxrwx 1 jwl jwl 7046 2011-06-09 02:27 position.baseB
-rwxrwxrwx 1 jwl jwl 474226688 2011-06-09 02:28
2012 Apr 16
1
Rebuilding corrupt databases from .DB files.
We've had some catastrophic filesystem failures that have left us with corrupted databases with empty files and no backup for about 15TB of our data. Recreating the 15TB from source data backups is possible but will take a very very long time.
I'm hoping that, given all of the .DB files are still intact, there my be some way to extract their contents and rebuild the other tables.
This
2012 Nov 21
1
about index speed of xapian
hi,
i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second.
code:
try
{
Xapian::WritableDatabase
2006 Aug 06
1
How to use omega to search remote back end?
Folks,
Having trouble getting this to work. OMEGA cgi is not reading my stub file properly because it is trying to read it as a directory instead of a file. Is there an easy fix? Here is a transcript.
Thanks,
OSC
oscar@epsilon:/svr/xapian/beta$ ls -aFl
total 21335200
drwxr-xr-x 2 oscar oscar 4096 Aug 6 10:15 ./
drwxr-xr-x 5 oscar oscar 4096 Aug 6 12:59 ../
lrwxrwxrwx 1 oscar
2015 Apr 27
2
empty FD after reopen since version 1.2.16
Hi all,
after upgrading xapian I encountered the same problem as described in
ticket
#645 Read block errors after reopen()
in our setup its 100% reproducible after each reopen(). I downgraded
again and it seems the problem occurs in Version 1.2.16 and above.
in <=1.2.15 everything works fine without seeing this error once.
attaches strace shows read ends on FD.
strace starts at reopen()
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote:
> > The question which remains for me is if I should run xapian-compact
> > after an initial indexing operation. I guess that this depends on the
> > amount of expected updates and that there is no easy answer ?
>
> I think it's not obvious whether it's a good plan
2009 Nov 26
1
Protecting .baseA and .baseB files
Most Xapian database files are locked while the database is open, but it seems
that .baseA and .baseB files are not, so any other application can delete them
(I am talking about the Windows package).
Is there a way to protect them as rest of the Xapian database files?
Regards,
PK
2010 Aug 16
1
No position.{DB,baseA,baseB}
I've just noticed that new indexes no longer have
position.{DB,baseA,baseB} files, all previous indexes (I roll indexes
every week using xapian-compact) have the position files. The index
seems to work but it is returning some odd results, for example if I run
a query with the phrase "machine learning" it mostly returns documents
containing "machine learning" but it also
2016 Jan 08
2
Strange index consistency issue
Hi,
A Recoll user is reporting an index corruption problem. In general, index
corruption happens from time to time with Recoll, because of crashes,
reboots, misc Recoll bugs, etc.
The strange thing here is that xapian-check does not seem to detect anything.
In a nutshell, some document numbers seem to point to a data blackhole: the
docids are returned when searching for the file/doc unique
2011 Jul 19
1
xapian-compact ok, xapian-check failure
Greets,
I've encountered the following while performing test merges (and writing code
to handle errors, etc so things can be automated) and wondering about the best
way to proceed:
xapian-compact -b64k -m src1 src2.... tmp_dst -- works as expected, exit code 0.
xapian-check tmp_dst -- produces the following error for the postlist:
postlist:
baseB blocksize=64K items=28175410
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes:
> On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote:
> > I have a user reporting the following error during recoll indexing:
> >
> > flush() failed: Db block overwritten - are there multiple writers?
> >
> > "flush() failed" is from recoll, the rest is, I think the text of the Xapian
> > exception.
2017 Feb 27
2
errors on rebuild
Hello,
I am trying to rebuild an index of 2+ million documents and have not been successful. I am running
Python 2.7
Django 1.7
Haystack 2.1.1
Xapian 1.2.21
The index rebuild command I’m using is: django-admin.py rebuild_index --noinput --batch-size=100000
The rebuild completes but an immediate xapian-check returns this error:
xapian-check ./archive_index
record:
baseB blocksize=8K
2014 Feb 13
2
回复: A beginner in "Posting list encoding improvements"
I think what i did is the same with you except i use make rather than make -sj8, and I did as root.
And I do as you wrote again:
root at hurricanetong-VirtualBox:/home/hurricanetong/xapian-1.2.17/xapian-core-1.2.17# ./configure
[...]
root at hurricanetong-VirtualBox:/home/hurricanetong/xapian-1.2.17/xapian-core-1.2.17# make -sj8
Making all in .
Making all in docs
Making all in tests
root at
2011 Jan 11
1
chert-update creates a db with some errors
I've some problems converting a xapian db, created with core 1.1.3 (using
chert), to the new chert format.
I'm using xapian-chert-update, compiled from the core-1.2.4.
The conversion seems to run without errors:
#./xapian-core-1.2.4/bin/xapian-chert-update old new
postlist: Reduced by 33.3333% 16K (48K -> 32K)
record: Size unchanged (8K)
termlist: doesn't exist
position: Size
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote:
> On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote:
> > The advantage of compact - it runs approximately 8 times as fast (we
> > are CPU limited in each case - writing to tmpfs first, then rsyncing
> > to the destination) and it takes approximately 75% of the space of a
> > fresh database with maximum
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings,
Using latest svn.
I've noticed the following error when performing index merging:
postlist:
baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459
B-tree checked okay
Tag containing meta information is corrupt.
postlist table errors found: 1
I can still search on this index (I've only checked very small indexes),
but merging is now a problem since I check