similar to: Xapian Index 253 million documents = 704G

Displaying 20 results from an estimated 400 matches similar to: "Xapian Index 253 million documents = 704G"

2011 Mar 31
0
Xapian Index: 607GB = 219 million of unique documents
It took approximately five days, having single process using one core CPU and 6GB of memory to build this giant 607GB single Xapian index, containing 219 million of unique documents (web sites). So far I did not found any other implementation that would enable me to build such a single index containing over 200 million documents, while testing Lucene, Solr, MySQL, Hadoop and Oracle. Probably
2010 Dec 18
1
Xapian index size 475GB = 170 million documents (URLs)
Xapians, I am maintaining about two indexes for my search engines which approximately is each the same size. I would like to share this knowledge with you, since many of you have never seen Xapian index of this size. And of course you can search the index by yourself at - http://myhealthcare.com/ - http://find1friend.com/ I need 2 x 100 million more documents into each index, and I hope it will
2011 Apr 01
0
Xapian-discuss Digest, Vol 83, Issue 1
I think this is a shining example of how well Xapian works with large document collections. I was just discussing this with my colleagues here and one of the issues that came up is that we'd love Xapian to become really lot more popular but have found that the documentation's a bit difficult to get into, as is the API. So I was wondering: do you have any thoughts on improving this and
2011 Apr 02
1
Xapian docs (was Re: Xapian-discuss Digest, Vol 83, Issue 2)
> I think this is a shining example of how well Xapian works with large > document collections. I was just discussing this with my colleagues here > and one of the issues that came up is that we'd love Xapian to become > really lot more popular but have found that the documentation's a bit > difficult to get into, as is the API. I agree. There are a few gotchas, as well
2012 Apr 16
1
Rebuilding corrupt databases from .DB files.
We've had some catastrophic filesystem failures that have left us with corrupted databases with empty files and no backup for about 15TB of our data. Recreating the 15TB from source data backups is possible but will take a very very long time. I'm hoping that, given all of the .DB files are still intact, there my be some way to extract their contents and rebuild the other tables. This
2011 Jun 10
2
Just starting to experiment with php
I took one of the examples and tried to run against my database ls -l /data1/mail/db/cur.1 total 1129624 -rw-r--r-- 1 jwl jwl 0 2011-06-09 02:27 flintlock -rw-r--r-- 1 jwl jwl 28 2011-06-09 02:27 iamchert -rwxrwxrwx 1 jwl jwl 7258 2011-06-09 02:27 position.baseA -rwxrwxrwx 1 jwl jwl 7046 2011-06-09 02:27 position.baseB -rwxrwxrwx 1 jwl jwl 474226688 2011-06-09 02:28
2012 Nov 21
1
about index speed of xapian
hi, i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second. code: try { Xapian::WritableDatabase
2006 Aug 06
1
How to use omega to search remote back end?
Folks, Having trouble getting this to work. OMEGA cgi is not reading my stub file properly because it is trying to read it as a directory instead of a file. Is there an easy fix? Here is a transcript. Thanks, OSC oscar@epsilon:/svr/xapian/beta$ ls -aFl total 21335200 drwxr-xr-x 2 oscar oscar 4096 Aug 6 10:15 ./ drwxr-xr-x 5 oscar oscar 4096 Aug 6 12:59 ../ lrwxrwxrwx 1 oscar
2015 Apr 27
2
empty FD after reopen since version 1.2.16
Hi all, after upgrading xapian I encountered the same problem as described in ticket #645 Read block errors after reopen() in our setup its 100% reproducible after each reopen(). I downgraded again and it seems the problem occurs in Version 1.2.16 and above. in <=1.2.15 everything works fine without seeing this error once. attaches strace shows read ends on FD. strace starts at reopen()
2009 Nov 26
1
Protecting .baseA and .baseB files
Most Xapian database files are locked while the database is open, but it seems that .baseA and .baseB files are not, so any other application can delete them (I am talking about the Windows package). Is there a way to protect them as rest of the Xapian database files? Regards, PK
2010 Aug 16
1
No position.{DB,baseA,baseB}
I've just noticed that new indexes no longer have position.{DB,baseA,baseB} files, all previous indexes (I roll indexes every week using xapian-compact) have the position files. The index seems to work but it is returning some odd results, for example if I run a query with the phrase "machine learning" it mostly returns documents containing "machine learning" but it also
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2016 Jan 08
2
Strange index consistency issue
Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2017 Feb 27
2
errors on rebuild
Hello, I am trying to rebuild an index of 2+ million documents and have not been successful. I am running Python 2.7 Django 1.7 Haystack 2.1.1 Xapian 1.2.21 The index rebuild command I’m using is: django-admin.py rebuild_index --noinput --batch-size=100000 The rebuild completes but an immediate xapian-check returns this error: xapian-check ./archive_index record: baseB blocksize=8K
2007 Nov 08
0
Xapian Search Websites Listings
Xapian Search Websites Listings, I come across Xapian Search Websites Listings for Xapian search engines. http://xapian.org/users.php Can you please ad MyHealthcare.com search engine to section: Search Websites MyHealthcare.com using Xapian to crawl and search 50 million web sites on single 1U server. MyHealthcare.com Url: http://myhealthcare.com General web search engine with 50 million
2014 Feb 13
2
回复: A beginner in "Posting list encoding improvements"
I think what i did is the same with you except i use make rather than make -sj8, and I did as root. And I do as you wrote again: root at hurricanetong-VirtualBox:/home/hurricanetong/xapian-1.2.17/xapian-core-1.2.17# ./configure [...] root at hurricanetong-VirtualBox:/home/hurricanetong/xapian-1.2.17/xapian-core-1.2.17# make -sj8 Making all in . Making all in docs Making all in tests root at
2011 Jul 19
1
xapian-compact ok, xapian-check failure
Greets, I've encountered the following while performing test merges (and writing code to handle errors, etc so things can be automated) and wondering about the best way to proceed: xapian-compact -b64k -m src1 src2.... tmp_dst -- works as expected, exit code 0. xapian-check tmp_dst -- produces the following error for the postlist: postlist: baseB blocksize=64K items=28175410
2007 Feb 02
1
Working demo of search engine using boolean query.
Lately I was reading many articles about using boolean queries for search engine but I haven't seen any complete working demo. Therefore I put together very simple working demo of search engine using boolean query. Feel free to suggest any performance improvement or error while keeping it as simple as possible for understanding. Thanks, -Kevin Duraj http://myhealthcare.com
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum