similar to: Xapian index size 475GB = 170 million documents (URLs)

Displaying 20 results from an estimated 700 matches similar to: "Xapian index size 475GB = 170 million documents (URLs)"

2012 Apr 16
1
Rebuilding corrupt databases from .DB files.
We've had some catastrophic filesystem failures that have left us with corrupted databases with empty files and no backup for about 15TB of our data. Recreating the 15TB from source data backups is possible but will take a very very long time. I'm hoping that, given all of the .DB files are still intact, there my be some way to extract their contents and rebuild the other tables. This
2006 Aug 06
1
How to use omega to search remote back end?
Folks, Having trouble getting this to work. OMEGA cgi is not reading my stub file properly because it is trying to read it as a directory instead of a file. Is there an easy fix? Here is a transcript. Thanks, OSC oscar@epsilon:/svr/xapian/beta$ ls -aFl total 21335200 drwxr-xr-x 2 oscar oscar 4096 Aug 6 10:15 ./ drwxr-xr-x 5 oscar oscar 4096 Aug 6 12:59 ../ lrwxrwxrwx 1 oscar
2011 Mar 31
0
Xapian Index: 607GB = 219 million of unique documents
It took approximately five days, having single process using one core CPU and 6GB of memory to build this giant 607GB single Xapian index, containing 219 million of unique documents (web sites). So far I did not found any other implementation that would enable me to build such a single index containing over 200 million documents, while testing Lucene, Solr, MySQL, Hadoop and Oracle. Probably
2011 May 13
0
Xapian Index 253 million documents = 704G
Xapian Index 253 million documents = 704G I just build my largest single Xapian index with 253 million unique documents on single server using single hard disk, less that 8G RAM and single processor 2.0 GHz. I do not see any search performance decreases in searching my indexes between 100 million and 250 million, which indicates a good scalability of Xapian and it looks like, I can push it easily
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2011 Jun 10
2
Just starting to experiment with php
I took one of the examples and tried to run against my database ls -l /data1/mail/db/cur.1 total 1129624 -rw-r--r-- 1 jwl jwl 0 2011-06-09 02:27 flintlock -rw-r--r-- 1 jwl jwl 28 2011-06-09 02:27 iamchert -rwxrwxrwx 1 jwl jwl 7258 2011-06-09 02:27 position.baseA -rwxrwxrwx 1 jwl jwl 7046 2011-06-09 02:27 position.baseB -rwxrwxrwx 1 jwl jwl 474226688 2011-06-09 02:28
2011 Apr 01
0
Xapian-discuss Digest, Vol 83, Issue 1
I think this is a shining example of how well Xapian works with large document collections. I was just discussing this with my colleagues here and one of the issues that came up is that we'd love Xapian to become really lot more popular but have found that the documentation's a bit difficult to get into, as is the API. So I was wondering: do you have any thoughts on improving this and
2012 Nov 21
1
about index speed of xapian
hi, i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second. code: try { Xapian::WritableDatabase
2015 Apr 27
2
empty FD after reopen since version 1.2.16
Hi all, after upgrading xapian I encountered the same problem as described in ticket #645 Read block errors after reopen() in our setup its 100% reproducible after each reopen(). I downgraded again and it seems the problem occurs in Version 1.2.16 and above. in <=1.2.15 everything works fine without seeing this error once. attaches strace shows read ends on FD. strace starts at reopen()
2011 Apr 02
1
Xapian docs (was Re: Xapian-discuss Digest, Vol 83, Issue 2)
> I think this is a shining example of how well Xapian works with large > document collections. I was just discussing this with my colleagues here > and one of the issues that came up is that we'd love Xapian to become > really lot more popular but have found that the documentation's a bit > difficult to get into, as is the API. I agree. There are a few gotchas, as well
2009 Nov 26
1
Protecting .baseA and .baseB files
Most Xapian database files are locked while the database is open, but it seems that .baseA and .baseB files are not, so any other application can delete them (I am talking about the Windows package). Is there a way to protect them as rest of the Xapian database files? Regards, PK
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2019 Aug 26
2
Commit error with Xapian 1.4.11
A Recoll user gets the following message while indexing: "Attempted to delete or modify an entry in a non-existent posting list for #bannerholder" The exception happens during a commit call. Xapian version 1.4.11, Debian Buster A little more detail here: https://opensourceprojects.eu/p/recoll1/tickets/108/ I asked if this was reproducible, and to run the indexing in single-thread
2014 Feb 13
2
A beginner in "Posting list encoding improvements"
I uninstall xapian1.3 and install xapian-1.2.17 but i still failed hurricanetong at hurricanetong-VirtualBox:~/workspace$ g++ `xapian-config --cxxflags --libs` demo2.cc /tmp/cc2wsfDJ.o: In function `main': demo2.cc:(.text+0x4a): undefined reference to `Xapian::WritableDatabase::WritableDatabase(std::basic_string<char, std::char_traits<char>, std::allocator<char> >
2016 Jan 14
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes: > > On Thu, Jan 14, 2016 at 11:04:29AM +0100, Jean-Francois Dockes wrote: > > Olly Betts writes: > > > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote: > > > > I will look into the bug you listed to see if it might be related. If there > > > > is anything else that I can do, please
2010 Aug 16
1
No position.{DB,baseA,baseB}
I've just noticed that new indexes no longer have position.{DB,baseA,baseB} files, all previous indexes (I roll indexes every week using xapian-compact) have the position files. The index seems to work but it is returning some odd results, for example if I run a query with the phrase "machine learning" it mostly returns documents containing "machine learning" but it also
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings, Using latest svn. I've noticed the following error when performing index merging: postlist: baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459 B-tree checked okay Tag containing meta information is corrupt. postlist table errors found: 1 I can still search on this index (I've only checked very small indexes), but merging is now a problem since I check
2011 Jul 19
1
xapian-compact ok, xapian-check failure
Greets, I've encountered the following while performing test merges (and writing code to handle errors, etc so things can be automated) and wondering about the best way to proceed: xapian-compact -b64k -m src1 src2.... tmp_dst -- works as expected, exit code 0. xapian-check tmp_dst -- produces the following error for the postlist: postlist: baseB blocksize=64K items=28175410
2016 Jan 08
2
Strange index consistency issue
Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique