Okay XAPIANS I found the Bug!
flint_database.cc for what ever reason is not picking up the
environment variable XAPIAN_FLUSH_THRESHOLD and makes the indexing
VERY SLOW, because it defaults it to 10000 documents. I was going
crazy for passed month after we switched to FLINT not able to figure
out why indexing goes so slow. Therefore I hard coded my own
flush_threshold directly to flint_database.cc and now indexing going
fast as before!
PS: Sometimes you just got to hack it yourself ... welcome to open
source ... *hahaha*
-= MY HACK =-
vi flint_database.cc
size_t FlintWritableDatabase::flush_threshold = 20000000;
FlintWritableDatabase::FlintWritableDatabase(const string &dir, int action,
int block_size)
: freq_deltas(),
doclens(),
mod_plists(),
database_ro(dir, action, block_size),
total_length(database_ro.postlist_table.get_total_length()),
lastdocid(database_ro.get_lastdocid()),
changes_made(0)
{
DEBUGCALL(DB, void, "FlintWritableDatabase", dir << ",
" << action << ", "
<< block_size);
//if (flush_threshold == 0)
//{
// const char *p = getenv("XAPIAN_FLUSH_THRESHOLD");
// if (p) flush_threshold = atoi(p);
//}
//if (flush_threshold == 0) flush_threshold = 10000;
flush_threshold = 20000000;
}
On 7/17/07, Kevin Duraj <kevin.softdev@gmail.com>
wrote:> There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
>
> When trying for force Xapian flush documents to flush after 20 million
> documents Xapian ignores the size and flush it after only 10,000
> documents.
>
> Data captured from delve after 60 seconds interval when has been set as
follow:
> XAPIAN_FLUSH_THRESHOLD=20000000
>
> perl -e ' while(1) { system("delve ."); sleep(60); } '
>
> number of documents = 8510000
> average document length = 13.5538
> number of documents = 8520000
> average document length = 13.5537
> number of documents = 8530000
> average document length = 13.5543
> number of documents = 8530000
> average document length = 13.5543
> number of documents = 8540000
> average document length = 13.5548
> number of documents = 8550000
> average document length = 13.5548
> number of documents = 8550000
> average document length = 13.5548
> number of documents = 8560000
> average document length = 13.5545
> number of documents = 8570000
> average document length = 13.5549
> number of documents = 8570000
> average document length = 13.5549
> number of documents = 8580000
> average document length = 13.5563
> number of documents = 8590000
> average document length = 13.5568
>
> PS: Please do not ask me create smaller index and then merge them. I
> am indexing 500 million documents. 20 million is my small index.
>
> --
> Cheers,
> Kevin Duraj
>
--
Cheers,
Kevin