search for: txn_byte

Displaying 3 results from an estimated 3 matches for "txn_byte".

Did you mean: txn_bytes
2023 May 03
1
manual flushing thresholds for deletes?
...an terms and the term frequency for boolean terms, so that's: xapian-delve -avv1 .|tr -d A-Z|awk '{f = $3 ? $3 : $2; t += length($1)*f; n += f} END {print t/n}' > My Perl deletion code is something like: > > my $EST_LEN = 6; > ... > for my $docid (@docids) { > $TXN_BYTES -= $xdb->get_doclength($docid) * $EST_LEN; However you're using that estimate here, and the document length doesn't include boolean terms (it's sum(wdf) over the terms in the document), so including them in $EST_LEN seems wrong. For you doing so increases $EST_LEN, so you'll t...
2023 May 03
1
manual flushing thresholds for deletes?
...NR > 1 {t += length($1)*($3+1); n += ($3+1)} END {print t/n}' # (also added "NR > 1" to ignore the delve header line) Which gives me 6.00067, so rounding to 6 seems fine either way. My Perl deletion code is something like: my $EST_LEN = 6; ... for my $docid (@docids) { $TXN_BYTES -= $xdb->get_doclength($docid) * $EST_LEN; $xdb->delete_document($docid); if ($TXN_BYTES < 0) { # flush within txn $xdb->commit_transaction; $TXN_BYTES = 8000000; $xdb->begin_transaction; } } > > (that awk bit should be overflow-free) <snip> > Or us...
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a