Displaying 3 results from an estimated 3 matches for "txn_bytes".
Did you mean:
tx_bytes
2023 May 03
1
manual flushing thresholds for deletes?
...an terms and the term frequency for boolean terms,
so that's:
xapian-delve -avv1 .|tr -d A-Z|awk '{f = $3 ? $3 : $2; t += length($1)*f; n += f} END {print t/n}'
> My Perl deletion code is something like:
>
> my $EST_LEN = 6;
> ...
> for my $docid (@docids) {
> $TXN_BYTES -= $xdb->get_doclength($docid) * $EST_LEN;
However you're using that estimate here, and the document length
doesn't include boolean terms (it's sum(wdf) over the terms in the
document), so including them in $EST_LEN seems wrong. For you doing
so increases $EST_LEN, so you'll te...
2023 May 03
1
manual flushing thresholds for deletes?
...NR > 1 {t += length($1)*($3+1); n += ($3+1)} END {print t/n}'
# (also added "NR > 1" to ignore the delve header line)
Which gives me 6.00067, so rounding to 6 seems fine either way.
My Perl deletion code is something like:
my $EST_LEN = 6;
...
for my $docid (@docids) {
$TXN_BYTES -= $xdb->get_doclength($docid) * $EST_LEN;
$xdb->delete_document($docid);
if ($TXN_BYTES < 0) { # flush within txn
$xdb->commit_transaction;
$TXN_BYTES = 8000000;
$xdb->begin_transaction;
}
}
> > (that awk bit should be overflow-free)
<snip>
> Or use...
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > 10 seems too long. You want the mean word length weighted by frequency
> > of occurrence. For English that's typically around 5 characters, which
> > is 5 bytes. If we go for +1 that's:
>
> Actually, 10 may be too short in my case since there's a