Displaying 3 results from an estimated 3 matches for "mean_term_length".
2023 Mar 24
1
manual flushing thresholds for deletes?
...dealing with many deletes at once and hitting OOM
again. Since the raw text is no longer available and I didn't
store its original size anywhere, would calculating something
based on get_doclength be a reasonable approximation?
I'm wondering if something like:
get_doclength * (1 + 3) * mean_term_length
where 1x is for the mean term length itself,
and 3x for the position overhead
And perhaps assume mean_term_length is 10 bytes, so maybe:
get_doclength * 40
?
I'm using Search::Xapian XS since it's in Debian stable;
and don't think there's a standard way to show the amount...
2023 Mar 26
1
manual flushing thresholds for deletes?
...nd hitting OOM
> again. Since the raw text is no longer available and I didn't
> store its original size anywhere, would calculating something
> based on get_doclength be a reasonable approximation?
>
> I'm wondering if something like:
>
> get_doclength * (1 + 3) * mean_term_length
>
> where 1x is for the mean term length itself,
> and 3x for the position overhead
If I follow you want an approximation to the number of raw bytes in the
text to match the non-delete case, so I think you want something like:
get_doclength() / 2 * (mean_word_length + 1)
The /2 is...
2023 Mar 27
1
manual flushing thresholds for deletes?
...nce the raw text is no longer available and I didn't
> > store its original size anywhere, would calculating something
> > based on get_doclength be a reasonable approximation?
> >
> > I'm wondering if something like:
> >
> > get_doclength * (1 + 3) * mean_term_length
> >
> > where 1x is for the mean term length itself,
> > and 3x for the position overhead
>
> If I follow you want an approximation to the number of raw bytes in the
> text to match the non-delete case, so I think you want something like:
>
> get_doclength()...