search for: mean_term_length

Displaying 3 results from an estimated 3 matches for "mean_term_length".

2023 Mar 24
1
manual flushing thresholds for deletes?
...dealing with many deletes at once and hitting OOM again. Since the raw text is no longer available and I didn't store its original size anywhere, would calculating something based on get_doclength be a reasonable approximation? I'm wondering if something like: get_doclength * (1 + 3) * mean_term_length where 1x is for the mean term length itself, and 3x for the position overhead And perhaps assume mean_term_length is 10 bytes, so maybe: get_doclength * 40 ? I'm using Search::Xapian XS since it's in Debian stable; and don't think there's a standard way to show the amount...
2023 Mar 26
1
manual flushing thresholds for deletes?
...nd hitting OOM > again. Since the raw text is no longer available and I didn't > store its original size anywhere, would calculating something > based on get_doclength be a reasonable approximation? > > I'm wondering if something like: > > get_doclength * (1 + 3) * mean_term_length > > where 1x is for the mean term length itself, > and 3x for the position overhead If I follow you want an approximation to the number of raw bytes in the text to match the non-delete case, so I think you want something like: get_doclength() / 2 * (mean_word_length + 1) The /2 is...
2023 Mar 27
1
manual flushing thresholds for deletes?
...nce the raw text is no longer available and I didn't > > store its original size anywhere, would calculating something > > based on get_doclength be a reasonable approximation? > > > > I'm wondering if something like: > > > > get_doclength * (1 + 3) * mean_term_length > > > > where 1x is for the mean term length itself, > > and 3x for the position overhead > > If I follow you want an approximation to the number of raw bytes in the > text to match the non-delete case, so I think you want something like: > > get_doclength()...