On Thu, Aug 28, 2025 at 08:09:07AM -0300, David Bremner
wrote:> void
> _notmuch_message_remove_terms (notmuch_message_t *message, const char
*prefix)
> {
> Xapian::TermIterator i;
> size_t prefix_len = 0;
>
> prefix_len = strlen (prefix);
>
> while (1) {
> i = message->doc.termlist_begin ();
> i.skip_to (prefix);
>
> /* Terminate loop when no terms remain with desired prefix. */
> if (i == message->doc.termlist_end () ||
> strncmp ((*i).c_str (), prefix, prefix_len))
> break;
>
> try {
> message->doc.remove_term ((*i));
> message->modified = true;
> } catch (const Xapian::InvalidArgumentError) {
> /* Ignore failure to remove non-existent term. */
> }
> }
You're creating a new iterator and skipping it for every term removed
whereas you should be able to just increment the iterator to get the
next term you need to consider.
The only wrinkle (which I'm guessing may be why the code is as above) is
that you would need to increment *before* removing the term so as not
to invalidate the iterator, so something like:
size_t prefix_len = strlen (prefix);
Xapian::TermIterator i = message->doc.termlist_begin ();
i.skip_to (prefix);
while (i != message->doc.termlist_end ()) {
const string& term = *i;
/* Terminate loop when no terms remain with desired prefix. */
if (strncmp (term.c_str (), prefix, prefix_len))
break;
++i;
try {
message->doc.remove_term (term);
message->modified = true;
} catch (const Xapian::InvalidArgumentError) {
/* Ignore failure to remove non-existent term. */
}
}
(It shouldn't be possible for the term not to exist here BTW - if
that was added due to actually encountering that situation that's
definitely a bug that would be useful to report...)
That should help but there's unfortunately still a significant
inefficiency here though, as remove_term() takes a string which it then
has to look up in the document terms to refind the position which we
actually already have in the iterator, and that's O(log(n)) (whereas
using the position from the iterator would be O(0) !)
A variant of remove_term() which took an iterator could avoid this
(and automatically increment the iterator for you).
We could also/instead provide a method to directly remove all terms
with a specified prefix (or maybe between two specified terms) from the
document, which is probably better for this specific case as I think it
can be done even more efficiently.
I'm trying to focus on getting a new release series started, so please
can you open a ticket for this so it doesn't get forgotten?
Cheers,
Olly