On Tue, Nov 09, 2021 at 03:11:05AM +0000, Eric Wong
wrote:> Hey all, I'm wondering if there's a way to search for documents
> based on whether a prefix was used or not, regardless of the
> text indexed with that prefix.
>
> I'm already indexing email attachment filenames with the
"XFN"
> prefix. However, I may want to construct a query that returns
> emails with any attachment filename in them at all.
There is a way, but it's probably not a good idea for a large system:
Xapian::Query(Xapian::Query::OP_WILDCARD, "XFN")
The reason you probably don't want to do that is that it is essentially
the same as a big OR of all the terms with the prefix "XFN", so here
that's one for each unique attachment filename (it's a bit more
efficient than that big OR for a few reasons, but that gives you an idea
of what's involved).
> Would I have to add a new boolean term to search against to
> accomplish this?
That's the way to make it fast.
One trick here is that if most emails have attachments, you could make
it a flag for those that don't and filter with OP_AND_NOT to get emails
with attachments, or OP_FILTER to get those without.
> Using XS Search::Xapian on Debian buster and bullseye.
I don't think Search::Xapian wraps OP_WILDCARD (or more importantly the
Xapian::Query constructor for use with it - the OP_WILDCARD constant
would be fairly easy to define yourself).
Cheers,
Olly