On Fri, Aug 12, 2005 at 01:28:58PM +0200, Sebastjan Trepca
wrote:> I will be using xapian to index mailboxes and the first problem is
> that I will have to index headers somehow. As I read from previous
> messages the best way is to create some unique terms like
> "from::hehe@hehe.net" and then index that. But what if I have a
query
> that wants all messages that has word "hehe" in from header?
> Searching by "from::mirko" doesn't get any results, using
wildcards
> doesnt help either.
If you generate suitable "from::"-prefixed terms this will work.
So "From: olly@survex.com" might produce from::olly from::survex
from::com
and from::olly@survex.com.
Incidentally, the convention is to use capital letters as prefixes (as
Omega does) but nothing in the core library forces you to do this - it
makes interworking with Omega much easier though.  The QueryParser class
has a small amount of special handling for capitalised prefixes, but
should work with any prefix I think.
> I will be syncing mailbox with xapian index so I will try to use its
> batching mechanism using flush() etc. I'm just wondering if anyone has
> any experience and tips about handling this problem using xapian. I
> will probably just call flush() on some delay.
I'd suggest a fairly simple approach - decide on an acceptable delay
before a message becomes searchable, and then flush() if you're idle and
haven't flushed for that length of time since adding the first message
of a batch.
Xapian will auto-flush periodically anyway unless you stop it from doing
so, but you can probably ignore that in tracking when you think you need
to flush.
Cheers,
    Olly