On Thu, Aug 14, 2008 at 03:47:56AM +0930, Frank Bruzzaniti
wrote:> Partial searches. In omega if I enter "red" I would get a match
with the
> word "red" but not "redhead", how would I search for
the pattern within a
> string. E.g. so when I enter "red" it returns "red"
& "redhead"
You can't do that as such. Stemming will mean that different forms of
the same word should match each other (e.g. "red" would match
"reds")
but red and redhead aren't the same word (though clearly there's a
relationship between them).
Trailing wildcards (e.g. red*) are supported by Xapian, but you
currently need to tweak Omega's source code to add the appropriate flag
to the call to QueryParser::parse_query() if you want Omega to enable
this feature.
> Also I made a simple script that runs omindex then xapian-compact, is there
> any issue with this? I thought I might as well compress the database at
the
> end of each omindex run.
Note that "compress" isn't really the right term - xapian-compact
just
shuffles data around to eliminate as much dead space as it can. The
downside of this is that having a bit of dead space is a B-tree achieves
its amortised cost of updates, so in simple terms, updates to a
compacted database are slower until the dead space reemerges. In
practice, this probably isn't an issue.
> Also I scheduled the script in crontab, but I'm thinking it might be a
bad
> idea if the script's run time is longer the the cron interval.
> E.g. If the scrpt takes 1 hour to run but I've set crontab job to run
every
> 15 mins. Would I be better to create a script that runs on boot and keeps
> running with maybe a wait/sleep at the end?
omindex will just fail to get a lock if another omindex is already
running. You'd probably want some lock around the combined
omindex+xapian-compact though.
> Also I noted that when the script runs it say's that it updates files
even
> tho i haven't altered them, is this normal?. What's also
suspecious is
> that everytime I run omindex it takes almost the exact amout of time to run
> even tho no files have changed. Is there an easy way to only
"scan" what's
> changed?
Currently the file modification times are stored but not used to decide
to avoid indexing unmodified stuff. There's a patch around for that,
but it's not been merged yet. It's more of an issue if you're
running
expensive external filters on files.
> I guess at the end of the day I'm trying to keep an index that's
able to
> update files as they are added or best effort.
The more efficient approach than regular polling (on systems which
support it at least) is to use FAM or similar to notify you when
files/directories of interest change:
http://en.wikipedia.org/wiki/File_alteration_monitor
Omega doesn't support that currently, though it would be nice to have as
an option.
> Also is the database searchable wile indexing is occuring, my tests say
yes,
> just wanna double check
Yes.
Cheers,
Olly