On Mon, Nov 26, 2007 at 09:43:08PM +0100, Thomas Viehmann
wrote:> Does a WritableDatabase with several DBs added have defined/stable
> semantics as to where documents are stored upon replace_document?
WritableDatabase only supports a single sub-database at present.
Nothing actually prevents you calling add_database() to add more, but
the results are entirely undefined.
It would be nice to make this work in a sensible way though. Then
you could split indexing load with a WritableDatabase which round-robins
updates across several remote servers.
> I would want to be able to re-index parts of the archive and to replace
> messages.
> Or is it preferable to have some sort of external partition of what is
> in which DB?
Currently that is what you need to do. If you commonly want to search
over subsets of the data, this approach probably is better anyway.
> Also, is there a way (short of patching but configuration rather than
> passing parameters) to make omega search through multiple databases?
It's only currently supported by passing multiple DB parameters, or a
single DB parameter with a list of database names separated by "/".
> Finally, is there a simple good way of searching a database with
> documents stemmed in different languages? The two naive ideas I could
> come up with is split the index into databases by language or search
> with something like
> OR_{lang in languages} (queriy_stemmed_for_lang AND LANG=lang)...
This sort of multi-language search is a problem I've seen come up a
number of times over the years I've been involved in search, and I've
yet to see a totally satisfactory solution.
You can determine the language of a document pretty reliably (e.g. look
at the textcat library), but a query string is often too short to make
a reliable determination. Some queries are ambiguous as they make sense
in multiple languages.
If you can, I think it's best to sidestep these problems and set up your
UI so that the user actually specifies (explicitly or implicitly) what
language their query is in. Then search a database of documents in just
that language (since you can identify these reliably enough).
Cheers,
Olly