On Thu, Feb 01, 2007 at 08:35:14PM +0100, Emmanuel Eckard wrote:
> The programme was tested on the SMART collections (the ones you find
> at ftp://ftp.cs.cornell.edu/pub/smart/ , converted to the TREC format),
> with the default BM25 weight. The results were reasonably on par with
> other tools like Lemur (the competition from
> http://www.lemurproject.org/) and ad hoc tools,
Interesting. I don't have any background in Information Retrieval, but did
you
also compare Terrier ( http://ir.dcs.gla.ac.uk/terrier/)? If so, what were the
results?
When I informally looked at free/open information retrieval systems (out of
curiosity, mostly, as this is an area in which I have a longstanding interest
from a user's perspective) I thought that Xapian, Lemur/Indri and Terrier
were the most interesting projects due to their having roots in information
retrieval research. Indri (part of Lemur, http://www.lemurproject.org/), like
Xapian, supports incremental indexing for rapid updates. On the other hand,
after reading the details of its query language, it wasn't clear to me how
to
use the various operators to specify an effective search; knowledge of the
underlying theory would appear to be necessary, or at lest helpful. In
contrast, Xapian's parser provides familiar boolean and proximity queries.
This isn't to say that Indri's query language is awkward, just that it
demands a
different approach to query construction and could benefit from more
tutorial-style documentation.
Unlike Xapian, Indri stores the full text of the document independently of the
index, and also supports the creation of document/passage summaries.