thr3ads.net - Xapian discuss - [Xapian-discuss] TREC parser and comparison [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Emmanuel Eckard

2007-Feb-01 19:35 UTC

[Xapian-discuss] TREC parser and comparison

Hello,

   Some time ago I asked whether there was an indexer for TREC-format
bases, and outputs for TrecEval (yes I am doing my thesis). Today I
decided to spend a few hours toying with Xapian, and I came up with
something very crude.

   The programme was tested on the SMART collections (the ones you find
at ftp://ftp.cs.cornell.edu/pub/smart/ , converted to the TREC format),
with the default BM25 weight. The results were reasonably on par with
other tools like Lemur (the competition from
http://www.lemurproject.org/) and ad hoc tools, except for MED which
gave noise (there might be an indexing bug with this one), and TIME
which is exceptionally good. This might indicate that Xapian behaves
better with "easy" texts -- the other collections are more or less
difficult technical texts, TIME is a collection of news. Of course this
depends only on the weighting scheme.

   If this can be of some interest, the code is at the disposal of
whoever is brave enough to read it.

   Cheers !
     -- Emmanuel

Jason White

2007-Feb-03 10:57 UTC

head link

[Xapian-discuss] Re: TREC parser and comparison

On Thu, Feb 01, 2007 at 08:35:14PM +0100, Emmanuel Eckard wrote:
 >    The programme was tested on the SMART collections (the ones you find
> at ftp://ftp.cs.cornell.edu/pub/smart/ , converted to the TREC format),
> with the default BM25 weight. The results were reasonably on par with
> other tools like Lemur (the competition from
> http://www.lemurproject.org/) and ad hoc tools,
Interesting. I don't have any background in Information Retrieval, but did
you
also compare Terrier ( http://ir.dcs.gla.ac.uk/terrier/)? If so, what were the
results?

When I informally looked at free/open information retrieval systems (out of
curiosity, mostly, as this is an area in which I have a longstanding interest
from a user's perspective) I thought that Xapian, Lemur/Indri and Terrier
were the most interesting projects due to their having roots in information
retrieval research. Indri (part of Lemur, http://www.lemurproject.org/), like
Xapian, supports incremental indexing for rapid updates. On the other hand,
after reading the details of its query language, it wasn't clear to me how
to
use the various operators to specify an effective search; knowledge of the
underlying theory would appear to be necessary, or at lest helpful. In
contrast, Xapian's parser provides familiar boolean and proximity queries.
This isn't to say that Indri's query language is awkward, just that it
demands a
different approach to query construction and could benefit from more
tutorial-style documentation.

Unlike Xapian, Indri stores the full text of the document independently of the
index, and also supports the creation of document/passage summaries.

Valli Varanasi

2010-Dec-29 20:43 UTC

head link

[Xapian-discuss] TREC parser and comparison

Emmanuel Eckard <emmanuel.eckard <at> epfl.ch> writes:
>    If this can be of some interest, the code is at the disposal of
> whoever is brave enough to read it.
> 
>    Cheers !
>      -- Emmanuel
> 

Do you still have this code? I was actually looking for code that takes in trec 
and spits out trec compatible output. If you can point me to this, that will 
really speed up my work.

Thanks,
Valli

Xapian discuss - Feb 2007 - TREC parser and comparison

[Xapian-discuss] TREC parser and comparison

[Xapian-discuss] Re: TREC parser and comparison

[Xapian-discuss] TREC parser and comparison