thr3ads.net - Xapian discuss - help improving relevance of snippets displayed by Omega [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Michael Decerbo

2020-Sep-19 00:33 UTC

help improving relevance of snippets displayed by Omega

Thanks Olly!

But expanding the sample seems like the wrong solution. Is there a way to
instead pass a hit or hits from the document to snippet generation?

Michael

Olly Betts

2020-Sep-19 05:31 UTC

head link

help improving relevance of snippets displayed by Omega

On Fri, Sep 18, 2020 at 08:33:49PM -0400, Michael Decerbo
wrote:> But expanding the sample seems like the wrong solution. Is there a way to
> instead pass a hit or hits from the document to snippet generation?
I'm not sure what you have in mind, but the only way I can see that
working is if it read all the positional data for all the terms in
the document, and then sorted it to essentially reconstruct the
document text.  However (a) that gives you the text without
capitalisation and without punctuation which doesn't look very good
and (b) it tends to be rather slow because the positional data is
primarily ordered by document for efficient searching, so there's
poor locality of reference for this use (and large documents would
make that worse).

The "xapian-pos" debug tool effectively does this text reconstruction
to help visualise the positional data, so you can see what the
reconstructed text would look like using that - e.g.:

Gap of 1 unused positions
1       Sbath
2       Ssomerset
3       bath
4       somerset
5       coordinates
6       51
7       23
8       n
9       2
10      22
11      w
12      51.38
13      n
14      2.36
15      w
16      51.38
17      2.36
18      bath
19      ?b???
20      or
21      ?b??
22      latin
23      aquae
24      sulis
25      welsh
26      caerfaddon
27      is
28      a
29      city
...

I've tried this approach on a project, but it didn't work out.  Storing
a larger sample is definitely what I'd recommend (or if you have the
text stored in another system, you could pass that to the
MSet::snippet() method, but there isn't a way to do that with omega
unless you modify the code).

Cheers,
    Olly

Michael Decerbo

2020-Sep-20 02:56 UTC

head link

help improving relevance of snippets displayed by Omega

Olly,

Thanks again very much for helping me improve my understanding of Xapian
and Omega. Thanks especially for pointing out that my idea of trying to
generate a snippet from stemmed text lacking capitalization and punctuation
would probably not produce a user-friendly result.

But I'm still doubtful that expanding the sample size could be the right
way to obtain excerpts from the document that are relevant to the query.
Suppose that the sample size were even as big as 10% of the average
document size, queries contained only a single term, and a typical query
term appeared on average only once per document. In that case, it seems to
me that nine out of ten samples would not contain the single query term, so
that nine times out of ten the snippet generated from the sample would not
contain the query term. Is my thinking accurate about this, or am I again
missing something?

In general, I'm wondering how best to use Xapian so that, at query time, my
application can display an excerpt that is relevant to the query, not a
sample chosen at indexing time without regard to the query that may or may
not contain the query term(s). For example, TheyWorkForYou.com is listed on
xapian.org as a site using Xapian, and when I enter a single-term query on
that site the document excerpts provided as part of the search results
invariably include highlighted words, possibly stemmed, responsive to the
query. That's the effect I would like to achieve.

If you can think of any sample code that I should refer to, or even if you
could just suggest the broad outlines of a solution, I would be very
grateful.

Thanks again!


Michael

>
>
>

Maybe Matching Threads

Search for more possibly parallel threads

Xapian discuss - Sep 2020 - help improving relevance of snippets displayed by Omega

help improving relevance of snippets displayed by Omega

help improving relevance of snippets displayed by Omega

help improving relevance of snippets displayed by Omega

Maybe Matching Threads