Michael Decerbo
2020-Sep-18 20:51 UTC
help improving relevance of snippets displayed by Omega
Hi, Thanks for creating Xapian and Omega. I have been amazed by how easy they make it to get a basic full-text search engine up and running. I'm wondering if you can help me better understand one aspect of the results I am getting from the default query template. Usually the snippet that's displayed in response to a query doesn't contain the word that the user searched for, even when that word appears verbatim in the document. I should note that my documents are all fairly long. I guess that this is because the template shows a snippet generated from the returned document's "sample" field: <small>$snippet{$field{sample}}</small><br> and perhaps the text in that "sample" field is fairly short and generated when the document is indexed-- is that correct? If so, is there a way to generate a snippet that is more responsive to the query terms entered by the user than what the default query template provides? Even something as crude as passing stemmed substrings to my application so that it can do a simple linear search for them in each displayed document? Or should I be indexing separate sections of my long documents, so that the "sample" is more likely to be relevant? If so, how should I capture the parent-child relationship between the individual sections and the original document that they came from? Or am I totally misunderstanding things? It wouldn't be the first time... Thanks very much for any insight! Michael
On Fri, Sep 18, 2020 at 04:51:44PM -0400, Michael Decerbo wrote:> I guess that this is because the template shows a snippet generated from > the returned document's "sample" field: > > <small>$snippet{$field{sample}}</small><br> > > and perhaps the text in that "sample" field is fairly short and generated > when the document is indexed-- is that correct?Yes. You can specify how many bytes to store with omindex's --sample-size=SIZE option, with the default being 512. It'll actually stop at the previous word boundary before the limit.> If so, is there a way to generate a snippet that is more responsive to the > query terms entered by the user than what the default query template > provides? Even something as crude as passing stemmed substrings to my > application so that it can do a simple linear search for them in each > displayed document?You could use $terms to get the query terms matching the current hit and do that, though $snippet also supports phrase queries and wildcards which would be hard to replicate from just the individual terms. Cheers, Olly
Maybe Matching Threads
- help improving relevance of snippets displayed by Omega
- help improving relevance of snippets displayed by Omega
- help improving relevance of snippets displayed by Omega
- help improving relevance of snippets displayed by Omega
- help improving relevance of snippets displayed by Omega