sorabji at sorabji.com
2017-Apr-18 16:00 UTC
About search result excerpts with HTML tags showing
Hi, folks. New to Xapian. I just built a couple of indexes. Search results seem good but I can't figure out why the excerpts are showing HTML tags. These tags are not present in the original HTML documents. Is there a built-in way to either get rid of these tags or have them render as actual HTML tags? A couple of examples here, you'll see the STRONG tags wrapped around the search terms: http://etudemagazine.com/cgi-bin/omega?P=beethoven&DEFAULTOP=and&DB=etudemagazine&FMT=query&xP=ZBchaminad%09ZFchaminad%09ZSchaminad%09Zchaminad&xDB=etudemagazine&xFILTERS=.~~ Shortened URL: http://bit.ly/2oRKJ8u http://sorabji.com/cgi-bin/omega?P=sorabji&DEFAULTOP=and&DB=sorabji_bbs&FMT=query&xP=ZBnurseri%09ZFnurseri%09ZSnurseri%09Znurseri&xDB=sorabji_bbs&xFILTERS=.~~ Shortened URL: http://bit.ly/2ok5Qxj Thanks! -mt -- http://sorabji.com/ http://www.payphone-project.com/ tel: (212) 203-2970
On Tue, Apr 18, 2017 at 12:00:21PM -0400, sorabji at sorabji.com wrote:> Hi, folks. New to Xapian. I just built a couple of indexes. Search results > seem good but I can't figure out why the excerpts are showing HTML tags. > These tags are not present in the original HTML documents. Is there a > built-in way to either get rid of these tags or have them render as actual > HTML tags?There's a bug in the version of the query template: $highlight{$snippet{$field{sample}},$terms} $highlight{TEXT,TERMS} escapes for HTML and highlights TERMS. $snippet{TEXT} selects a dynamic snippet, escapes for HTML and highlights query terms in the text. So we really don't want to do both - replace this with either: $snippet{$field{sample}} or: $highlight{$field{sample},$terms} (The reason it's like that is the original snippet generation didn't do HTML escaping or highlighting, but that means we have to parse the text twice so was changed during the development series.) Using $snippet{$field{sample}} is probably the better choice (and what the default template ought to use I think) - if the stored sample is small then the snippet generation will short-cut, and if you're storing larger samples then you want to select a smaller snippet from them. Thanks for reporting this - I'll get a fix in before 1.4.4. Cheers, Olly