Hello, Is there advancements in snippeting? (Besides what mentioned in the wiki.) I think extracting snippets is clearly IR task. And I hope Xapian will provide at least helpers to do that. I have set of documents up to 5M of extracted text and 1M in average (they are even bigger pdfs but I pre-extracted text into some sort of text cache, pdftotext is very slow). To parse ~1M documents on the fly for 10 documents to show probably too cpu/disk intensive (10M disk io and parsing just to show single search results page seems not perl task). But I can bear sizes. More hard is to correctly locate snippets for user entered query. Maybe Xapian can provide such functionality? (Locate best matched snippets in the text for the query string.) I think absence of snippeting is currently major weakness to be solved. Best regards, Don.
On Thu, Dec 17, 2009 at 11:29:52AM +0300, Do. wrote:> Is there advancements in snippeting? (Besides what mentioned in the wiki.) I > think extracting snippets is clearly IR task. And I hope Xapian will provide > at least helpers to do that.I agree that it is a feature which would fit well in Xapian, but nobody has yet implemented it. I don't know of anybody currently working on it (and since nobody else has responded to your post, I guess nobody is). There's a ticket in trac as well as the FAQ entry. The FAQ entry had some rough edges (e.g. the sample thread it linked to wasn't about snippets at all) so I've overhauled it, and linked to the ticket as part of that: http://trac.xapian.org/wiki/FAQ/Snippets Cheers, Olly
Hi, The problem I am facing using Search::Tools is xapian treats query string in terms eg. if the query is "universities" xapian may have a corresponding term for this query as "universit" and while searching it uses this term rather than "universities". Now if one of the page has "university" in its text I want to highlight that as well. But since the actual query is "universities" Search::Tools doesnt highlight this word though xapian::search shows this page in the result. Now in my overall architecture I can supply this term list (i.e. universit) to the perl script but is there any way that Search::Tool will match half-word but highlight full-word? Thanks, Shripad Bodas
Shripad Bodas wrote on 1/14/10 3:40 PM:> Hi, > > The problem I am facing using Search::Tools is xapian treats query > string in terms eg. if the query is "universities" xapian may have a > corresponding term for this query as "universit" and while searching > it uses this term rather than "universities". Now if one of the page > has "university" in its text I want to highlight that as well. But > since the actual query is "universities" Search::Tools doesnt > highlight this word though xapian::search shows this page in the > result. > Now in my overall architecture I can supply this term list (i.e. > universit) to the perl script but is there any way that Search::Tool > will match half-word but highlight full-word? >Search::Tools::QueryParser->new() takes a stemmer argument, documented here: http://search.cpan.org/dist/Search-Tools/lib/Search/Tools/QueryParser.pm#stemmer -- Peter Karman . http://peknet.com/ . peter at peknet.com
Hello! I'm new to the list, so please bare with me. This is a follow-on to Shripad Bodas's question about snippets/highlighting. I too have tried the Perl Search::Tools mode, and it works well. Only problem is that it's slow - eg, when displaying results which exceeds (say) 50, the time taken to render the page (snipping and highlighting with Search::Tools) can actually exceed the time to perform the search itself... Seeing that our search frontend is coded in PHP, it makes sense to use PHP exclusively (calling a Perl routine works, but you pay a double penalty - the one mentioned above, plus the usual costs associated with calling Perl). Is anyone aware of PHP code I can use to create excerpts/snippets and keyword highlighting (with usage of stemming of course)? Any pointers would be appreciated. Thanks! Graham