ostmann at websuche.de
2011-Sep-23 15:36 UTC
[Xapian-discuss] understanding stemming and synonyms
I am working with version 1.2.7 and want to use stemming and synonyms. I use the perl-bindings and get some problems. First of all: the perl-bindings dont allow the QueryParser a third argument when calling parse_query! So i cannot set a default prefix (which perhaps is the solution to my problem, but later more) i have a simple testcase: 3 documents, every document only has one word: bike fahrrad (german bike singular) fahrraeder (german bike plural, umlaut replaced) i have build the database with one synonym: Zfahrrad = Zbik When i insert the documents, i printed the termlist: INSERT DOKUMENT: bike DOCUMENT: Document(Xapian::Document::Internal(data=`bike', terms[2])) TERM: Zbik TERM: bike INSERT DOKUMENT: fahrrad DOCUMENT: Document(Xapian::Document::Internal(data=`fahrrad', terms[2])) TERM: Zfahrrad TERM: fahrrad INSERT DOKUMENT: fahrraeder DOCUMENT: Document(Xapian::Document::Internal(data=`fahrraeder', terms[2])) TERM: Zfahrrad TERM: fahrraeder That looks fine, but when i now use the query_parser with stemmer (german2 & STEM_ALL) and parse_query (FLAG_AUTO_SYNONYMS), i get this queries: ENTER QUERY: bike [QUERY: Xapian::Query(bik:(pos=1))] [RESULTS: 0] ENTER QUERY: fahrrad [QUERY: Xapian::Query((fahrrad:(pos=1) SYNONYM Zbik:(pos=1)))] [RESULTS: 2] ENTER QUERY: fahrraeder [QUERY: Xapian::Query((fahrrad:(pos=1) SYNONYM Zbik:(pos=1)))] [RESULTS: 2] I think there is a Z missing befor the first item, he searching for the stemmed word of bike (it is bik/Zbik), but he dont prefix that question. No search every find bike and fahrraeder ... After fighting this, i want to implement spelling too, but my first tests with auto spelling correction (feeding spelling while indexing) was really bad, perhaps its good to only add a complete dictionary into the database and dont use the index self?
On 23 Sep 2011, at 16:36, ostmann at websuche.de wrote:> That looks fine, but when i now use the query_parser with stemmer (german2 & STEM_ALL) and parse_query (FLAG_AUTO_SYNONYMS), i get this queries.Try STEM_SOME. I've poked around a little, and I think we're lacking a clear introduction to the QueryParser, since IIRC this question comes up semi-frequently. I've added a note to MissingDocument; if I'm in error and there is something, feel free to delete it. J -- James Aylett talktorex.co.uk - xapian.org - devfort.com
[Back on list] On 26 Sep 2011, at 09:26, Websuche :: Felix Antonius Wilhelm Ostmann wrote:>>> That looks fine, but when i now use the query_parser with stemmer (german2 & STEM_ALL) and parse_query (FLAG_AUTO_SYNONYMS), i get this queries. >> >> Try STEM_SOME. >> >> I've poked around a little, and I think we're lacking a clear introduction to the QueryParser, since IIRC this question comes up semi-frequently. I've added a note to MissingDocument; if I'm in error and there is something, feel free to delete it. > > http://xapian.org/docs/sourcedoc/html/classXapian_1_1QueryParser.html#389713b3969cac6cd98da5fb970f2f8e > > And it is well documented ... my bad! I think i was at missleaded by a > bad howto-website for xapian :-/It's documented, but I think my concerns stand. (You have to think to realise it's generally the right choice, and I think from the point of view of getting started thinking is a bad requirement :-) There are unfortunately a bunch of howtos for Xapian floating round the internet that are now out of date :-( J -- James Aylett talktorex.co.uk - xapian.org - devfort.com