Wojewsky, Sascha, Heinze
2007-Sep-12 11:00 UTC
[Xapian-discuss] php and get_termpos (stemming problem?)
Hi, I've not found a solution in the mailinglist... I've a problem with the "get_termpos" Function in php. I've used this code: ... $i = $mset->begin(); while (!$i->equals($mset->end())) { $terms = $enquire->get_matching_terms_begin($i->get_docid()); $pos = $database->positionlist_begin($i->get_docid(), $terms->get_term()); if (!$pos->equals($database->positionlist_end($i->get_docid(), $terms->get_term()))) { $pos = $pos->get_termpos(); } } ... But the if condition was always false. The term (searchstring) from "$terms->get_term()" starts with the stemming-"Z". If I've called $database->positionlist_end($i->get_docid(), 'seachstring') without the leading "Z", I've got a result. I'm using xapian 1.02. Do you have any solution? Thanks. Sascha Wojewsky Geschaftsfuhrer: Dirk Schoning | Sven Hohmann Handelsregister: Amtsgericht Luneburg | HRB 100051 Bauprodukte und Informationen online im HeinzeBauOffice Diese E-Mail enthalt vertrauliche und/oder rechtlich geschutzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtumlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
On Wed, Sep 12, 2007 at 12:01:23PM +0200, Wojewsky, Sascha, Heinze wrote:> if (!$pos->equals($database->positionlist_end($i->get_docid(), > $terms->get_term()))) {[...]> But the if condition was always false. > > The term (searchstring) from "$terms->get_term()" starts with the > stemming-"Z".See: http://www.xapian.org/docs/termgenerator.html#stemming In particular: Now we index all terms lowercased with positional information, and also stemmed with a 'Z' prefix (unless they start with a digit), but without positional information. So your "if" condition is always false because there's no positional information stored for 'Z'-prefixed terms. This is done because it saves a lot of disk space, but we can still provide phrase searching etc (by using the unstemmed forms).> If I've called $database->positionlist_end($i->get_docid(), > 'seachstring') without the leading "Z", I've got a result.You will, provided that the stemmed form is also an unstemmed word in the document.> Any unauthorized copying, disclosure or distribution of the material > in this e-mail is strictly forbidden.Please don't post to mailing lists with such disclaimers. Email sent to this (and most other) mailing lists will be copied, disclosed, and distributed very widely - that's the very purpose of a mailing list. If you don't want that, don't send mail to it. Cheers, Olly
Wojewsky, Sascha, Heinze
2007-Sep-12 13:17 UTC
[Xapian-discuss] php and get_termpos (stemming problem?)
> So your "if" condition is always false because there's no positional > information stored for 'Z'-prefixed terms. This is done because it > saves a lot of disk space, but we can still provide phrase searching > etc (by using the unstemmed forms).> > If I've called $database->positionlist_end($i->get_docid(), > > 'seachstring') without the leading "Z", I've got a result.> You will, provided that the stemmed form is also an unstemmed word in > the document.Do you have any solution to get the Position of the searched term in the document, or to get a snippet of the document around the term?> > Any unauthorized copying, disclosure or distribution of the material > > in this e-mail is strictly forbidden.> Please don't post to mailing lists with such disclaimers. Email sent > to this (and most other) mailing lists will be copied, disclosed, and > distributed very widely - that's the very purpose of a mailing list. > If you don't want that, don't send mail to it.I cannot send emails without this footer, because our email-service generate it... Tanks Sascha Geschaftsfuhrer: Dirk Schoning | Sven Hohmann Handelsregister: Amtsgericht Luneburg | HRB 100051 Bauprodukte und Informationen online im HeinzeBauOffice Diese E-Mail enthalt vertrauliche und/oder rechtlich geschutzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtumlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.