Wojewsky, Sascha, Heinze
2007-Sep-12 11:00 UTC
[Xapian-discuss] php and get_termpos (stemming problem?)
Hi,
I've not found a solution in the mailinglist...
I've a problem with the "get_termpos" Function in php.
I've used this code:
...
$i = $mset->begin();
while (!$i->equals($mset->end())) {
$terms = $enquire->get_matching_terms_begin($i->get_docid());
$pos = $database->positionlist_begin($i->get_docid(),
$terms->get_term());
if (!$pos->equals($database->positionlist_end($i->get_docid(),
$terms->get_term()))) {
$pos = $pos->get_termpos();
}
}
...
But the if condition was always false.
The term (searchstring) from "$terms->get_term()" starts with the
stemming-"Z".
If I've called $database->positionlist_end($i->get_docid(),
'seachstring') without the leading "Z", I've got a result.
I'm using xapian 1.02.
Do you have any solution?
Thanks.
Sascha Wojewsky
Geschaftsfuhrer: Dirk Schoning | Sven Hohmann
Handelsregister: Amtsgericht Luneburg | HRB 100051
Bauprodukte und Informationen
online im HeinzeBauOffice
Diese E-Mail enthalt vertrauliche und/oder rechtlich geschutzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtumlich erhalten
haben, informieren Sie bitte sofort den Absender und vernichten Sie diese
E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail
sind nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient or have received this e-mail in error please notify
the sender immediately and destroy this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.
On Wed, Sep 12, 2007 at 12:01:23PM +0200, Wojewsky, Sascha, Heinze wrote:> if (!$pos->equals($database->positionlist_end($i->get_docid(), > $terms->get_term()))) {[...]> But the if condition was always false. > > The term (searchstring) from "$terms->get_term()" starts with the > stemming-"Z".See: http://www.xapian.org/docs/termgenerator.html#stemming In particular: Now we index all terms lowercased with positional information, and also stemmed with a 'Z' prefix (unless they start with a digit), but without positional information. So your "if" condition is always false because there's no positional information stored for 'Z'-prefixed terms. This is done because it saves a lot of disk space, but we can still provide phrase searching etc (by using the unstemmed forms).> If I've called $database->positionlist_end($i->get_docid(), > 'seachstring') without the leading "Z", I've got a result.You will, provided that the stemmed form is also an unstemmed word in the document.> Any unauthorized copying, disclosure or distribution of the material > in this e-mail is strictly forbidden.Please don't post to mailing lists with such disclaimers. Email sent to this (and most other) mailing lists will be copied, disclosed, and distributed very widely - that's the very purpose of a mailing list. If you don't want that, don't send mail to it. Cheers, Olly
Wojewsky, Sascha, Heinze
2007-Sep-12 13:17 UTC
[Xapian-discuss] php and get_termpos (stemming problem?)
> So your "if" condition is always false because there's no positional > information stored for 'Z'-prefixed terms. This is done because it > saves a lot of disk space, but we can still provide phrase searching > etc (by using the unstemmed forms).> > If I've called $database->positionlist_end($i->get_docid(), > > 'seachstring') without the leading "Z", I've got a result.> You will, provided that the stemmed form is also an unstemmed word in > the document.Do you have any solution to get the Position of the searched term in the document, or to get a snippet of the document around the term?> > Any unauthorized copying, disclosure or distribution of the material > > in this e-mail is strictly forbidden.> Please don't post to mailing lists with such disclaimers. Email sent > to this (and most other) mailing lists will be copied, disclosed, and > distributed very widely - that's the very purpose of a mailing list. > If you don't want that, don't send mail to it.I cannot send emails without this footer, because our email-service generate it... Tanks Sascha Geschaftsfuhrer: Dirk Schoning | Sven Hohmann Handelsregister: Amtsgericht Luneburg | HRB 100051 Bauprodukte und Informationen online im HeinzeBauOffice Diese E-Mail enthalt vertrauliche und/oder rechtlich geschutzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtumlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.