Graham Kann
2010-Jan-16 10:04 UTC
[Xapian-discuss] PHP XapianTermIterator/XapianPositionIterator usage
Hello again, /thanks to Peter for previous response. I've been digging around trying to find sample usage of XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a test case in PHP to perform snippet extraction (with a possible view to coding a pecl extension in C). I found a C++ sample, but that wasn't much help. I must be dense this morning though, since I can't get my head around how to use XapianTermIterator to step through the terms, and to iterate through all the positions of each term with XapianPositionIterator. Can someone provide a basic PHP example of how to: //pseudo code $position_iterator = new XapianPositionIterator(); $term_iterator = new XapianTermIterator(); foreach $term ($position_iterator) foreach $position ($term_iterator($term)) ... Any assistance would be appreciated. Thanks
Menard, Daniel
2010-Jan-18 10:53 UTC
[Xapian-discuss] RE : PHP XapianTermIterator/XapianPositionIterator usage
> I've been digging around trying to find sample usage of > XapianTermIterator/XapianPositionIterator in PHP. > [...] > Can someone provide a basic PHP example of how to: > > //pseudo code > $position_iterator = new XapianPositionIterator(); > $term_iterator = new XapianTermIterator(); > > foreach $term ($position_iterator) > foreach $position ($term_iterator($term)) > ...Hello, Currently, Xapian does not support native PHP iterators, but using those supplied by Xapian is easy. Documentation is here: http://xapian.org/docs/bindings/php/ But, I think that you're missing a level: you can iterate all terms from the database (termlist), you can get all documents ID for a particuliar term (postings) and then get all positions of this term in this document, but AFAIK you can't get directly all positions for a term. Perhaps the following PHP code which work for me can be useful: <?php require_once '/path/to/your/xapian.php'; // adjust the path $database=new XapianDatabase('/path/to/your/database'); // adjust the path dumpTerms($database, '10:test', 100); function dumpTerms(XapianDatabase $database, $start=false, $max=10) { echo "<pre>\n"; // just in cas this script is ran from the web // Get "all terms" iterators from the database $terms = $database->allterms_begin(); // XapianTermIterator $endTerms = $database->allterms_end(); // XapianTermIterator // Skip iterator to $start or the first term after if (false !== $start) $terms->skip_to($start); // First loop: dump terms $nb = 0; while (! $terms->equals($endTerms)) { // No more than $max terms if ($nb > $max) break; // Get some info about the current term $term = $terms->get_term(); printf ( "term=%s, freq=%d\n", $term, $terms->get_termfreq() // # of docs containing this term ); // Second loop: dump IDs of documents containing this term $docs = $database->postlist_begin($term); // PostingIterator $endDocs = $database->postlist_end($term); // PostingIterator while (! $docs->equals($endDocs)) { printf ( '- doc ID=%d, doc length=%d, wdf=%d, positions=', $docs->get_docid(), $docs->get_doclength(), // total number of terms in this doc $docs->get_wdf() // # of occurences of this term in this doc ); // Third loop : dump positions for this particuliar (term+document) $positions = $docs->positionlist_begin(); // PositionIterator $endPositions = $docs->positionlist_end(); // PositionIterator while (! $positions->equals($endPositions)) { printf ( '%d, ', $positions->get_termpos() ); // Next pos $positions->next(); } echo "\n"; // Next doc $docs->next(); } // Next term ++$nb; $terms->next(); } } ?> Cheers, Daniel