thr3ads.net - Xapian discuss - [Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Kevin SoftDev

2006-Mar-08 17:44 UTC

[Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result

Hi,

Thank you all for emailing some answers to my question. I put together
simple Perl script so we do not keep asking the same thing over. As you can
see I had to parse the document data and try to find where is title, body
and url. If someone knows something that was not yet documented and
retrieves the specific document attribute (title,body,url) let me know.

 #--------------------------------------------------------- begin of the
script -----------------------------------------------#

  my $db     = Search::Xapian::Database->new( '/europa' );
  my $qp     = Search::Xapian::QueryParser->new();
  my $enq   = $db->enquire($qp->parse_query($terms));
  my $total  = $db->get_termfreq($terms);


  printf "Searching for: '%s' ", $terms;
  print "Total matches found" . $total;


  #--- display only range of documents for pagination ----#
  my @matches = $enq->matches($start, $end);

  my($doc,$html,$body,$title,$url);


  foreach my $match ( @matches )
  {
    $doc  = $match->get_document();
    $html = $doc->get_data();

    $html    =~ m/body=(.*)/;   $body  = $1;
    $html    =~ m/title=(.*)/;     $title = $1;
    $html    =~ m/url=(.*)/;      $url   = $1;

    printf "<table border=0 width=95%><tr><td><font
size=2
face=Verdana>Relevance: %s%&nbsp;&nbsp;",
$match->get_percent();
    print "<a href=\"$url\"
target=_blank><b>$title</b><BR><i>$url</i></a><BR>$body";
    print
"</font></td></tr></table><P>";
  }
#--------------------------------------------------------- end of the script
-----------------------------------------------#
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060308/072db734/attachment.htm

Olly Betts

2006-Mar-09 00:38 UTC

head link

[Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result

On Wed, Mar 08, 2006 at 09:43:53AM -0800, Kevin SoftDev
wrote:>   my $total  = $db->get_termfreq($terms);
This looks up the frequency of a single term, so it'll be fine for a one
term query, but will return zero for anything more complicated (unless
you happen to have terms with spaces, etc in).

As I explained just now, you want MSet::get_matches_estimated().
>     $html = $doc->get_data();
> 
>     $html    =~ m/body=(.*)/;   $body  = $1;
That's kind of risky - you only want to match body at the start of a
line, but this doesn't specify that, so it'll match wrongly if
there's
an earlier line containing "body=" anywhere in it.  I suggest: 

      my ($body) = $html =~ m/^body=(.*)/m;
>     print "<a href=\"$url\"
>
target=_blank><b>$title</b><BR><i>$url</i></a><BR>$body";
You really want to be escaping values put into HTML output, unless
you've carefully sanitised them at indexing time.  Otherwise you're
opening yourself to cross-site scripting type exploits.

Cheers,
    Olly

Xapian discuss - Mar 2006 - Perl example: parse terms, search , get total, get result, parse result

[Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result

[Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result