Displaying 20 results from an estimated 1000 matches similar to: "Problem with stop words by indexing"
2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit:
> On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote:
> > I try to remove stop words during the index process
> and I have no stemming.
> I have tried with a simple example but it does not
> work at all.
>
> > I have my writableDatabase and my termGenerator
> (indexer) and they
2008 Mar 12
1
how can i use stopwords?
Hi,
I do not understand the stopword function...
I've set the termgenerator like this:
$self->{'Stemmer'} = new Search::Xapian::Stem(german2);
$self->{'Stopper'} = new Search::Xapian::SimpleStopper();
$self->{'TermGenerator'} = new Search::Xapian::TermGenerator;
$self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi,
I'm using SimpleStopper with TermGenerator in a Python indexing
script, in an attempt to keep my index size down (currently 30K per
doc, and I have 200 million docs to index, which I think implies
6TB.) However, unprefixed (positional?) terms are not affected by
the stopper, though Z-prefixed terms are.
I assume this is intentional for phrase queries, but I need to reduce
my
2012 Jun 04
1
Search not finding queries with stop words.
I have a search in perl that looks a bit like:
my $qp = new Search::Xapian::QueryParser();
$qp->set_stemmer(new Search::Xapian::Stem("english"));
$qp->set_stemming_strategy(STEM_SOME);
$qp->set_default_op($defaultop);
...
my $par = $qp->parse_query($query);
my $enq = $xDatabase->enquire( $par );
and in the db create script:
my $stopper =
2010 Sep 01
8
FIXMEs in Search::Xapian
Carrying on this conversation:
http://lists.tartarus.org/pipermail/xapian-discuss/2007-March/003513.html
void
TermGenerator::set_stopper(stopper)
Stopper * stopper
CODE:
// FIXME: no corresponding SvREFCNT_dec(), but a leak seems better
than
// a SEGV!
SvREFCNT_inc(ST(1));
THIS->set_stopper(stopper);
It would be good to fix these FIXMEs.
A class-level HASH could be
2007 Dec 29
3
Term-Flags
Hi,
Is it necessary to set the down below flag to the TermGenerator,
if I want the "Did you mean ..." spelling corrections?
Xapian::TermGenerator::flags::FLAG_SPELLING
Thank you very much
Markus
2008 Mar 27
2
Proper noun stemming
Hi All
I was wondering if anyone had a solution for the following problem.
I user QueryParser to stem my documents before adding them to a
database. During the stemming process I would like to find a way of
keeping proper nouns that span two or more words together as a phrase.
For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
up. I see
2007 Nov 14
1
Problem indexing text with spelling enabled in Perl
Hi All,
I'm using the TermGenerator::index_text() on version 1.0.4 with the
FLAG_SPELLING turned on, because the new spelling suggestion stuff
seems awesome, but I'm getting a segv.
(gdb) bt
#0 0xb7ae153c in Xapian::WritableDatabase::add_spelling
(this=0xa553988, word=@0xbff97724, freqinc=1) at ./include/xapian/
base.h:154
#1 0xb7becf47 in
2007 Dec 17
1
Crashes with spelling enabled and perl.
Hi Guys,
Here's a simple test case that causes a segfault with the perl
bindings patched to enable spelling correction:
use strict;
use warnings;
use Search::Xapian;
my $db = Search::Xapian::WritableDatabase->new("test.db",
Search::Xapian::DB_CREATE_OR_OPEN);
if (!defined($db)) {
die("Failed to open xapian_database: $!");
}
my $indexer =
2010 Oct 24
1
Cannot index with dynamic spelling data (Perl/Search::Xapian)
This is my test case, what am I doing wrong? It seems that the API is used
incorrectly, but I cannot find the problem...
--- 8< ---
#!/usr/bin/perl
use Search::Xapian qw(:all);
use strict;
my $xa = new Search::Xapian::WritableDatabase ("/tmp/xapian",
DB_CREATE_OR_OVERWRITE);
my $indexer = Search::Xapian::TermGenerator->new();
2014 Jan 27
4
Perl Search::Xapian
Hi,
Trying to learn Search::Xapian and be better at perl at the same time,
I'm stuck, at the DB_CREATE_OR_OPEN error. Perl says this:
~/dev/sandbox/Xapian-perl$ ./Index1-Xap.pl 100-objects-v1.csv db
"db" is not exported by the Search::Xapian module
Can't continue after import errors at ./Index1-Xap.pl line 7.
BEGIN failed--compilation aborted at ./Index1-Xap.pl line 7.
What I
2015 Jul 26
1
Get term from document by position
mple (see attachment).
>
> Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>.
>
> Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to
2015 Jun 10
1
make check xapian-bindings-1.2.21 & Search-Xapian-1.2.21.0
Eric Lindblad
http://www.ericlindblad.blogspot.com
- - -
Slackware-14.0
bash-4.2# make check
Making check in perl
make[1]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
make check-am
make[2]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
make check-TESTS
make[3]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
./t/01use.t .. ok
All tests
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2014 Feb 27
2
Summer of Code help
I think there is a development in the bug #616.
The exception obtained is:
Exception in thread "main" java.lang.IllegalArgumentException: No enum
class org.xapian.TermGenerator$flags with value 0
at org.xapian.TermGenerator$flags.swigToEnum(TermGenerator.java:143)
at org.xapian.TermGenerator.setFlags(TermGenerator.java:71)
at org.xapian.examples.SimpleIndex.main(SimpleIndex.java:54)
2012 Nov 26
1
Word missing after stemmed with Norwegian in Search::Xapian::TermGenerator
Hi all Xapian-devel,
Gist: https://gist.github.com/10d2222d8bffe8d7631d
I'm using Xapian-TermGenerator to extract Norwegian sentences to vsm
(vector space model) using TermGenerator. But when I test generating vsm
from 'Truet med ? stevne misforn?yd PC-kunde - PC-leverand?ren Asus likte
sv?rt d?rlig kundens misforn?yde leserbrev.' It doen't return 'asus' result
in vsm.
2018 Nov 30
1
Xapian Benchmark results
Hi,
I am currently trying to benchmark a multithreaded xapian implementation on
a chameleon baremetal instance written in C++. My workload is a 3 Gig
wikipedia xml dump consisting of ~286 file of different sizes. My results
are showing me that indexing on xapian is an order of magnitude faster than
my lucene and lucene plusplus implementations. This is a result that I did
not expect. Just want to
2011 Sep 14
1
Integrated Chinese tokenizer SCWS in xapian-core
Xapian is a very excellent open source search engine library, but there is no native support for Chinese word segmentation in queryparser and termgenerator.
Therefore, I modified small amount of source codes, integrated into the SCWS tokenizer, that is the same open-source and developped by myself.
Anyone can obtain the patch from below URL. After patching, Xapian::QueryParser::parse_query and
2007 May 04
1
Last minute feature for 1.0.0
I'd like to draw people's attention to bug report #143 that I've just
submitted. This is a proposal (and patch) to add the ability to store
arbitrary metadata associated with a database (rather than with an
individual document in the database). The rationale for this feature is
explained more fully in the bug report, but briefly I've come across
several situations where I