Displaying 12 results from an estimated 12 matches for "simplestopper".
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi,
I'm using SimpleStopper with TermGenerator in a Python indexing
script, in an attempt to keep my index size down (currently 30K per
doc, and I have 200 million docs to index, which I think implies
6TB.) However, unprefixed (positional?) terms are not affected by
the stopper, though Z-prefixed terms are.
I assume...
2008 Mar 12
1
how can i use stopwords?
Hi,
I do not understand the stopword function...
I've set the termgenerator like this:
$self->{'Stemmer'} = new Search::Xapian::Stem(german2);
$self->{'Stopper'} = new Search::Xapian::SimpleStopper();
$self->{'TermGenerator'} = new Search::Xapian::TermGenerator;
$self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
$self->{'TermGenerator'}->set_stopper( $self->{'Stopper'} );
I've thought that xapian now exclude the stopw...
2010 Apr 05
1
Problem with stop words by indexing
...process and I have no stemming.
I have tried with a simple example but it does not work at all.
I have my writableDatabase and my termGenerator (indexer) and they work
well both together: I can index texts and search trough the database
correctly.
But if I add (before indexing my texts):
Xapian::SimpleStopper stopper;
stopper.add("testword");
indexer.set_stopper(&stopper);
... the result is exactly the same as before. I have checked with delve
and "testword" is indexed.
Do I use the SimpleStopper in a right way?
Regards
Emmanuel
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2017 Jun 14
2
KMeans Clusterer - Going forward
...of terms). Getting the
useful terms within a document in its document vector can improve its
accuracy, due to less noise terms. Two important things to be done in this
direction are :
1) Stemming
This is easier because xapian already provides stemmed terms.
2) Stopword removal
Use either Xapian::SimpleStopper or create a subclass of Xapian::Stopper to
determine whether a term that is fed to it is a stopword or not. But for
determining which terms are stopwords, I was wondering whether we'd be
using the stopword list within xapian/languages/stopwords or will we have
to create one within the cluster d...
2009 Apr 23
1
Expanding the search in PHP
I tried using the simpleexpand.php from
http://xapian.org/docs/bindings/php/examples/simpleexpand.php5
I get different results between PHP and the Omega expand (see below),
I'd like to have the same functionality in PHP.
Could anyone suggest how to do it? Is there an example I could use?
Thanks,
Frank
And got the following results from PHP:
Zdefin: weight = 46.963883268652
Zconfigur:
2010 May 27
1
Problem with stop words by indexing
...t it does not
> work at all.
>
> > I have my writableDatabase and my termGenerator
> (indexer) and they work
> well both together: I can index texts and search
> trough the database
> correctly.
> >
> > But if I add (before indexing my texts):
> > Xapian::SimpleStopper stopper;
> > stopper.add("testword");
> > indexer.set_stopper(&stopper);
> >
> > ... the result is exactly the same as before. I have
> checked with delve
> and "testword" is indexed.
>
> http://article.gmane.org/gmane.comp.search.xapian...
2012 Jun 04
1
Search not finding queries with stop words.
...stemmer(new Search::Xapian::Stem("english"));
$qp->set_stemming_strategy(STEM_SOME);
$qp->set_default_op($defaultop);
...
my $par = $qp->parse_query($query);
my $enq = $xDatabase->enquire( $par );
and in the db create script:
my $stopper = Search::Xapian::SimpleStopper->new();
foreach my $word (@ar) {
$stopper->add($word);
}
...
my $doc = Search::Xapian::Document->new();
my $indexer = Search::Xapian::TermGenerator->new();
my $stemmer = Search::Xapian::Stem->new('english');
$doc-&...
2010 Jul 26
2
related documents
Hi All,
I would like to take a doc in the xapian DB and find all related
documents by relevance e.g. so when you view one document it says
"Related entries X Y Z".
I'm aware of the "Morelikethis" Lucene plugin that is supposed to do
something like this, by generating a query from a document based on term
frequency.
Has anyone developed a tool to generate a query from a
2008 Mar 27
2
Proper noun stemming
Hi All
I was wondering if anyone had a solution for the following problem.
I user QueryParser to stem my documents before adding them to a
database. During the stemming process I would like to find a way of
keeping proper nouns that span two or more words together as a phrase.
For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
up. I see
2006 May 10
1
Documentation for the PHP OO wrapper
...e
to resolve :
- some methods are documented but do not exist in the wrapper (e.g.
empty(), max_size(), swap(), create()...)
- some forms of methods use arguments types which are not in the wrapper
and should not be documented (e.g : get_mset with a MatchDecider,
get_eset with an ExpandDecider, SimpleStopper::__construct() with
iterators...)
- some methods exist in the wrapper but are not documented. This is
everything which is specific to the bindings :
http://svn.xapian.org/trunk/xapian-bindings/php/docs/bindings.html?view=co
and some which are not documented like (Simple)Stopper::apply().
- some m...