Displaying 20 results from an estimated 300 matches similar to: "how can i use stopwords?"
2010 Apr 05
1
Problem with stop words by indexing
Hi,
I try to remove stop words during the index process and I have no stemming.
I have tried with a simple example but it does not work at all.
I have my writableDatabase and my termGenerator (indexer) and they work
well both together: I can index texts and search trough the database
correctly.
But if I add (before indexing my texts):
Xapian::SimpleStopper stopper;
2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit:
> On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote:
> > I try to remove stop words during the index process
> and I have no stemming.
> I have tried with a simple example but it does not
> work at all.
>
> > I have my writableDatabase and my termGenerator
> (indexer) and they
2008 Mar 27
2
Proper noun stemming
Hi All
I was wondering if anyone had a solution for the following problem.
I user QueryParser to stem my documents before adding them to a
database. During the stemming process I would like to find a way of
keeping proper nouns that span two or more words together as a phrase.
For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
up. I see
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi,
I'm using SimpleStopper with TermGenerator in a Python indexing
script, in an attempt to keep my index size down (currently 30K per
doc, and I have 200 million docs to index, which I think implies
6TB.) However, unprefixed (positional?) terms are not affected by
the stopper, though Z-prefixed terms are.
I assume this is intentional for phrase queries, but I need to reduce
my
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2012 Jun 04
1
Search not finding queries with stop words.
I have a search in perl that looks a bit like:
my $qp = new Search::Xapian::QueryParser();
$qp->set_stemmer(new Search::Xapian::Stem("english"));
$qp->set_stemming_strategy(STEM_SOME);
$qp->set_default_op($defaultop);
...
my $par = $qp->parse_query($query);
my $enq = $xDatabase->enquire( $par );
and in the db create script:
my $stopper =
2010 Sep 01
8
FIXMEs in Search::Xapian
Carrying on this conversation:
http://lists.tartarus.org/pipermail/xapian-discuss/2007-March/003513.html
void
TermGenerator::set_stopper(stopper)
Stopper * stopper
CODE:
// FIXME: no corresponding SvREFCNT_dec(), but a leak seems better
than
// a SEGV!
SvREFCNT_inc(ST(1));
THIS->set_stopper(stopper);
It would be good to fix these FIXMEs.
A class-level HASH could be
2011 Sep 23
2
understanding stemming and synonyms
I am working with version 1.2.7 and want to use stemming and synonyms.
I use the perl-bindings and get some problems.
First of all: the perl-bindings dont allow the QueryParser a third
argument when calling parse_query! So i cannot set a default prefix
(which perhaps is the solution to my problem, but later more)
i have a simple testcase:
3 documents, every document only has one word:
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was
wondering if anyone had any good rules of thumb on how to pick which
words to blacklist. It all seems a little... well... vague. Although I
guess it kind of depends on the sort of documents you're wanting to index.
My current idea is to write a little script to output the terms with the
highest frequency in my
2008 Mar 12
1
xapian and autocomplete?
Hi,
does anybody know how to realise a autocomplete Searchinput like (for example) google?
Thanks.
Greetings.
Sascha
2011 Oct 14
1
stemming an irregular forms?
Dear All,
I could not find the irregular forms table in xapian.
Please, could you tell me how to define/add words to the irregular forms table in xapian?
Thank you a lot.
Sascha
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual
place:
http://www.xapian.org/download.php
This release mainly comprises bug fixes and performance improvements.
The "simple" examples (for both C++ and the bindings) have also been
overhauled and now use the QueryParser and TermGenerator classes, which
makes for simpler examples and should better reflect
2010 Nov 15
4
Stopword addition and stemming
Hi,
Two questions which I'm unsure about:
Stemming: I've turned on stemming, etc, but how can I confirm that
it's being used in searches? What should I look/search for?
Stopwords: I'm trying out xapian on a regional dataset (searching
data from a *.co.us TLD, eg) . I've noticed that searching for [bob
co.us] results in *very* slow search times (tens of seconds), since it
2007 Dec 29
3
Term-Flags
Hi,
Is it necessary to set the down below flag to the TermGenerator,
if I want the "Did you mean ..." spelling corrections?
Xapian::TermGenerator::flags::FLAG_SPELLING
Thank you very much
Markus
2014 Jan 27
4
Perl Search::Xapian
Hi,
Trying to learn Search::Xapian and be better at perl at the same time,
I'm stuck, at the DB_CREATE_OR_OPEN error. Perl says this:
~/dev/sandbox/Xapian-perl$ ./Index1-Xap.pl 100-objects-v1.csv db
"db" is not exported by the Search::Xapian module
Can't continue after import errors at ./Index1-Xap.pl line 7.
BEGIN failed--compilation aborted at ./Index1-Xap.pl line 7.
What I
2010 May 11
3
indexing words with alternative spellings
Some languages (e.g. German and Danish) have special letters that are
often written using two-letter combinations when the appropriate
keyboard or medium is not available:
? = ae
? = ue
? = oe
? = ae
? = oe
? = aa
? = ss
(there are undoubtedly far more examples than those)
As a user of an index, I would like to be able to search for
e.g. "schaefer" and get matches on both
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
Dear Xapian users,
I try to index some German text with Xapian using the xapian_php bindings. I
run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian
bindings from Flax:
Xapian Support enabled Xapian
Compiled Version @PACKAGE_VERSION@
Xapian Linked Version 1.2.0
The problem is that after indexing text which contains special characters
like ?, ?, ? and ?, using
2017 Mar 15
2
xapian core missing link to math on MSYS2
Dear All,
I've tried to build xapian-core 1.4.3 on MSYS2. It fails with attached
error (undefined reference to `exp10'). I think it might be missing an
explicit link to 'm'. I'm not able to fix this myself as I do not
know autotools sufficiently well, but I hope you might be able to help.
Cheers,
Mario Emmenlauer
--
BioDataAnalysis GmbH, Mario Emmenlauer Tel.
2009 Apr 23
1
Expanding the search in PHP
I tried using the simpleexpand.php from
http://xapian.org/docs/bindings/php/examples/simpleexpand.php5
I get different results between PHP and the Omega expand (see below),
I'd like to have the same functionality in PHP.
Could anyone suggest how to do it? Is there an example I could use?
Thanks,
Frank
And got the following results from PHP:
Zdefin: weight = 46.963883268652
Zconfigur: