similar to: how can i use stopwords?

Displaying 20 results from an estimated 300 matches similar to: "how can i use stopwords?"

2010 Apr 05
1
Problem with stop words by indexing
Hi, I try to remove stop words during the index process and I have no stemming. I have tried with a simple example but it does not work at all. I have my writableDatabase and my termGenerator (indexer) and they work well both together: I can index texts and search trough the database correctly. But if I add (before indexing my texts): Xapian::SimpleStopper stopper;
2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit: > On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote: > > I try to remove stop words during the index process > and I have no stemming. > I have tried with a simple example but it does not > work at all. > > > I have my writableDatabase and my termGenerator > (indexer) and they
2008 Mar 27
2
Proper noun stemming
Hi All I was wondering if anyone had a solution for the following problem. I user QueryParser to stem my documents before adding them to a database. During the stemming process I would like to find a way of keeping proper nouns that span two or more words together as a phrase. For example "New York" or "Gordon Brown" or "Prime Minister" get spilt up. I see
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi, I'm using SimpleStopper with TermGenerator in a Python indexing script, in an attempt to keep my index size down (currently 30K per doc, and I have 200 million docs to index, which I think implies 6TB.) However, unprefixed (positional?) terms are not affected by the stopper, though Z-prefixed terms are. I assume this is intentional for phrase queries, but I need to reduce my
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2012 Jun 04
1
Search not finding queries with stop words.
I have a search in perl that looks a bit like: my $qp = new Search::Xapian::QueryParser(); $qp->set_stemmer(new Search::Xapian::Stem("english")); $qp->set_stemming_strategy(STEM_SOME); $qp->set_default_op($defaultop); ... my $par = $qp->parse_query($query); my $enq = $xDatabase->enquire( $par ); and in the db create script: my $stopper =
2010 Sep 01
8
FIXMEs in Search::Xapian
Carrying on this conversation: http://lists.tartarus.org/pipermail/xapian-discuss/2007-March/003513.html void TermGenerator::set_stopper(stopper) Stopper * stopper CODE: // FIXME: no corresponding SvREFCNT_dec(), but a leak seems better than // a SEGV! SvREFCNT_inc(ST(1)); THIS->set_stopper(stopper); It would be good to fix these FIXMEs. A class-level HASH could be
2011 Sep 23
2
understanding stemming and synonyms
I am working with version 1.2.7 and want to use stemming and synonyms. I use the perl-bindings and get some problems. First of all: the perl-bindings dont allow the QueryParser a third argument when calling parse_query! So i cannot set a default prefix (which perhaps is the solution to my problem, but later more) i have a simple testcase: 3 documents, every document only has one word:
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my
2008 Mar 12
1
xapian and autocomplete?
Hi, does anybody know how to realise a autocomplete Searchinput like (for example) google? Thanks. Greetings. Sascha
2011 Oct 14
1
stemming an irregular forms?
Dear All, I could not find the irregular forms table in xapian. Please, could you tell me how to define/add words to the irregular forms table in xapian? Thank you a lot. Sascha
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual place: http://www.xapian.org/download.php This release mainly comprises bug fixes and performance improvements. The "simple" examples (for both C++ and the bindings) have also been overhauled and now use the QueryParser and TermGenerator classes, which makes for simpler examples and should better reflect
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual place: http://www.xapian.org/download.php This release mainly comprises bug fixes and performance improvements. The "simple" examples (for both C++ and the bindings) have also been overhauled and now use the QueryParser and TermGenerator classes, which makes for simpler examples and should better reflect
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it
2007 Dec 29
3
Term-Flags
Hi, Is it necessary to set the down below flag to the TermGenerator, if I want the "Did you mean ..." spelling corrections? Xapian::TermGenerator::flags::FLAG_SPELLING Thank you very much Markus
2014 Jan 27
4
Perl Search::Xapian
Hi, Trying to learn Search::Xapian and be better at perl at the same time, I'm stuck, at the DB_CREATE_OR_OPEN error. Perl says this: ~/dev/sandbox/Xapian-perl$ ./Index1-Xap.pl 100-objects-v1.csv db "db" is not exported by the Search::Xapian module Can't continue after import errors at ./Index1-Xap.pl line 7. BEGIN failed--compilation aborted at ./Index1-Xap.pl line 7. What I
2010 May 11
3
indexing words with alternative spellings
Some languages (e.g. German and Danish) have special letters that are often written using two-letter combinations when the appropriate keyboard or medium is not available: ? = ae ? = ue ? = oe ? = ae ? = oe ? = aa ? = ss (there are undoubtedly far more examples than those) As a user of an index, I would like to be able to search for e.g. "schaefer" and get matches on both
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
Dear Xapian users, I try to index some German text with Xapian using the xapian_php bindings. I run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian bindings from Flax: Xapian Support enabled Xapian Compiled Version @PACKAGE_VERSION@ Xapian Linked Version 1.2.0 The problem is that after indexing text which contains special characters like ?, ?, ? and ?, using
2017 Mar 15
2
xapian core missing link to math on MSYS2
Dear All, I've tried to build xapian-core 1.4.3 on MSYS2. It fails with attached error (undefined reference to `exp10'). I think it might be missing an explicit link to 'm'. I'm not able to fix this myself as I do not know autotools sufficiently well, but I hope you might be able to help. Cheers, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel.
2009 Apr 23
1
Expanding the search in PHP
I tried using the simpleexpand.php from http://xapian.org/docs/bindings/php/examples/simpleexpand.php5 I get different results between PHP and the Omega expand (see below), I'd like to have the same functionality in PHP. Could anyone suggest how to do it? Is there an example I could use? Thanks, Frank And got the following results from PHP: Zdefin: weight = 46.963883268652 Zconfigur: