similar to: TermGenerator and SimpleStopper

Displaying 20 results from an estimated 300 matches similar to: "TermGenerator and SimpleStopper"

2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit: > On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote: > > I try to remove stop words during the index process > and I have no stemming. > I have tried with a simple example but it does not > work at all. > > > I have my writableDatabase and my termGenerator > (indexer) and they
2008 Mar 12
1
how can i use stopwords?
Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2010 Apr 05
1
Problem with stop words by indexing
Hi, I try to remove stop words during the index process and I have no stemming. I have tried with a simple example but it does not work at all. I have my writableDatabase and my termGenerator (indexer) and they work well both together: I can index texts and search trough the database correctly. But if I add (before indexing my texts): Xapian::SimpleStopper stopper;
2007 Nov 14
1
Problem indexing text with spelling enabled in Perl
Hi All, I'm using the TermGenerator::index_text() on version 1.0.4 with the FLAG_SPELLING turned on, because the new spelling suggestion stuff seems awesome, but I'm getting a segv. (gdb) bt #0 0xb7ae153c in Xapian::WritableDatabase::add_spelling (this=0xa553988, word=@0xbff97724, freqinc=1) at ./include/xapian/ base.h:154 #1 0xb7becf47 in
2024 Mar 06
1
Never exporting .__global__ and .__suppressForeign__?
Hello, (Dear Richard, I hope you don't mind being Cc:'d on this thread in R-devel. This is one of the ways we can prevent similar problems from happening in the future.) Sometimes, package authors who use both exportPattern('.') and utils::globalVariables(...) get confusing WARNINGs about undocumented exports: https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010531.html I
2008 Mar 27
2
Proper noun stemming
Hi All I was wondering if anyone had a solution for the following problem. I user QueryParser to stem my documents before adding them to a database. During the stemming process I would like to find a way of keeping proper nouns that span two or more words together as a phrase. For example "New York" or "Gordon Brown" or "Prime Minister" get spilt up. I see
2002 Mar 28
2
Patches for rsync.mbox
I found that my mail client reported invalid messages in the just-downloaded rsync.mbox. Further examination showed that they are due to instances of unprefixed words "From" at beginning of line in the message body. Once I fixed them, all messages are visible. It looks like the rsync.mbox file may have been prepared by simple concatenation of messages, without the filtering of
2012 Nov 26
1
Word missing after stemmed with Norwegian in Search::Xapian::TermGenerator
Hi all Xapian-devel, Gist: https://gist.github.com/10d2222d8bffe8d7631d I'm using Xapian-TermGenerator to extract Norwegian sentences to vsm (vector space model) using TermGenerator. But when I test generating vsm from 'Truet med ? stevne misforn?yd PC-kunde - PC-leverand?ren Asus likte sv?rt d?rlig kundens misforn?yde leserbrev.' It doen't return 'asus' result in vsm.
2007 Jun 15
1
TermGenerator in PHP4
(xapian 1.0.1) Should TermGenerator in the PHP4 bindings be called XapianTermGenerator? Thanks, Tim.
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
Dear Xapian users, I try to index some German text with Xapian using the xapian_php bindings. I run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian bindings from Flax: Xapian Support enabled Xapian Compiled Version @PACKAGE_VERSION@ Xapian Linked Version 1.2.0 The problem is that after indexing text which contains special characters like ?, ?, ? and ?, using
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual place: http://www.xapian.org/download.php This release mainly comprises bug fixes and performance improvements. The "simple" examples (for both C++ and the bindings) have also been overhauled and now use the QueryParser and TermGenerator classes, which makes for simpler examples and should better reflect
2007 Jun 11
3
Xapian 1.0.1 released
I've now uploaded Xapian 1.0.1, which you can download from the usual place: http://www.xapian.org/download.php This release mainly comprises bug fixes and performance improvements. The "simple" examples (for both C++ and the bindings) have also been overhauled and now use the QueryParser and TermGenerator classes, which makes for simpler examples and should better reflect
2012 Mar 09
0
.conflicts.OK no longer working regardless of export(.conflicts.OK) due to "stoplist"
Hi, in (at least) R v2.14.2 and R v2.15.0 alpha, '.conflicts.OK' is not exported and hence to seen by library(). DETAILS: In R-devel thread '[Rd] Suggestion: Not having to export .conflicts.OK in name spaces' on Mar 17-22, 2010 [https://stat.ethz.ch/pipermail/r-devel/2010-March/057017.html] it was discussed that one had to export '.conflicts.OK' in the namespace,
2007 Dec 17
1
Crashes with spelling enabled and perl.
Hi Guys, Here's a simple test case that causes a segfault with the perl bindings patched to enable spelling correction: use strict; use warnings; use Search::Xapian; my $db = Search::Xapian::WritableDatabase->new("test.db", Search::Xapian::DB_CREATE_OR_OPEN); if (!defined($db)) { die("Failed to open xapian_database: $!"); } my $indexer =
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",
2012 Jun 04
1
Search not finding queries with stop words.
I have a search in perl that looks a bit like: my $qp = new Search::Xapian::QueryParser(); $qp->set_stemmer(new Search::Xapian::Stem("english")); $qp->set_stemming_strategy(STEM_SOME); $qp->set_default_op($defaultop); ... my $par = $qp->parse_query($query); my $enq = $xDatabase->enquire( $par ); and in the db create script: my $stopper =
2007 Jan 19
9
Double-quoted query with "and" fails.
Hi, We''re using Ferret 0.9.4 and we''ve observed the following behavior. Searching for ''fieldname: foo and bar'' works fine while ''fieldname: "foo and bar"'' doesn''t return any results. Is there a way to make ferret recognize the ''and'' inside the query as a search term and not an operator? (I hope I got the
2007 Jun 13
2
winbind idmap customization
I would like to have winbind map all of my AD users to their full user@REALM form on the Linux domain members. I'd like lookups to be properly canonical. Is this possible? 'getent passwd user' should return: user@REALM.NET:*:1786588783:1786588745:Mr Man:/home/whatever:/bin/bash I'm finding my options are to either have the local names be plain, unprefixed, or prefixed, but