thr3ads.net - similar to: "Proper noun stemming"

Displaying 20 results from an estimated 7000 matches similar to: "Proper noun stemming"

2009 Mar 26

ideas on picking stopwords

I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my

Proposed changes to omindex

2006 Aug 11

Proposed changes to omindex

Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

TermGenerator and SimpleStopper

2007 Jun 28

TermGenerator and SimpleStopper

Hi, I'm using SimpleStopper with TermGenerator in a Python indexing script, in an attempt to keep my index size down (currently 30K per doc, and I have 200 million docs to index, which I think implies 6TB.) However, unprefixed (positional?) terms are not affected by the stopper, though Z-prefixed terms are. I assume this is intentional for phrase queries, but I need to reduce my

Does OP_NEAR works with stemming?

2011 May 27

Does OP_NEAR works with stemming?

Hi All, I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether

how can i use stopwords?

2008 Mar 12

how can i use stopwords?

Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );

Get term from document by position

2015 Jul 26

Get term from document by position

> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

Getting non-stemmed terms from IndexReader

2007 Mar 04

Getting non-stemmed terms from IndexReader

I need to get a set of terms being indexed using Ferret. I used IndexReader.terms and it returns a list of TermEnum nicely. The only problem is that my analyzer includes a stemming filter. So now, the terms I''m getting back are all stemmed. Is there anyway to get the original unstemmed terms back from the index somehow? Thanks. -- Posted via http://www.ruby-forum.com/.

QueryParser prefixing terms when stemming?

2007 May 30

QueryParser prefixing terms when stemming?

I'm new to Xapian and we just recently upgraded to version 1.0.0.0. However, something seems to have changed during the upgrade and I need help figuring out how my code should be written. In version 0.9.9.1 of Search::Xapian, the following code results in this output "Xapian::Query(pet:(pos=1))". my $qp = new Search::Xapian::QueryParser; $qp->set_stemmer(new

Problem with stop words by indexing

2010 Apr 05

Problem with stop words by indexing

Hi, I try to remove stop words during the index process and I have no stemming. I have tried with a simple example but it does not work at all. I have my writableDatabase and my termGenerator (indexer) and they work well both together: I can index texts and search trough the database correctly. But if I add (before indexing my texts): Xapian::SimpleStopper stopper;

Get term from document by position

2015 Jul 26

Get term from document by position

mple (see attachment). > > Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>. > > Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to

Empty results OMEGA with XAPIAN 1.0.1

2007 Jun 12

Empty results OMEGA with XAPIAN 1.0.1

Hi, I configured XAPIAN 1.0.1 and OMEGA 1.0.1. on my development machine (first removed the old ones). I recreated my databases (both quartz and flint) and tried to run original queries against the databases created by the new versions. I'm getting empty result sets from OMEGA. If I use the delve tool I actually see that the records are created fine. No log files are written as far as I

Problem with stop words by indexing

2010 May 27

Problem with stop words by indexing

Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit: > On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote: > > I try to remove stop words during the index process > and I have no stemming. > I have tried with a simple example but it does not > work at all. > > > I have my writableDatabase and my termGenerator > (indexer) and they

How to beat Google aka Xapian & Natural Language Processing.

2007 Oct 01

How to beat Google aka Xapian & Natural Language Processing.

Xapians! If tomorrow Xapian search engine would achieved the same performance and result in searches as Google we would not be able to beat Google, because we would create only a copy of the searches that already exists from Google search engine. However there is a way to beat anyone, and there is a way to beat Google successfully as well just do not give up. Some see it as implementing Ajax, or

Ask for advice on exact requirements to fix #699 mixed CJK numbers

2019 Mar 07

Ask for advice on exact requirements to fix #699 mixed CJK numbers

I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to

TermGenerator incorrectly tokenizes German text which contains special characters

2010 Jun 09

TermGenerator incorrectly tokenizes German text which contains special characters

Dear Xapian users, I try to index some German text with Xapian using the xapian_php bindings. I run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian bindings from Flax: Xapian Support enabled Xapian Compiled Version @PACKAGE_VERSION@ Xapian Linked Version 1.2.0 The problem is that after indexing text which contains special characters like ?, ?, ? and ?, using

Term-Flags

2007 Dec 29

Term-Flags

Hi, Is it necessary to set the down below flag to the TermGenerator, if I want the "Did you mean ..." spelling corrections? Xapian::TermGenerator::flags::FLAG_SPELLING Thank you very much Markus

Cannot index with dynamic spelling data (Perl/Search::Xapian)

2010 Oct 24

Cannot index with dynamic spelling data (Perl/Search::Xapian)

This is my test case, what am I doing wrong? It seems that the API is used incorrectly, but I cannot find the problem... --- 8< --- #!/usr/bin/perl use Search::Xapian qw(:all); use strict; my $xa = new Search::Xapian::WritableDatabase ("/tmp/xapian", DB_CREATE_OR_OVERWRITE); my $indexer = Search::Xapian::TermGenerator->new();

Searching using prefixes

2011 Jul 27

Searching using prefixes

Hi guys I'm trying to figure out how I can use probabilistic searching on a given field within a document; I've written to the list about this before, but haven't quite figured out what's required and, following a little research, I think I understand what I need to do but I'd like a clarification on this. o We have a database of a number of documents, with fields: title,

similar to: Proper noun stemming