similar to: Proper noun stemming

Displaying 20 results from an estimated 7000 matches similar to: "Proper noun stemming"

2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi, I'm using SimpleStopper with TermGenerator in a Python indexing script, in an attempt to keep my index size down (currently 30K per doc, and I have 200 million docs to index, which I think implies 6TB.) However, unprefixed (positional?) terms are not affected by the stopper, though Z-prefixed terms are. I assume this is intentional for phrase queries, but I need to reduce my
2011 May 27
1
Does OP_NEAR works with stemming?
Hi All, I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether
2008 Mar 12
1
how can i use stopwords?
Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it
2007 Mar 04
5
Getting non-stemmed terms from IndexReader
I need to get a set of terms being indexed using Ferret. I used IndexReader.terms and it returns a list of TermEnum nicely. The only problem is that my analyzer includes a stemming filter. So now, the terms I''m getting back are all stemmed. Is there anyway to get the original unstemmed terms back from the index somehow? Thanks. -- Posted via http://www.ruby-forum.com/.
2007 May 30
1
QueryParser prefixing terms when stemming?
I'm new to Xapian and we just recently upgraded to version 1.0.0.0. However, something seems to have changed during the upgrade and I need help figuring out how my code should be written. In version 0.9.9.1 of Search::Xapian, the following code results in this output "Xapian::Query(pet:(pos=1))". my $qp = new Search::Xapian::QueryParser; $qp->set_stemmer(new
2010 Apr 05
1
Problem with stop words by indexing
Hi, I try to remove stop words during the index process and I have no stemming. I have tried with a simple example but it does not work at all. I have my writableDatabase and my termGenerator (indexer) and they work well both together: I can index texts and search trough the database correctly. But if I add (before indexing my texts): Xapian::SimpleStopper stopper;
2015 Jul 26
1
Get term from document by position
mple (see attachment). > > Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>. > > Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to
2007 Jun 12
1
Empty results OMEGA with XAPIAN 1.0.1
Hi, I configured XAPIAN 1.0.1 and OMEGA 1.0.1. on my development machine (first removed the old ones). I recreated my databases (both quartz and flint) and tried to run original queries against the databases created by the new versions. I'm getting empty result sets from OMEGA. If I use the delve tool I actually see that the records are created fine. No log files are written as far as I
2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit: > On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote: > > I try to remove stop words during the index process > and I have no stemming. > I have tried with a simple example but it does not > work at all. > > > I have my writableDatabase and my termGenerator > (indexer) and they
2007 Oct 01
3
How to beat Google aka Xapian & Natural Language Processing.
Xapians! If tomorrow Xapian search engine would achieved the same performance and result in searches as Google we would not be able to beat Google, because we would create only a copy of the searches that already exists from Google search engine. However there is a way to beat anyone, and there is a way to beat Google successfully as well just do not give up. Some see it as implementing Ajax, or
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
Dear Xapian users, I try to index some German text with Xapian using the xapian_php bindings. I run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian bindings from Flax: Xapian Support enabled Xapian Compiled Version @PACKAGE_VERSION@ Xapian Linked Version 1.2.0 The problem is that after indexing text which contains special characters like ?, ?, ? and ?, using
2007 Dec 29
3
Term-Flags
Hi, Is it necessary to set the down below flag to the TermGenerator, if I want the "Did you mean ..." spelling corrections? Xapian::TermGenerator::flags::FLAG_SPELLING Thank you very much Markus
2010 Oct 24
1
Cannot index with dynamic spelling data (Perl/Search::Xapian)
This is my test case, what am I doing wrong? It seems that the API is used incorrectly, but I cannot find the problem... --- 8< --- #!/usr/bin/perl use Search::Xapian qw(:all); use strict; my $xa = new Search::Xapian::WritableDatabase ("/tmp/xapian", DB_CREATE_OR_OVERWRITE); my $indexer = Search::Xapian::TermGenerator->new();
2011 Jul 27
3
Searching using prefixes
Hi guys I'm trying to figure out how I can use probabilistic searching on a given field within a document; I've written to the list about this before, but haven't quite figured out what's required and, following a little research, I think I understand what I need to do but I'd like a clarification on this. o We have a database of a number of documents, with fields: title,