similar to: Getting non-stemmed terms from IndexReader

Displaying 20 results from an estimated 2000 matches similar to: "Getting non-stemmed terms from IndexReader"

2007 Apr 09
5
IndexReader#terms for all fields?
Is it possible to query the index for a TermEnum for all fields in the index instead of just ? Thanks, John
2007 Apr 28
6
Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in, besides iterating through every document with TermDocEnum? -- Best regards, Stian Gryt?yr
2008 Jan 06
3
Did you mean ...? with act_as_ferret
Hello, does anybody know how to implement a "Did you mean ...?" like Google with act_as_ferret? I think this is a possible way: 1. Generate a keyword-list (this is my difficulty. I don''t know how to build such a list from the index) with no stop-words from the first index. e. g. (car, ship, plant, house) 2. Build a second index from this word-list where we store the word in
2007 Dec 05
2
Term frequency doesn''t decrement after document is deleted.
Hey all, The frequency count returned by my ferret reader doesn''t decrement after I remove a documents with those terms. Using the example from http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html the frequency increments after a document is added but stays the same after a document is deleted. index.reader.terms(:tags).each do |term, freq| "#{term} appears
2008 Mar 27
2
Proper noun stemming
Hi All I was wondering if anyone had a solution for the following problem. I user QueryParser to stem my documents before adding them to a database. During the stemming process I would like to find a way of keeping proper nouns that span two or more words together as a phrase. For example "New York" or "Gordon Brown" or "Prime Minister" get spilt up. I see
2006 Sep 09
3
Per field analyzer
Is there a way to add per-field analyzer? I can''t seem to find a way to do that. Thanks -- Kent --- http://www.datanoise.com
2007 Apr 03
2
How can I count frequency of terms in a document?
Hi, there. I need some help. Is there a way to count frequencies of terms in a document on Ferret? I know that Ferret has IndexReader#terms_docs_for method which counts all documents. I need to count frequencies of terms in a specific document. Some way?? -- Posted via http://www.ruby-forum.com/.
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2006 Jun 14
3
In memory IndexReader bug?
Hi All, Hope all is going well. I''m having trouble with the following code creating an in memory index reader - it seems to be attempting to read from a file regardless. Here''s the simple code: require ''rubygems'' require ''ferret'' a = Ferret::Index::Index.new r = Ferret::Index::IndexReader.new(nil) Running the code on my OS X machine
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2011 May 27
1
Does OP_NEAR works with stemming?
Hi All, I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp
2007 Apr 14
3
Error on optimize leads to corrupt index?
The following exception occurred while trying optimize a large index: vendor/gems/rdig-0.3.4/lib/rdig/index.rb:46:in `optimize'': End-of- File Error occured at <except.c>:93 in xraise (EOFError) Error occured in store.c:216 - is_refill current pos = 0, file length = 0 Now, I get the following error any time I try to create a new index on the directory that I was trying
2007 Feb 20
10
ferret webpage down
The ferret webpage at http://ferret.davebalmain.com/ has been down for a number of days. Any idea what''s going on? or how to notify the webmaster? -- Posted via http://www.ruby-forum.com/.
2007 Jul 26
1
doubts in ferret
I am using ferret to build a search application for my site. I used stemming analyzer to build the index. When i searched "market" i get hits but on searching "marketing" i get no hits,while there are fields containing the word marketing. I am using stemming analyzer even while searching. Is the problem with the analyzer? Or am I missing out something -------------- next part
2007 Mar 28
6
trouble with PerFieldAnalyzer
I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14). Script: require ''rubygems'' require ''ferret'' require ''pp'' include Ferret::Analysis include Ferret::Index class TestAnalyzer def token_stream field, input pp field pp input LetterTokenizer.new(input) end end pfa =
2006 Feb 17
1
IndexReader NotImplemented
Hi there, Sorry if this has come up before, but I couldn''t see it obviously addressed anywhere. There are a few methods in IndexReader that raise NotImplementedErrors. I''m specifically interested in get_term_vector, but there are a number of others. Is there anything specific holding these back, or would patches to implement them be accepted? Thanks, -- Alex
2007 Apr 08
10
Ferret and non latin characters support
I''ve successfully installed ferret and acts_as_ferret and have no problem with utf-8 for accented characters. It returns correct results fot e.g. fran?ais. My problem is with non latin characters (Persian indeed). I have tested different locales with no success both on Debian and Mac. Any idea? (ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6) -- Posted via http://www.ruby-forum.com/.
2006 Feb 28
2
Most Popular Searches
Hi, I have an index where each document contains an untokenized ''url'' field. I would like to query the index for the most popular urls. In SQL I would do this via a Group By clause. Is there anything in Ferret that will do something similar? I found this discussion that proposed a solution involving TermEnums:
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my