Displaying 20 results from an estimated 2000 matches similar to: "Getting non-stemmed terms from IndexReader"
2007 Apr 09
5
IndexReader#terms for all fields?
Is it possible to query the index for a TermEnum for all fields in
the index instead of just ?
Thanks,
John
2007 Apr 28
6
Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in,
besides iterating through every document with TermDocEnum?
--
Best regards,
Stian Gryt?yr
2008 Jan 06
3
Did you mean ...? with act_as_ferret
Hello,
does anybody know how to implement a "Did you mean ...?" like Google
with act_as_ferret?
I think this is a possible way:
1. Generate a keyword-list (this is my difficulty. I don''t know how to
build such a list from the index) with no stop-words from the first
index.
e. g. (car, ship, plant, house)
2. Build a second index from this word-list where we store the word in
2007 Dec 05
2
Term frequency doesn''t decrement after document is deleted.
Hey all,
The frequency count returned by my ferret reader doesn''t decrement
after I remove a documents with those terms. Using the example from
http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html
the frequency increments after a document is added but stays the same
after a document is deleted.
index.reader.terms(:tags).each do |term, freq|
"#{term} appears
2008 Mar 27
2
Proper noun stemming
Hi All
I was wondering if anyone had a solution for the following problem.
I user QueryParser to stem my documents before adding them to a
database. During the stemming process I would like to find a way of
keeping proper nouns that span two or more words together as a phrase.
For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
up. I see
2006 Sep 09
3
Per field analyzer
Is there a way to add per-field analyzer? I can''t seem to find a way to do that.
Thanks
--
Kent
---
http://www.datanoise.com
2007 Apr 03
2
How can I count frequency of terms in a document?
Hi, there.
I need some help.
Is there a way to count frequencies of terms in a document on Ferret?
I know that Ferret has IndexReader#terms_docs_for method which counts
all documents.
I need to count frequencies of terms in a specific document.
Some way??
--
Posted via http://www.ruby-forum.com/.
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a
> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
> It?s not available in the 1.2 series, but as I understand it should work out of the
> box in 1.3.3.
I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2006 Jun 14
3
In memory IndexReader bug?
Hi All,
Hope all is going well.
I''m having trouble with the following code creating an in memory index
reader - it seems to be attempting to read from a file regardless.
Here''s the simple code:
require ''rubygems''
require ''ferret''
a = Ferret::Index::Index.new
r = Ferret::Index::IndexReader.new(nil)
Running the code on my OS X machine
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2011 May 27
1
Does OP_NEAR works with stemming?
Hi All,
I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex
Currently Available Items
=========================
1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during
indexing.
2) Add the document?s last modified time to the value table (ID 0). This would allow incremental
indexing based on the timestamp and also sorting by date in omega (SORT=0)
a. Currently I store the timestamp
2007 Apr 14
3
Error on optimize leads to corrupt index?
The following exception occurred while trying optimize a large index:
vendor/gems/rdig-0.3.4/lib/rdig/index.rb:46:in `optimize'': End-of-
File Error occured at <except.c>:93 in xraise (EOFError)
Error occured in store.c:216 - is_refill
current pos = 0, file length = 0
Now, I get the following error any time I try to create a new index
on the directory that I was trying
2007 Feb 20
10
ferret webpage down
The ferret webpage at http://ferret.davebalmain.com/ has been down for a
number of days. Any idea what''s going on? or how to notify the
webmaster?
--
Posted via http://www.ruby-forum.com/.
2007 Jul 26
1
doubts in ferret
I am using ferret to build a search application for my site. I used stemming
analyzer to build the index. When i searched "market" i get hits but on
searching "marketing" i get no hits,while there are fields containing the
word marketing. I am using stemming analyzer even while searching. Is the
problem with the analyzer? Or am I missing out something
-------------- next part
2007 Mar 28
6
trouble with PerFieldAnalyzer
I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14).
Script:
require ''rubygems''
require ''ferret''
require ''pp''
include Ferret::Analysis
include Ferret::Index
class TestAnalyzer
def token_stream field, input
pp field
pp input
LetterTokenizer.new(input)
end
end
pfa =
2006 Feb 17
1
IndexReader NotImplemented
Hi there,
Sorry if this has come up before, but I couldn''t see it obviously
addressed anywhere. There are a few methods in IndexReader that raise
NotImplementedErrors. I''m specifically interested in get_term_vector,
but there are a number of others. Is there anything specific holding
these back, or would patches to implement them be accepted?
Thanks,
--
Alex
2007 Apr 08
10
Ferret and non latin characters support
I''ve successfully installed ferret and acts_as_ferret and have no
problem with utf-8 for accented characters. It returns correct results
fot e.g. fran?ais. My problem is with non latin characters (Persian
indeed). I have tested different locales with no success both on Debian
and Mac. Any idea?
(ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6)
--
Posted via http://www.ruby-forum.com/.
2006 Feb 28
2
Most Popular Searches
Hi,
I have an index where each document contains an untokenized ''url''
field. I would like to query the index for the most popular urls. In
SQL I would do this via a Group By clause. Is there anything in
Ferret that will do something similar?
I found this discussion that proposed a solution involving TermEnums:
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was
wondering if anyone had any good rules of thumb on how to pick which
words to blacklist. It all seems a little... well... vague. Although I
guess it kind of depends on the sort of documents you're wanting to index.
My current idea is to write a little script to output the terms with the
highest frequency in my