Displaying 20 results from an estimated 29 matches for "standardtokenizer".
2007 Aug 03
0
StandardTokenizer Doesn''t Support token_stream method
According to the Analyzer doc and the StandardTokenizer doc:
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html
I ought to be able to construct a StandardTokenizer like this:
t = StandardTokenizer.new( true) # true to downcase tokens
and then...
2006 Sep 05
15
ferret finds ''tests'' but not ''test''
Hello all,
Quick question (possibly!) - I''ve got a few records indexed and doing a
search for ''test'' reports in no hits even though I know the word ''tests''
exists in the indexed field. Doing a search for ''tests'' produces a
result. I would have thought that ''test'' would match ''tests'' but no such
2006 Sep 15
1
Custom analyzer not invoked?
...--------------------
require ''ferret''
include Ferret
class MyAnalyzer < Analysis::Analyzer
def token_stream(field, str)
# Display results of analysis
puts ''Analyzing: field:%s str:%s'' % [field, str]
t =
Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str))
while true
n = t.next()
break if n == nil
puts n.to_s
end
return
Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str))
end
end
puts ''== Adding document to index...''
index = Index::Index.new(:analyzer => MyAnalyzer.new(...
2007 Sep 07
5
Custom Analyser .. where to put it ??
...put my custom Analyser class
like :
class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = FULL_GERMAN_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words), ''de'')
end
end
Any clue ?
Thanks a lot
Guillaume.
--
Posted via http://www.ruby-forum.com/.
2007 Mar 06
1
case-sensitivity of analyzer
Is there anything about this analyzer that says "case-sensitive" to you?
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
Just wondering how I can force my index to be case-insensitive.
Thanks,
-Adam
--
Posted via http://www.ruby-forum.com/.
2007 Mar 01
4
Need help creating my own Filter in Ruby
...list to reach more people.
I''m using these filters together in my analyzer (with acts_as_ferret
+ Ferret 0.11.1).
HyphenFilter.new(
StopFilter.new(
LowerCaseFilter.new(
MappingFilter.new(
StandardTokenizer.new(str),
mapping)),
FULL_FRENCH_STOP_WORDS + FULL_ENGLISH_STOP_WORDS)
)
The mapping filter maps pretty much all the french accents to the
letter without the accent. So far so good.
Only thing missing for what I want to do: I need to be able to make
the w...
2007 Jan 11
5
stop words in query
...;m using AAF and the following custom analyzer:
class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end
However when my search term includes a stop word I never get any results
back. Once I remove the stop word I get the normal results back. Do I
need to do a search of my query for stop words and remove them myself?
Or is there something I''m doing wrong wit...
2007 Jan 21
2
A few questions: Tweaking StemFilter, indexes, ...
...to figure out after
messing around with ferret and going through the documentation.
StemFilter ------
I am trying to improve the quality of my searches in context of the
content of my application. I have created an analyzer using the
following:
StemFilter.new StopFilter.new(
LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words )
This has been pretty good so far, however, I really would like to get
a search for "plumber" match "plumbing" at maybe a lower score than it
would match "plumbers". The thing is that plumber(s) is filtered to
"plumber" and plumbing...
2007 Nov 09
2
Problem with stemming and AAF
...ms''
require ''ferret''
class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end
end
And added the call to the analyzer in my model file:
acts_as_ferret( :fields => { :name => { :boost => 1,
:store => :yes },
:product_number => { :boost => 2 },
:de...
2007 May 23
6
Accented characters
...?'',''?''] => ''y'',
[''?'',''?'',''?''] => ''z''
}
def token_stream(field, string)
return MappingFilter.new(StandardTokenizer.new(string), MAPPING)
end
end
And inserted this code at the end of environment.rb.
Im my model:
acts_as_ferret({ :fields => [ ''name'' ] }, :analyzer =>
PortugueseAnalyzer.new)
But this did not work....
Can someone tell me what I did wrong ????
Thanks
Marcello...
2007 Nov 13
8
acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
...gt; {:store => :yes}} } ,
{:analyzer => PlainAsciiAnalyzer.new}
)
end
ANALYZER
lib : plain_ascii_analyzer.rb
class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer
include ::Ferret::Analysis
def token_stream(field, str)
StopFilter.new(
StandardTokenizer.new(str) ,
["fax", "gsm"]
)
# raise <<<----- is never executed when uncommented !!
end
end
In the console, I rebuild the index + search for a stop word => I get a
results, when I should not :
>> reload!; AccessPointKind2.r...
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work
quite nicely. However, my fields are very short (titles of music) and I
don''t think may users will be typing in apostrophes when they are
looking for something. Right now, for a simple document such as "what
i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Dec 08
4
Using custom stem analyzer giving mongrel errors
...'rubygems''
require ''ferret''
include Ferret
module Ferret::Analysis
class FerretAnalyzer
def initialize(stop_words = FULL_ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, text)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)),
@stop_words))
end
end
end
and I''m simply setting the :analyzer option in AAF.
However, I get odd behavior. The first search that I do will go through
and display the proper results, but any subsequent request starts to
produce odd behavior. For example when you are redi...
2009 Apr 09
4
Weird analyzer issue with the word ''fly''
...nalyzer.new,
:fields => {:name => { :boost => 2.0 },
...
}})
And this analyzer is defined in a module thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
Now, here''s a search without using the analyzer:
>> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size
=> 14
And with the analyzer:
>> TeachingObject.find_with_ferret("flea fly", :per_page => 2000,...
2006 Dec 06
10
Stem Analyzer
Hi all,
I am trying to implement a search that will use the Stem Analyzer. I
added the Stem Anaylzer from the examples shown in another post
http://ruby-forum.com/topic/80178#147014
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
The problem with the Stem analyzer is that when I search for a term such
as ''engineering'', it only matches whole words that fit the stem so the
only results I get back are documents where ''engin'' is a whole word
(i.e. I don''...
2007 Sep 20
5
Ferret DRB, UTF-8, Mongrel
...?'',''?'',''?''] => ''y'',
[''?'',''?'',''?''] => ''z''
}
def token_stream(field, str)
MappingFilter.new(StandardTokenizer.new(str), CHARACTER_MAPPINGS)
end
end
I think Ferret is working fine... because when I run some tests, the
mapping filter correctly pulls out the accented characters... exactly as
it should.
However, when something is persisted via the model (acts_as_ferret and
DRB server), I get unexpected be...
2007 Jan 17
1
Tokenizers?
Hi everyone. First a quick word - I am relatively new to Ruby and Ruby
on Rails, but I love learning about it and using it. Currently I am
working on extending Boxroom (file repository RoR app) for the CARE
Indonsia intranet, where I work as an intern. I am using ferret, and
it''s working great.
I noticed that if a file contains something like this
"applications/entries", this
2006 Dec 06
1
AAF - Stem Analyzer
I''m not on AAF. Can someone else help Raymond with an example?
On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote:
>
> Matt Schnitz wrote:
> > You also need to stem-analyze the incoming query.
> >
> > I had this same problem. :^>
> >
> >
> > Schnitz
>
> Do you have an example of how to do this? I''m using
2006 Oct 19
2
How to deal with accentuated chars in 0.10.8?
I''m startin to use Ferret and acts_as_ferret.
I need to use something like EuropeanAnalyzer
(http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).
By example, if the user search by "gonzalez" you can find documents taht
contents the term "gonz?lez" (gonzález)
The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,
2006 Oct 20
0
Ferret 0.10.13 released
...'', ''?''] => ''e'',
[''?'', ''?'', ''?''] => ''u'',
[''?''] => ''c''
}
def token_stream(field, string)
return MappingFilter.new(StandardTokenizer.new(string), MAPPING)
end
end
Happy Ferreting and check the Ferret homepage[1] if you are able to contribute.
Cheers,
Dave
[1] http://ferret.davebalmain.com/trac/