thr3ads.net - search: "standardtokenizer"

Displaying 20 results from an estimated 29 matches for "standardtokenizer".

StandardTokenizer Doesn''t Support token_stream method

2007 Aug 03

StandardTokenizer Doesn''t Support token_stream method

According to the Analyzer doc and the StandardTokenizer doc: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then...

ferret finds ''tests'' but not ''test''

2006 Sep 05

ferret finds ''tests'' but not ''test''

Hello all, Quick question (possibly!) - I''ve got a few records indexed and doing a search for ''test'' reports in no hits even though I know the word ''tests'' exists in the indexed field. Doing a search for ''tests'' produces a result. I would have thought that ''test'' would match ''tests'' but no such

Custom analyzer not invoked?

2006 Sep 15

Custom analyzer not invoked?

...-------------------- require ''ferret'' include Ferret class MyAnalyzer < Analysis::Analyzer def token_stream(field, str) # Display results of analysis puts ''Analyzing: field:%s str:%s'' % [field, str] t = Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) while true n = t.next() break if n == nil puts n.to_s end return Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) end end puts ''== Adding document to index...'' index = Index::Index.new(:analyzer => MyAnalyzer.new(...

Custom Analyser .. where to put it ??

2007 Sep 07

Custom Analyser .. where to put it ??

...put my custom Analyser class like : class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = FULL_GERMAN_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words), ''de'') end end Any clue ? Thanks a lot Guillaume. -- Posted via http://www.ruby-forum.com/.

case-sensitivity of analyzer

2007 Mar 06

case-sensitivity of analyzer

Is there anything about this analyzer that says "case-sensitive" to you? module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Just wondering how I can force my index to be case-insensitive. Thanks, -Adam -- Posted via http://www.ruby-forum.com/.

Need help creating my own Filter in Ruby

2007 Mar 01

Need help creating my own Filter in Ruby

...list to reach more people. I''m using these filters together in my analyzer (with acts_as_ferret + Ferret 0.11.1). HyphenFilter.new( StopFilter.new( LowerCaseFilter.new( MappingFilter.new( StandardTokenizer.new(str), mapping)), FULL_FRENCH_STOP_WORDS + FULL_ENGLISH_STOP_WORDS) ) The mapping filter maps pretty much all the french accents to the letter without the accent. So far so good. Only thing missing for what I want to do: I need to be able to make the w...

stop words in query

2007 Jan 11

stop words in query

...;m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop word I get the normal results back. Do I need to do a search of my query for stop words and remove them myself? Or is there something I''m doing wrong wit...

A few questions: Tweaking StemFilter, indexes, ...

2007 Jan 21

A few questions: Tweaking StemFilter, indexes, ...

...to figure out after messing around with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing...

Problem with stemming and AAF

2007 Nov 09

Problem with stemming and AAF

...ms'' require ''ferret'' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name => { :boost => 1, :store => :yes }, :product_number => { :boost => 2 }, :de...

Accented characters

2007 May 23

Accented characters

...?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end end And inserted this code at the end of environment.rb. Im my model: acts_as_ferret({ :fields => [ ''name'' ] }, :analyzer => PortugueseAnalyzer.new) But this did not work.... Can someone tell me what I did wrong ???? Thanks Marcello...

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

2007 Nov 13

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

...gt; {:store => :yes}} } , {:analyzer => PlainAsciiAnalyzer.new} ) end ANALYZER lib : plain_ascii_analyzer.rb class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer include ::Ferret::Analysis def token_stream(field, str) StopFilter.new( StandardTokenizer.new(str) , ["fax", "gsm"] ) # raise <<<----- is never executed when uncommented !! end end In the console, I rebuild the index + search for a stop word => I get a results, when I should not : >> reload!; AccessPointKind2.r...

Ignore apostrophes in words

2007 Jun 25

Ignore apostrophes in words

Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right

Using custom stem analyzer giving mongrel errors

2006 Dec 08

Using custom stem analyzer giving mongrel errors

...'rubygems'' require ''ferret'' include Ferret module Ferret::Analysis class FerretAnalyzer def initialize(stop_words = FULL_ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, text) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words)) end end end and I''m simply setting the :analyzer option in AAF. However, I get odd behavior. The first search that I do will go through and display the proper results, but any subsequent request starts to produce odd behavior. For example when you are redi...

Weird analyzer issue with the word ''fly''

2009 Apr 09

Weird analyzer issue with the word ''fly''

...nalyzer.new, :fields => {:name => { :boost => 2.0 }, ... }}) And this analyzer is defined in a module thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Now, here''s a search without using the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size => 14 And with the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000,...

Stem Analyzer

2006 Dec 06

Stem Analyzer

Hi all, I am trying to implement a search that will use the Stem Analyzer. I added the Stem Anaylzer from the examples shown in another post http://ruby-forum.com/topic/80178#147014 module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end The problem with the Stem analyzer is that when I search for a term such as ''engineering'', it only matches whole words that fit the stem so the only results I get back are documents where ''engin'' is a whole word (i.e. I don''...

Ferret DRB, UTF-8, Mongrel

2007 Sep 20

Ferret DRB, UTF-8, Mongrel

...?'',''?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } def token_stream(field, str) MappingFilter.new(StandardTokenizer.new(str), CHARACTER_MAPPINGS) end end I think Ferret is working fine... because when I run some tests, the mapping filter correctly pulls out the accented characters... exactly as it should. However, when something is persisted via the model (acts_as_ferret and DRB server), I get unexpected be...

Tokenizers?

2007 Jan 17

Tokenizers?

Hi everyone. First a quick word - I am relatively new to Ruby and Ruby on Rails, but I love learning about it and using it. Currently I am working on extending Boxroom (file repository RoR app) for the CARE Indonsia intranet, where I work as an intern. I am using ferret, and it''s working great. I noticed that if a file contains something like this "applications/entries", this

AAF - Stem Analyzer

2006 Dec 06

AAF - Stem Analyzer

I''m not on AAF. Can someone else help Raymond with an example? On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote: > > Matt Schnitz wrote: > > You also need to stem-analyze the incoming query. > > > > I had this same problem. :^> > > > > > > Schnitz > > Do you have an example of how to do this? I''m using

How to deal with accentuated chars in 0.10.8?

2006 Oct 19

How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,

Ferret 0.10.13 released

2006 Oct 20

Ferret 0.10.13 released

...'', ''?''] => ''e'', [''?'', ''?'', ''?''] => ''u'', [''?''] => ''c'' } def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end end Happy Ferreting and check the Ferret homepage[1] if you are able to contribute. Cheers, Dave [1] http://ferret.davebalmain.com/trac/

search for: standardtokenizer