thr3ads.net - search: "token

Displaying 20 results from an estimated 45 matches for "token_stream".

2007 Mar 28

trouble with PerFieldAnalyzer

I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14). Script: require ''rubygems'' require ''ferret'' require ''pp'' include Ferret::Analysis include Ferret::Index class TestAnalyzer def token_stream field, input pp field pp input LetterTokenizer.new(input) end end pfa = PerFieldAnalyzer.new(StandardAnalyzer.new()) pfa[:test] = TestAnalyzer.new index = Index.new(:analyzer => pfa) index << {:test => ''foo''} index.search_each(''bar'')...

StandardTokenizer Doesn''t Support token_stream method

2007 Aug 03

StandardTokenizer Doesn''t Support token_stream method

...ret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then later: stream = token_stream( ignored_field_name, some_string) To create a new TokenStream from some_string. This approach would be valuable for my application since I am analyzing many short strings -- so I''m thinking that building my 5-deep analyzer chain for each small string will be a nice savings. Unfortunately...

Creating my own analyzer

2006 Apr 20

Creating my own analyzer

I created this analyzer: class DescriptionAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, string) if field == "code" return CodeTokenStream.new(string) else return Ferret::Analysis::Analyzer.new.token_stream(field,string) end end end and created an IndexWriter with it: Ferret::Index::IndexWriter.new(get_index_path,...

stop words in query

2007 Jan 11

stop words in query

Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop word I get the normal results back. Do I need to do a search of my query for s...

How to make custom TokenFilter?

2007 Apr 08

How to make custom TokenFilter?

In the O''reilly Ferret short cuts, I found very useful example for me. It explains how to make custom Tokenizer. But that book doesn''t explain how to make custom Filter. (especially, how to implement the #text=() method) I''m a newbee and I don''t understand how do I create my own custom Filter. Are there some good source code examples?? -- Posted via

Extending/Modifying QueryParser

2007 Jul 07

Extending/Modifying QueryParser

...and SynonymTokenFilter: class SynonymAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(synonym_engine, stop_words = FULL_ENGLISH_STOP_WORDS, lower = true) @synonym_engine = synonym_engine @lower = lower @stop_words = stop_words end def token_stream(field, str) ts = StandardTokenizer.new(str) ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = SynonymTokenFilter.new(ts, @synonym_engine) end end class SynonymTokenFilter < Ferret::Analysis::TokenStream include Ferret::Analysis def in...

ferret finds ''tests'' but not ''test''

2006 Sep 05

ferret finds ''tests'' but not ''test''

Hello all, Quick question (possibly!) - I''ve got a few records indexed and doing a search for ''test'' reports in no hits even though I know the word ''tests'' exists in the indexed field. Doing a search for ''tests'' produces a result. I would have thought that ''test'' would match ''tests'' but no such

Weird analyzer issue with the word ''fly''

2009 Apr 09

Weird analyzer issue with the word ''fly''

...:analyzer => Ferret::Analysis::StemmingAnalyzer.new, :fields => {:name => { :boost => 2.0 }, ... }}) And this analyzer is defined in a module thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Now, here''s a search without using the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size => 14 And with the analyzer: >> TeachingObject.find_with...

Custom Analyser .. where to put it ??

2007 Sep 07

Custom Analyser .. where to put it ??

...net/acts_as_ferret/wiki/AdvancedUsage My problem is that i ve no idea where to put my custom Analyser class like : class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = FULL_GERMAN_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words), ''de'') end end Any clue ? Thanks a lot Guillaume. -- Posted via http://www.ruby-forum.com/.

Trouble with custom Analyzer

2006 Oct 23

Trouble with custom Analyzer

Hi! I wanted to build my own custom Analyzer like so: class Analyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, string) StopFilter.new(LetterTokenizer.new(string, true), @stop_words) end end As one can easily spot, I essentially want a LetterAnalyzer with stop word filtering. However, using that analyzer (for indexing) results in a segmentation fault. /opt/local/lib/ruby/gems/...

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

2007 Nov 13

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

...ot;. test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a result when I should not. (note : I delete the index directory => I can see the index is recreated, index/develop ). test 2 : insert a ''raise'' in the token_stream() method => it''s never thrown. test 3 : use the standard analyzer, to exclude the 2 stop words => same wrong result. class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fi...

svn problems

2006 Sep 23

svn problems

I can consistently segfault the 0.10.4 gem, so I''m trying to get the subversion version working with hopes towards tracking the problem down. I have a fresh SVN checkout but: a) the version (in ferret.rb) claims to be 0.9.6; and b) Ferret::Index::FieldInfos and a couple other classes are missing at run time. It looks like this is because they''re not exported in the C

Problem with stemming and AAF

2007 Nov 09

Problem with stemming and AAF

...ed_analyzer.rb file in the lib directory, as follows: require ''rubygems'' require ''ferret'' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name => { :boost => 1, :store => :yes },...

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

2007 Apr 13

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

There is serious memory leak bug in ferret. I''m having this error on TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret 0.11.4 It happens while searching for some terms with accented or special characters. This makes ferret to allocate lots of memory (usually reaching 3+ GB) and failing if another query like this is executed. Any ideas on that, could this be locale

Metaphone analysis

2006 Nov 25

Metaphone analysis

...m. It''s a fairly simple class, but does require the ''Text'' gem be installed. require ''ferret'' require ''text'' module Curtis module Analysis # TODO write tests! class MetaphoneFilter < Ferret::Analysis::TokenStream def initialize(token_stream, version = :double) @input = token_stream @version = version end def next t = @input.next return nil if t.nil? t.text = @version.eql?(:double) ? Text::Metaphone.double_metaphone(t.text) : Text::Metaphone.metaphone(t.text) end end end end Second I created a...

case-sensitivity of analyzer

2007 Mar 06

case-sensitivity of analyzer

Is there anything about this analyzer that says "case-sensitive" to you? module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Just wondering how I can force my index to be case-insensitive. Thanks, -Adam -- Posted via http://www.ruby-forum.com/.

Stemming, stop words, acts_as_ferret

2006 Nov 13

Stemming, stop words, acts_as_ferret

...image" needs to hit "thermal imaging." 2. Stop words. Searches for "failing to instruct the jury" should come up with hits on a search for "fail to instruct." 3. Case-insensitive. What I tried was: class StemmedAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, reader) return Ferret::Analysis::PorterStemFilter.new(Ferret::Analysis::LowerCaseTokenizer. new(reader)) end end class Summary < ActiveRecord::Base acts_as_ferret(:analyzer => StemmedAnalyzer.new) But this doesn''t appear to give me either stemming or stopwords. It d...

How to deal with accentuated chars in 0.10.8?

2006 Oct 19

How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,

Using StemFilter with PhraseQuery

2008 May 12

Using StemFilter with PhraseQuery

...39;m doing wrong or is the above description what I should expect? To get the response that I''m expecting I could parse the phrase and build up a query to be used by QueryParser but I''d like a more succinct solution for now. I use a StemFilter in my analyzer as follows: def token_stream(field, str) ... ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = StemFilter.new(ts) ... end My use of PhraseQuery is as follows: def generate_query(phrase) phrase = phrase.downcase phrase_parts = phrase.split('' &...

Custom analyzer not invoked?

2006 Sep 15

Custom analyzer not invoked?

Hello, I''m trying to define my own analyzer by doing something like: #----------------------------------------------------- require ''ferret'' include Ferret class MyAnalyzer < Analysis::Analyzer def token_stream(field, str) # Display results of analysis puts ''Analyzing: field:%s str:%s'' % [field, str] t = Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) while true n = t.next() break if n == nil puts n.to_s end return Analys...

search for: token_stream