similar to: How to make custom TokenFilter?

Displaying 20 results from an estimated 1000 matches similar to: "How to make custom TokenFilter?"

2006 Oct 19
2
How to deal with accentuated chars in 0.10.8?
I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,
2007 Apr 13
5
[Ferret] Serious memory leak on Joyent / TextDrive / Solaris
There is serious memory leak bug in ferret. I''m having this error on TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret 0.11.4 It happens while searching for some terms with accented or special characters. This makes ferret to allocate lots of memory (usually reaching 3+ GB) and failing if another query like this is executed. Any ideas on that, could this be locale
2006 Apr 19
2
How to do case-sensitive searches
Forgive me if this topic has already been discussed on the list. I googled but couldn''t find much. I''d like to search through text for US state abbreviations that are written in capitals. What is the best way to do this? I read somewhere that tokenized fields are stored in the index in lowercase, so I am concerned that I will lose precision. What is the best way to store a
2006 Jun 01
8
Windows progress
Hi there, What''s the current status of the Windows port? I may be in a position to lend a hand over the next couple of weeks - where should I start looking? And what''s the best way to get SVN HEAD? This happens: $ svn checkout svn://www.davebalmain.com/ferret/trunk ferret svn: Can''t connect to host ''www.davebalmain.com'': Connection refused --
2007 May 18
1
roll my own TokenFilter subclass
Hi all, I''d like to write my own TokenStream Filter (in lucene this would be a subclass of a TokenFilter, which ferret seems to lack) but I''m not sure how to go about it. Specifically, it''s not clear how I''d create a non-trivial TokenStream to pass out to any filters that wrapped mine. Can anyone point me towards a code example? Thanks. -- Richard Jones
2007 Mar 28
6
trouble with PerFieldAnalyzer
I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14). Script: require ''rubygems'' require ''ferret'' require ''pp'' include Ferret::Analysis include Ferret::Index class TestAnalyzer def token_stream field, input pp field pp input LetterTokenizer.new(input) end end pfa =
2007 Apr 06
3
Count frequency of term in a specific document?
Is there any way to count the frequency of specific term in one document? I can''t find any method... Do you? -- Posted via http://www.ruby-forum.com/.
2007 Jan 17
1
Tokenizers?
Hi everyone. First a quick word - I am relatively new to Ruby and Ruby on Rails, but I love learning about it and using it. Currently I am working on extending Boxroom (file repository RoR app) for the CARE Indonsia intranet, where I work as an intern. I am using ferret, and it''s working great. I noticed that if a file contains something like this "applications/entries", this
2006 Sep 23
8
svn problems
I can consistently segfault the 0.10.4 gem, so I''m trying to get the subversion version working with hopes towards tracking the problem down. I have a fresh SVN checkout but: a) the version (in ferret.rb) claims to be 0.9.6; and b) Ferret::Index::FieldInfos and a couple other classes are missing at run time. It looks like this is because they''re not exported in the C
2007 Apr 08
10
Ferret and non latin characters support
I''ve successfully installed ferret and acts_as_ferret and have no problem with utf-8 for accented characters. It returns correct results fot e.g. fran?ais. My problem is with non latin characters (Persian indeed). I have tested different locales with no success both on Debian and Mac. Any idea? (ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6) -- Posted via http://www.ruby-forum.com/.
2007 Mar 23
5
Any chance to get 0.11.3 on windows soon ?
Hi, I''m working on a Ferret-based application which indexes content in all European languages. Thus, I have to deal with those funny European characters. After googling a bit, I decided to move on with a custom European analyzer based on MappingFilter, as suggested in the Ferret rdoc. Everything works fine with Ferret 0.11.3 on Mac OS X. But this application needs to run on both
2006 Oct 23
2
Trouble with custom Analyzer
Hi! I wanted to build my own custom Analyzer like so: class Analyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, string) StopFilter.new(LetterTokenizer.new(string, true), @stop_words) end end As one can easily spot, I essentially want
2007 Jan 11
5
stop words in query
Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when
2006 Jun 04
5
Manipulating form inputs?
I have created a scaffold Admin/Radicals for doing CRUD. However, I''m not sure exactly where the scaffold uses the save() method. For a new entry, it creates form "radical" referencing method create(). The code for create() is as follows: def create @radical = Radical.new(params[:radical]) if @radical.save flash[:notice] = ''Radical was
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Apr 25
3
Migrating to 0.9.1
After migrating to 0.9.1, I got: usr/local/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing'': uninitialized constant TokenFilter (NameError) Here is a snapshot of my code: ... require ''ferret'' class MyFilter < Analysis::TokenFilter ... I works fine on my dev machine, but not a production server (shared host). Any
2007 May 03
2
Custom analyzer weirdness with 0.11.3
Hi- I was previously using 0.11.4, and I wrote my own analyzer. Everything worked fine. When I took the system to production, 0.11.4 starting failing updating the index, complaining that files were missing. The failure always happened on the same model document, and was completely reproducible. This failure looked a lot like the one described at http://www.ruby-forum.com/topic/104145. I
2005 Dec 02
43
ANN: acts_as_ferret
Hi all This week I have worked with Rails and Ferret to test Ferrets (and Lucenes) capabilities. I decided to make a mixin for ActiveRecord as it seemed the simplest possible solution and I ended up making this into a plugin. For more info on Ferret see: http://ferret.davebalmain.com/trac/ The plugin is functional but could easily be refined. Anyway I want to share it with you. Regard it as a
2005 Dec 02
43
ANN: acts_as_ferret
Hi all This week I have worked with Rails and Ferret to test Ferrets (and Lucenes) capabilities. I decided to make a mixin for ActiveRecord as it seemed the simplest possible solution and I ended up making this into a plugin. For more info on Ferret see: http://ferret.davebalmain.com/trac/ The plugin is functional but could easily be refined. Anyway I want to share it with you. Regard it as a
2007 Aug 03
0
StandardTokenizer Doesn''t Support token_stream method
According to the Analyzer doc and the StandardTokenizer doc: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then later: stream = token_stream(