thr3ads.net - similar to: "How to make custom TokenFilter?"

Displaying 20 results from an estimated 1000 matches similar to: "How to make custom TokenFilter?"

How to deal with accentuated chars in 0.10.8?

2006 Oct 19

How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

2007 Apr 13

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

There is serious memory leak bug in ferret. I''m having this error on TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret 0.11.4 It happens while searching for some terms with accented or special characters. This makes ferret to allocate lots of memory (usually reaching 3+ GB) and failing if another query like this is executed. Any ideas on that, could this be locale

How to do case-sensitive searches

2006 Apr 19

How to do case-sensitive searches

Forgive me if this topic has already been discussed on the list. I googled but couldn''t find much. I''d like to search through text for US state abbreviations that are written in capitals. What is the best way to do this? I read somewhere that tokenized fields are stored in the index in lowercase, so I am concerned that I will lose precision. What is the best way to store a

Windows progress

2006 Jun 01

Windows progress

Hi there, What''s the current status of the Windows port? I may be in a position to lend a hand over the next couple of weeks - where should I start looking? And what''s the best way to get SVN HEAD? This happens: $ svn checkout svn://www.davebalmain.com/ferret/trunk ferret svn: Can''t connect to host ''www.davebalmain.com'': Connection refused --

roll my own TokenFilter subclass

2007 May 18

roll my own TokenFilter subclass

Hi all, I''d like to write my own TokenStream Filter (in lucene this would be a subclass of a TokenFilter, which ferret seems to lack) but I''m not sure how to go about it. Specifically, it''s not clear how I''d create a non-trivial TokenStream to pass out to any filters that wrapped mine. Can anyone point me towards a code example? Thanks. -- Richard Jones

trouble with PerFieldAnalyzer

2007 Mar 28

trouble with PerFieldAnalyzer

I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14). Script: require ''rubygems'' require ''ferret'' require ''pp'' include Ferret::Analysis include Ferret::Index class TestAnalyzer def token_stream field, input pp field pp input LetterTokenizer.new(input) end end pfa =

Count frequency of term in a specific document?

2007 Apr 06

Count frequency of term in a specific document?

Is there any way to count the frequency of specific term in one document? I can''t find any method... Do you? -- Posted via http://www.ruby-forum.com/.

Tokenizers?

2007 Jan 17

Tokenizers?

Hi everyone. First a quick word - I am relatively new to Ruby and Ruby on Rails, but I love learning about it and using it. Currently I am working on extending Boxroom (file repository RoR app) for the CARE Indonsia intranet, where I work as an intern. I am using ferret, and it''s working great. I noticed that if a file contains something like this "applications/entries", this

svn problems

2006 Sep 23

svn problems

I can consistently segfault the 0.10.4 gem, so I''m trying to get the subversion version working with hopes towards tracking the problem down. I have a fresh SVN checkout but: a) the version (in ferret.rb) claims to be 0.9.6; and b) Ferret::Index::FieldInfos and a couple other classes are missing at run time. It looks like this is because they''re not exported in the C

Ferret and non latin characters support

2007 Apr 08

Ferret and non latin characters support

I''ve successfully installed ferret and acts_as_ferret and have no problem with utf-8 for accented characters. It returns correct results fot e.g. fran?ais. My problem is with non latin characters (Persian indeed). I have tested different locales with no success both on Debian and Mac. Any idea? (ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6) -- Posted via http://www.ruby-forum.com/.

Any chance to get 0.11.3 on windows soon ?

2007 Mar 23

Any chance to get 0.11.3 on windows soon ?

Hi, I''m working on a Ferret-based application which indexes content in all European languages. Thus, I have to deal with those funny European characters. After googling a bit, I decided to move on with a custom European analyzer based on MappingFilter, as suggested in the Ferret rdoc. Everything works fine with Ferret 0.11.3 on Mac OS X. But this application needs to run on both

Trouble with custom Analyzer

2006 Oct 23

Trouble with custom Analyzer

Hi! I wanted to build my own custom Analyzer like so: class Analyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, string) StopFilter.new(LetterTokenizer.new(string, true), @stop_words) end end As one can easily spot, I essentially want

stop words in query

2007 Jan 11

stop words in query

Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when

Manipulating form inputs?

2006 Jun 04

Manipulating form inputs?

I have created a scaffold Admin/Radicals for doing CRUD. However, I''m not sure exactly where the scaffold uses the save() method. For a new entry, it creates form "radical" referencing method create(). The code for create() is as follows: def create @radical = Radical.new(params[:radical]) if @radical.save flash[:notice] = ''Radical was

Ignore apostrophes in words

2007 Jun 25

Ignore apostrophes in words

Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right

Migrating to 0.9.1

2006 Apr 25

Migrating to 0.9.1

After migrating to 0.9.1, I got: usr/local/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing'': uninitialized constant TokenFilter (NameError) Here is a snapshot of my code: ... require ''ferret'' class MyFilter < Analysis::TokenFilter ... I works fine on my dev machine, but not a production server (shared host). Any

Custom analyzer weirdness with 0.11.3

2007 May 03

Custom analyzer weirdness with 0.11.3

Hi- I was previously using 0.11.4, and I wrote my own analyzer. Everything worked fine. When I took the system to production, 0.11.4 starting failing updating the index, complaining that files were missing. The failure always happened on the same model document, and was completely reproducible. This failure looked a lot like the one described at http://www.ruby-forum.com/topic/104145. I

ANN: acts_as_ferret

2005 Dec 02

ANN: acts_as_ferret

Hi all This week I have worked with Rails and Ferret to test Ferrets (and Lucenes) capabilities. I decided to make a mixin for ActiveRecord as it seemed the simplest possible solution and I ended up making this into a plugin. For more info on Ferret see: http://ferret.davebalmain.com/trac/ The plugin is functional but could easily be refined. Anyway I want to share it with you. Regard it as a

ANN: acts_as_ferret

2005 Dec 02

ANN: acts_as_ferret

StandardTokenizer Doesn''t Support token_stream method

2007 Aug 03

StandardTokenizer Doesn''t Support token_stream method

According to the Analyzer doc and the StandardTokenizer doc: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then later: stream = token_stream(

similar to: How to make custom TokenFilter?