Displaying 20 results from an estimated 1000 matches similar to: "How to make custom TokenFilter?"
2006 Oct 19
2
How to deal with accentuated chars in 0.10.8?
I''m startin to use Ferret and acts_as_ferret.
I need to use something like EuropeanAnalyzer
(http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).
By example, if the user search by "gonzalez" you can find documents taht
contents the term "gonz?lez" (gonzález)
The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,
2007 Apr 13
5
[Ferret] Serious memory leak on Joyent / TextDrive / Solaris
There is serious memory leak bug in ferret. I''m having this error on
TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret
0.11.4
It happens while searching for some terms with accented or special
characters.
This makes ferret to allocate lots of memory (usually reaching 3+ GB)
and failing if another query like this is executed.
Any ideas on that, could this be locale
2006 Apr 19
2
How to do case-sensitive searches
Forgive me if this topic has already been discussed on the list. I
googled but couldn''t find much. I''d like to search through text for
US state abbreviations that are written in capitals. What is the best
way to do this? I read somewhere that tokenized fields are stored in
the index in lowercase, so I am concerned that I will lose precision.
What is the best way to store a
2006 Jun 01
8
Windows progress
Hi there,
What''s the current status of the Windows port? I may be in a position
to lend a hand over the next couple of weeks - where should I start
looking? And what''s the best way to get SVN HEAD? This happens:
$ svn checkout svn://www.davebalmain.com/ferret/trunk ferret
svn: Can''t connect to host ''www.davebalmain.com'': Connection refused
--
2007 May 18
1
roll my own TokenFilter subclass
Hi all,
I''d like to write my own TokenStream Filter (in lucene this would be a
subclass of a TokenFilter, which ferret seems to lack) but I''m not
sure how to go about it. Specifically, it''s not clear how I''d create
a non-trivial TokenStream to pass out to any filters that wrapped
mine.
Can anyone point me towards a code example? Thanks.
--
Richard Jones
2007 Mar 28
6
trouble with PerFieldAnalyzer
I''m having trouble with PerFieldAnalyzer (ferret version 0.10.14).
Script:
require ''rubygems''
require ''ferret''
require ''pp''
include Ferret::Analysis
include Ferret::Index
class TestAnalyzer
def token_stream field, input
pp field
pp input
LetterTokenizer.new(input)
end
end
pfa =
2007 Apr 06
3
Count frequency of term in a specific document?
Is there any way to count the frequency of specific term in one
document?
I can''t find any method... Do you?
--
Posted via http://www.ruby-forum.com/.
2007 Jan 17
1
Tokenizers?
Hi everyone. First a quick word - I am relatively new to Ruby and Ruby
on Rails, but I love learning about it and using it. Currently I am
working on extending Boxroom (file repository RoR app) for the CARE
Indonsia intranet, where I work as an intern. I am using ferret, and
it''s working great.
I noticed that if a file contains something like this
"applications/entries", this
2006 Sep 23
8
svn problems
I can consistently segfault the 0.10.4 gem, so I''m trying to get the
subversion version working with hopes towards tracking the problem down.
I have a fresh SVN checkout but:
a) the version (in ferret.rb) claims to be 0.9.6; and
b) Ferret::Index::FieldInfos and a couple other classes are missing at
run time. It looks like this is because they''re not exported in the C
2007 Apr 08
10
Ferret and non latin characters support
I''ve successfully installed ferret and acts_as_ferret and have no
problem with utf-8 for accented characters. It returns correct results
fot e.g. fran?ais. My problem is with non latin characters (Persian
indeed). I have tested different locales with no success both on Debian
and Mac. Any idea?
(ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6)
--
Posted via http://www.ruby-forum.com/.
2007 Mar 23
5
Any chance to get 0.11.3 on windows soon ?
Hi,
I''m working on a Ferret-based application which indexes content in all
European languages. Thus, I have to deal with those funny European
characters.
After googling a bit, I decided to move on with a custom European
analyzer based on MappingFilter, as suggested in the Ferret rdoc.
Everything works fine with Ferret 0.11.3 on Mac OS X.
But this application needs to run on both
2006 Oct 23
2
Trouble with custom Analyzer
Hi!
I wanted to build my own custom Analyzer like so:
class Analyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, string)
StopFilter.new(LetterTokenizer.new(string, true), @stop_words)
end
end
As one can easily spot, I essentially want
2007 Jan 11
5
stop words in query
Hello all,
Quick question, I''m using AAF and the following custom analyzer:
class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end
However when
2006 Jun 04
5
Manipulating form inputs?
I have created a scaffold Admin/Radicals for doing CRUD. However, I''m
not sure exactly where the scaffold uses the save() method. For a new
entry, it creates form "radical" referencing method create(). The code
for create() is as follows:
def create
@radical = Radical.new(params[:radical])
if @radical.save
flash[:notice] = ''Radical was
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work
quite nicely. However, my fields are very short (titles of music) and I
don''t think may users will be typing in apostrophes when they are
looking for something. Right now, for a simple document such as "what
i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Apr 25
3
Migrating to 0.9.1
After migrating to 0.9.1, I got:
usr/local/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in
`const_missing'': uninitialized constant TokenFilter (NameError)
Here is a snapshot of my code:
...
require ''ferret''
class MyFilter < Analysis::TokenFilter
...
I works fine on my dev machine, but not a production server (shared
host).
Any
2007 May 03
2
Custom analyzer weirdness with 0.11.3
Hi-
I was previously using 0.11.4, and I wrote my own analyzer. Everything
worked fine.
When I took the system to production, 0.11.4 starting failing updating
the index, complaining that files were missing. The failure always
happened on the same model document, and was completely reproducible.
This failure looked a lot like the one described at
http://www.ruby-forum.com/topic/104145.
I
2005 Dec 02
43
ANN: acts_as_ferret
Hi all
This week I have worked with Rails and Ferret to test Ferrets (and Lucenes)
capabilities. I decided to make a mixin for ActiveRecord as it seemed the
simplest possible solution and I ended up making this into a plugin.
For more info on Ferret see:
http://ferret.davebalmain.com/trac/
The plugin is functional but could easily be refined. Anyway I want to share it
with you. Regard it as a
2005 Dec 02
43
ANN: acts_as_ferret
Hi all
This week I have worked with Rails and Ferret to test Ferrets (and Lucenes)
capabilities. I decided to make a mixin for ActiveRecord as it seemed the
simplest possible solution and I ended up making this into a plugin.
For more info on Ferret see:
http://ferret.davebalmain.com/trac/
The plugin is functional but could easily be refined. Anyway I want to share it
with you. Regard it as a
2007 Aug 03
0
StandardTokenizer Doesn''t Support token_stream method
According to the Analyzer doc and the StandardTokenizer doc:
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html
I ought to be able to construct a StandardTokenizer like this:
t = StandardTokenizer.new( true) # true to downcase tokens
and then later:
stream = token_stream(