Displaying 20 results from an estimated 25 matches for "stop_word".
Did you mean:
stop_words
2007 Sep 07
5
Custom Analyser .. where to put it ??
...y french stop words... i m
reading the tutorial at :
http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage
My problem is that i ve no idea where to put my custom Analyser class
like :
class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = FULL_GERMAN_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words), ''de'')
end
end
Any clue ?
Thanks a lot
Guillaume.
--
Posted via http://www.ruby-forum....
2006 Oct 23
2
Trouble with custom Analyzer
Hi!
I wanted to build my own custom Analyzer like so:
class Analyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, string)
StopFilter.new(LetterTokenizer.new(string, true), @stop_words)
end
end
As one can easily spot, I essentially want a LetterAnalyzer with stop
word filtering. However, using that a...
2007 Aug 20
2
can''t stop stop_words
I have looked at the documentation and done some searching, but I can''t
seem to stop the STOP_WORDS from cutting out common words. I am using
acts_as_ferret and I have add the following to my code:
STOP_WORDS = []
acts_as_ferret({ :fields => { :name => { :boost
=> 10 },
:project_client_company_id => { :boost
=> 0 }...
2007 Jan 11
5
stop words in query
Hello all,
Quick question, I''m using AAF and the following custom analyzer:
class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end
However when my search term includes a stop word I never get any results
back. Once I remove the stop wo...
2007 Nov 09
2
Problem with stemming and AAF
...plement stemming, which seemed straightforward
enough. I created the stemmed_analyzer.rb file in the lib directory,
as follows:
require ''rubygems''
require ''ferret''
class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end
end
And added the call to the analyzer in my model file:
acts_as_ferret( :fields => { :name...
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work
quite nicely. However, my fields are very short (titles of music) and I
don''t think may users will be typing in apostrophes when they are
looking for something. Right now, for a simple document such as "what
i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Dec 08
4
Using custom stem analyzer giving mongrel errors
I''m using the custom stem analyzer:
require ''rubygems''
require ''ferret''
include Ferret
module Ferret::Analysis
class FerretAnalyzer
def initialize(stop_words = FULL_ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, text)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)),
@stop_words))
end
end
end
and I''m simply setting the :analyzer option in AAF.
However, I get odd beha...
2006 Apr 13
3
QueryParser doesn''t use StandardAnalyzer correctly?
I am having a bit of a problem with my search queries being parsed
correctly it seems, and I wonder if anyone else has experienced this.
I have written an index using StandardAnalyzer for analysis. I want to
search that index by passing my user query through a QueryParser
instance which is also using a StandardAnalyzer. However the resultant
query does not seem to be a valid term query and
2006 Dec 06
1
AAF - Stem Analyzer
I''m not on AAF. Can someone else help Raymond with an example?
On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote:
>
> Matt Schnitz wrote:
> > You also need to stem-analyze the incoming query.
> >
> > I had this same problem. :^>
> >
> >
> > Schnitz
>
> Do you have an example of how to do this? I''m using
2007 Nov 13
8
acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Hi all,
I cannot make aaf (rev. 220) use my custom analyzer, despite following the
indications @
http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage
To pinpoint the problem, I created a model + a simple analyzer with 2 stop
words : "fax" and "gsm".
test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a
stop word.
=> I get a
2005 Nov 17
1
indexing source code
Hi again,
I''m using ferret to index source code - DamageControl will allow users
to search for text in source code.
Currently I''m using the default index with no custom analyzer (I''m
using the StandardAnalyzer). Do you have any recommendations about how
to write an analyzer that will index source code in a more ''optimal''
way? I.e. disregard common
2007 Jul 07
2
Extending/Modifying QueryParser
...t => {
:or_default => false,
:analyzer => SynonymAnalyzer.new(WordnetSynonymEngine.new, [])
}
)
I created a SynonymAnalyzer and SynonymTokenFilter:
class SynonymAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(synonym_engine, stop_words =
FULL_ENGLISH_STOP_WORDS, lower = true)
@synonym_engine = synonym_engine
@lower = lower
@stop_words = stop_words
end
def token_stream(field, str)
ts = StandardTokenizer.new(str)
ts = LowerCaseFilter.new(ts) if @lower
ts = StopFilter.new(ts, @stop_words)...
2007 Sep 27
5
QueryParser.parse question
Hi there,
I am stomped as to why QueryParser''s parse method behaves differently
between query ''a'' and ''b''.
See http://pastie.caboo.se/private/4rlwrecyyow3yl6qtf4tq
Could someone please help me understand why that is the case.
p.s. I also found ''i'' produce the same behavour as ''a''
Cheers,
Andy
2020 Apr 28
3
Stopwords: Topic modelling con LDA
...ncluiríais estas palabras que me aparecen en todos los
topics o casi todos como stopwords? ¿Hay alguna forma de refinar más el
análisis y que haya más diferencias entre topics?
Este es el código que estoy usando:
Reviews_dtm <-text_df12star %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)%>%
count(Brand, word) %>%
cast_dtm(Brand, word, n)
Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016))
Un saludo
Miriam
2006 Nov 25
5
Metaphone analysis
...er so that
words like "eat" and "eating" both equal to "eat".
require ''ferret''
# TODO write tests
module Curtis
module Analysis
class MetaphoneAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(version = :double, stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
@version = version
end
def token_stream(field, str)
MetaphoneFilter.new(StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words)), @version)
end
end
end
end
I saved both of these files, '...
2020 Nov 04
1
Eliminar números de texto
...Estoy analizando texto en R y no encuentro cómo eliminar los números y
símbolos del texto como ",", "%", etc.
Estoy pasando este código, text_data es donde está el texto en la variable
"text".
tidy_data <- text_data%>%
unnest_tokens(word, text)%>%
anti_join(stop_words)
¿Cómo podría añadirse a ese código?
Muchas gracias
2008 May 12
1
Using StemFilter with PhraseQuery
...ting I could parse
the phrase and build up a query to be used by QueryParser but I''d like a
more succinct solution for now.
I use a StemFilter in my analyzer as follows:
def token_stream(field, str)
...
ts = LowerCaseFilter.new(ts) if @lower
ts = StopFilter.new(ts, @stop_words)
ts = StemFilter.new(ts)
...
end
My use of PhraseQuery is as follows:
def generate_query(phrase)
phrase = phrase.downcase
phrase_parts = phrase.split('' '')
query = Ferret::Search::PhraseQuery.new(:content, 2)
phrase_parts.each do |part|
#...
2006 Oct 24
2
Problem with stop words
...re ''ferret''
index = Ferret::I.new(:or_default => false)
index << ''you''
puts index.search(''you'')
returns no hits.
I assumed from the docs that StandardAnalyzer was using stop words
as defined by:
Ferret::Analysis::ENGLISH_STOP_WORDS
but when I print that to the console I get:
["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if",
"in",
"into", "is", "it",...
2020 Apr 29
2
[Posible SPAM] Re: Stopwords: Topic modelling con LDA
...o casi todos como stopwords? Hay alguna forma de refinar más el
>> análisis y que haya más diferencias entre topics?
>>
>> Este es el código que estoy usando:
>>
>> Reviews_dtm <-text_df12star %>%
>> unnest_tokens(word, text) %>%
>> anti_join(stop_words)%>%
>> count(Brand, word) %>%
>> cast_dtm(Brand, word, n)
>>
>>
>> Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016))
>>
>> Un saludo
>>
>> Miriam
>>
>> ______________________________________________...
2007 Jan 21
2
A few questions: Tweaking StemFilter, indexes, ...
...ound with ferret and going through the documentation.
StemFilter ------
I am trying to improve the quality of my searches in context of the
content of my application. I have created an analyzer using the
following:
StemFilter.new StopFilter.new(
LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words )
This has been pretty good so far, however, I really would like to get
a search for "plumber" match "plumbing" at maybe a lower score than it
would match "plumbers". The thing is that plumber(s) is filtered to
"plumber" and plumbing is filtered to plumb, s...