search for: stop_words

Displaying 20 results from an estimated 25 matches for "stop_words".

2007 Sep 07
5
Custom Analyser .. where to put it ??
...y french stop words... i m reading the tutorial at : http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage My problem is that i ve no idea where to put my custom Analyser class like : class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = FULL_GERMAN_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words), ''de'') end end Any clue ? Thanks a lot Guillaume. -- Posted via http://www.ruby-forum.c...
2006 Oct 23
2
Trouble with custom Analyzer
Hi! I wanted to build my own custom Analyzer like so: class Analyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, string) StopFilter.new(LetterTokenizer.new(string, true), @stop_words) end end As one can easily spot, I essentially want a LetterAnalyzer with stop word filtering. However, using that an...
2007 Aug 20
2
can''t stop stop_words
I have looked at the documentation and done some searching, but I can''t seem to stop the STOP_WORDS from cutting out common words. I am using acts_as_ferret and I have add the following to my code: STOP_WORDS = [] acts_as_ferret({ :fields => { :name => { :boost => 10 }, :project_client_company_id => { :boost => 0 }...
2007 Jan 11
5
stop words in query
Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop wor...
2007 Nov 09
2
Problem with stemming and AAF
...plement stemming, which seemed straightforward enough. I created the stemmed_analyzer.rb file in the lib directory, as follows: require ''rubygems'' require ''ferret'' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name...
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Dec 08
4
Using custom stem analyzer giving mongrel errors
I''m using the custom stem analyzer: require ''rubygems'' require ''ferret'' include Ferret module Ferret::Analysis class FerretAnalyzer def initialize(stop_words = FULL_ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, text) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words)) end end end and I''m simply setting the :analyzer option in AAF. However, I get odd behav...
2006 Apr 13
3
QueryParser doesn''t use StandardAnalyzer correctly?
I am having a bit of a problem with my search queries being parsed correctly it seems, and I wonder if anyone else has experienced this. I have written an index using StandardAnalyzer for analysis. I want to search that index by passing my user query through a QueryParser instance which is also using a StandardAnalyzer. However the resultant query does not seem to be a valid term query and
2006 Dec 06
1
AAF - Stem Analyzer
I''m not on AAF. Can someone else help Raymond with an example? On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote: > > Matt Schnitz wrote: > > You also need to stem-analyze the incoming query. > > > > I had this same problem. :^> > > > > > > Schnitz > > Do you have an example of how to do this? I''m using
2007 Nov 13
8
acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Hi all, I cannot make aaf (rev. 220) use my custom analyzer, despite following the indications @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage To pinpoint the problem, I created a model + a simple analyzer with 2 stop words : "fax" and "gsm". test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a
2005 Nov 17
1
indexing source code
Hi again, I''m using ferret to index source code - DamageControl will allow users to search for text in source code. Currently I''m using the default index with no custom analyzer (I''m using the StandardAnalyzer). Do you have any recommendations about how to write an analyzer that will index source code in a more ''optimal'' way? I.e. disregard common
2007 Jul 07
2
Extending/Modifying QueryParser
...t => { :or_default => false, :analyzer => SynonymAnalyzer.new(WordnetSynonymEngine.new, []) } ) I created a SynonymAnalyzer and SynonymTokenFilter: class SynonymAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(synonym_engine, stop_words = FULL_ENGLISH_STOP_WORDS, lower = true) @synonym_engine = synonym_engine @lower = lower @stop_words = stop_words end def token_stream(field, str) ts = StandardTokenizer.new(str) ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words)...
2007 Sep 27
5
QueryParser.parse question
Hi there, I am stomped as to why QueryParser''s parse method behaves differently between query ''a'' and ''b''. See http://pastie.caboo.se/private/4rlwrecyyow3yl6qtf4tq Could someone please help me understand why that is the case. p.s. I also found ''i'' produce the same behavour as ''a'' Cheers, Andy
2020 Apr 28
3
Stopwords: Topic modelling con LDA
...ncluiríais estas palabras que me aparecen en todos los topics o casi todos como stopwords? ¿Hay alguna forma de refinar más el análisis y que haya más diferencias entre topics? Este es el código que estoy usando: Reviews_dtm <-text_df12star %>% unnest_tokens(word, text) %>% anti_join(stop_words)%>% count(Brand, word) %>% cast_dtm(Brand, word, n) Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016)) Un saludo Miriam
2006 Nov 25
5
Metaphone analysis
...er so that words like "eat" and "eating" both equal to "eat". require ''ferret'' # TODO write tests module Curtis module Analysis class MetaphoneAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(version = :double, stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words @version = version end def token_stream(field, str) MetaphoneFilter.new(StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)), @version) end end end end I saved both of these files, '...
2020 Nov 04
1
Eliminar números de texto
...Estoy analizando texto en R y no encuentro cómo eliminar los números y símbolos del texto como ",", "%", etc. Estoy pasando este código, text_data es donde está el texto en la variable "text". tidy_data <- text_data%>% unnest_tokens(word, text)%>% anti_join(stop_words) ¿Cómo podría añadirse a ese código? Muchas gracias
2008 May 12
1
Using StemFilter with PhraseQuery
...ting I could parse the phrase and build up a query to be used by QueryParser but I''d like a more succinct solution for now. I use a StemFilter in my analyzer as follows: def token_stream(field, str) ... ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = StemFilter.new(ts) ... end My use of PhraseQuery is as follows: def generate_query(phrase) phrase = phrase.downcase phrase_parts = phrase.split('' '') query = Ferret::Search::PhraseQuery.new(:content, 2) phrase_parts.each do |part| # p...
2006 Oct 24
2
Problem with stop words
...re ''ferret'' index = Ferret::I.new(:or_default => false) index << ''you'' puts index.search(''you'') returns no hits. I assumed from the docs that StandardAnalyzer was using stop words as defined by: Ferret::Analysis::ENGLISH_STOP_WORDS but when I print that to the console I get: ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", &...
2020 Apr 29
2
[Posible SPAM] Re: Stopwords: Topic modelling con LDA
...o casi todos como stopwords? Hay alguna forma de refinar más el >> análisis y que haya más diferencias entre topics? >> >> Este es el código que estoy usando: >> >> Reviews_dtm <-text_df12star %>% >> unnest_tokens(word, text) %>% >> anti_join(stop_words)%>% >> count(Brand, word) %>% >> cast_dtm(Brand, word, n) >> >> >> Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016)) >> >> Un saludo >> >> Miriam >> >> _______________________________________________...
2007 Jan 21
2
A few questions: Tweaking StemFilter, indexes, ...
...ound with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing is filtered to plumb, so...