thr3ads.net - search: "stop

Displaying 20 results from an estimated 25 matches for "stop_words".

2007 Sep 07

Custom Analyser .. where to put it ??

...y french stop words... i m reading the tutorial at : http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage My problem is that i ve no idea where to put my custom Analyser class like : class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = FULL_GERMAN_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words), ''de'') end end Any clue ? Thanks a lot Guillaume. -- Posted via http://www.ruby-forum.c...

Trouble with custom Analyzer

2006 Oct 23

Trouble with custom Analyzer

Hi! I wanted to build my own custom Analyzer like so: class Analyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, string) StopFilter.new(LetterTokenizer.new(string, true), @stop_words) end end As one can easily spot, I essentially want a LetterAnalyzer with stop word filtering. However, using that an...

can''t stop stop_words

2007 Aug 20

can''t stop stop_words

I have looked at the documentation and done some searching, but I can''t seem to stop the STOP_WORDS from cutting out common words. I am using acts_as_ferret and I have add the following to my code: STOP_WORDS = [] acts_as_ferret({ :fields => { :name => { :boost => 10 }, :project_client_company_id => { :boost => 0 }...

stop words in query

2007 Jan 11

stop words in query

Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop wor...

Problem with stemming and AAF

2007 Nov 09

Problem with stemming and AAF

...plement stemming, which seemed straightforward enough. I created the stemmed_analyzer.rb file in the lib directory, as follows: require ''rubygems'' require ''ferret'' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name...

Ignore apostrophes in words

2007 Jun 25

Ignore apostrophes in words

Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right

Using custom stem analyzer giving mongrel errors

2006 Dec 08

Using custom stem analyzer giving mongrel errors

I''m using the custom stem analyzer: require ''rubygems'' require ''ferret'' include Ferret module Ferret::Analysis class FerretAnalyzer def initialize(stop_words = FULL_ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, text) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words)) end end end and I''m simply setting the :analyzer option in AAF. However, I get odd behav...

QueryParser doesn''t use StandardAnalyzer correctly?

2006 Apr 13

QueryParser doesn''t use StandardAnalyzer correctly?

I am having a bit of a problem with my search queries being parsed correctly it seems, and I wonder if anyone else has experienced this. I have written an index using StandardAnalyzer for analysis. I want to search that index by passing my user query through a QueryParser instance which is also using a StandardAnalyzer. However the resultant query does not seem to be a valid term query and

AAF - Stem Analyzer

2006 Dec 06

AAF - Stem Analyzer

I''m not on AAF. Can someone else help Raymond with an example? On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote: > > Matt Schnitz wrote: > > You also need to stem-analyze the incoming query. > > > > I had this same problem. :^> > > > > > > Schnitz > > Do you have an example of how to do this? I''m using

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

2007 Nov 13

acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

Hi all, I cannot make aaf (rev. 220) use my custom analyzer, despite following the indications @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage To pinpoint the problem, I created a model + a simple analyzer with 2 stop words : "fax" and "gsm". test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a

indexing source code

2005 Nov 17

indexing source code

Hi again, I''m using ferret to index source code - DamageControl will allow users to search for text in source code. Currently I''m using the default index with no custom analyzer (I''m using the StandardAnalyzer). Do you have any recommendations about how to write an analyzer that will index source code in a more ''optimal'' way? I.e. disregard common

Extending/Modifying QueryParser

2007 Jul 07

Extending/Modifying QueryParser

...t => { :or_default => false, :analyzer => SynonymAnalyzer.new(WordnetSynonymEngine.new, []) } ) I created a SynonymAnalyzer and SynonymTokenFilter: class SynonymAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(synonym_engine, stop_words = FULL_ENGLISH_STOP_WORDS, lower = true) @synonym_engine = synonym_engine @lower = lower @stop_words = stop_words end def token_stream(field, str) ts = StandardTokenizer.new(str) ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words)...

QueryParser.parse question

2007 Sep 27

QueryParser.parse question

Hi there, I am stomped as to why QueryParser''s parse method behaves differently between query ''a'' and ''b''. See http://pastie.caboo.se/private/4rlwrecyyow3yl6qtf4tq Could someone please help me understand why that is the case. p.s. I also found ''i'' produce the same behavour as ''a'' Cheers, Andy

Stopwords: Topic modelling con LDA

2020 Apr 28

Stopwords: Topic modelling con LDA

...ncluiríais estas palabras que me aparecen en todos los topics o casi todos como stopwords? ¿Hay alguna forma de refinar más el análisis y que haya más diferencias entre topics? Este es el código que estoy usando: Reviews_dtm <-text_df12star %>% unnest_tokens(word, text) %>% anti_join(stop_words)%>% count(Brand, word) %>% cast_dtm(Brand, word, n) Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016)) Un saludo Miriam

Metaphone analysis

2006 Nov 25

Metaphone analysis

...er so that words like "eat" and "eating" both equal to "eat". require ''ferret'' # TODO write tests module Curtis module Analysis class MetaphoneAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(version = :double, stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words @version = version end def token_stream(field, str) MetaphoneFilter.new(StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)), @version) end end end end I saved both of these files, '...

Eliminar números de texto

2020 Nov 04

Eliminar números de texto

...Estoy analizando texto en R y no encuentro cómo eliminar los números y símbolos del texto como ",", "%", etc. Estoy pasando este código, text_data es donde está el texto en la variable "text". tidy_data <- text_data%>% unnest_tokens(word, text)%>% anti_join(stop_words) ¿Cómo podría añadirse a ese código? Muchas gracias

Using StemFilter with PhraseQuery

2008 May 12

Using StemFilter with PhraseQuery

...ting I could parse the phrase and build up a query to be used by QueryParser but I''d like a more succinct solution for now. I use a StemFilter in my analyzer as follows: def token_stream(field, str) ... ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = StemFilter.new(ts) ... end My use of PhraseQuery is as follows: def generate_query(phrase) phrase = phrase.downcase phrase_parts = phrase.split('' '') query = Ferret::Search::PhraseQuery.new(:content, 2) phrase_parts.each do |part| # p...

Problem with stop words

2006 Oct 24

Problem with stop words

...re ''ferret'' index = Ferret::I.new(:or_default => false) index << ''you'' puts index.search(''you'') returns no hits. I assumed from the docs that StandardAnalyzer was using stop words as defined by: Ferret::Analysis::ENGLISH_STOP_WORDS but when I print that to the console I get: ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", &...

[Posible SPAM] Re: Stopwords: Topic modelling con LDA

2020 Apr 29

[Posible SPAM] Re: Stopwords: Topic modelling con LDA

...o casi todos como stopwords? Hay alguna forma de refinar más el >> análisis y que haya más diferencias entre topics? >> >> Este es el código que estoy usando: >> >> Reviews_dtm <-text_df12star %>% >> unnest_tokens(word, text) %>% >> anti_join(stop_words)%>% >> count(Brand, word) %>% >> cast_dtm(Brand, word, n) >> >> >> Reviews_lda <- LDA(Reviews12_dtm, k = 15, control = list(seed = 2016)) >> >> Un saludo >> >> Miriam >> >> _______________________________________________...

A few questions: Tweaking StemFilter, indexes, ...

2007 Jan 21

A few questions: Tweaking StemFilter, indexes, ...

...ound with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing is filtered to plumb, so...

search for: stop_words