I am seeing trouble with searches for ''you'' not returning anything. It appears that ''you'' is a stop word to the standard analyzer: require ''rubygems'' require ''ferret'' index = Ferret::I.new(:or_default => false) index << ''you'' puts index.search(''you'') returns no hits. I assumed from the docs that StandardAnalyzer was using stop words as defined by: Ferret::Analysis::ENGLISH_STOP_WORDS but when I print that to the console I get: ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"] I don''t see ''you'' in there. Supplying my own stop words seems to fix the problem: STOP_WORDS = ["a", "the", "and", "or"] index = Ferret::I.new(:or_default => false, :analyzer => Ferret::Analysis::StandardAnalyzer.new(STOP_WORDS)) index << ''you'' puts index.search(''you'') this returns a hit. I am running the latest Windows build, but I''ve seen the same behavior on Linux with the latest builds. I am happy with my solution, but it seems odd that ''you'' should be standard stop word. -- Posted via http://www.ruby-forum.com/.
On 24.10.2006, at 23:28, Scott Persinger wrote:> I am seeing trouble with searches for ''you'' not returning anything. It > appears that ''you'' is a stop word to the standard analyzer:> I assumed from the docs that StandardAnalyzer was using stop words > as defined by: > > Ferret::Analysis::ENGLISH_STOP_WORDS > > I don''t see ''you'' in there.StandardAnalyzer actually uses Ferret::Analysis::FULL_ENGLISH_STOP_WORDS by default. (Note the ''FULL_'')> Supplying my own stop words seems to fix the problem:Standard stop words are just a one-size-fit-all reasonable default. For maximum control you should always supply your own list of stop words.> I am running the latest Windows build, but I''ve seen the same behavior > on Linux with the latest builds. I am happy with my solution, but it > seems odd that ''you'' should be standard stop word.Depends on how you look at it. ''You'' is definitely not the least adequate candidate for a stop word. Then again, it''s not included in Ferret::Analysis::ENGLISH_STOP_WORDS. Cheers, Andy
On 10/24/06, Andreas Korth <andreas.korth at gmx.net> wrote:> > On 24.10.2006, at 23:28, Scott Persinger wrote: > > > I am seeing trouble with searches for ''you'' not returning anything. It > > appears that ''you'' is a stop word to the standard analyzer: > > > I assumed from the docs that StandardAnalyzer was using stop words > > as defined by: > > > > Ferret::Analysis::ENGLISH_STOP_WORDS > > > > I don''t see ''you'' in there. > > StandardAnalyzer actually uses > Ferret::Analysis::FULL_ENGLISH_STOP_WORDS by default. (Note the ''FULL_'')My apologies. This had been fixed in the documentation a while ago. I just have updated the docs on the Ferret homepage for a while.> > Supplying my own stop words seems to fix the problem: > > Standard stop words are just a one-size-fit-all reasonable default. > For maximum control you should always supply your own list of stop > words.> > I am running the latest Windows build, but I''ve seen the same behavior > > on Linux with the latest builds. I am happy with my solution, but it > > seems odd that ''you'' should be standard stop word. > > Depends on how you look at it. ''You'' is definitely not the least > adequate candidate for a stop word. Then again, it''s not included in > Ferret::Analysis::ENGLISH_STOP_WORDS. > > Cheers, > AndyThanks Andy. Actually the reason for the two English stop-word lists is that they come from two different sources. ENGLISH_STOP_WORDS is the list taken from Lucene. FULL_ENGLISH_STOP_WORDS is taken from Martin Porter''s website[1]. I hope that clears things up a little. You are quite right in saying you should probably use your own list of stop words for best results. Cheers, Dave [1] http://snowball.tartarus.org/