Hi, Can Ferret be configured to change the minimum word length of what it indexes? Right now it seems to drop words 3 characters or less, but I''d like to include words going down to 2 characters. How would I do that? Francis
Sorry, false alarm, I was not indexing some of my records. On Jul 26, 2006, at 11:48 AM, Francis Hwang wrote:> Hi, > > Can Ferret be configured to change the minimum word length of what it > indexes? Right now it seems to drop words 3 characters or less, but > I''d like to include words going down to 2 characters. How would I do > that? > > Francis > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
Hello, I am actually experiencing the same problem (I am using Ferret 0.9.5). When I search for terms that are under 4 characters, Ferret doesn''t return any result. Is there a way to index all words (even single-character words) easily ? Since I am using acts_as-ferret for my project, is there a way to also specify that within acts_as_ferret options ? Thank you, Maxime CUrioni Francis Hwang wrote:> Sorry, false alarm, I was not indexing some of my records. > > >> Hi, >> >> Can Ferret be configured to change the minimum word length of what it >> indexes? Right now it seems to drop words 3 characters or less, but >> I''d like to include words going down to 2 characters. How would I do >> that? >> >> Francis-- Posted via http://www.ruby-forum.com/.
Hi Maxime, Ferret already indexes all words no matter what their length (unless you add a custom filter). Could you give an example of the problem? ie. what words are you trying to search for? Cheers, Dave On 8/16/06, Maxime Curioni <mxcurioni at yahoo.com> wrote:> Hello, > I am actually experiencing the same problem (I am using Ferret 0.9.5). > When I search for terms that are under 4 characters, Ferret doesn''t > return any result. Is there a way to index all words (even > single-character words) easily ? Since I am using acts_as-ferret for my > project, is there a way to also specify that within acts_as_ferret > options ? > > Thank you, > Maxime CUrioni > > Francis Hwang wrote: > > Sorry, false alarm, I was not indexing some of my records. > > > > > >> Hi, > >> > >> Can Ferret be configured to change the minimum word length of what it > >> indexes? Right now it seems to drop words 3 characters or less, but > >> I''d like to include words going down to 2 characters. How would I do > >> that? > >> > >> Francis > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Hello Dave, Sorry for responding so late. I am actually using Ferret via the acts_as_ferret Rails plugin. I have a problem with small words, especially when I search for them between quotes. For example, I have indexed the following sentence: "e-commerce growth strategy for a major business to leverage key intangible assets" When I search for the sentence ''"for a"'' (not just ''for AND a'' but the sentence "for a"), I don''t get any results. Is there a way to impose to Ferret to return results _strictly_ containing certain words (i.e. exact results, not approximate results) ? I am also experiencing problems with words containing special characters (especially words separated with dashes). Is there a way to send a raw query to Ferret without having to escape the special characters ? Thank you for your help, Maxime Curioni David Balmain wrote:> Hi Maxime, > > Ferret already indexes all words no matter what their length (unless > you add a custom filter). Could you give an example of the problem? > ie. what words are you trying to search for? > > Cheers, > Dave-- Posted via http://www.ruby-forum.com/.
On 9/8/06, Maxime Curioni <mxcurioni at yahoo.com> wrote:> Hello Dave, > Sorry for responding so late. I am actually using Ferret via the > acts_as_ferret Rails plugin. > > I have a problem with small words, especially when I search for them > between quotes. For example, I have indexed the following sentence: > "e-commerce growth strategy for a major business to leverage key > intangible assets" > > When I search for the sentence ''"for a"'' (not just ''for AND a'' but the > sentence "for a"), I don''t get any results.Hi Maxime, It''s not the length of the words that is the problem. If you did a search for "cat" it would find it. The problem is that the default analyzer which you are using removes common stop-words like "and", "the", "a" and "for". You can create a StandardAnalyzer that doesn''t remove stopwords like this; include Ferret::Index include Ferret::Analysis index = Index.new(:analyzer => StandardAnalyzer.new([]))> Is there a way to impose to > Ferret to return results _strictly_ containing certain words (i.e. exact > results, not approximate results) ?I''m not sure what you mean here. Can you give me an example where Ferret returns approximate results?> I am also experiencing problems with words containing special characters > (especially words separated with dashes). Is there a way to send a raw > query to Ferret without having to escape the special characters ?words separated by dashes are treated as single words by the current StandardAnalyzer but that will change in version 0.10.3. Here is an example; require ''rubygems'' require ''ferret'' index = Ferret::I.new(:analyzer => Ferret::Analysis::StandardAnalyzer.new([])) index << "e-commerce growth strategy for a major business to leverage key intangible assets" puts index.search("e-commerce") puts index.search("commerce") puts index.search("for a") Currently the search for "commerce" won''t return any results. In version 0.10.3 both "e-commerce" and "commerce" and "e" for that matter will find the document.> Thank you for your help, > Maxime Curioni
On Fri, Sep 08, 2006 at 03:24:27PM +0900, David Balmain wrote:> On 9/8/06, Maxime Curioni <mxcurioni at yahoo.com> wrote: > > Hello Dave, > > Sorry for responding so late. I am actually using Ferret via the > > acts_as_ferret Rails plugin. > > > > I have a problem with small words, especially when I search for them > > between quotes. For example, I have indexed the following sentence: > > "e-commerce growth strategy for a major business to leverage key > > intangible assets" > > > > When I search for the sentence ''"for a"'' (not just ''for AND a'' but the > > sentence "for a"), I don''t get any results. > > Hi Maxime, > It''s not the length of the words that is the problem. If you did a > search for "cat" it would find it. The problem is that the default > analyzer which you are using removes common stop-words like "and", > "the", "a" and "for". You can create a StandardAnalyzer that doesn''t > remove stopwords like this; > > include Ferret::Index > include Ferret::Analysis > > index = Index.new(:analyzer => StandardAnalyzer.new([]))or, with aaf: acts_as_ferret :analyzer => StandardAnalyzer.new([]) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hello David and Jens, I cannot thank you enough for your prompt answers. I had quickly browsed through both Ferret and aaf APIs but being short on schedule, I did not really have time to dive in the technology. I have successfully used aaf and Ferret out out the box for my product, thanks to your work and the Rails environment. I am realizing now that if I had read the documentation (especially about Ferret analyzers), I could have saved some of your time... so thanks a lot ! I now understand about Ferret parsing the query for common words. I will use the basic analyzer that you provided me with. Regarding the "approximate results", after what you have told me, it makes more sense: the record "Defining an e-commerce growth strategy for a major business" would be matched by both ''"Defining an"'' and ''"Defining as"''. I thought that Ferret would match ''approximate results'', considering that those queries were somehow close enough to return the previous record as a valid result for both of them. I understand that "an" and "as" are considered common words and Ferret removes them, therefore giving the results of the ''"Defining"'' query. I understand that the feature I am looking for (matching words separated with dashes) will be available in the next released version: - what can I do, in the meantime, to match those words ? Do I need to write an ad hoc analyzer ? Could you tell me the list of the "special characters" ? - when do you estimate that 0.10.3 will be released ? Having to deliver my product soon, I was wondering if that version would make it into my work. Thank you again for your time and your help. Regards. Maxime Curioni -- Posted via http://www.ruby-forum.com/.
> or, with aaf: > acts_as_ferret :analyzer => StandardAnalyzer.new([]) >I''ve tried this with aaf, and it still uses stopwords. Anyone else have this problem? I''m running 10.10 and aaf, plugin (as current as today...not sure what v.). I''ve tried: acts_as_ferret :fields => [:name], :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([]) even different analyzers. All of them still seem to use the stopwords. Anyone have an idea? -- Posted via http://www.ruby-forum.com/.
Brad Adams wrote:> >> or, with aaf: >> acts_as_ferret :analyzer => StandardAnalyzer.new([]) >> > > I''ve tried this with aaf, and it still uses stopwords. Anyone else have > this problem? I''m running 10.10 and aaf, plugin (as current as > today...not sure what v.). > > I''ve tried: > acts_as_ferret :fields => [:name], :analyzer => > Ferret::Analysis::StandardAnalyzer.new([]) > > acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([]) > > even different analyzers. All of them still seem to use the stopwords. > Anyone have an idea?I''ve got it to work...after countless tries with different syntax, and analyzers. It worked only when I passed ''nil''. acts_as_ferret( { :fields => [:name] }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([nil]) } ) Hope that''ll help anyone else that comes across this. -- Posted via http://www.ruby-forum.com/.
Brad Adams wrote:> Brad Adams wrote: >> >>> or, with aaf: >>> acts_as_ferret :analyzer => StandardAnalyzer.new([]) >>> >> >> I''ve tried this with aaf, and it still uses stopwords. Anyone else have >> this problem? I''m running 10.10 and aaf, plugin (as current as >> today...not sure what v.). >> >> I''ve tried: >> acts_as_ferret :fields => [:name], :analyzer => >> Ferret::Analysis::StandardAnalyzer.new([]) >> >> acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([]) >> >> even different analyzers. All of them still seem to use the stopwords. >> Anyone have an idea? > > I''ve got it to work...after countless tries with different syntax, and > analyzers. > It worked only when I passed ''nil''. > acts_as_ferret( { :fields => [:name] }, { :analyzer => > Ferret::Analysis::StandardAnalyzer.new([nil]) } ) > > Hope that''ll help anyone else that comes across this.Thanks everyone for posting this. I have a question.> acts_as_ferret( { :fields => [:name] }, { :analyzer => > Ferret::Analysis::StandardAnalyzer.new([nil]) } )works by allowing stopwords in my searches, but what if I want to allow stopword searching in only ONE field? This is what I have: acts_as_ferret({:fields => {:name => {:boost => 10, :store => :yes}, :description => {}, :title => {:boost => 3}}}, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([nil]) } ) I want to allow stopword searching for :title, and remove stopwords for :name and :description. Is there a way to do it? I''m new to Ferret, and can''t really figure out how to use StopFilter in QueryParser. qp = QueryParser.new(:fields => [:name, :description], :analyzer => StopFilter.new()) Thanks a lot! -- Posted via http://www.ruby-forum.com/.
On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote: [..]> I want to allow stopword searching for :title, and remove stopwords for > :name and :description. Is there a way to do it?Have a look at PerFieldAnalyzer, it allows you to specify separate Analyzers for fields. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Thanks Jens, exactly what I needed. Jens Kraemer wrote:> On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote: > [..] >> I want to allow stopword searching for :title, and remove stopwords for >> :name and :description. Is there a way to do it? > > Have a look at PerFieldAnalyzer, it allows you to specify separate > Analyzers for fields. > > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa-- Posted via http://www.ruby-forum.com/.
> > Jens Kraemer wrote: >> On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote: >> [..] >>> I want to allow stopword searching for :title, and remove >>> stopwords for >>> :name and :description. Is there a way to do it? >> >> Have a look at PerFieldAnalyzer, it allows you to specify separate >> Analyzers for fields.Hey.. that''s what we do over at omdb.org: @analyzer = PerFieldAnalyzer.new( OmdbDefaultAnalyzer.new ) @analyzer[:aliases] = OmdbContentAnalyzer.new( Locale.base_language ) @analyzer[:keywords] = OmdbContentAnalyzer.new ( Locale.base_language ) LOCALES.each_key do |key| language = Language.pick(key) @analyzer["content_#{key}".to_sym] = OmdbContentAnalyzer.new ( language ) @analyzer["keywords_#{key}".to_sym] = OmdbContentAnalyzer.new ( language ) end Where a ContentAnalyzer is a MappingFilter > StemFilter > StopFilter > LowerCaseFilter and a DefaultAnalyzer is simply a MappingFilter > HyphenFilter > LowerCaseFilter :-) Ben