Chris Roos
2006-Feb-09 11:13 UTC
[Ferret-talk] Finding related items (like latent semantic indexing)
I''ve been trying to use Classifier::LSI to provide a means of finding ''related items'', where each item is a one line description of a product. Although on small samples the Classifier works great, it completely baulks on my current dataset of 3000 items. I''ve started to look at ferret this morning, following a post on the ruby mailing list. I''d guess that the Fuzzy Query would be the thing that I need, although it doesn''t appear to be as comprehensive as the LSI stuff in classifier (I realise they are doing different things). I''m really just after any thoughts anyone might have.. Thanks in advance, Chris -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Feb-09 12:42 UTC
[Ferret-talk] Finding related items (like latent semantic indexing)
Hi Chris, I plan on adding a "More Like This" function to Ferret but I''m really swamped (doing other stuff on Ferret) at the moment. If you want to have a go at implementing it yourself you could have a look at the way it''s done in Lucene. It''s not too much work but it could take you a while to get your head around the Ferret internals and the current Ferret codebase is soon to be obselete. Sorry I can''t be of more help. Cheers, Dave On 2/9/06, Chris Roos <chrisroos at revieworld.com> wrote:> I''ve been trying to use Classifier::LSI to provide a means of finding > ''related items'', where each item is a one line description of a product. > > Although on small samples the Classifier works great, it completely > baulks on my current dataset of 3000 items. > > I''ve started to look at ferret this morning, following a post on the > ruby mailing list. I''d guess that the Fuzzy Query would be the thing > that I need, although it doesn''t appear to be as comprehensive as the > LSI stuff in classifier (I realise they are doing different things). > > I''m really just after any thoughts anyone might have.. > > Thanks in advance, > > Chris > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
David Balmain
2006-Feb-09 15:09 UTC
[Ferret-talk] Finding related items (like latent semantic indexing)
Hi Chris, I just noticed that you are indexing one line product descriptions. What I''d suggest doing (I believe this is how the lucene MoreLikeThis query works) is just taking the description of your start product and using that as the query. So if the description is; "apple ipod nano 4Gb black" then your query will be; "description:(apple ipod nano 4Gb black)" Hope that helps, Dave On 2/9/06, David Balmain <dbalmain.ml at gmail.com> wrote:> Hi Chris, > > I plan on adding a "More Like This" function to Ferret but I''m really > swamped (doing other stuff on Ferret) at the moment. If you want to > have a go at implementing it yourself you could have a look at the way > it''s done in Lucene. It''s not too much work but it could take you a > while to get your head around the Ferret internals and the current > Ferret codebase is soon to be obselete. Sorry I can''t be of more help. > > Cheers, > Dave > > On 2/9/06, Chris Roos <chrisroos at revieworld.com> wrote: > > I''ve been trying to use Classifier::LSI to provide a means of finding > > ''related items'', where each item is a one line description of a product. > > > > Although on small samples the Classifier works great, it completely > > baulks on my current dataset of 3000 items. > > > > I''ve started to look at ferret this morning, following a post on the > > ruby mailing list. I''d guess that the Fuzzy Query would be the thing > > that I need, although it doesn''t appear to be as comprehensive as the > > LSI stuff in classifier (I realise they are doing different things). > > > > I''m really just after any thoughts anyone might have.. > > > > Thanks in advance, > > > > Chris > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > >
Reasonably Related Threads
- Assigning has_many child in parent creation question
- some improvements about the latent semantic search
- Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
- Error "... x must be atomic" when using lsa (latent semantic analysis) package
- Comparing two documents in the index