On Wed, Mar 23, 2011 at 07:42:08PM -0400, Prasad Prabhu
wrote:> This is my list set of ideas and overview of my analysis I have done on
some
> other ideas I felt should be discussed. Please provide me some comments and
> suggestions to make it better before the application process starts.
> Here is the link: Idea Log <http://goo.gl/GjCcA>
I think trying to actually parse queries as sentences isn't likely to
work well. People usually search for a few words without the proper
grammar, or for a sentence fragment. So for being context sensitive,
I think a statistical approach is more likely to work (e.g. something
like tracking how likely is this word to appear near that one, and
then comparing that for words within edit distance X of the word we
are considering for correction).
I'm not clear how stemming helps here - perhaps you could elaborate
on how it would be used?
And soundex is really a non-starter. It's only intended to be used
on surnames common in the USA, and it's not even much good for those.
Metaphone (and metaphone 2) are better alternatives.
Cheers,
Olly