thr3ads.net - Xapian devel - [Xapian-devel] GSoC 2011: Improve Spelling Correction [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Nikita Smetanin

2011-Mar-20 14:53 UTC

[Xapian-devel] GSoC 2011: Improve Spelling Correction

Hello, I am Nikita Smetanin (ntz), russian student. I'm interested in
fuzzy search algorithms (also known as similarity search and spelling
correction), I have some articles and open-source implementations of
related algorithms. I also have good experience in enterprise software
development (Java/C++/C# and related stuff) and in small projects.

I want to work on your project "Improve spelling correction", but I
want to suggest some additions to that project:

- One or several phonetic matching algorithms to improve name and
surname search.
- Alternative faster (than trigram) algorithm for correction candidate search.
- More complicated word distance metric to improve result set relevance.
- Something about improving stemming quality.
- Language detection for automatic language-specific algorithms selection.

I'll be happy to participate in this project during Google Summer of
Code 2011 program and implement most of these ideas.

Olly Betts

2011-Mar-21 15:05 UTC

head link

[Xapian-devel] GSoC 2011: Improve Spelling Correction

On Sun, Mar 20, 2011 at 07:53:56PM +0500, Nikita Smetanin
wrote:> Hello, I am Nikita Smetanin (ntz), russian student. I'm interested in
> fuzzy search algorithms (also known as similarity search and spelling
> correction), I have some articles and open-source implementations of
> related algorithms. I also have good experience in enterprise software
> development (Java/C++/C# and related stuff) and in small projects.
> 
> I want to work on your project "Improve spelling correction", but
I
> want to suggest some additions to that project:
That's cool - I actually added a new sentence to the ideas page earlier
to make this clearer (http://trac.xapian.org/wiki/GSoCProjectIdeas):

    Note that these are ideas - some are more fully formed than others, but
    don't be afraid to take them and extend or adapt them in your proposal
    to produce something you're more interesting in working on.
> - One or several phonetic matching algorithms to improve name and
> surname search.
How would you apply these?  Just as something which could be applied to
a field known to contain a name (e.g. author) or something more complex?
> - Alternative faster (than trigram) algorithm for correction candidate
search.
> - More complicated word distance metric to improve result set relevance.
> - Something about improving stemming quality.
> - Language detection for automatic language-specific algorithms selection.
> 
> I'll be happy to participate in this project during Google Summer of
> Code 2011 program and implement most of these ideas.
Cool - I know you've discussed a lot of this on IRC already, but feel
free to ask/discuss further.

And if you get a chance to translate any of your papers into English,
I'd be interested to read them.

Cheers,
    Olly

Xapian devel - Mar 2011 - GSoC 2011: Improve Spelling Correction

[Xapian-devel] GSoC 2011: Improve Spelling Correction

[Xapian-devel] GSoC 2011: Improve Spelling Correction