Displaying 2 results from an estimated 2 matches for "within_document_frequency".
2014 Mar 09
2
[GSoC 2014] clustering of search results
Hello guys,
I was looking forward to participate in GSoc14. I have a decent knowledge
about c++ and parsers.
I was looking at the idea pages where I found many interesting projects, in
which "clustering of search results" interests me the most. I want someone
to take me to the right track in understanding the project so that I can
think about its implementation.
According to me, in this
2014 Mar 10
2
[GSoC 2014] clustering of search results
...ering, but it isn't completely equivalent.
>
> As an example, one way you could generate clusters is to think of each
> document as a point in a multi-dimensional space, where each dimension
> represents a different term with the distance in that direction being
> something like (within_document_frequency / document_length). In this
> space, the distance between two identical documents is 0, and documents
> which are more different will tend to be further apart (one word
> changed is a small distance; no words in common is a long way apart).
>
> Clustering is then splitting the docum...