search for: within_document_frequency

Displaying 2 results from an estimated 2 matches for "within_document_frequency".

2014 Mar 09
2
[GSoC 2014] clustering of search results
Hello guys, I was looking forward to participate in GSoc14. I have a decent knowledge about c++ and parsers. I was looking at the idea pages where I found many interesting projects, in which "clustering of search results" interests me the most. I want someone to take me to the right track in understanding the project so that I can think about its implementation. According to me, in this
2014 Mar 10
2
[GSoC 2014] clustering of search results
...ering, but it isn't completely equivalent. > > As an example, one way you could generate clusters is to think of each > document as a point in a multi-dimensional space, where each dimension > represents a different term with the distance in that direction being > something like (within_document_frequency / document_length). In this > space, the distance between two identical documents is 0, and documents > which are more different will tend to be further apart (one word > changed is a small distance; no words in common is a long way apart). > > Clustering is then splitting the docum...