Hi,
On Mon, Jun 23, 2014 at 9:11 AM, ashwin ittoo <ashwin.ittoo1 at gmail.com>
wrote:> Hello
> I have been using R for some text pre-processing. I have 2 qestions
> concerning the tm package/
> 1) the function removeSparseTerms takes as parameters a matrix and a
> sparsefactor. Can anyone please tell me how is the sparsefactor calculated?
> I have tried playing around with different values and then inspecting the
> marix. But I could not still grasp the maths behind the sparsefactor
The help says percentage, although since sparse can range from 0 to 1
this is likely proportion instead. But you could always look at the
source yourself if you want to know for certain.
>
> 2) Similarly, the function findAssocs() takes as parameters a matrix , a
> term and an association threshold, e.g. findAssocs(mat,
"test",.5) will
> return all the tokens in the matrix mat (created from a corpus) that have
> an association strength of 0.5 with the term "test". Can anyone
please tell
> me what association metric is being used, for e.g. chi-squared,mutual
> information,....The documentation, help.search("findAssocs"),
does not say
> anything. I read on a web page (which i cannot retrieve now) that
> findAssocs is a *generic* function, but this is still very vague
The help says correlation, and the vignette "Introduction to the tm
Package" confirms that. Again, you could check the source, or you
could contact the package maintainer, which is the appropriate thing
to do for questions of this sort.
Sarah
--
Sarah Goslee
http://www.functionaldiversity.org