I think there is a problem with R's tm package's weightTfIdf function. The manual says that the idf is calculated as idf(term ) = log (|D|/number of documents that contain the term) In cases where the dictionary is passed in the control list as given below: dtm DocumentTermMatrix(myCorpus,control=list(dictionary=myDict,weighting=function(x)weightTfIdf(x,normalize=FALSE))) There are chances that there is no document that contains a term.In that case the denominator in the idf becomes 0 leading to a NAN How can I raise this issue and actually fix the code so that we use idf(term) = log (|D|/number of documents that contain the term+1) Any help on this would be appreciated! Shivani [[alternative HTML version deleted]]