Hi all, I am currently using package "topicmodels" to find the topics of a given text. The dataset contains 8523 documents. I would like to see which documents belong to which topic. Here is my code: ########################get the documentTermMatrix######### tdm=DocumentTermMatrix(corpus,control) length(tdm$dimnames$Terms) dim (tdm) ##################the dimension of tdm is "[1] 8513 21135" library ("slam") library ("topicmodels") term_tfidf <-tapply(tdm$v/row_sums(tdm)[tdm$i], tdm$j, mean) * log2(nDocs(tdm)/col_sums(tdm > 0)) summary(term_tfidf) summary(col_sums(tdm)) tdm <- tdm[,term_tfidf >= 0.15] tdm2 <- tdm[row_sums(tdm) > 0,] dim(tdm2) ######################now the dim of tdm2 is *8513 10091##* ###################topic modeling analysis###################### k <- 30 lda <-LDA (tdm2, control=list(alpha=0.1),k) ###### cell values as posterior topic distribution for each document##### gammaDF <- as.data.frame(lda@gamma) names(gammaDF) <- c(1:k) # inspect... gammaDF toptopics <- as.data.frame(cbind(document = row.names(gammaDF), topic = apply(gammaDF,1,function(x) names(gammaDF)[which(x==max(x))]))) sapply(toptopics, class) toptopics<-unlist(toptopics) write.csv (toptopics, "topicdistribution.csv") Some of the documents (in this case, 10 documents) were excluded since some of them contain zero entry . Therefore, I cannot match the original document ID with the result of the topics. My question is how can I include the original document id and match these id numbers with the topics? ZHANG Lun [[alternative HTML version deleted]]