Hi: I am trying to cluster the rows of a text file with kmeans: I load the data as follows file1 <- read.csv("somefile.csv") and the file can be viewed having the following line of words> file11 word1 word3 word4 word1 2 word1 word4 word3 word1 3 word4 word2 word4 word3 4 word4 word2 word1 word3 5 word2 word2 word4 word2 file_as_matrix <- as.matrix(file1); Now, I want to apply some clustering algorithm such as kmeans to cluster the rows in the file to get the following output: Cluster1 word1 word3 word4 word1 word1 word4 word3 word1 Cluster2 word4 word2 word4 word3 word4 word2 word1 word3 word2 word2 word4 word2 But as kmeans takes as input numeric matrix of data, it cannot be used to cluster the rows in this case. Is there any simple way to cluster the rows of such a text file? An example code would be really useful. Thanks and regards: debb
Alekseiy Beloshitskiy
2012-Mar-26 12:11 UTC
[R] how to cluster rows of words in a text file
Hello, I didn't quite understand what you need, but maybe you can have a look here: www.slideshare.net/whitish/textmining-with-r R code fragments are in appendixes of the presentation. Hope this will help, -Alex ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of mail me [mailme842 at googlemail.com] Sent: 23 March 2012 20:03 To: r-help Subject: [R] how to cluster rows of words in a text file Hi: I am trying to cluster the rows of a text file with kmeans: I load the data as follows file1 <- read.csv("somefile.csv") and the file can be viewed having the following line of words> file11 word1 word3 word4 word1 2 word1 word4 word3 word1 3 word4 word2 word4 word3 4 word4 word2 word1 word3 5 word2 word2 word4 word2 file_as_matrix <- as.matrix(file1); Now, I want to apply some clustering algorithm such as kmeans to cluster the rows in the file to get the following output: Cluster1 word1 word3 word4 word1 word1 word4 word3 word1 Cluster2 word4 word2 word4 word3 word4 word2 word1 word3 word2 word2 word4 word2 But as kmeans takes as input numeric matrix of data, it cannot be used to cluster the rows in this case. Is there any simple way to cluster the rows of such a text file? An example code would be really useful. Thanks and regards: debb ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.