Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1: Variable2: Variable3: . . . (1,2) (1,5) (4,2) (7,8) (3,88) (6,5) (4,7) (12,4) (4,4) . . . . . . . . . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia
You can create distance matrices for each Variable, square them, sum them, and take the square root. As for getting the data into a data frame, the simplest would be to enter the three variables into six columns like the following: data [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 2 1 5 4 2 [2,] 7 8 3 88 6 5 [3,] 4 7 12 4 4 4 Then use dist() on each pair of columns: 1:2, 3:4, 5:6 . . . e.g. for the 3 rows of data you provided size <- nrow(data)*(nrow(data)-1)/2 dm <- dist(rep(0, size)) for(i in seq(1, 6, 2)) { dm <- dm + dist(data[,i:(i+1)])^2 } dm <- sqrt(dm) dm ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of paladini Sent: Wednesday, April 04, 2012 6:32 AM To: r-help at r-project.org Subject: [R] cluster analysis with pairwise data Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1: Variable2: Variable3: . . . (1,2) (1,5) (4,2) (7,8) (3,88) (6,5) (4,7) (12,4) (4,4) . . . . . . . . . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:> Hello, > I want to do a cluster analysis with my data. The problem is, that the > variables dont't consist of single value but the entries are pairs of > values. > That lokks like this: > > > Variable 1: Variable2: Variable3: . . . > (1,2) (1,5) (4,2) > (7,8) (3,88) (6,5) > (4,7) (12,4) (4,4) > . . . > . . . > . . . > Is it possible to perform a cluster-analysis with this kind of data in > R ? > I dont even know how to get this data in a matrix or a dada-frame or > anything like this.Hi. The data as they are may be read into R as character data. The exact way depends on the format of the data in the file. The result may look like the following. Var1 <- c("(1,2)", "(7,8)", "(4,7)") Var2 <- c("(1,5)", "(3,88)", "(12,4)") Var3 <- c("(4,2)", "(6,5)", "(4,4)") DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you want to use a distance between pairs depending on the numbers (and not only equal/different pair), then the data should to be transformed to a numeric format. For example, as follows trans <- function(x) { y <- strsplit(gsub("[()]", "", x), ",") unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0)))) } DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3)) DF Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1 1 1 2 1 5 4 2 2 7 8 3 88 6 5 3 4 7 12 4 4 4 Then, see library(help=cluster). Hope this helps. Petr Savicky.