Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1: Variable2: Variable3: . . . (1,2) (1,5) (4,2) (7,8) (3,88) (6,5) (4,7) (12,4) (4,4) . . . . . . . . . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia
You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the
simplest would be to enter the three variables into six columns like the
following:
data
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 1 5 4 2
[2,] 7 8 3 88 6 5
[3,] 4 7 12 4 4 4
Then use dist() on each pair of columns:
1:2, 3:4, 5:6 . . .
e.g. for the 3 rows of data you provided
size <- nrow(data)*(nrow(data)-1)/2
dm <- dist(rep(0, size))
for(i in seq(1, 6, 2)) {
dm <- dm + dist(data[,i:(i+1)])^2
}
dm <- sqrt(dm)
dm
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of paladini
Sent: Wednesday, April 04, 2012 6:32 AM
To: r-help at r-project.org
Subject: [R] cluster analysis with pairwise data
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:
Variable 1: Variable2: Variable3: . . .
(1,2) (1,5) (4,2)
(7,8) (3,88) (6,5)
(4,7) (12,4) (4,4)
. . .
. . .
. . .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.
It would be really nice if somebody could help me.
Best regards and happy Easter
Claudia
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:> Hello, > I want to do a cluster analysis with my data. The problem is, that the > variables dont't consist of single value but the entries are pairs of > values. > That lokks like this: > > > Variable 1: Variable2: Variable3: . . . > (1,2) (1,5) (4,2) > (7,8) (3,88) (6,5) > (4,7) (12,4) (4,4) > . . . > . . . > . . . > Is it possible to perform a cluster-analysis with this kind of data in > R ? > I dont even know how to get this data in a matrix or a dada-frame or > anything like this.Hi. The data as they are may be read into R as character data. The exact way depends on the format of the data in the file. The result may look like the following. Var1 <- c("(1,2)", "(7,8)", "(4,7)") Var2 <- c("(1,5)", "(3,88)", "(12,4)") Var3 <- c("(4,2)", "(6,5)", "(4,4)") DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you want to use a distance between pairs depending on the numbers (and not only equal/different pair), then the data should to be transformed to a numeric format. For example, as follows trans <- function(x) { y <- strsplit(gsub("[()]", "", x), ",") unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0)))) } DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3)) DF Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1 1 1 2 1 5 4 2 2 7 8 3 88 6 5 3 4 7 12 4 4 4 Then, see library(help=cluster). Hope this helps. Petr Savicky.