thr3ads.net - R help - [R] cluster analysis with pairwise data [Apr 2012]

If this information is useful, please help other people find it:
Share via:

paladini

2012-Apr-04 11:32 UTC

[R] cluster analysis with pairwise data

Hello,
I want to do a cluster analysis with my data. The problem is, that the 
variables dont't consist of single value but the entries are pairs of 
values.
That lokks like this:


Variable 1:    Variable2:      Variable3:  .    .    .
(1,2)          (1,5)           (4,2)
(7,8)          (3,88)          (6,5)
(4,7)          (12,4)          (4,4)
.               .              .
.               .              .
.               .              .
Is it possible to perform a cluster-analysis with this kind of data in 
R ?
I dont even know how to get this data in a matrix or a dada-frame or 
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

David L Carlson

2012-Apr-04 15:59 UTC

head link

[R] cluster analysis with pairwise data

You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the
simplest would be to enter the three variables into six columns like the
following:

data
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    1    5    4    2
[2,]    7    8    3   88    6    5
[3,]    4    7   12    4    4    4

Then use dist() on each pair of columns:

1:2, 3:4, 5:6 . . .

e.g. for the 3 rows of data you provided

size <- nrow(data)*(nrow(data)-1)/2
dm <- dist(rep(0, size))
for(i in seq(1, 6, 2)) {
  dm <- dm + dist(data[,i:(i+1)])^2
}
dm <- sqrt(dm)
dm

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of paladini
Sent: Wednesday, April 04, 2012 6:32 AM
To: r-help at r-project.org
Subject: [R] cluster analysis with pairwise data

Hello,
I want to do a cluster analysis with my data. The problem is, that the 
variables dont't consist of single value but the entries are pairs of 
values.
That lokks like this:


Variable 1:    Variable2:      Variable3:  .    .    .
(1,2)          (1,5)           (4,2)
(7,8)          (3,88)          (6,5)
(4,7)          (12,4)          (4,4)
.               .              .
.               .              .
.               .              .
Is it possible to perform a cluster-analysis with this kind of data in 
R ?
I dont even know how to get this data in a matrix or a dada-frame or 
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Petr Savicky

2012-Apr-04 16:12 UTC

head link

[R] cluster analysis with pairwise data

On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini
wrote:> Hello,
> I want to do a cluster analysis with my data. The problem is, that the 
> variables dont't consist of single value but the entries are pairs of 
> values.
> That lokks like this:
> 
> 
> Variable 1:    Variable2:      Variable3:  .    .    .
> (1,2)          (1,5)           (4,2)
> (7,8)          (3,88)          (6,5)
> (4,7)          (12,4)          (4,4)
> .               .              .
> .               .              .
> .               .              .
> Is it possible to perform a cluster-analysis with this kind of data in 
> R ?
> I dont even know how to get this data in a matrix or a dada-frame or 
> anything like this.
Hi.

The data as they are may be read into R as character data. The
exact way depends on the format of the data in the file. The
result may look like the following.

  Var1 <- c("(1,2)", "(7,8)", "(4,7)")
  Var2 <- c("(1,5)", "(3,88)", "(12,4)")
  Var3 <- c("(4,2)", "(6,5)", "(4,4)")
  DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)

If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format. For example, as follows

  trans <- function(x)
  {
      y <- strsplit(gsub("[()]", "", x), ",")
      unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
  }

  DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
  DF

    Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
  1      1      2      1      5        4        2
  2      7      8      3     88        6        5
  3      4      7     12      4        4        4

Then, see library(help=cluster).

Hope this helps.

Petr Savicky.

Apparently Analagous Threads

Search for more reasonably related threads

R help - Apr 2012 - cluster analysis with pairwise data

[R] cluster analysis with pairwise data

[R] cluster analysis with pairwise data

[R] cluster analysis with pairwise data

Apparently Analagous Threads