Do not use html in r-help emails. Look below at what happens to your data.
The error message is telling you that t(data) is not numeric.
> str(data)
That will tell you what kind of data you have.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of marco milella
> Sent: Thursday, December 06, 2012 12:08 PM
> To: r-help at r-project.org
> Subject: [R] clustering of binary data
>
> Good morning,
> I am analyzing a dataset composed by 364 subjects and 13 binary
> variables
> (0,1 = absence,presence).
> I am testing possible association (co-presence) of my variables. To do
> this, I was trying with cluster analysis.
>
> My main interest is to check for the significance of the obtained
> clusters.
>
> First, I tried with the pvclust() function, by using
> method.hclust="ward"
> and method.dist="binary". Altoghether it works (clusters and
> significance
> obtained). However, I'm not convinced by the distance matrix.
> Association
> between variables are indeed different from results obtained in PAST by
> using Ward on a Jaccard matrix (that should be ok for binary data).
> Moreover, when I try to obtain a Jaccard matrix in R from my data, by
> using
> the Vegan package
>
> mydistance<-vegdist(t(data),method="jaccard")
>
> I receive the following error message:
>
> Error in rowSums(x, na.rm = TRUE) : 'x' must be numeric
>
>
> below an subset from my dataset:
>
> variable1 variable2 variable3 variable4 variable5 variable6
> variable7
> variable8 variable9 variable10 variable11 variable12 variable13 case1
> 0 0 0
> 0 0 1 0 0 1 1 0 0 0 case2 0 0 0 0 0 1 0 NA NA 1 0 0 0 case3 0 0 0 0 0
> 1 0
> 0 1 1 0 0 0 case4 1 0 0 0 0 1 0 1 0 1 0 0 0 case5 0 0 0 0 0 1 0 0 1 1
> 0 0
> 0 case6 0 1 0 0 0 1 0 1 0 1 0 0 0 case7 0 1 0 0 0 1 0 0 1 1 0 0 0
> case8 0
> 0 0 0 0 1 0 1 0 1 0 0 0 case9 0 0 0 0 0 1 0 1 0 1 0 0 0 case10 0 0 0
> 0 0 1
> 0 0 1 1 0 0 0 case11 1 0 0 1 0 1 1 1 0 1 0 0 0 case12 0 0 0 1 1 0 1 1
> 0 1
> 0 0 0 .....
>
>
>
>
>
>
>
>
>
>
>
>
>
> So, my questions are the following: Is the Jaccard index a good
> strategy
> for my kind of data? Is binary distance used in pvclust is
> theoretically
> more correct? Is there any alternative to pvclust for testing the
> significance of my clusters?
>
> Thanks in advance
> Marco
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.