BD
2010-Sep-22 20:41 UTC
[R] How to Ignore NaN values in Rows when using hclust function in making Heatmap??
I am making heatmaps for a dataset (~ 300*600 matrix) with the following R script (I am not familiar with R and this is the first time I am using it). library("gplots") library("Cairo") mydata <- read.csv(file="data.csv", header=TRUE, sep=",") rownames(mydata)=mydata$Name mydata <- mydata[,2:297] mydatamatrix <- data.matrix(mydata) mydatascale <- t(scale(t(mydatamatrix))) hr <- hclust(as.dist(1-cor(t(mydatascale), method="pearson")), method="complete") hc <- hclust(as.dist(1-cor(mydatascale, method="spearman")), method="complete") myclhr <- cutree(hr, h=max(hr$height)/2); mycolhr <- sample(rainbow(256)); myclhc <- cutree(hc, h=max(hc$height)/2); mycolhc <- sample(rainbow(256)); mycolhr <- mycolhr[as.vector(myclhr)]; mycolhc <- mycolhc[as.vector(myclhc)]; jpeg("scaleRow.jpg", height=6+2/3, width=6+2/3, units="in", res=1200) heatmap.2(mydatamatrix, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), dendrogram="both", scale="row", col=rev(heat.colors(10)), cexRow=0.08, cexCol=0.08, trace="none", density.info="none", symkey= FALSE, key=TRUE, keysize=1.5, margin=c(5,8),RowSideColor=mycolhr, ColSideColor=mycolhc) dev.off() My question is, in the dataset I have good number of rows (~ 17) that has zero value for all the columns defined. So when I run these two command lines, hr <- hclust(as.dist(1-cor(t(mydatascale), method="pearson")), method="complete") hc <- hclust(as.dist(1-cor(mydatascale, method="spearman")), method="complete") I get the error msg: error in hclust (as.dist(1-cor(t(mydatascale), method="pearson")), : NA/NaN/Inf in foreign function call (arg 11). It seems to be a problem when NaN exist in all columns for a given row. Because, when I delete those rows the script runs fine. I don't know how to work my way around this error msg. I have to include these rows and also cluster them in the heatmap. Is there a way to do this? Please help me! In addition to above, in my dataset I have duplicate row names. I want to keep it that way but every time I run a script I get a warning message for not having unique row names. Is there a way I can ignore this message and still keep my original row names instead of re-naming them? Thanks!! -- View this message in context: http://r.789695.n4.nabble.com/How-to-Ignore-NaN-values-in-Rows-when-using-hclust-function-in-making-Heatmap-tp2551032p2551032.html Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- saving heatmaps in graphical format that can be edited in graphic editor tools
- saving heatmaps in graphical format that can be edited in graphic editor tool
- mysql retrive question
- plot.hclust: strange behaviour with "manufactured" hclust object
- How to write a customized hclust algorithm in R?