Tim Richter-Heitmann
2014-Sep-04 13:23 UTC
[R] gplot heatmaps: clustering according to rowsidecolors + key.xtickfun
Hi there, I have two questions about heatmap.2 in gplot. My input is a simple square matrix with numeric values between 75 and 100 (it is a similarity matrix based on bacterial DNA sequences). 1. I can sort my input matrix into categories with rowsidecolors (in this case, very conveniently by bacterial taxa). I do a clustering and reordering of my matrix by Rowv=TRUE (and Colv="Rowv"). The question is now, can i combine the two features that the clustering/reordering is done only for submatrices defined by the vectors given in rowsidecolors (so, in this case, that the original ordering by bacterial taxa is preserved)? That would be very amazing. 2. I have set my own coloring rules with: mypal <- c("grey","blue", "green","yellow","orange","red") col_breaks = c(seq(0,74.9), seq(75.0,78.4), seq(78.5,81.9), seq(82.0,86.4), seq(86.5, 94.5), seq(94.5,100.0)) Is it possible to pass this sequential ordering to key.xtickfun? May i ask for an example code? Thank you very much! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Stra?e (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069
Gregory R. Warnes
2014-Sep-17 21:12 UTC
[R] gplot heatmaps: clustering according to rowsidecolors + key.xtickfun
Hello Tim, Sorry about the slow response, I just found this message. On Sep 4, 2014, at 9:23 AM, Tim Richter-Heitmann <trichter at uni-bremen.de> wrote:> Hi there, > > I have two questions about heatmap.2 in gplot. > My input is a simple square matrix with numeric values between 75 and 100 (it is a similarity matrix based on bacterial DNA sequences). > > 1. I can sort my input matrix into categories with rowsidecolors (in this case, very conveniently by bacterial taxa). I do a clustering and reordering of my matrix by Rowv=TRUE (and Colv="Rowv"). > The question is now, can i combine the two features that the clustering/reordering is done only for submatrices defined by the vectors given in rowsidecolors (so, in this case, that the original ordering by bacterial taxa is preserved)? > > That would be very amazing. >Hmm. To get the individual species clustered within taxa would require doing the hierarchical clustering first separately, then combining the dendrograms. This should do the trick: set.seed(1234567) ## Dummy Distances x <- matrix( rnorm(400, mean=87.5, sd=12.5/4), ncol=20) ## Dummy Taxa taxa <- sample(letters[1:4], 20, replace=T) taxa <- as.factor(taxa) # sort the data by taxa ord <- order(taxa) x <- x[ord, ord] taxa <- taxa[ord] rownames(x) <- 1:nrow(x) #### # stats:::merge.dendrogram is broken. This is the corrected version. # See R BUG 15648 # (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15648) for # details #### merge.dendrogram <- function(x, y, ..., height) { stopifnot(inherits(x,"dendrogram"), inherits(y,"dendrogram")) ### FIX inx.add <- function(inx, add) { if(is.leaf(inx)) { inx <- inx + add } return(inx) } y <- dendrapply(y, inx.add, add=max(unlist(x))) ### FIX r <- list(x,y) if(length(xtr <- list(...))) { if(!all(is.d <- vapply(xtr, inherits, NA, what="dendrogram"))) { xpr <- substitute(c(...)) nms <- sapply(xpr[-1][!is.d], deparse, nlines = 1L) ## do not simplify: xgettext needs this form msg <- ngettext(length(nms), "extra argument %s is not of class \"%s\"", "extra arguments %s are not of class \"%s\"s") stop(sprintf(msg, paste(nms, collapse=", "), "dendrogram"), domain = NA) } ## <GRW> for(i in 1:length(xtr)) { add <- max(c(unlist(r), unlist(xtr))) print(add) xtr[[i]] <- dendrapply(xtr[[i]], inx.add, add=add) } ## </GRW> r <- c(r, xtr) } attr(r, "members") <- sum(vapply(r, attr, 0L, which="members")) h.max <- max(vapply(r, attr, 0., which="height")) if(missing(height) || is.null(height)) height <- 1.1 * h.max else if(height < h.max) { msg <- gettextf("'height' must be at least %g, the maximal height of its components", h.max) stop(msg, domain = NA) } attr(r, "height") <- height class(r) <- "dendrogram" midcache.dendrogram(r, quiet=TRUE) } ## Compute dendrograms within each taxum, then merge into a combined dendrogram dendList <- list() for( taxon in levels(taxa) ) { items <- which(taxon==taxa) submatrix <- x[ items, items] dend <- as.dendrogram(hclust(dist(submatrix))) dendList[[taxon]] <- dend } names(dendList) <- NULL dends <- do.call("merge.dendrogram", dendList) ## Now generate the heatmap heatmap.2(x, Rowv=dends, Colv=dends, symm=TRUE, RowSideColors=c("red","blue","green","black")[as.numeric(taxa)], ColSideColors=c("red","blue","green","black")[as.numeric(taxa)], trace="none" )> 2. I have set my own coloring rules with: > > mypal <- c("grey","blue", "green","yellow","orange","red") > col_breaks = c(seq(0,74.9), seq(75.0,78.4), seq(78.5,81.9), seq(82.0,86.4), seq(86.5, 94.5), seq(94.5,100.0)) > > Is it possible to pass this sequential ordering to key.xtickfun? May i ask for an example code?Use the ?breaks? and ?col? arguements: ## Custom color key mypal <- c("grey","blue", "green","yellow","orange","red") col_breaks <- c(0,75.0,78.5,82.0,86.5,94.5,100.0) heatmap.2(x, Rowv=dends, Colv=dends, symm=TRUE, RowSideColors=c("red","blue","green","black")[as.numeric(taxa)], ColSideColors=c("red","blue","green","black")[as.numeric(taxa)], trace="none", breaks=col_breaks, col=mypal ) -Greg [[alternative HTML version deleted]]