On 11/02/2013 10:35 AM, Zhao Jin wrote:> Dear all,
>
> I am trying to make a series of waffle plot-like figures for my data to
> visualize the ratios of amino acid residues at each position. For each one
> of 37 positions, there may be one to four different amino acid residues. So
> the data consist of the positions, what residues are there, and the ratios
> of residues. The ratios of residues at a position add up to 100, or close
> to 100 (more on this soon)*. I am hoping to make a *square* waffle
> plot-like figure for each position, and fill the 10 X 10 grids with colors
> representing each amino acid residue and areas for grids of a certain color
> corresponding to the ratio of that residue. Then I could line up all the
> plots in one row from position 1 to position 37.
> *: if the sum of the ratios is less than 100 at a position, that's
because
> of an unknown residue which I did not include in the table.
>
> I am attaching the dput output for my data here:
> structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
> 8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
> 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
> 26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
> 36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
> 7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
> 14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
> 15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
> 12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C",
"D", "E",
> "G", "H", "I", "K", "L",
"M", "N", "P", "Q", "R",
"S", "T", "V",
> "Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L,
99L,
> 100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
> 1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
> 98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
> 100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names >
c("position",
> "residue", "ratio"), class = "data.frame",
row.names = c("1",
> "2", "3", "4", "5", "6",
"10", "11", "12", "13", "14",
"15",
> "17", "18", "19", "20",
"23", "25", "27", "28", "29",
"30", "31",
> "32", "33", "34", "36",
"37", "38", "39", "40", "42",
"43", "44",
> "45", "46", "47", "48",
"50", "51", "52", "53", "54",
"56", "57",
> "58", "59", "60", "61",
"62", "63", "64", "65"))
>
> Inspired by a statexchange post, I am using these scripts to make the plots
> :
> library(ggplot2)
>
col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
> dflist=list()
> for (i in 1:37){
> residue_num=length(which(df$position==i))
> dflist[[i]]=df[df$position==i,2:3]
>
waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
> residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
>
waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
> png(paste('plot',i,'.png',sep=''))
> print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color
> "white") + scale_fill_manual("residue",values = col4) +
coord_equal() +
> theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
> + theme(axis.ticks=element_blank()) +
> theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
> theme(axis.title.x=element_blank(),axis.title.y=element_blank())
> )
> dev.off()}
>
> With my scripts, I could make a waffle plot, but not a *square* 10 X 10
> waffle plot. Also, the grid size differs for positions with different
> numbers of residues. I am suspecting that I didn't use coord_equal()
> correctly.
>
> So I wonder how I can make the plots like I described above in ggplot2 or
> with some other packages. Also, is there a way to assign a color to
> different residues, say, purple for alanine, blue for glycine, etc, and
> incorporate that information in the for loop?
>
Hi Zhao,
By beginning with a 10x10 matrix of NA values and then replacing some of
them with a color, I think you can do what you want. First you need a
function to fill one corner of your matrix with values, leaving the rest
uncolored (i.e. NA):
fill.corner<-function(x,nrow,ncol) {
xlen<-length(x)
if(nrow*ncol > xlen) {
newmat<-matrix(NA,nrow=nrow,ncol=ncol)
xside<-1
while(xside*xside < xlen) xside<-xside+1
row=1
col=1
for(xindex in 1:xlen) {
newmat[row,col]<-x[xindex]
if(row == xside) {
col<-col+1
row<-1
}
else row<-row+1
}
return(newmat)
}
cat("Too many values in x
for",xrow,"by",xcol,"\n")
}
Then you have to massage your data frame into 37 smaller data frames,
create matrices with the values and colors to display on your 37 waffle
plots:
library(plotrix)
# get an "alphabet" of colors
alphacol<-rainbow(18)
# the actual values in the plotted matrix don't matter
fakemat<-matrix(1:100,nrow=10)
# pick off the positions one by one
for(pos in 1:37) {
posdf<-zjdat[zjdat$position == pos,]
for(res in 1:dim(posdf)[1]) {
if(res == 1)
rescol<-rep(alphacol[as.numeric(posdf$residue[res])],
posdf$ratio[res])
else
rescol<-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])],
posdf$ratio[res]))
}
if(!is.null(resmat<-fill.corner(rescol,10,10)))
color2D.matplot(fakemat,border="lightgray",cellcolors=resmat,
yrev=FALSE,main=c(pos,length(resmat)))
}
That might get you started. In fact, I might even write a waffle plot
function for plotrix.
Jim