Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n> n25p 5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao
If I've correctly interpreted what you want, you first need to get the x values: x <- colnames(n) x <- as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) <- c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li <Tiandao.Li at usm.edu> wrote:> Hello, > > I have a question about how to plot a series of data. The folloqing is my > data matrix of n > > n > 25p 5p 2.5p 0.5p > 16B-E06.g 45379 4383 5123 45 > 16B-E06.g 45138 4028 6249 52 > 16B-E06.g 48457 4267 5470 54 > 16B-E06.g 47740 4676 6769 48 > 37B-B02.g 42860 6152 19276 72 > 35B-A02.g 48325 12863 38274 143 > 35B-A02.g 48410 12806 39013 175 > 35B-A02.g 48417 9057 40923 176 > 35B-A02.g 51403 13865 43338 161 > 45B-C12.g 50939 3656 5783 43 > 45B-C12.g 52356 5524 6041 55 > 45B-C12.g 49338 5141 5266 41 > 45B-C12.g 51567 3915 5677 43 > 35A-G04.g 40365 5513 6971 32 > 35B-D01.g 54217 12607 13067 93 > 35B-D01.g 55283 11441 14964 101 > 35B-D01.g 55041 9626 14928 94 > 35B-D01.g 54058 9465 14912 88 > 35B-A04.g 42745 12080 34271 105 > 35B-A04.g 41055 12423 34874 126 > > colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is > Intensity. I want to plot the data this way. > x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. > y-axis is Intensity > Inside of plot is the points of intensity over 4 concentrations, points > from different genes have different color or shape. A regression line of > each genes crosss different concetrations, and at the end of line is gene > IDs. > > Thanks, > > Tiandao > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 10/2/07, Tiandao Li <Tiandao.Li at usm.edu> wrote:> Hello, > > I have a question about how to plot a series of data. The folloqing is my > data matrix of n > > n > 25p 5p 2.5p 0.5p > 16B-E06.g 45379 4383 5123 45 > 16B-E06.g 45138 4028 6249 52 > 16B-E06.g 48457 4267 5470 54 > 16B-E06.g 47740 4676 6769 48 > 37B-B02.g 42860 6152 19276 72 > 35B-A02.g 48325 12863 38274 143 > 35B-A02.g 48410 12806 39013 175 > 35B-A02.g 48417 9057 40923 176 > 35B-A02.g 51403 13865 43338 161 > 45B-C12.g 50939 3656 5783 43 > 45B-C12.g 52356 5524 6041 55 > 45B-C12.g 49338 5141 5266 41 > 45B-C12.g 51567 3915 5677 43 > 35A-G04.g 40365 5513 6971 32 > 35B-D01.g 54217 12607 13067 93 > 35B-D01.g 55283 11441 14964 101 > 35B-D01.g 55041 9626 14928 94 > 35B-D01.g 54058 9465 14912 88 > 35B-A04.g 42745 12080 34271 105 > 35B-A04.g 41055 12423 34874 126 > > colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is > Intensity. I want to plot the data this way. > x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. > y-axis is Intensity > Inside of plot is the points of intensity over 4 concentrations, points > from different genes have different color or shape. A regression line of > each genes crosss different concetrations, and at the end of line is gene > IDs.I might do it something like this: df <- structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c("16B-E06.g", "35A-G04.g", "35B-A02.g", "35B-A04.g", "35B-D01.g", "37B-B02.g", "45B-C12.g"), class = "factor"), X25p = c(45379L, 45138L, 48457L, 47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L, 49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L, 41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L, 9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L, 9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L, 6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L, 5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L, 34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L, 161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L )), .Names = c("gene", "X25p", "X5p", "X2.5p", "X0.5p"), class = "data.frame", row.names = c(NA, -20L)) library(reshape) library(ggplot2) dfm <- melt(df, id=1) names(dfm) <- c("gene", "conc", "intensity") dfm$conc <- as.numeric(gsub("[Xp]", "", as.character(dfm$conc))) qplot(conc, intensity, data=dfm, colour=gene, log="xy") + geom_smooth(method=lm) Note that I've converted the concentrations to numeric values and plotted them on a log scale. If you want to treat concentration as a factor, then you'll need the following code: dfm$conc <- factor(dfm$conc) qplot(conc, intensity, data=dfm, colour=gene, group=gene, log="y") + geom_smooth(method=lm, xseq=levels(dfm$conc)) But in that case, fitting a linear model seems a bit dubious. Note that you can also use this format of data with lattice: library(lattice) xyplot(intensity ~ conc, data=dfm, type=c("p","r"), group=gene, auto.key=T) Hadley -- http://had.co.nz/