Hi, I have been trying unsuccessfully to plot data using different colors based on a variable within a subset of an imported file. The file I am reading is about 20000 lines long and has a column (in the example called FILE) that contains approximately 100 unique entries. I would like to plot a subset of the data from the file and key the color from the FILE column, This is what my file looks like : CHR SNP BP NMISS BETA SE R2 T P REGION FILE RANDOM 1 rs17035189 10519610 135 0.3518 1.928 0.0002501 0.1824 0.8555 TCTX 4730341 0.284627081 6 rs3763311 32484154 109 -2.05 1.624 0.01467 -1.262 0.2096 TCTX 670603 0.083147673 6 rs3892710 32790839 106 0.5695 4.743 0.0001386 0.1201 0.9047 TCTX 7150403 0.549192815 6 rs3864300 32379785 102 9.208 6.416 0.02018 1.435 0.1544 TCTX 7210017 0.837265988 6 rs6912002 32873245 13 -1.295 5.043 0.005963 -0.2569 0.802 TCTX 2710441 0.170566699 5 rs4024109 35955374 9 26.19 31.01 0.09245 0.8444 0.4263 TCTX 2650653 0.298573497 6 rs3129719 32769757 16 10.35 7.44 0.1215 1.391 0.1859 TCTX 2900504 0.378538235 6 rs476885 32402690 109 -0.09378 1.552 3.411e-05 -0.06041 0.9519 TCTX 670603 0.017970964 10 rs12570766 5602540 139 0.6182 6.66 6.289e-05 0.09283 0.9262 TCTX 4560767 0.004973939 etc And this is the code that I have: assoc_data <- read.table("master.out", header =TRUE) par(fig=c(0, 10, 0, 10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2) attach(assoc_data) curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these criteria change based on input from another file #count the number of transcripts transcripts <- length(unique(curr_assoc$FILE)) #generate that number of unique ³FILE² entries in my subset of data my_colors <- rainbow(transcripts) plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20, col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA, cex=0.7, cex.lab=2) detach(assoc_data) The problem is that when I plot this I only see (for example) 2 colors instead of the expected 10. I believe that the problem I am having is that the FILE column is being recoded when I read the table (as a factor?) and that only factors within the range of my colors are being plotted (so if I have 10 colors but there are 100 unique entries in my FILE column, and the variables recoded 2, 7, 12, 34, 60, 64, 65, 70 and 71 are used, only 2 and 7 will be plotted). Many thanks for any suggestions/pointers, I have dug around in the help archives for a couple of hours but no joy. ----------------------- Andrew Singleton [[alternative HTML version deleted]]
Dear Andrew, Have a look at ggplot2 library(ggplot2) ggplot(curr_assoc, aes(x = BP, y = P, colour = FILE)) + geom_point() + scale_y_log10() HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Andrew Singleton Verzonden: maandag 2 februari 2009 15:56 Aan: r-help at r-project.org Onderwerp: [R] Defining plot colors based on a variable Hi, I have been trying unsuccessfully to plot data using different colors based on a variable within a subset of an imported file. The file I am reading is about 20000 lines long and has a column (in the example called FILE) that contains approximately 100 unique entries. I would like to plot a subset of the data from the file and key the color from the FILE column, This is what my file looks like : CHR SNP BP NMISS BETA SE R2 T P REGION FILE RANDOM 1 rs17035189 10519610 135 0.3518 1.928 0.0002501 0.1824 0.8555 TCTX 4730341 0.284627081 6 rs3763311 32484154 109 -2.05 1.624 0.01467 -1.262 0.2096 TCTX 670603 0.083147673 6 rs3892710 32790839 106 0.5695 4.743 0.0001386 0.1201 0.9047 TCTX 7150403 0.549192815 6 rs3864300 32379785 102 9.208 6.416 0.02018 1.435 0.1544 TCTX 7210017 0.837265988 6 rs6912002 32873245 13 -1.295 5.043 0.005963 -0.2569 0.802 TCTX 2710441 0.170566699 5 rs4024109 35955374 9 26.19 31.01 0.09245 0.8444 0.4263 TCTX 2650653 0.298573497 6 rs3129719 32769757 16 10.35 7.44 0.1215 1.391 0.1859 TCTX 2900504 0.378538235 6 rs476885 32402690 109 -0.09378 1.552 3.411e-05 -0.06041 0.9519 TCTX 670603 0.017970964 10 rs12570766 5602540 139 0.6182 6.66 6.289e-05 0.09283 0.9262 TCTX 4560767 0.004973939 etc And this is the code that I have: assoc_data <- read.table("master.out", header =TRUE) par(fig=c(0, 10, 0, 10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2) attach(assoc_data) curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these criteria change based on input from another file #count the number of transcripts transcripts <- length(unique(curr_assoc$FILE)) #generate that number of unique ?FILE? entries in my subset of data my_colors <- rainbow(transcripts) plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20, col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA, cex=0.7, cex.lab=2) detach(assoc_data) The problem is that when I plot this I only see (for example) 2 colors instead of the expected 10. I believe that the problem I am having is that the FILE column is being recoded when I read the table (as a factor?) and that only factors within the range of my colors are being plotted (so if I have 10 colors but there are 100 unique entries in my FILE column, and the variables recoded 2, 7, 12, 34, 60, 64, 65, 70 and 71 are used, only 2 and 7 will be plotted). Many thanks for any suggestions/pointers, I have dug around in the help archives for a couple of hours but no joy. ----------------------- Andrew Singleton [[alternative HTML version deleted]] Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
On Mon, Feb 2, 2009 at 8:56 AM, Andrew Singleton <singleta at mail.nih.gov> wrote:> Hi, I have been trying unsuccessfully to plot data using different colors > based on a variable within a subset of an imported file. The file I am > reading is about 20000 lines long and has a column (in the example called > FILE) that contains approximately 100 unique entries. I would like to plot a > subset of the data from the file and key the color from the FILE column, > This is what my file looks like : > > CHR SNP BP NMISS BETA SE R2 > T P REGION FILE RANDOM > 1 rs17035189 10519610 135 0.3518 1.928 0.0002501 > 0.1824 0.8555 TCTX 4730341 0.284627081 > 6 rs3763311 32484154 109 -2.05 1.624 0.01467 > -1.262 0.2096 TCTX 670603 0.083147673 > 6 rs3892710 32790839 106 0.5695 4.743 0.0001386 > 0.1201 0.9047 TCTX 7150403 0.549192815 > 6 rs3864300 32379785 102 9.208 6.416 0.02018 > 1.435 0.1544 TCTX 7210017 0.837265988 > 6 rs6912002 32873245 13 -1.295 5.043 0.005963 > -0.2569 0.802 TCTX 2710441 0.170566699 > 5 rs4024109 35955374 9 26.19 31.01 0.09245 > 0.8444 0.4263 TCTX 2650653 0.298573497 > 6 rs3129719 32769757 16 10.35 7.44 0.1215 > 1.391 0.1859 TCTX 2900504 0.378538235 > 6 rs476885 32402690 109 -0.09378 1.552 3.411e-05 > -0.06041 0.9519 TCTX 670603 0.017970964 > 10 rs12570766 5602540 139 0.6182 6.66 6.289e-05 > 0.09283 0.9262 TCTX 4560767 0.004973939 > etc > > > And this is the code that I have: > > assoc_data <- read.table("master.out", header =TRUE) > par(fig=c(0, 10, 0, 10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2) > attach(assoc_data) > curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these > criteria change based on input from another file > > #count the number of transcripts > transcripts <- length(unique(curr_assoc$FILE)) > > #generate that number of unique ?FILE? entries in my subset of data > my_colors <- rainbow(transcripts) > > plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20, > col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA, > cex=0.7, cex.lab=2) > detach(assoc_data)You might find it easier to use ggplot2: install.packages("ggplot2") library(ggplot2) qplot(BP, P, data = curr_assoc, colour = FILE, log="y") To ensure that you always have the same colours, you can set the limits for the colour scale (in analogous way to setting the limits for the x axis): qplot(BP, P, data = curr_assoc, colour = FILE, log="y") + scale_colour_hue(limits = c(2, 7, 12, 34, 60, 64, 65, 70, 71)) Hadley -- http://had.co.nz/
Reasonably Related Threads
- [LLVMdev] Recent Sanitizer regressions
- [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
- [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
- Issue with acl_xattr:ignore system acls in 4.5rc2
- Issue with acl_xattr:ignore system acls in 4.5rc2