thr3ads.net - R help - [R] Defining plot colors based on a variable [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Andrew Singleton

2009-Feb-02 14:56 UTC

[R] Defining plot colors based on a variable

Hi, I have been trying unsuccessfully to plot data using different colors
based on a variable within a subset of an imported file. The file I am
reading is about 20000 lines long and has a column (in the example called
FILE) that contains approximately 100 unique entries. I would like to plot a
subset of the data from the file and key the color from the FILE column,
This is what my file looks like :
 
CHR          SNP         BP    NMISS       BETA         SE         R2
T            P    REGION    FILE    RANDOM
   1  rs17035189   10519610      135     0.3518      1.928  0.0002501
0.1824       0.8555     TCTX    4730341    0.284627081
   6   rs3763311   32484154      109      -2.05      1.624    0.01467
-1.262       0.2096     TCTX    670603    0.083147673
   6   rs3892710   32790839      106     0.5695      4.743  0.0001386
0.1201       0.9047     TCTX    7150403    0.549192815
   6   rs3864300   32379785      102      9.208      6.416    0.02018
1.435       0.1544     TCTX    7210017    0.837265988
   6   rs6912002   32873245       13     -1.295      5.043   0.005963
-0.2569        0.802     TCTX    2710441    0.170566699
   5    rs4024109   35955374        9      26.19      31.01    0.09245
0.8444       0.4263     TCTX    2650653    0.298573497
   6   rs3129719   32769757       16      10.35       7.44     0.1215
1.391       0.1859     TCTX    2900504    0.378538235
   6    rs476885   32402690      109   -0.09378      1.552  3.411e-05
-0.06041       0.9519     TCTX    670603    0.017970964
  10   rs12570766    5602540      139     0.6182       6.66  6.289e-05
0.09283       0.9262     TCTX    4560767    0.004973939
etc 


And this is the code that I have:

assoc_data <- read.table("master.out", header =TRUE)
par(fig=c(0, 10, 0,  10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2)
attach(assoc_data)
curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000,
] #these
criteria change based on input from another file

#count the number of transcripts
transcripts <- length(unique(curr_assoc$FILE))

#generate that number of unique ³FILE² entries in my subset of data
my_colors <- rainbow(transcripts)

plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20,
col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA,
cex=0.7, cex.lab=2)
detach(assoc_data)


The problem is that when I plot this I only see (for example) 2 colors
instead of the expected 10. I believe that the problem I am having is that
the FILE column is being recoded when I read the table (as a factor?) and
that only factors within the range of my colors are being plotted (so if I
have 10 colors but there are 100 unique entries in my FILE column, and the
variables recoded 2, 7, 12, 34, 60, 64, 65, 70 and 71 are used, only 2 and 7
will be plotted). 

Many thanks for any suggestions/pointers, I have dug around in the help
archives for a couple of hours but no joy.
-----------------------
Andrew Singleton


	[[alternative HTML version deleted]]

ONKELINX, Thierry

2009-Feb-02 15:08 UTC

head link

[R] Defining plot colors based on a variable

Dear Andrew,

Have a look at ggplot2

library(ggplot2)
ggplot(curr_assoc, aes(x = BP, y = P, colour = FILE)) + geom_point() +
scale_y_log10()

HTH,

Thierry 


----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology
and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more than
asking him to perform a post-mortem examination: he may be able to say what the
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Andrew Singleton
Verzonden: maandag 2 februari 2009 15:56
Aan: r-help at r-project.org
Onderwerp: [R] Defining plot colors based on a variable

Hi, I have been trying unsuccessfully to plot data using different colors
based on a variable within a subset of an imported file. The file I am
reading is about 20000 lines long and has a column (in the example called
FILE) that contains approximately 100 unique entries. I would like to plot a
subset of the data from the file and key the color from the FILE column,
This is what my file looks like :

CHR          SNP         BP    NMISS       BETA         SE         R2
T            P    REGION    FILE    RANDOM
   1  rs17035189   10519610      135     0.3518      1.928  0.0002501
0.1824       0.8555     TCTX    4730341    0.284627081
   6   rs3763311   32484154      109      -2.05      1.624    0.01467
-1.262       0.2096     TCTX    670603    0.083147673
   6   rs3892710   32790839      106     0.5695      4.743  0.0001386
0.1201       0.9047     TCTX    7150403    0.549192815
   6   rs3864300   32379785      102      9.208      6.416    0.02018
1.435       0.1544     TCTX    7210017    0.837265988
   6   rs6912002   32873245       13     -1.295      5.043   0.005963
-0.2569        0.802     TCTX    2710441    0.170566699
   5    rs4024109   35955374        9      26.19      31.01    0.09245
0.8444       0.4263     TCTX    2650653    0.298573497
   6   rs3129719   32769757       16      10.35       7.44     0.1215
1.391       0.1859     TCTX    2900504    0.378538235
   6    rs476885   32402690      109   -0.09378      1.552  3.411e-05
-0.06041       0.9519     TCTX    670603    0.017970964
  10   rs12570766    5602540      139     0.6182       6.66  6.289e-05
0.09283       0.9262     TCTX    4560767    0.004973939
etc 


And this is the code that I have:

assoc_data <- read.table("master.out", header =TRUE)
par(fig=c(0, 10, 0,  10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2)
attach(assoc_data)
curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000,
] #these
criteria change based on input from another file

#count the number of transcripts
transcripts <- length(unique(curr_assoc$FILE))

#generate that number of unique ?FILE? entries in my subset of data
my_colors <- rainbow(transcripts)

plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20,
col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA,
cex=0.7, cex.lab=2)
detach(assoc_data)


The problem is that when I plot this I only see (for example) 2 colors
instead of the expected 10. I believe that the problem I am having is that
the FILE column is being recoded when I read the table (as a factor?) and
that only factors within the range of my colors are being plotted (so if I
have 10 colors but there are 100 unique entries in my FILE column, and the
variables recoded 2, 7, 12, 34, 60, 64, 65, 70 and 71 are used, only 2 and 7
will be plotted). 

Many thanks for any suggestions/pointers, I have dug around in the help
archives for a couple of hours but no joy.
-----------------------
Andrew Singleton


	[[alternative HTML version deleted]]


Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

hadley wickham

2009-Feb-02 15:10 UTC

head link

[R] Defining plot colors based on a variable

On Mon, Feb 2, 2009 at 8:56 AM, Andrew Singleton <singleta at
mail.nih.gov> wrote:> Hi, I have been trying unsuccessfully to plot data using different colors
> based on a variable within a subset of an imported file. The file I am
> reading is about 20000 lines long and has a column (in the example called
> FILE) that contains approximately 100 unique entries. I would like to plot
a
> subset of the data from the file and key the color from the FILE column,
> This is what my file looks like :
>
> CHR          SNP         BP    NMISS       BETA         SE         R2
> T            P    REGION    FILE    RANDOM
>   1  rs17035189   10519610      135     0.3518      1.928  0.0002501
> 0.1824       0.8555     TCTX    4730341    0.284627081
>   6   rs3763311   32484154      109      -2.05      1.624    0.01467
> -1.262       0.2096     TCTX    670603    0.083147673
>   6   rs3892710   32790839      106     0.5695      4.743  0.0001386
> 0.1201       0.9047     TCTX    7150403    0.549192815
>   6   rs3864300   32379785      102      9.208      6.416    0.02018
> 1.435       0.1544     TCTX    7210017    0.837265988
>   6   rs6912002   32873245       13     -1.295      5.043   0.005963
> -0.2569        0.802     TCTX    2710441    0.170566699
>   5    rs4024109   35955374        9      26.19      31.01    0.09245
> 0.8444       0.4263     TCTX    2650653    0.298573497
>   6   rs3129719   32769757       16      10.35       7.44     0.1215
> 1.391       0.1859     TCTX    2900504    0.378538235
>   6    rs476885   32402690      109   -0.09378      1.552  3.411e-05
> -0.06041       0.9519     TCTX    670603    0.017970964
>  10   rs12570766    5602540      139     0.6182       6.66  6.289e-05
> 0.09283       0.9262     TCTX    4560767    0.004973939
> etc
>
>
> And this is the code that I have:
>
> assoc_data <- read.table("master.out", header =TRUE)
> par(fig=c(0, 10, 0,  10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2)
> attach(assoc_data)
> curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP <
1000000, ] #these
> criteria change based on input from another file
>
> #count the number of transcripts
> transcripts <- length(unique(curr_assoc$FILE))
>
> #generate that number of unique ?FILE? entries in my subset of data
> my_colors <- rainbow(transcripts)
>
> plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20,
> col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i",
xlab=NA,
> cex=0.7, cex.lab=2)
> detach(assoc_data)
You might find it easier to use ggplot2:

install.packages("ggplot2")
library(ggplot2)

qplot(BP, P, data = curr_assoc, colour = FILE, log="y")

To ensure that you always have the same colours, you can set the
limits for the colour scale (in analogous way to setting the limits
for the x axis):

qplot(BP, P, data = curr_assoc, colour = FILE, log="y") +
scale_colour_hue(limits = c(2, 7, 12, 34, 60, 64, 65, 70, 71))

Hadley

-- 
http://had.co.nz/

Reasonably Related Threads

Search for more reasonably related threads

R help - Feb 2009 - Defining plot colors based on a variable

[R] Defining plot colors based on a variable

[R] Defining plot colors based on a variable

[R] Defining plot colors based on a variable

Reasonably Related Threads