Hi:
See inline, please.
On Thu, Jan 20, 2011 at 4:22 PM, Nathan Miller
<natemiller77@gmail.com>wrote:
> Hello,
>
> I am trying to generate a set of boxplots using ggplot with the following
> data with 4 columns (Day, Site, VO2, Cruise)
>
> AllCorbulaMR
> Day Site VO2 Cruise
> 1 1 1 148.43632670 1
> 2 1 1 61.73864969 1
> 3 1 1 92.64536096 1
> 4 1 1 73.35434957 1
> 5 1 1 69.85568810 1
> 6 1 1 98.71116866 1
> 7 1 1 67.57880107 1
> 8 1 1 80.57959160 1
> 9 1 1 53.38577137 1
> 10 1 2 81.08429594 1
> 11 1 2 79.73019687 1
> 12 1 2 67.93991806 1
> 13 1 2 50.69929558 1
> 14 1 2 42.02457680 1
> 15 1 2 64.10049924 1
> 16 1 2 80.02264095 1
> 17 1 2 67.14828804 1
> 18 1 2 93.33363743 1
> 19 1 3 53.86021985 1
> 20 1 3 50.53366868 1
> 21 1 3 52.12437086 1
> 22 1 3 43.44618922 1
> 23 1 3 64.64322840 1
> 24 1 3 55.03761768 1
> 25 1 3 67.79501374 1
> 26 1 4 12.70806068 1
> 27 1 4 114.56401960 1
> 28 1 4 34.34450695 1
> 29 1 4 76.70849935 1
> 30 1 4 68.99752863 1
> 31 1 4 71.23080332 1
> 32 1 1 0.08222308 2
> 33 1 1 NA 2
> 34 1 1 0.03743258 2
> 35 1 1 0.04496363 2
> 36 1 1 0.07184903 2
> 37 1 1 0.05637676 2
> 38 1 1 0.05163886 2
> 39 1 1 0.03022606 2
> 40 1 1 0.04150667 2
> 41 2 1 0.04982530 2
> 42 2 1 0.05248479 2
> 43 2 1 0.03839707 2
> 44 2 1 0.04283591 2
> 45 2 1 0.03285247 2
> 46 2 1 0.03965853 2
> 47 2 1 NA 2
> 48 2 1 0.03637822 2
> 49 2 1 0.03686663 2
> 50 1 2 0.04086229 2
> 51 1 2 NA 2
> 52 1 2 0.01891389 2
> 53 1 2 0.03365864 2
> 54 1 2 0.04179611 2
> 55 1 2 0.04675111 2
> 56 1 2 0.04734616 2
> 57 1 2 0.04046907 2
> 58 1 2 0.03395499 2
> 59 2 2 0.02104620 2
> 60 2 2 NA 2
> 61 2 2 NA 2
> 62 2 2 NA 2
> 63 2 2 0.01882796 2
> 64 2 2 NA 2
> 65 2 2 NA 2
> 66 2 2 0.02328894 2
> 67 2 2 0.02635327 2
> 68 1 3 0.06030056 2
> 69 1 3 0.04728888 2
> 70 1 3 0.04307900 2
> 71 1 3 0.05144241 2
> 72 1 3 0.03223973 2
> 73 1 3 0.05145292 2
> 74 1 3 0.02718536 2
> 75 1 3 0.02830348 2
> 76 1 3 0.05859836 2
> 77 2 3 0.04521778 2
> 78 2 3 0.03242385 2
> 79 2 3 0.03412688 2
> 80 2 3 0.04407171 2
> 81 2 3 0.04517834 2
> 82 2 3 NA 2
> 83 2 3 0.02745407 2
> 84 2 3 0.03118602 2
> 85 2 3 0.04420074 2
> 86 1 4 0.05352334 2
> 87 1 4 0.05378120 2
> 88 1 4 0.04394838 2
> 89 1 4 0.02597939 2
> 90 1 4 0.05476946 2
> 91 1 4 0.04371743 2
> 92 2 4 0.04022729 2
> 93 2 4 0.04078509 2
> 94 2 4 0.04911994 2
> 95 2 4 0.04068468 2
> 96 2 4 NA 2
> 97 2 4 NA 2
> 98 1 1 0.08892223 3
> 99 1 1 0.08617873 3
> 100 1 1 0.06108950 3
> 101 1 1 0.19047922 3
> 102 1 1 0.09865930 3
> 103 1 1 0.12103549 3
> 104 1 1 0.06788404 3
> 105 1 1 0.12629497 3
> 106 1 1 0.10947173 3
> 107 1 1 0.11381467 3
> 108 2 1 0.07809781 3
> 109 2 1 0.04397586 3
> 110 2 1 0.06317635 3
> 111 2 1 0.02020365 3
> 112 2 1 0.09525985 3
> 113 2 1 0.04732347 3
> 114 2 1 0.03043341 3
> 115 2 1 0.04419395 3
> 116 2 1 NA 3
> 117 2 1 NA 3
> 118 1 2 0.03380003 3
> 119 1 2 0.02600926 3
> 120 1 2 0.03980552 3
> 121 1 2 0.03659985 3
> 122 1 2 0.04867881 3
> 123 1 2 0.03694679 3
> 124 1 2 0.03372825 3
> 125 1 2 0.03644750 3
> 126 1 2 0.01497611 3
> 127 1 2 0.02697976 3
> 128 2 2 0.03136923 3
> 129 2 2 0.03602215 3
> 130 2 2 0.04000660 3
> 131 2 2 0.03673098 3
> 132 2 2 0.03090854 3
> 133 2 2 0.04877643 3
> 134 2 2 0.02468537 3
> 135 2 2 NA 3
> 136 2 2 NA 3
> 137 2 2 NA 3
> 138 1 3 0.04809866 3
> 139 1 3 0.02380070 3
> 140 1 3 0.03672271 3
> 141 1 3 0.03232115 3
> 142 1 3 0.02950701 3
> 143 1 3 0.05068163 3
> 144 1 3 0.03004234 3
> 145 1 3 0.03090461 3
> 146 1 3 0.04539888 3
> 147 1 3 0.03261571 3
> 148 2 3 0.02069708 3
> 149 2 3 0.02117658 3
> 150 2 3 0.03244907 3
> 151 2 3 0.03404048 3
> 152 2 3 0.04597381 3
> 153 2 3 0.04034278 3
> 154 2 3 0.02167128 3
> 155 2 3 0.02245179 3
> 156 2 3 NA 3
> 157 2 3 NA 3
> 158 1 4 0.03935839 3
> 159 1 4 0.02932294 3
> 160 1 4 0.04928409 3
> 161 1 4 0.05075344 3
> 162 1 4 0.04353663 3
> 163 1 4 0.03376173 3
> 164 1 4 0.03640901 3
> 165 1 4 0.03616992 3
> 166 1 4 0.04999144 3
> 167 1 4 NA 3
> 168 2 4 0.02864131 3
> 169 2 4 0.02269317 3
> 170 2 4 0.04231203 3
> 171 2 4 0.03117968 3
> 172 2 4 0.05936813 3
> 173 2 4 0.04453866 3
> 174 2 4 0.04596834 3
> 175 2 4 0.03316604 3
> 176 2 4 0.03557714 3
> 177 2 4 0.03546922 3
> 178 1 1 25.85293219 4
> 179 1 1 114.35190440 4
> 180 1 1 41.24654852 4
> 181 1 1 30.41020474 4
> 182 1 1 36.48809050 4
> 183 1 1 54.41756532 4
> 184 1 1 19.80251528 4
> 185 1 1 69.26706620 4
> 186 1 1 24.58416429 4
> 187 1 1 34.07847862 4
> 188 2 1 21.89169319 4
> 189 2 1 59.63530232 4
> 190 2 1 18.52010709 4
> 191 2 1 41.19883744 4
> 192 2 1 48.19844111 4
> 193 2 1 15.39495989 4
> 194 2 1 29.50623201 4
> 195 2 1 51.30763254 4
> 196 2 1 7.47325378 4
> 197 2 1 24.31193857 4
> 198 1 2 19.29834204 4
> 199 1 2 36.58998998 4
> 200 1 2 28.68983063 4
> 201 1 2 67.06563511 4
> 202 1 2 30.00310234 4
> 203 1 2 28.28411410 4
> 204 1 2 34.81315669 4
> 205 1 2 45.49758389 4
> 206 1 2 30.96530199 4
> 207 1 2 35.12478034 4
> 208 2 2 20.10730199 4
> 209 2 2 20.51722925 4
> 210 2 2 0.74851863 4
> 211 2 2 16.93243539 4
> 212 2 2 8.88325120 4
> 213 2 2 49.09739859 4
> 214 2 2 6.19244047 4
> 215 2 2 27.52476529 4
> 216 2 2 14.38173305 4
> 217 2 2 5.80964158 4
> 218 1 3 29.04103986 4
> 219 1 3 0.92325019 4
> 220 1 3 29.13337158 4
> 221 1 3 34.74658994 4
> 222 1 3 49.14354727 4
> 223 1 3 22.07507554 4
> 224 1 3 34.33727406 4
> 225 1 3 36.87618286 4
> 226 1 3 31.18096886 4
> 227 1 3 36.55777870 4
> 228 2 3 25.58876028 4
> 229 2 3 0.98494566 4
> 230 2 3 24.97630908 4
> 231 2 3 11.95195819 4
> 232 2 3 21.70297861 4
> 233 2 3 13.65235649 4
> 234 2 3 10.50488380 4
> 235 2 3 2.87966024 4
> 236 2 3 2.73826233 4
> 237 2 3 0.11720351 4
> 238 1 4 4.55652677 4
> 239 1 4 8.70352253 4
> 240 1 4 41.11933137 4
> 241 1 4 28.85120068 4
> 242 1 4 32.32787505 4
> 243 1 4 23.07206283 4
> 244 1 4 21.14382198 4
> 245 1 4 13.81868580 4
> 246 1 4 58.54591722 4
> 247 1 4 43.80195253 4
> 248 2 4 46.78384255 4
> 249 2 4 0.61816630 4
> 250 2 4 13.55381496 4
> 251 2 4 4.48444064 4
> 252 2 4 14.19295226 4
> 253 2 4 11.94361059 4
> 254 2 4 35.62088407 4
> 255 2 4 16.11595544 4
> 256 2 4 34.80256181 4
> 257 2 4 19.66686930 4
>
> I would like to make a boxplot with Site on the x-axis and VO2 on the
> y-axis
> and with the fill colour specified by the Cruise. I have the following code
>
> p=ggplot(AllCorbulaMR,aes(factor(Site),VO2))
>
>
p+geom_boxplot(aes(fill=factor(Cruise)))+scale_fill_manual('Cruise',
> values=(colours()[c(375,577,573,439)]), breaks=c(1,2,3,4),
labels=c('Cruise
> 1', 'Cruise 2', 'Cruise 3', 'Cruise
4'))+xlab('Sampling
> Site')+ylab(expression("Metabolic Rate "
> (mu*moles*~O[2]*~mg^-1*~hr^-1)))+opts(title="Corbula
> Cruises")+ylim(c(0,0.2))
>
>
> The issue I have at the moment is that only Cruises 2 and 3 are plotted
> correctly. The other two are not shown or the boxes are plotted as single
> horizontal lines. I also get a warning that 129 rows containing missing
> data
> were removed (probably explaining the missing boxplots). I have a few
> missing data points (specified NA), but not 129. Can anyone see an error in
> the code I have that would explain why some rows of data are being
> disregarded?
>
(1) Groups 1 and 4 have much larger VO2 values than do groups 2 and 3. Your
VO2 data range from about 0.01 to 150, which suggests it might a a prime
candidate for a log transformation.
(2) You've told ggplot() to restrict the y-range from 0 to 0.2; that's
why
129 rows of data are removed. You can see all the observations whose VO2
value is less than 0.2, but everything larger than that is suppressed
because of your choice of y-range.
The 'complete' plot of the data (after some reorganization of your code
with
spacing) produces flatlines for Cruises 2 and 3, since the full range of VO2
values is from 0 to 150. (I'm pretty sure you did that and then restricted
the y-range later because of this.) I reran your code with metabolic rate
plotted on the log_10 scale and got what looked like an acceptable plot to
me, although I'd look carefully at the data on the upper end of the scale.
Here's the modified code (with spacing, since it affects the position of
labels):
p=ggplot(AllCorbulaMR,aes(factor(Site),VO2))
p + geom_boxplot(aes(fill=factor(Cruise))) +
scale_fill_manual('Cruise', values=(colours()[c(375,577,573,439)]),
breaks=c(1,2,3,4),
labels=c('Cruise1', 'Cruise 2',
'Cruise 3', 'Cruise
4')) +
xlab('Sampling Site') +
scale_y_continuous(trans = 'log10') +
ylab(expression("Metabolic Rate "(mu*moles*~O[2]*~mg^-1*~hr^-1)))
+
opts(title="Corbula Cruises")
You can adjust the labeling/scaling of the y-axis using some of the options
available in scale_continuous() if desired. There are quite a few
transformation options; you may well find a better one than what I chose
here.
There is also a list devoted to ggplot2, to which you can subscribe at
http://had.co.nz/ggplot2, where questions like this might be better placed.
The on-line help system for ggplot2, with myriad examples, is found towards
the bottom of that page.
HTH,
Dennis
Thank you,>
> Nate
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]