Kiyoshi Sasaki
2010-Jul-15 02:26 UTC
[R] Histogram with two groups on the same graph (not on separate panels)
I have been trying to produce a histogram that has two groups (male and female
snakes) on the same graph (either superimposed or each frequency bar appears
side by side). I found a couple of functions for superimposed histogram written
by other people.
The below is the codes I used for my data containing a column of svl (body size;
snout-vent length) and another column of sex (male or female). My data structure
is shown at the bottom of my email message.
My question: When I ran the codes below (modified from:
http://onertipaday.blogspot.com/2007/04/how-to-superimpose-histograms.html), I
got an error message, “Error in hist.default(X[[1L]], ...) : 'x' must be
numeric”. Could anyone help me figure out the problem, please? Do you know
alternative codes or a function that produce a histogram containing two groups,
with each frequency bar appearing side by side, as shown in
http://home.medewerker.uva.nl/g.dutilh/bestanden/multiple%20group%20histogram.png
?
gb <- read.csv(file = "D:\\data
12.24.06\\AllMamushiCorrected5.8.10_7.12.10.csv", header=TRUE,
strip.white=TRUE,
na.strings="")
attach(gb)
superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks
="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf",
"png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab =
"x", main =
title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle =
h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle =
h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}
female <- subset(gb, sex=="f", select=svl)
male <- subset(gb, sex=="m", select=svl)
l1 = list(female, male)
superhist2pdf(l1, nbreaks="Sturges")
Error in hist.default(X[[1L]], ...) : 'x' must be numeric
FYI: The object ‘female’ and ‘male’ looks like:> female
svl
1 51.5
2 52.5
3 52.5
4 58.5
<edited>
277 NA
278 NA
279 55.4
280 57.5
> male
svl
5 41.8
14 49.5
17 49.0
20 53.0
<edited>
231 47.6
235 NA
238 50.3
241 50.5
243 62.8
244 59.0
FYI: The structure of my dataset, gb looks like:> str(gb)
'data.frame': 308 obs. of 43 variables:
$ id : Factor w/ 290 levels "(023.541.040) G7",..: 241 244
243 245
278 193 194 195 196 197 ...
$ studysite : Factor w/ 29 levels
"Assabu","Astushinai",..: 20 19 19 19 19
29 29 29 29 29 ...
$ studysitecode: int NA NA NA NA NA 7 7 7 7 7 ...
$ subsite : int NA NA NA NA NA 18 18 18 18 18 ...
$ sitecond : int NA NA NA NA NA 1 1 1 1 1 ...
$ Habitat : Factor w/ 6 levels "Beach","On road",..: 5
5 5 5 5 5 5 5 5 5
...
$ Baskingspots : int 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "f","m": 1 1 1 1 2 1 1
1 1 1 ...
$ sexcode : int NA NA NA 0 1 0 0 0 0 0 ...
$ svl : num 51.5 52.5 52.5 58.5 41.8 57.6 59 55.6 62 58.5 ...
$ tl : num 8 8 8 10 8.2 9.2 9 8.5 9.5 8.8 ...
$ bm : num 142 128 148.3 192.3 70.5 ...
$ defensiveness: int NA NA NA NA NA 1 1 1 1 2 ...
$ ftr : int NA NA NA NA NA 5 10 60 18 30 ...
$ latency : num NA NA NA NA NA NA NA NA NA NA ...
$ dvc : int NA NA NA NA NA 25 25 38 28 28 ...
$ RepCnd : Factor w/ 4 levels
"Male","Nonpregnant",..: 4 4 4 2 1 2 2 2 2
2 ...
$ repcnd : int 1 1 1 0 2 0 0 0 0 0 ...
$ repstatus : int 1 1 1 1 NA 0 0 0 0 0 ...
$ LS.w_egg : int NA NA NA NA NA NA NA NA NA NA ...
$ estimatedLS : int NA NA NA NA NA NA NA NA NA NA ...
$ lit.size : int NA NA NA NA NA NA NA NA NA NA ...
$ lit.mass : num NA NA NA NA NA NA NA NA NA NA ...
$ post.BM : num NA NA NA NA NA NA NA NA NA NA ...
$ rcm : num NA NA NA NA NA NA NA NA NA NA ...
$ mean.nSVL : num NA NA NA NA NA NA NA NA NA NA ...
$ mean.nBM : num NA NA NA NA NA NA NA NA NA NA ...
$ ta : num 23 20 20 20 21 20 20 20 20 20 ...
$ tm : num 24 29 29 30 20 20.8 20.8 20.8 20.8 20.8 ...
$ tb : num 23 29 30 NA 23 30 29.6 30.2 29 27 ...
$ partdate : int NA NA NA NA NA NA NA NA NA NA ...
$ time : int 1255 1330 1340 1355 1300 1600 1600 1600 1600 1600 ...
$ julian : int NA NA NA NA NA 1999186 1999186 1999186 1999186 1999186
...
$ JulianDate : int NA NA NA NA NA 186 186 186 186 186 ...
$ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
$ calenderdate : Factor w/ 106 levels
"10/11/2002","10/12/2001",..: 18 20 20 20
22 49 49 49 49 49 ...
$ logSVL : num NA NA NA NA NA 1.76 1.77 1.75 1.79 1.77 ...
$ logBM : num NA NA NA NA NA 2.26 2.23 2.26 2.28 2.23 ...
$ bc : num NA NA NA NA NA ...
$ Rsvl : num NA NA NA NA NA ...
$ Rftr : num NA NA NA NA NA ...
$ Rdefense : num NA NA NA NA NA 97 97 97 97 212 ...
$ Rdvc : num NA NA NA NA NA ...
Thank you so much for your time and help!
Sincerely,
Kiyoshi Sasaki
[[alternative HTML version deleted]]
Apparently Analagous Threads
- Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable
- Different goodness of fit tests leads to contradictory conclusions
- normalized frequency histogram
- superimposing histograms con't
- Effect of data set size on calculation
