Kiyoshi Sasaki
2010-Jul-15 02:26 UTC
[R] Histogram with two groups on the same graph (not on separate panels)
I have been trying to produce a histogram that has two groups (male and female snakes) on the same graph (either superimposed or each frequency bar appears side by side). I found a couple of functions for superimposed histogram written by other people. The below is the codes I used for my data containing a column of svl (body size; snout-vent length) and another column of sex (male or female). My data structure is shown at the bottom of my email message. My question: When I ran the codes below (modified from: http://onertipaday.blogspot.com/2007/04/how-to-superimpose-histograms.html), I got an error message, “Error in hist.default(X[[1L]], ...) : 'x' must be numeric”. Could anyone help me figure out the problem, please? Do you know alternative codes or a function that produce a histogram containing two groups, with each frequency bar appearing side by side, as shown in http://home.medewerker.uva.nl/g.dutilh/bestanden/multiple%20group%20histogram.png ? gb <- read.csv(file = "D:\\data 12.24.06\\AllMamushiCorrected5.8.10_7.12.10.csv", header=TRUE, strip.white=TRUE, na.strings="") attach(gb) superhist2pdf <- function(x, filename = "super_histograms.pdf", dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") { junk = NULL grouping = NULL for(i in 1:length(x)) { junk = c(junk,x[[i]]) grouping <- c(grouping, rep(i,length(x[[i]]))) } grouping <- factor(grouping) n.gr <- length(table(grouping)) xr <- range(junk) histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE) maxC <- max(sapply(lapply(histL, "[[", "counts"), max)) if(dev == "pdf") { pdf(filename, version = "1.4") } else{} if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) { cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) } else { h.den <- c(10, 15, 20) h.ang <- c(45, 15, -30) } if(TC) { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = title) } else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle = h.ang[1], xlab = "x") } if(!transparent.cols) { for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = h.ang[j]) } else { for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) } invisible() if( dev == "pdf") { dev.off() } } female <- subset(gb, sex=="f", select=svl) male <- subset(gb, sex=="m", select=svl) l1 = list(female, male) superhist2pdf(l1, nbreaks="Sturges") Error in hist.default(X[[1L]], ...) : 'x' must be numeric FYI: The object ‘female’ and ‘male’ looks like:> femalesvl 1 51.5 2 52.5 3 52.5 4 58.5 <edited> 277 NA 278 NA 279 55.4 280 57.5> malesvl 5 41.8 14 49.5 17 49.0 20 53.0 <edited> 231 47.6 235 NA 238 50.3 241 50.5 243 62.8 244 59.0 FYI: The structure of my dataset, gb looks like:> str(gb)'data.frame': 308 obs. of 43 variables: $ id : Factor w/ 290 levels "(023.541.040) G7",..: 241 244 243 245 278 193 194 195 196 197 ... $ studysite : Factor w/ 29 levels "Assabu","Astushinai",..: 20 19 19 19 19 29 29 29 29 29 ... $ studysitecode: int NA NA NA NA NA 7 7 7 7 7 ... $ subsite : int NA NA NA NA NA 18 18 18 18 18 ... $ sitecond : int NA NA NA NA NA 1 1 1 1 1 ... $ Habitat : Factor w/ 6 levels "Beach","On road",..: 5 5 5 5 5 5 5 5 5 5 ... $ Baskingspots : int 1 1 1 1 1 1 1 1 1 1 ... $ sex : Factor w/ 2 levels "f","m": 1 1 1 1 2 1 1 1 1 1 ... $ sexcode : int NA NA NA 0 1 0 0 0 0 0 ... $ svl : num 51.5 52.5 52.5 58.5 41.8 57.6 59 55.6 62 58.5 ... $ tl : num 8 8 8 10 8.2 9.2 9 8.5 9.5 8.8 ... $ bm : num 142 128 148.3 192.3 70.5 ... $ defensiveness: int NA NA NA NA NA 1 1 1 1 2 ... $ ftr : int NA NA NA NA NA 5 10 60 18 30 ... $ latency : num NA NA NA NA NA NA NA NA NA NA ... $ dvc : int NA NA NA NA NA 25 25 38 28 28 ... $ RepCnd : Factor w/ 4 levels "Male","Nonpregnant",..: 4 4 4 2 1 2 2 2 2 2 ... $ repcnd : int 1 1 1 0 2 0 0 0 0 0 ... $ repstatus : int 1 1 1 1 NA 0 0 0 0 0 ... $ LS.w_egg : int NA NA NA NA NA NA NA NA NA NA ... $ estimatedLS : int NA NA NA NA NA NA NA NA NA NA ... $ lit.size : int NA NA NA NA NA NA NA NA NA NA ... $ lit.mass : num NA NA NA NA NA NA NA NA NA NA ... $ post.BM : num NA NA NA NA NA NA NA NA NA NA ... $ rcm : num NA NA NA NA NA NA NA NA NA NA ... $ mean.nSVL : num NA NA NA NA NA NA NA NA NA NA ... $ mean.nBM : num NA NA NA NA NA NA NA NA NA NA ... $ ta : num 23 20 20 20 21 20 20 20 20 20 ... $ tm : num 24 29 29 30 20 20.8 20.8 20.8 20.8 20.8 ... $ tb : num 23 29 30 NA 23 30 29.6 30.2 29 27 ... $ partdate : int NA NA NA NA NA NA NA NA NA NA ... $ time : int 1255 1330 1340 1355 1300 1600 1600 1600 1600 1600 ... $ julian : int NA NA NA NA NA 1999186 1999186 1999186 1999186 1999186 ... $ JulianDate : int NA NA NA NA NA 186 186 186 186 186 ... $ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ... $ calenderdate : Factor w/ 106 levels "10/11/2002","10/12/2001",..: 18 20 20 20 22 49 49 49 49 49 ... $ logSVL : num NA NA NA NA NA 1.76 1.77 1.75 1.79 1.77 ... $ logBM : num NA NA NA NA NA 2.26 2.23 2.26 2.28 2.23 ... $ bc : num NA NA NA NA NA ... $ Rsvl : num NA NA NA NA NA ... $ Rftr : num NA NA NA NA NA ... $ Rdefense : num NA NA NA NA NA 97 97 97 97 212 ... $ Rdvc : num NA NA NA NA NA ... Thank you so much for your time and help! Sincerely, Kiyoshi Sasaki [[alternative HTML version deleted]]
Possibly Parallel Threads
- Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable
- Different goodness of fit tests leads to contradictory conclusions
- normalized frequency histogram
- superimposing histograms con't
- Effect of data set size on calculation