jlhoward.2015 at gmail.com
2013-Nov-12 21:18 UTC
[R] geom_abline does not seem to respect groups in facet_grid [ggplot2]
Just trying to understand how geom_abline works with facets in ggplot. By way of example, I have a dataset of student test scores. These are in a data table dt with 4 columns: student: unique student ID cohort: grouping factor for students (A, B, . H) subject: subject of the test (English, Math, Science) score: the test score for that student in that subject The goal is to compare cohorts. ## Code to generate dt library(data.table) ## cohorts: list of cohorts with number of students in each cohorts <- data.table(name=toupper(letters[1:8]),size=as.numeric(c(8,25,16,30,10,27,13,32))) ## base: assign students to cohorts base <- data.table(student=c(1:sum(cohorts$size)),cohort=rep(cohorts$name,cohorts$size)) ## scores for each subject english <- data.table(base,subject="English", score=rnorm(nrow(base), mean=45, sd=50)) math <- data.table(base,subject="Math", score=rnorm(nrow(base), mean=55, sd=25)) science <- data.table(base,subject="Science", score=rnorm(nrow(base), mean=70, sd=25)) ## combine dt <- rbind(english,math,science) ## clip scores to (0,100) dt$score<- (dt$score>=0) * dt$score dt$score<- (dt$score<=100)*dt$score + (dt$score>100)*100 The following displays mean score by cohort with 95% CL, facetted by subject, and includes a (blue, dashed) reference line (using geom_abline). library(ggplot2) library(Hmisc) ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100) ggp <- ggp + stat_summary(fun.data="mean_cl_normal") ggp <- ggp + geom_abline(aes(slope=0,intercept=mean(score)),color="blue",linetype="dashed") ggp <- ggp + facet_grid(subject~.) ggp The problem is that the reference line (from geom_abline) is the same in all facets (= the grand average score for all students and all subjects). So stat_summary seems to respect the grouping implied in facet_grid (e.g., by subject), but abline does not. *Why*? NB: I realize this problem can be solved by creating a table of group means and using that as the data source in geom_abline (below), but *why is this necessary*? means <- dt[,list(mean.score=mean(score)),by="subject"] ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100) ggp <- ggp + stat_summary(fun.data="mean_cl_normal") ggp <- ggp + geom_abline(data=means, aes(slope=0,intercept=mean.score),color="blue",linetype="dashed") ggp <- ggp + facet_grid(subject~.) ggp [[alternative HTML version deleted]]