Johann Hibschman
2009-Oct-08 14:20 UTC
[R] Plotting fit marginals, multiple plots on same x-axis
I'm trying to plot the "marginals" of a fit: the aggregated value of the actual and predicted vs. a cut/bucketed dimension. (The data set is huge, so just plotting the raw points would be unintelligible.) I'd also like to plot the number of points in each bucket (or, rather, the sum of the weights in each bucket), so I can mentally discount crazy behavior at low weights. To do this, I want a divided plot, with the same x-axis. The top plot, larger, would show the predicted and actual line. The bottom plot, smaller, the count/weight data. (Alternative suggestions for how to view this are welcome.) If I use ggplot2, I can get a plot that mostly looks like what I want, but I can't get one facet to be larger than the other. For time series, using plot.zoo with a heights option gives the effect I'm looking for, but this isn't a time series. Using layout, I end up duplicating the x-axis and wasting a lot of space; also, as far as I can tell, nothing actually guarantees that two plots aligned with layout would be on the same x-coordinate axis, so if the y-axis label of one ends up larger than the other, the curves won't line up. Now, I'm sure that I can eventually hack up a layout-based solution so that it works, by appropriate margin/axis/etc settings, but I thought I'd ask if there's a better, elegant way. Also, I'm no master of R graphics, so it would take a long time for me to figure out what to do, so I'd want a bit of confirmation that that's the right way to go. So, any suggestions? To give a concrete example, here's something based on the mtcars dataset that more-or-less shows what I want, aside from the complication that my dataset is much larger: ## Make some sample data. mtc <- within(mtcars, mpg.pred <- predict(lm(mpg~wt))) hp.cut <- 25*mtc$hp%/%25 mtc.agg <- merge(aggregate(mtc[,c("mpg","mpg.pred")], list(hp.cut=hp.cut), mean), aggregate(list(count=rep(1,nrow(mtc))), list(hp.cut=hp.cut), sum)) ## Is there an easier way to do this aggregation? ## Basic plot with layout. ## Not that pretty, wastes a lot of space by duplicating axes. layout(1:2, heights=c(2, 1)) plot(mpg ~ hp.cut, data=agg, type='b') lines(mpg.pred ~ hp.cut, data=agg, type='b', col='red') legend("topright", legend=c("actual", "predicted"), col=c("black", "red"), lty=1, pch=1) plot(count ~ hp.cut, data=agg, type='l') ## Try to use ggplot2 for prettier plots. ## Very pretty, but the "secondary" variable of count gets equal billing ## with the "main" variables of mpg and mpg.pred. library(ggplot2) mtc.melt <- melt(mtc.agg[,c("hp.cut","mpg","mpg.pred","count")], id.vars=1) mtc.melt$mpg.f <- factor(ifelse(mtc.melt$variable=="count", "Count", "MPG"), levels=c("MPG", "Count")) qplot(hp.cut, value, data=mtc.melt, geom=c("line","point"), colour=variable) + facet_grid(mpg.f ~ ., scales="free") Thanks, Johann