Michael Friendly
2011-Oct-21 15:22 UTC
[R] lattice::xyplot/ggplot2: plotting weighted data frames with lmline and smooth
In the HistData package, I have a data frame, PearsonLee, containing observations on heights of parent and child, in weighted form: library(HistData) > str(PearsonLee) 'data.frame': 746 obs. of 6 variables: $ child : num 59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ... $ parent : num 62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ... $ frequency: num 0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ... $ gp : Factor w/ 4 levels "fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ... $ par : Factor w/ 2 levels "Father","Mother": 1 1 1 1 1 1 1 1 1 1 ... $ chl : Factor w/ 2 levels "Daughter","Son": 2 2 2 2 2 2 2 2 2 2 ... I want to make a 2x2 set of plots of child ~ parent | par+chl, with regression lines and loess smooths, that incorporate weights=frequency. The "frequencies" are not integers, so I can't simply expand the data frame. I'd also like to use different colors for the regression and smoothed lines. Here's what I've tried using xyplot, all unsuccessful. I suppose I could also use ggplot2, if I could do what I want. xyplot(child ~ parent|par+chl, data=PearsonLee, weights=frequency, type=c("p", "r", "smooth")) xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p", "r", "smooth")) panel.lmline and panel.smooth don't have a weights= argument, though lm() and loess() do. # Try to control line colors: unsuccessfully -- only one value of col.lin is used xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p", "r", "smooth"), col.line=c("red", "blue")) ## try to use panel functions ... unsucessfully xyplot(child ~ parent|par+chl, data=PearsonLee, type="p", panel = function(x, y, ...) { panel.xyplot(x, y, ...) panel.lmline(x, y, col="blue", ...) panel.smooth(x, y, col="red", ...) } ) The following, using base graphics, illustrates the difference between the weighted and unweighted lines, for the total data frame: with(PearsonLee, { lim <- c(55,80) xv <- seq(55,80, .5) sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim, seg.col="gray", size=.1) # unweighted abline(lm(child ~ parent), col="green", lwd=2) lines(xv, predict(loess(child ~ parent), data.frame(parent=xv)), col="green", lwd=2) # weighted abline(lm(child ~ parent, weights=frequency), col="blue", lwd=2) lines(xv, predict(loess(child ~ parent, weights=frequency), data.frame(parent=xv)), col="blue", lwd=2) }) thanks, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Dennis Murphy
2011-Oct-21 16:15 UTC
[R] lattice::xyplot/ggplot2: plotting weighted data frames with lmline and smooth
Hi Michael: Here's one way to get it from ggplot2. To avoid possible overplotting, I jittered the points horizontally by +/- 0.2. I also reduced the point size from the default 2 and increased the line thickness to 1.5 for both fitted curves. In ggplot2, the term faceting is synonymous with conditioning (by groups). library('HistData') library('ggplot2') ggplot(PearsonLee, aes(x = parent, y = child)) + geom_point(size = 1.5, position = position_jitter(width = 0.2)) + geom_smooth(method = lm, aes(weights = PearsonLee$weight), colour = 'green', se = FALSE, size = 1.5) + geom_smooth(aes(weights = PearsonLee$weight), colour = 'red', se = FALSE, size = 1.5) + facet_grid(chl ~ par) # If you prefer a legend, here's one take, pulling the legend inside # to the upper left corner. This requires a bit more 'trickery', but # the tricks are found in the ggplot2 book. ggplot(PearsonLee, aes(x = parent, y = child)) + geom_point(size = 1.5, position = position_jitter(width = 0.2)) + geom_smooth(method = lm, aes(weights = PearsonLee$weight, colour = 'Linear'), se = FALSE, size = 1.5) + geom_smooth(aes(weights = PearsonLee$weight, colour = 'Loess'), se = FALSE, size = 1.5) + facet_grid(chl ~ par) + scale_colour_manual(breaks = c('Linear', 'Loess'), values = c('green', 'red')) + opts(legend.position = c(0.14, 0.885), legend.background = theme_rect(fill = 'white')) HTH, Dennis On Fri, Oct 21, 2011 at 8:22 AM, Michael Friendly <friendly at yorku.ca> wrote:> In the HistData package, I have a data frame, PearsonLee, containing > observations on heights of parent and child, in weighted form: > > library(HistData) > >> str(PearsonLee) > 'data.frame': ? 746 obs. of ?6 variables: > ?$ child ? ?: num ?59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ... > ?$ parent ? : num ?62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ... > ?$ frequency: num ?0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ... > ?$ gp ? ? ? : Factor w/ 4 levels "fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ... > ?$ par ? ? ?: Factor w/ 2 levels "Father","Mother": 1 1 1 1 1 1 1 1 1 1 ... > ?$ chl ? ? ?: Factor w/ 2 levels "Daughter","Son": 2 2 2 2 2 2 2 2 2 2 ... > > I want to make a 2x2 set of plots of child ~ parent | par+chl, with > regression lines and loess smooths, that > incorporate weights=frequency. ?The "frequencies" are not integers, so I > can't simply expand the > data frame. > > I'd also like to use different colors for the regression and smoothed lines. > Here's what I've tried using xyplot, all unsuccessful. ?I suppose I could > also use ggplot2, if I could do what > I want. > > xyplot(child ~ parent|par+chl, data=PearsonLee, weights=frequency, > type=c("p", "r", "smooth")) > xyplot(child ~ parent|par+chl, data=PearsonLee, ?type=c("p", "r", "smooth")) > > ?panel.lmline ?and panel.smooth don't have a weights= argument, though lm() > and loess() do. > > # Try to control line colors: unsuccessfully -- only one value of col.lin is > used > xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p", "r", "smooth"), > col.line=c("red", "blue")) > > ## try to use panel functions ... unsucessfully > xyplot(child ~ parent|par+chl, data=PearsonLee, type="p", > ? ? ? panel = function(x, y, ...) { > ? ? ? ? ? panel.xyplot(x, y, ...) > ? ? ? ? ? panel.lmline(x, y, col="blue", ...) > ? ? ? ? ? panel.smooth(x, y, col="red", ...) > ? ? ? ? ? } > ) > > The following, using base graphics, illustrates the difference between the > weighted and unweighted lines, > for the total data frame: > > with(PearsonLee, > ? ?{ > ? ?lim <- c(55,80) > ? ?xv <- seq(55,80, .5) > ? ?sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim, > seg.col="gray", size=.1) > ? ?# unweighted > ? ?abline(lm(child ~ parent), col="green", lwd=2) > ? ?lines(xv, predict(loess(child ~ parent), data.frame(parent=xv)), > col="green", lwd=2) > ? ?# weighted > ? ?abline(lm(child ~ parent, weights=frequency), col="blue", lwd=2) > ? ?lines(xv, predict(loess(child ~ parent, weights=frequency), > data.frame(parent=xv)), col="blue", lwd=2) > ?}) > > thanks, > -Michael > > > > -- > Michael Friendly ? ? Email: friendly AT yorku DOT ca > Professor, Psychology Dept. > York University ? ? ?Voice: 416 736-5115 x66249 Fax: 416 736-5814 > 4700 Keele Street ? ?Web: ? http://www.datavis.ca > Toronto, ONT ?M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >