Michael Friendly
2011-Oct-21 15:22 UTC
[R] lattice::xyplot/ggplot2: plotting weighted data frames with lmline and smooth
In the HistData package, I have a data frame, PearsonLee, containing
observations on heights of parent and child, in weighted form:
library(HistData)
> str(PearsonLee)
'data.frame': 746 obs. of 6 variables:
$ child : num 59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ...
$ parent : num 62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ...
$ frequency: num 0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ...
$ gp : Factor w/ 4 levels
"fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ...
$ par : Factor w/ 2 levels "Father","Mother": 1 1 1 1
1 1 1 1 1 1 ...
$ chl : Factor w/ 2 levels "Daughter","Son": 2 2 2 2
2 2 2 2 2 2 ...
I want to make a 2x2 set of plots of child ~ parent | par+chl, with
regression lines and loess smooths, that
incorporate weights=frequency. The "frequencies" are not integers, so
I
can't simply expand the
data frame.
I'd also like to use different colors for the regression and smoothed lines.
Here's what I've tried using xyplot, all unsuccessful. I suppose I
could also use ggplot2, if I could do what
I want.
xyplot(child ~ parent|par+chl, data=PearsonLee, weights=frequency,
type=c("p", "r", "smooth"))
xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p",
"r", "smooth"))
panel.lmline and panel.smooth don't have a weights= argument, though
lm() and loess() do.
# Try to control line colors: unsuccessfully -- only one value of
col.lin is used
xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p",
"r",
"smooth"), col.line=c("red", "blue"))
## try to use panel functions ... unsucessfully
xyplot(child ~ parent|par+chl, data=PearsonLee, type="p",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmline(x, y, col="blue", ...)
panel.smooth(x, y, col="red", ...)
}
)
The following, using base graphics, illustrates the difference between
the weighted and unweighted lines,
for the total data frame:
with(PearsonLee,
{
lim <- c(55,80)
xv <- seq(55,80, .5)
sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim,
seg.col="gray", size=.1)
# unweighted
abline(lm(child ~ parent), col="green", lwd=2)
lines(xv, predict(loess(child ~ parent), data.frame(parent=xv)),
col="green", lwd=2)
# weighted
abline(lm(child ~ parent, weights=frequency), col="blue", lwd=2)
lines(xv, predict(loess(child ~ parent, weights=frequency),
data.frame(parent=xv)), col="blue", lwd=2)
})
thanks,
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
Dennis Murphy
2011-Oct-21 16:15 UTC
[R] lattice::xyplot/ggplot2: plotting weighted data frames with lmline and smooth
Hi Michael:
Here's one way to get it from ggplot2. To avoid possible overplotting,
I jittered the points horizontally by +/- 0.2. I also reduced the point
size from the default 2 and increased the line thickness to 1.5 for
both fitted curves. In ggplot2, the term faceting is synonymous with
conditioning (by groups).
library('HistData')
library('ggplot2')
ggplot(PearsonLee, aes(x = parent, y = child)) +
geom_point(size = 1.5, position = position_jitter(width = 0.2)) +
geom_smooth(method = lm, aes(weights = PearsonLee$weight),
colour = 'green', se = FALSE, size = 1.5) +
geom_smooth(aes(weights = PearsonLee$weight),
colour = 'red', se = FALSE, size = 1.5) +
facet_grid(chl ~ par)
# If you prefer a legend, here's one take, pulling the legend inside
# to the upper left corner. This requires a bit more 'trickery', but
# the tricks are found in the ggplot2 book.
ggplot(PearsonLee, aes(x = parent, y = child)) +
geom_point(size = 1.5, position = position_jitter(width = 0.2)) +
geom_smooth(method = lm, aes(weights = PearsonLee$weight,
colour = 'Linear'), se = FALSE, size = 1.5) +
geom_smooth(aes(weights = PearsonLee$weight,
colour = 'Loess'), se = FALSE, size = 1.5) +
facet_grid(chl ~ par) +
scale_colour_manual(breaks = c('Linear', 'Loess'),
values = c('green', 'red')) +
opts(legend.position = c(0.14, 0.885),
legend.background = theme_rect(fill = 'white'))
HTH,
Dennis
On Fri, Oct 21, 2011 at 8:22 AM, Michael Friendly <friendly at yorku.ca>
wrote:> In the HistData package, I have a data frame, PearsonLee, containing
> observations on heights of parent and child, in weighted form:
>
> library(HistData)
>
>> str(PearsonLee)
> 'data.frame': ? 746 obs. of ?6 variables:
> ?$ child ? ?: num ?59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ...
> ?$ parent ? : num ?62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ...
> ?$ frequency: num ?0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ...
> ?$ gp ? ? ? : Factor w/ 4 levels
"fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ...
> ?$ par ? ? ?: Factor w/ 2 levels "Father","Mother": 1 1
1 1 1 1 1 1 1 1 ...
> ?$ chl ? ? ?: Factor w/ 2 levels "Daughter","Son": 2 2
2 2 2 2 2 2 2 2 ...
>
> I want to make a 2x2 set of plots of child ~ parent | par+chl, with
> regression lines and loess smooths, that
> incorporate weights=frequency. ?The "frequencies" are not
integers, so I
> can't simply expand the
> data frame.
>
> I'd also like to use different colors for the regression and smoothed
lines.
> Here's what I've tried using xyplot, all unsuccessful. ?I suppose I
could
> also use ggplot2, if I could do what
> I want.
>
> xyplot(child ~ parent|par+chl, data=PearsonLee, weights=frequency,
> type=c("p", "r", "smooth"))
> xyplot(child ~ parent|par+chl, data=PearsonLee, ?type=c("p",
"r", "smooth"))
>
> ?panel.lmline ?and panel.smooth don't have a weights= argument, though
lm()
> and loess() do.
>
> # Try to control line colors: unsuccessfully -- only one value of col.lin
is
> used
> xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p",
"r", "smooth"),
> col.line=c("red", "blue"))
>
> ## try to use panel functions ... unsucessfully
> xyplot(child ~ parent|par+chl, data=PearsonLee, type="p",
> ? ? ? panel = function(x, y, ...) {
> ? ? ? ? ? panel.xyplot(x, y, ...)
> ? ? ? ? ? panel.lmline(x, y, col="blue", ...)
> ? ? ? ? ? panel.smooth(x, y, col="red", ...)
> ? ? ? ? ? }
> )
>
> The following, using base graphics, illustrates the difference between the
> weighted and unweighted lines,
> for the total data frame:
>
> with(PearsonLee,
> ? ?{
> ? ?lim <- c(55,80)
> ? ?xv <- seq(55,80, .5)
> ? ?sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim,
> seg.col="gray", size=.1)
> ? ?# unweighted
> ? ?abline(lm(child ~ parent), col="green", lwd=2)
> ? ?lines(xv, predict(loess(child ~ parent), data.frame(parent=xv)),
> col="green", lwd=2)
> ? ?# weighted
> ? ?abline(lm(child ~ parent, weights=frequency), col="blue",
lwd=2)
> ? ?lines(xv, predict(loess(child ~ parent, weights=frequency),
> data.frame(parent=xv)), col="blue", lwd=2)
> ?})
>
> thanks,
> -Michael
>
>
>
> --
> Michael Friendly ? ? Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University ? ? ?Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street ? ?Web: ? http://www.datavis.ca
> Toronto, ONT ?M3J 1P3 CANADA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>