Dear Solomon,
On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
<solomon.messing at gmail.com> wrote:> Dear Soren and R users:
>
> I am trying to use the summaryBy function with weights. ?Is this possible?
?An example that illustrates what I am trying to do follows:
>
> library(doBy)
> ## make up some data
> response = rnorm(100)
> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
> weights = runif(100, 0, 1)
> mydata = data.frame(response,group,weights)
>
> ## run summaryBy without weights:
> summaryBy(response~group, data = mydata, FUN = mean)
>
> ## attempt to run summaryBy with weights, throws error
> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
>
> ## throws the error:
> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
> # ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? arguments must have same length
>
> My guess is that summaryBy is not giving weighted.mean() each group of
weights, but instead is passing all of the weights in the data set each time it
calls weighted.mean().
Yes, of course. It has no way of knowing that the weights should also
be being broken down by group....they are not in the formula.
> ?Do you know if there is some way to get summaryBy to pass weights to
weighted.mean() only for each group?
Ideally there would be a way to pass more than one variable to a
function (e.g., response and weights) or just an entire object
(mydata) broken down by group. Then you would just make a wrapper
function to pass the right values to the x and w arguments of
weighted.mean. Instead here is a somewhat hacked version:
library(doBy)
## make up some data (easier)
mydata <- data.frame(response = rnorm(100),
group = rep(1:5, each = 20), weights = runif(100, 0, 1))
## manually compute weighted mean
tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum)
tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum))
tmp ## weighted means
## here's the 'problem', if you will, even with +, they are passed
one at a time
summaryBy(response + weights ~ group, data = mydata, FUN = str)
summaryBy(mydata ~ group, data = mydata, FUN = str)
## here is an option using by():
xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response,
z$weights))
xy
## if you don't like the formatting....
data.frame(group = names(c(xy)), weighted.mean = c(xy))
HTH,
Josh
>
> I suspect this functionality would be a tremendous benefit to R users who
regularly work with weighted data, such as myself.
>
> Thanks,
>
> Solomon Messing
> www.stanford.edu/~messing
>
> PS I know this basic example can be done using lapply(split(...)) approach
referenced here:
>
> http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg12349.html
>
> but for more complex tasks the lapply approach will mean writing a lot of
extra code to run everything and then to get things formatted as nicely as
summaryBy() was designed to do.
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/