StellathePug
2011-Sep-21 18:15 UTC
[R] Weighted Average on More than One Variable in Data Frame
Dear R Users, I have looked for a solution to the following problem and I have not been able to find it on the archive, through Google or in the R documentation. I have a data frame, say df, which has 4 variables, one of which I would like to use as a grouping variable (g), another one that I would like to use for my weights (w) The other two variables are variables (x1 and x2) for which I would like to compute the weighted average by group. df <- data.frame(x1 = c(15, 12, 3, 10, 10, 14, 12), x2 = c(10, 11, 16, 9, 7, 17, 18), g = c( 1, 1, 1, 2, 2, 3, 3), w = c( 2, 3, 1, 5, 5, 2, 5)) wx1 <- sapply(split(df, df$g), function(x){weighted.mean(x$x1, x$w)}) wx2 <- sapply(split(df, df$g), function(x){weighted.mean(x$x2, x$w)}) The above code works, the result is:> wx11 2 3 11.50000 10.00000 12.57143> wx21 2 3 11.50000 8.00000 17.71429 But is there not a more elegant way of acting on x1 and x2 simultaneously? Something along the lines of wdf <- sapply(split(df, df$g), function(x){weighted.mean(df, x$w)}) which is wrong since df has two columns, while w only has one. I suppose, one could write a loop but that strikes me as being highly inefficient. Thank you very much for your help! Rita -- View this message in context: http://r.789695.n4.nabble.com/Weighted-Average-on-More-than-One-Variable-in-Data-Frame-tp3830922p3830922.html Sent from the R help mailing list archive at Nabble.com.
Jean V Adams
2011-Sep-21 19:38 UTC
[R] Weighted Average on More than One Variable in Data Frame
Try this sapply(split(df, df$g), function(x) apply(x[, 1:2], 2, weighted.mean, x$w)) Jean StellathePug wrote on 09/21/2011 01:15:33 PM:> > Dear R Users, > I have looked for a solution to the following problem and I have notbeen> able to find it on the archive, through Google or in the Rdocumentation.> > I have a data frame, say df, which has 4 variables, one of which I would > like to use as a grouping variable (g), another one that I would like touse> for my weights (w) The other two variables are variables (x1 and x2) for > which I would like to compute the weighted average by group. > > df <- data.frame(x1 = c(15, 12, 3, 10, 10, 14, 12), > x2 = c(10, 11, 16, 9, 7, 17, 18), > g = c( 1, 1, 1, 2, 2, 3, 3), > w = c( 2, 3, 1, 5, 5, 2, 5)) > > wx1 <- sapply(split(df, df$g), function(x){weighted.mean(x$x1, x$w)}) > wx2 <- sapply(split(df, df$g), function(x){weighted.mean(x$x2, x$w)}) > > The above code works, the result is: > > wx1 > 1 2 3 > 11.50000 10.00000 12.57143 > > wx2 > 1 2 3 > 11.50000 8.00000 17.71429 > > But is there not a more elegant way of acting on x1 and x2simultaneously?> Something along the lines of > > wdf <- sapply(split(df, df$g), function(x){weighted.mean(df, x$w)}) > > which is wrong since df has two columns, while w only has one. Isuppose,> one could write a loop but that strikes me as being highly inefficient. > > Thank you very much for your help! > Rita >[[alternative HTML version deleted]]
StellathePug
2011-Sep-21 22:19 UTC
[R] Weighted Average on More than One Variable in Data Frame
Thanks Jean, that worked perfectly! Try this sapply(split(df, df$g), function(x) apply(x[, 1:2], 2, weighted.mean, x$w)) Jean StellathePug wrote on 09/21/2011 01:15:33 PM:> > I have a data frame, say df, which has 4 variables, one of which I would > like to use as a grouping variable (g), another one that I would like to > use > for my weights (w) The other two variables are variables (x1 and x2) for > which I would like to compute the weighted average by group. > > df <- data.frame(x1 = c(15, 12, 3, 10, 10, 14, 12), > x2 = c(10, 11, 16, 9, 7, 17, 18), > g = c( 1, 1, 1, 2, 2, 3, 3), > w = c( 2, 3, 1, 5, 5, 2, 5)) > > wx1 <- sapply(split(df, df$g), function(x){weighted.mean(x$x1, x$w)}) > wx2 <- sapply(split(df, df$g), function(x){weighted.mean(x$x2, x$w)}) > > The above code works, the result is: > > wx1 > 1 2 3 > 11.50000 10.00000 12.57143 > > wx2 > 1 2 3 > 11.50000 8.00000 17.71429 > > But is there not a more elegant way of acting on x1 and x2simultaneously?> Something along the lines of > > wdf <- sapply(split(df, df$g), function(x){weighted.mean(df, x$w)}) > > which is wrong since df has two columns, while w only has one. Isuppose, one could write a loop but that strikes me as being highly inefficient. -- View this message in context: http://r.789695.n4.nabble.com/Weighted-Average-on-More-than-One-Variable-in-Data-Frame-tp3830922p3831611.html Sent from the R help mailing list archive at Nabble.com.