Striessnig, Erich
2018-Mar-22  22:34 UTC
[R] Calculate weighted proportions for several factors at once
Hi,
I have a grouped data set and would like to calculate weighted proportions for a
large number of factor variables within each group member. Rather than using
dplyr::count() on each of these factors individually, the idea would be to do it
for all factors at once. Does anyone know how this would work? Here is a
reproducible example:
############################################################
# reproducible example
df1 <- data.frame(wt=rnorm(90),
                  group=paste0('reg', 1:5),
                  var1=rep(c('male','female'), times=45),
                  var2=rep(c('low','med','high'),
each=30)) %>% tbl_df()
# instead of doing this separately for each factor ...
df2 <- df1 %>%
  group_by(group) %>%
  dplyr::count(var1, wt=wt) %>%
  mutate(prop1=n/sum(n))
df3 <- df1 %>%
  group_by(group) %>%
  dplyr::count(var2, wt=wt) %>%
  mutate(prop2=n/sum(n)) %>%
  left_join(df2, by='group')
# I would like to do something like the following (which does of course not
work):
my_fun <- function(x,wt){
  freq1 <- dplyr::count(x, wt=wt)
  prop1 <- freq1 / sum(freq1)
  return(prop)
}
df1 %>%
  group_by(group) %>%
  summarise_all(.funs=my_fun(.), .vars=c('var1', 'var2'))
############################################################
Best regards,
Erich
	[[alternative HTML version deleted]]
David Winsemius
2018-Mar-23  20:40 UTC
[R] Calculate weighted proportions for several factors at once
> On Mar 22, 2018, at 3:34 PM, Striessnig, Erich <Erich.Striessnig at oeaw.ac.at> wrote: > > Hi, > > I have a grouped data set and would like to calculate weighted proportions for a large number of factor variables within each group member. Rather than using dplyr::count() on each of these factors individually, the idea would be to do it for all factors at once. Does anyone know how this would work? Here is a reproducible example: > > ############################################################ > # reproducible example > df1 <- data.frame(wt=rnorm(90), > group=paste0('reg', 1:5), > var1=rep(c('male','female'), times=45), > var2=rep(c('low','med','high'), each=30)) %>% tbl_df() > > # instead of doing this separately for each factor ... > df2 <- df1 %>% > group_by(group) %>% > dplyr::count(var1, wt=wt) %>% > mutate(prop1=n/sum(n)) > > df3 <- df1 %>% > group_by(group) %>% > dplyr::count(var2, wt=wt) %>% > mutate(prop2=n/sum(n)) %>% > left_join(df2, by='group') > > # I would like to do something like the following (which does of course not work): > my_fun <- function(x,wt){ > freq1 <- dplyr::count(x, wt=wt) > prop1 <- freq1 / sum(freq1) > return(prop) > } > > df1 %>% > group_by(group) %>% > summarise_all(.funs=my_fun(.), .vars=c('var1', 'var2')) > ############################################################You might find useful functions in the ?freqweights? package. It appears from its description that it was design to fit into the tidyverse paradigm. I think the survey package might also be useful, but it is not particularly designed for use with tibbles and `%>%`. Might work. Might not. HTH; Dadid.> > Best regards, > Erich > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law