Andrew Miles
2009-Nov-14 22:31 UTC
[R] Weighted descriptives by levels of another variables
I've noticed that R has a number of very useful functions for obtaining descriptive statistics on groups of variables, including summary {stats}, describe {Hmisc}, and describe {psych}, but none that I have found is able to provided weighted descriptives of subsets of a data set (ex. descriptives for both males and females for age, where accurate results require use of sampling weights). Does anybody know of a function that does this? What I've looked at already: I have looked at describe.by {psych} which will give descriptives by levels of another variable (eg. mean ages of males and females), but does not accept sample weights. I have also looked at describe {Hmisc} which allows for weights, but has no functionality for subdivision. I tried using a by() function with describe{Hmisc}: by(cbind(my, variables, here), division.variable, describe, weights=weight.variable) but found that this returns an error message stating that the variables to be described and the weights variable are not the same length: Error in describe.vector(xx, nam[i], exclude.missing = exclude.missing, : length of weights must equal length of x In addition: Warning message: In present & !is.na(weights) : longer object length is not a multiple of shorter object length This comes because the by() function passes down a subset of the variables to be described to describe(), but not a subset of the weights variable. describe() then searches the whatever data set is attached in order to find the weights variables, but this is in its original (i.e. not subsetted) form. Here is an example using the ChickWeight dataset that comes in the "datasets" package. data(ChickWeight) attach(ChickWeight) library(Hmisc) #this gives descriptive data on the variables "Time" and "Chick" by levels of "Diet") by(cbind(Time, Chick), Diet, describe) #trying to add weights, however, does not work for reasons described above wgt=rnorm(length(Chick), 12, 1) by(cbind(Time, Chick), Diet, describe, weights=wgt) Again, my question is, does anybody know of a function that combines both the ability to provided weighted descriptives with the ability to subdivide by the levels of some other variable? Andrew Miles Department of Sociology Duke University [[alternative HTML version deleted]]
David Winsemius
2009-Nov-15 00:43 UTC
[R] Weighted descriptives by levels of another variables
Have you reviewed the survey package functions? -- David On Nov 14, 2009, at 5:31 PM, Andrew Miles wrote:> I've noticed that R has a number of very useful functions for > obtaining descriptive statistics on groups of variables, including > summary {stats}, describe {Hmisc}, and describe {psych}, but none that > I have found is able to provided weighted descriptives of subsets of a > data set (ex. descriptives for both males and females for age, where > accurate results require use of sampling weights). > > Does anybody know of a function that does this? > > What I've looked at already: > > I have looked at describe.by {psych} which will give descriptives by > levels of another variable (eg. mean ages of males and females), but > does not accept sample weights. > > I have also looked at describe {Hmisc} which allows for weights, but > has no functionality for subdivision. > > I tried using a by() function with describe{Hmisc}: > > by(cbind(my, variables, here), division.variable, describe, > weights=weight.variable) > > but found that this returns an error message stating that the > variables to be described and the weights variable are not the same > length: > > Error in describe.vector(xx, nam[i], exclude.missing > exclude.missing, : > length of weights must equal length of x > In addition: Warning message: > In present & !is.na(weights) : > longer object length is not a multiple of shorter object length > > This comes because the by() function passes down a subset of the > variables to be described to describe(), but not a subset of the > weights variable. describe() then searches the whatever data set is > attached in order to find the weights variables, but this is in its > original (i.e. not subsetted) form. Here is an example using the > ChickWeight dataset that comes in the "datasets" package. > > data(ChickWeight) > attach(ChickWeight) > library(Hmisc) > #this gives descriptive data on the variables "Time" and "Chick" by > levels of "Diet") > by(cbind(Time, Chick), Diet, describe) > #trying to add weights, however, does not work for reasons described > above > wgt=rnorm(length(Chick), 12, 1) > by(cbind(Time, Chick), Diet, describe, weights=wgt) > > Again, my question is, does anybody know of a function that combines > both the ability to provided weighted descriptives with the ability to > subdivide by the levels of some other variable? > > > Andrew Miles > Department of Sociology > Duke University > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
David Freedman
2009-Nov-15 05:08 UTC
[R] Weighted descriptives by levels of another variables
In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote:> > I've noticed that R has a number of very useful functions for > obtaining descriptive statistics on groups of variables, including > summary {stats}, describe {Hmisc}, and describe {psych}, but none that > I have found is able to provided weighted descriptives of subsets of a > data set (ex. descriptives for both males and females for age, where > accurate results require use of sampling weights). > > Does anybody know of a function that does this? > > What I've looked at already: > > I have looked at describe.by {psych} which will give descriptives by > levels of another variable (eg. mean ages of males and females), but > does not accept sample weights. > > I have also looked at describe {Hmisc} which allows for weights, but > has no functionality for subdivision. > > I tried using a by() function with describe{Hmisc}: > > by(cbind(my, variables, here), division.variable, describe, > weights=weight.variable) > > but found that this returns an error message stating that the > variables to be described and the weights variable are not the same > length: > > Error in describe.vector(xx, nam[i], exclude.missing = > exclude.missing, : > length of weights must equal length of x > In addition: Warning message: > In present & !is.na(weights) : > longer object length is not a multiple of shorter object length > > This comes because the by() function passes down a subset of the > variables to be described to describe(), but not a subset of the > weights variable. describe() then searches the whatever data set is > attached in order to find the weights variables, but this is in its > original (i.e. not subsetted) form. Here is an example using the > ChickWeight dataset that comes in the "datasets" package. > > data(ChickWeight) > attach(ChickWeight) > library(Hmisc) > #this gives descriptive data on the variables "Time" and "Chick" by > levels of "Diet") > by(cbind(Time, Chick), Diet, describe) > #trying to add weights, however, does not work for reasons described > above > wgt=rnorm(length(Chick), 12, 1) > by(cbind(Time, Chick), Diet, describe, weights=wgt) > > Again, my question is, does anybody know of a function that combines > both the ability to provided weighted descriptives with the ability to > subdivide by the levels of some other variable? > > > Andrew Miles > Department of Sociology > Duke University > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355885.html Sent from the R help mailing list archive at Nabble.com.
David Freedman
2009-Nov-15 05:08 UTC
[R] Weighted descriptives by levels of another variables
In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote:> > I've noticed that R has a number of very useful functions for > obtaining descriptive statistics on groups of variables, including > summary {stats}, describe {Hmisc}, and describe {psych}, but none that > I have found is able to provided weighted descriptives of subsets of a > data set (ex. descriptives for both males and females for age, where > accurate results require use of sampling weights). > > Does anybody know of a function that does this? > > What I've looked at already: > > I have looked at describe.by {psych} which will give descriptives by > levels of another variable (eg. mean ages of males and females), but > does not accept sample weights. > > I have also looked at describe {Hmisc} which allows for weights, but > has no functionality for subdivision. > > I tried using a by() function with describe{Hmisc}: > > by(cbind(my, variables, here), division.variable, describe, > weights=weight.variable) > > but found that this returns an error message stating that the > variables to be described and the weights variable are not the same > length: > > Error in describe.vector(xx, nam[i], exclude.missing = > exclude.missing, : > length of weights must equal length of x > In addition: Warning message: > In present & !is.na(weights) : > longer object length is not a multiple of shorter object length > > This comes because the by() function passes down a subset of the > variables to be described to describe(), but not a subset of the > weights variable. describe() then searches the whatever data set is > attached in order to find the weights variables, but this is in its > original (i.e. not subsetted) form. Here is an example using the > ChickWeight dataset that comes in the "datasets" package. > > data(ChickWeight) > attach(ChickWeight) > library(Hmisc) > #this gives descriptive data on the variables "Time" and "Chick" by > levels of "Diet") > by(cbind(Time, Chick), Diet, describe) > #trying to add weights, however, does not work for reasons described > above > wgt=rnorm(length(Chick), 12, 1) > by(cbind(Time, Chick), Diet, describe, weights=wgt) > > Again, my question is, does anybody know of a function that combines > both the ability to provided weighted descriptives with the ability to > subdivide by the levels of some other variable? > > > Andrew Miles > Department of Sociology > Duke University > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355886.html Sent from the R help mailing list archive at Nabble.com.
Andrew Miles
2009-Nov-16 15:43 UTC
[R] Weighted descriptives by levels of another variables
Thanks! Using the plyr package and the approach you outlined seems to work well for relatively simple functions (like wtd.mean), but so far I haven't had much success in using it with more complex descriptive functions like describe {Hmisc}. I'll take a look later, though, and see if I can figure out why. At any rate, ddply() looks like it will simplify writing a function that will allow for weighting data and subdividing it, but still give comprehensive summary statistics (i.e. not just the mean or quantiles, but all in one). I'll post it to the list once I have the time to write it up. I also took a stab at using the svyby funtion in the survey package, but received the following error message when I input : > svyby(cbind(educ, age), female, svynlsy, svymean) Error in `[.survey.design2`(design, byfactor %in% byfactor[i], ) : (subscript) logical subscript too long __________________________________________________________ In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote:> > I've noticed that R has a number of very useful functions for > obtaining descriptive statistics on groups of variables, including > summary {stats}, describe {Hmisc}, and describe {psych}, but none that > I have found is able to provided weighted descriptives of subsets of a > data set (ex. descriptives for both males and females for age, where > accurate results require use of sampling weights). > > Does anybody know of a function that does this? > > What I've looked at already: > > I have looked at describe.by {psych} which will give descriptives by > levels of another variable (eg. mean ages of males and females), but > does not accept sample weights. > > I have also looked at describe {Hmisc} which allows for weights, but > has no functionality for subdivision. > > I tried using a by() function with describe{Hmisc}: > > by(cbind(my, variables, here), division.variable, describe, > weights=weight.variable) > > but found that this returns an error message stating that the > variables to be described and the weights variable are not the same > length: > > Error in describe.vector(xx, nam[i], exclude.missing > exclude.missing, : > length of weights must equal length of x > In addition: Warning message: > In present & !is.na(weights) : > longer object length is not a multiple of shorter object length > > This comes because the by() function passes down a subset of the > variables to be described to describe(), but not a subset of the > weights variable. describe() then searches the whatever data set is > attached in order to find the weights variables, but this is in its > original (i.e. not subsetted) form. Here is an example using the > ChickWeight dataset that comes in the "datasets" package. > > data(ChickWeight) > attach(ChickWeight) > library(Hmisc) > #this gives descriptive data on the variables "Time" and "Chick" by > levels of "Diet") > by(cbind(Time, Chick), Diet, describe) > #trying to add weights, however, does not work for reasons described > above > wgt=rnorm(length(Chick), 12, 1) > by(cbind(Time, Chick), Diet, describe, weights=wgt) > > Again, my question is, does anybody know of a function that combines > both the ability to provided weighted descriptives with the ability to > subdivide by the levels of some other variable? > > > Andrew Miles > Department of Sociology > Duke University