Hi all, I'd like to use the Hmisc::summarize function, but it uses a function (FUN) of a single vector argument to create the statistical summaries. Consider an easy case: I'd like to compute the correlation between two variables in my dataframe, grouped according to other variables in the same dataframe. For exemple, consider the following dataframe D: V1 V2 V3 A 1 -1 A 1 1 A -1 -1 B 1 1 B 1 1 I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) where corr.V2.V3 is defined as follows: corr.V2.V3 = function(x) { d = cbind(x$V2, x$V3) out = c(cor(d)) names(out) = c("CORR") return(out) } I was not able to use Hmisc::summarize in this case because FUN should be a function of a matrix argument. Any idea? Thanks in advance, Arnaud [[alternative HTML version deleted]]
Hi Arnaud, I'm not sure how do to this with Hmis::summarize, but it's pretty easy with plyr::ddply: D <- read.table(textConnection("V1 V2 V3 A 1 -1 A 1 1 A -1 -1 B 1 1 B 1 1"), header=TRUE) closeAllConnections() corr.V2.V3 = function(x) { out = cor(x$V2, x$V3) names(out) = "CORR" return(out) } library(plyr) ddply(D, .(V1), corr.V2.V3) -Ista On Fri, Apr 16, 2010 at 9:21 AM, arnaud chozo <arnaud.chozo@gmail.com>wrote:> Hi all, > > I'd like to use the Hmisc::summarize function, but it uses a function (FUN) > of a single vector argument to create the statistical summaries. > > Consider an easy case: I'd like to compute the correlation between two > variables in my dataframe, grouped according to other variables in the same > dataframe. > > For exemple, consider the following dataframe D: > V1 V2 V3 > A 1 -1 > A 1 1 > A -1 -1 > B 1 1 > B 1 1 > > I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) > > where corr.V2.V3 is defined as follows: > > corr.V2.V3 = function(x) { > d = cbind(x$V2, x$V3) > > out = c(cor(d)) > names(out) = c("CORR") > return(out) > } > > I was not able to use Hmisc::summarize in this case because FUN should be a > function of a matrix argument. Any idea? > > Thanks in advance, > Arnaud > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]]
> corr.V2.V3 = function(x) { > ?out = cor(x$V2, x$V3) > ?names(out) = "CORR" > ?return(out) > }A litte more concisely: corr.V2.V3 = function(x) { c(CORR = cor(x$V2, x$V3)) } -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
arnaud chozo wrote:> Hi all, > > I'd like to use the Hmisc::summarize function, but it uses a function (FUN) > of a single vector argument to create the statistical summaries. > > Consider an easy case: I'd like to compute the correlation between two > variables in my dataframe, grouped according to other variables in the same > dataframe. > > For exemple, consider the following dataframe D: > V1 V2 V3 > A 1 -1 > A 1 1 > A -1 -1 > B 1 1 > B 1 1 > > I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) > > where corr.V2.V3 is defined as follows: > > corr.V2.V3 = function(x) { > d = cbind(x$V2, x$V3) > > out = c(cor(d)) > names(out) = c("CORR") > return(out) > } > > I was not able to use Hmisc::summarize in this case because FUN should be a > function of a matrix argument. Any idea? > > Thanks in advance, > ArnaudSee the Hmisc mApply or summary.formula functions, or use tapply using a vector of possible subscripts (1:n) as the first argument; then you can use the subscripts selected to address multiple variables. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University