Hi all,
I'd like to use the Hmisc::summarize function, but it uses a function (FUN)
of a single vector argument to create the statistical summaries.
Consider an easy case: I'd like to compute the correlation between two
variables in my dataframe, grouped according to other variables in the same
dataframe.
For exemple, consider the following dataframe D:
V1 V2 V3
A 1 -1
A 1 1
A -1 -1
B 1 1
B 1 1
I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3)
where corr.V2.V3 is defined as follows:
corr.V2.V3 = function(x) {
d = cbind(x$V2, x$V3)
out = c(cor(d))
names(out) = c("CORR")
return(out)
}
I was not able to use Hmisc::summarize in this case because FUN should be a
function of a matrix argument. Any idea?
Thanks in advance,
Arnaud
[[alternative HTML version deleted]]
Hi Arnaud,
I'm not sure how do to this with Hmis::summarize, but it's pretty easy
with
plyr::ddply:
D <- read.table(textConnection("V1 V2 V3
A 1 -1
A 1 1
A -1 -1
B 1 1
B 1 1"), header=TRUE)
closeAllConnections()
corr.V2.V3 = function(x) {
out = cor(x$V2, x$V3)
names(out) = "CORR"
return(out)
}
library(plyr)
ddply(D, .(V1), corr.V2.V3)
-Ista
On Fri, Apr 16, 2010 at 9:21 AM, arnaud chozo
<arnaud.chozo@gmail.com>wrote:
> Hi all,
>
> I'd like to use the Hmisc::summarize function, but it uses a function
(FUN)
> of a single vector argument to create the statistical summaries.
>
> Consider an easy case: I'd like to compute the correlation between two
> variables in my dataframe, grouped according to other variables in the same
> dataframe.
>
> For exemple, consider the following dataframe D:
> V1 V2 V3
> A 1 -1
> A 1 1
> A -1 -1
> B 1 1
> B 1 1
>
> I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1),
FUN=corr.V2.V3)
>
> where corr.V2.V3 is defined as follows:
>
> corr.V2.V3 = function(x) {
> d = cbind(x$V2, x$V3)
>
> out = c(cor(d))
> names(out) = c("CORR")
> return(out)
> }
>
> I was not able to use Hmisc::summarize in this case because FUN should be a
> function of a matrix argument. Any idea?
>
> Thanks in advance,
> Arnaud
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
[[alternative HTML version deleted]]
> corr.V2.V3 = function(x) { > ?out = cor(x$V2, x$V3) > ?names(out) = "CORR" > ?return(out) > }A litte more concisely: corr.V2.V3 = function(x) { c(CORR = cor(x$V2, x$V3)) } -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
arnaud chozo wrote:> Hi all, > > I'd like to use the Hmisc::summarize function, but it uses a function (FUN) > of a single vector argument to create the statistical summaries. > > Consider an easy case: I'd like to compute the correlation between two > variables in my dataframe, grouped according to other variables in the same > dataframe. > > For exemple, consider the following dataframe D: > V1 V2 V3 > A 1 -1 > A 1 1 > A -1 -1 > B 1 1 > B 1 1 > > I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) > > where corr.V2.V3 is defined as follows: > > corr.V2.V3 = function(x) { > d = cbind(x$V2, x$V3) > > out = c(cor(d)) > names(out) = c("CORR") > return(out) > } > > I was not able to use Hmisc::summarize in this case because FUN should be a > function of a matrix argument. Any idea? > > Thanks in advance, > ArnaudSee the Hmisc mApply or summary.formula functions, or use tapply using a vector of possible subscripts (1:n) as the first argument; then you can use the subscripts selected to address multiple variables. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University