Hi all, I have a data set (df, n=10 for the sake of simplicity here) where I have two continuous variables (age and weight) and I also have a grouping variable (group, with two levels). I want to run correlations for each group separately (kind of similar to "split file" in SPSS). I've been experimenting with different functions, and I was able to do this correctly using ddply function, but output is a little bit difficult to read when I do the cor.test to get all the data with p values, df, and pearson r (see below). I also tried to do it with by function. Although, with by, it shows the data for two groups separately, it seems like it calculates the same r for both groups. Here is my code for both ddply and by, and the output as well. I was wondering if there is a way to display the output better with ddply or run the correlations correctly for each group using by. Thanks in advance, 1.with "ddply" r<-ddply(df, .(group), summarise, "corr" = cor.test(age, weight, method = "pearson")) Output: Group corr 1 1 Inf 2 1 3 3 1 0 4 1 1 5 1 0 6 1 two.sided 7 1 Pearson's product-moment correlation 8 1 age and weight 9 1 1, 1 10 2 9.722211 11 2 3 12 2 0.002311412 13 2 0.9844986 14 2 0 15 2 two.sided 16 2 Pearson's product-moment correlation 17 2 age and weight 18 2 0.7779640, 0.9990233 2. with "by" r <- by(df, group, FUN = function(x) cor.test(age, weight, method = "pearson")) Output: Group: 1 Pearson's product-moment correlation data: age and weight t = 6.4475, df = 8, p-value = 0.0001988 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6757758 0.9802100 sample estimates: cor 0.9157592 ------------------------------------------------------------ Group: 2 Pearson's product-moment correlation data: age and weight t = 6.4475, df = 8, p-value = 0.0001988 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6757758 0.9802100 sample estimates: cor 0.9157592 [[alternative HTML version deleted]]
a) This is not reproducible (missing data). Please read the Posting Guide
mentioned at the bottom of every message.
b) You are abusing ddply. Read the help for ddply... the function you give it
needs to return a data frame. You may want dlply instead, or you need to copy
the atomic values from the result of the cor.test function into a one-row data
frame.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
jacaranda tree <myjacaranda at yahoo.com> wrote:
>Hi all,
>I have a data set (df, n=10 for the sake of simplicity here) where I
>have two continuous variables (age and weight) and I also have a
>grouping variable (group, with two levels). I want to run correlations
>for each group separately (kind of similar to "split file" in
SPSS).
>I've been experimenting with different functions, and I was able to do
>this correctly using ddply function, but output is a little bit
>difficult to read when I do the cor.test to get all the data with p
>values, df, and pearson r (see below). I also tried to do it with by
>function. Although, with by, it shows the data for two groups
>separately, it seems like it calculates the same r for both groups.
>Here is my code for both ddply and by, and the output as well. I was
>wondering if there is a way to display the output better with ddply or
>run the correlations correctly for each group using by.
>Thanks in advance,
>
>1.with ?"ddply"
>r<-ddply(df, .(group), summarise, "corr" = cor.test(age,
weight, method
>= "pearson"))
>
>Output:
>? ?Group ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? corr
>1 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Inf
>2 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?3
>3 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0
>4 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1
>5 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0
>6 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ?two.sided
>7 ? ? ?1 Pearson's product-moment correlation
>8 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? age and weight
>9 ? ? ?1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1, 1
>10 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? 9.722211
>11 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?3
>12 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ?0.002311412
>13 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ?0.9844986
>14 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0
>15 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ?two.sided
>16 ? ? 2 Pearson's product-moment correlation
>17 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? age and weight
>18 ? ? 2 ? ? ? ? ? ? ? ? 0.7779640, 0.9990233
>
>2. with "by"
>r <- by(df, group, FUN = function(x) cor.test(age, weight, method
>"pearson"))
>
>Output:
>Group: 1
>
>? ? ? ? Pearson's product-moment correlation
>
>data: ?age and weight?
>t = 6.4475, df = 8, p-value = 0.0001988
>alternative hypothesis: true correlation is not equal to 0?
>95 percent confidence interval:
>?0.6757758 0.9802100?
>sample estimates:
>? ? ? cor?
>0.9157592?
>
>------------------------------------------------------------?
>Group: 2
>
>? ? ? ? Pearson's product-moment correlation
>
>data: ?age and weight?
>t = 6.4475, df = 8, p-value = 0.0001988
>alternative hypothesis: true correlation is not equal to 0?
>95 percent confidence interval:
>?0.6757758 0.9802100?
>sample estimates:
>? ? ? cor?
>0.9157592?
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
On May 29, 2012, at 6:32 PM, jacaranda tree wrote:> Hi all, > I have a data set (df, n=10 for the sake of simplicity here) where I > have two continuous variables (age and weight) and I also have a > grouping variable (group, with two levels). I want to run > correlations for each group separately (kind of similar to "split > file" in SPSS). I've been experimenting with different functions, > and I was able to do this correctly using ddply function, but output > is a little bit difficult to read when I do the cor.test to get all > the data with p values, df, and pearson r (see below). I also tried > to do it with by function. Although, with by, it shows the data for > two groups separately, it seems like it calculates the same r for > both groups. Here is my code for both ddply and by, and the output > as well. I was wondering if there is a way to display the output > better with ddply or run the correlations correctly for each group > using by. > Thanks in advance, >I would have imagined something along the lines of lapply( split( df, df$group, function(x) cor.test(x[["age"]], x[["weight")] ) ... but without an example it's only a guess. -- David> 1.with "ddply" > r<-ddply(df, .(group), summarise, "corr" = cor.test(age, weight, > method = "pearson")) > > Output: > Group corr > 1 1 Inf > 2 1 3 > 3 1 0 > 4 1 1 > 5 1 0 > 6 1 two.sided > 7 1 Pearson's product-moment correlation > 8 1 age and weight > 9 1 1, 1 > 10 2 9.722211 > 11 2 3 > 12 2 0.002311412 > 13 2 0.9844986 > 14 2 0 > 15 2 two.sided > 16 2 Pearson's product-moment correlation > 17 2 age and weight > 18 2 0.7779640, 0.9990233 > > 2. with "by" > r <- by(df, group, FUN = function(x) cor.test(age, weight, method = > "pearson")) > > Output: > Group: 1 > > Pearson's product-moment correlation > > data: age and weight > t = 6.4475, df = 8, p-value = 0.0001988 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > 0.6757758 0.9802100 > sample estimates: > cor > 0.9157592 > > ------------------------------------------------------------ > Group: 2 > > Pearson's product-moment correlation > > data: age and weight > t = 6.4475, df = 8, p-value = 0.0001988 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > 0.6757758 0.9802100 > sample estimates: > cor > 0.9157592 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.