thr3ads.net - R help - [R] Correlate [Aug 2022]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2022-Aug-22 16:06 UTC

[R] Correlate

... But of course the p-values are essentially meaningless without
some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert


On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:>
> A somewhat clunky solution:
> for(i in colnames(dat)){
>   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
>   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value)
> }
>
> Rather than printing you could set up an array or list to save the results.
>
>
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
> Sent: Monday, August 22, 2022 11:09 AM
> To: r-help at R-project.org (r-help at r-project.org) <r-help at
r-project.org>
> Subject: [R] Correlate
>
> [External Email]
>
> Hi all,
>
> I have a data set with  ~250  variables(columns).  I want to calculate the
correlation of  one variable with the rest of the other variables and also want 
the p-values  for each correlation.  Please see the sample data and my attempt. 
I  have got the correlation but unable to get the p-values
>
> dat <- read.table(text="x1 x2 x3 x4
>            1.68 -0.96 -1.25  0.61
>           -0.06  0.41  0.06 -0.96
>               .    0.08  1.14  1.42
>            0.80 -0.67  0.53 -0.68
>            0.23 -0.97 -1.18 -0.78
>           -1.03  1.11 -0.61    .
>            2.15     .    0.02  0.66
>            0.35 -0.37 -0.26  0.39
>           -0.66  0.89   .    -1.49
>            0.11  1.52  0.73  -1.03",header=TRUE)
>
> #change all to numeric
>     dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
>
>     data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1,
method = "pearson", use = "complete.obs")
>
> Result
>               [,1]
> x2 -0.5845835
> x3 -0.4664220
> x4  0.7202837
>
> How do I get the p-values ?
>
> Thank you,
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl.edu%7Cf0bf7462434f445fdc3808da84505c52%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637967777937186965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Oqo1ikNvtAix%2Fj7jax%2Bsf53J5eDHB0LHnRSHEy9O5hM%3D&amp;reserved=0
> PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%7Cf0bf7462434f445fdc3808da84505c52%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637967777937186965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=TWJ%2BJxRdA2S7PKBnsYg3DiSdFtSxIit6v1HOAi7Hft8%3D&amp;reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ebert,Timothy Aaron

2022-Aug-22 17:23 UTC

head link

[R] Correlate

I (maybe) agree, but I would go further than that. There are assumptions
associated with the test that are missing. It is not clear that the
relationships are all linear. Regardless of a "significant outcome"
all of the relationships need to be explored in more detail than what is
provided in the correlation test.

Multiplicity adjustment as in :
https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an
issue that I can see in these data from the information provided. At least not
in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer
to the experimentwise error rate in a multiple comparison procedure.
https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1
error rate is inflated the more test you do and if you perform enough tests you
find significant outcomes by chance alone. There is great significance in the
Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by
the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another
approach is to try to discuss the outcomes in a way that makes sense. What is
the connection between a football team's last home game an the election
result that would enable me to take another team and apply their last home game
result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated.
Not enough information was provided in the problem to know if this is an issue,
but 250 orthogonal variables in a real dataset would be a bit unusual
considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com> 
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at
r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of
multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:>
> A somewhat clunky solution:
> for(i in colnames(dat)){
>   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
>   print(cor.test(dat[,i], dat$x1, method = "pearson", use = 
> "complete.obs")$p.value) }
>
> Rather than printing you could set up an array or list to save the results.
>
>
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
> Sent: Monday, August 22, 2022 11:09 AM
> To: r-help at R-project.org (r-help at r-project.org) <r-help at
r-project.org>
> Subject: [R] Correlate
>
> [External Email]
>
> Hi all,
>
> I have a data set with  ~250  variables(columns).  I want to calculate 
> the correlation of  one variable with the rest of the other variables 
> and also want  the p-values  for each correlation.  Please see the 
> sample data and my attempt.  I  have got the correlation but unable to 
> get the p-values
>
> dat <- read.table(text="x1 x2 x3 x4
>            1.68 -0.96 -1.25  0.61
>           -0.06  0.41  0.06 -0.96
>               .    0.08  1.14  1.42
>            0.80 -0.67  0.53 -0.68
>            0.23 -0.97 -1.18 -0.78
>           -1.03  1.11 -0.61    .
>            2.15     .    0.02  0.66
>            0.35 -0.37 -0.26  0.39
>           -0.66  0.89   .    -1.49
>            0.11  1.52  0.73  -1.03",header=TRUE)
>
> #change all to numeric
>     dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
>
>     data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1,
method =
> "pearson", use = "complete.obs")
>
> Result
>               [,1]
> x2 -0.5845835
> x3 -0.4664220
> x4  0.7202837
>
> How do I get the p-values ?
>
> Thank you,
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> =0 PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> =0 PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> and provide commented, minimal, self-contained, reproducible code.

R help - Aug 2022 - Correlate

[R] Correlate

[R] Correlate