R help - Aug 2022 - Correlate

Thank you John again. I have got my result  in the form as shown below
      Variable   Corr     Pvalue   Nobs
            x1      1.000       0     165425


On Fri, Aug 26, 2022 at 3:59 PM John Fox <jfox at mcmaster.ca>
wrote:>
> Dear Val,
>
> On 2022-08-26 4:06 p.m., Val wrote:
> > Thank you John for your help and advice.\
>
> You're welcome, and it occurred to me that if you want the number of
> non-missing cases for each individual variable, you could add
>
>         diag(N) <- colSums(!is.na(dat))
>
> to the code I sent earlier.
>
> Best,
>   John
>
> >
> > On Fri, Aug 26, 2022 at 11:04 AM John Fox <jfox at mcmaster.ca>
wrote:
> >>
> >> Dear Val,
> >>
> >> On 2022-08-26 10:41 a.m., Val wrote:
> >>> Hi John and Timothy
> >>>
> >>> Thank you for your suggestion and help. Using the sample data,
I did
> >>> carry out a test run and found a difference in the correlation
result.
> >>>
> >>> Option 1.
> >>> data_cor <- cor(dat[ , colnames(dat) != "x1"],  #
Calculate correlations
> >>>                       dat$x1, method = "pearson",
use = "complete.obs")
> >>> resulted
> >>>                    [,1]
> >>>       x2 -0.5845835
> >>>       x3 -0.4664220
> >>>       x4  0.7202837
> >>>
> >>> Option 2.
> >>>    for(i in colnames(dat)){
> >>>         print(cor.test(dat[,i], dat$x1, method =
"pearson", use > >>> "complete.obs")$estimate)
> >>>       }
> >>>              [,1]
> >>> x2  -0.7362030
> >>> x3  -0.04935132
> >>> x4   0.85766290
> >>>
> >>> This was crosschecked  using Excel and other softwares and all
matches
> >>> with option 2.
> >>> One of the factors that contributed for this difference  is
loss of
> >>> information when we are using na.rm(). This is because that if
x2 has
> >>> missing value but x3 and x4 don?t have then  na.rm()  removed 
entire
> >>> row information including x3 and x4.
> >>
> >> Yes, I already explained that in my previous message.
> >>
> >> As well, cor() is capable of computing pairwise-complete
correlations --
> >> see ?cor.
> >>
> >> There's not an obvious right answer here, however. Using
> >> pairwise-complete correlations can produce inconsistent (i.e.,
> >> non-positive semi-definite) correlation matrices because
correlations
> >> are computed on different subsets of the data.
> >>
> >> There are much better ways to deal with missing data.
> >>
> >>>
> >>> My question is there  a way to extract the number of rows (N) 
used in
> >>> the correlation analysis?.
> >>
> >> I'm sure that there are many ways, but here is one that is
very
> >> simple-minded and should be reasonably efficient for ~250
variables:
> >>
> >>   > (nc <- ncol(dat))
> >> [1] 4
> >>
> >>   > R <- N <- matrix(NA, nc, nc)
> >>   > diag(R) <- 1
> >>   > for (i in 1:(nc - 1)){
> >> +   for (j in (i + 1):nc){
> >> +     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j],
use="complete.obs")
> >> +     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
> >> +   }
> >> + }
> >>
> >>   > round(R, 3)
> >>          [,1]   [,2]   [,3]   [,4]
> >> [1,]  1.000 -0.736 -0.049  0.858
> >> [2,] -0.736  1.000  0.458 -0.428
> >> [3,] -0.049  0.458  1.000  0.092
> >> [4,]  0.858 -0.428  0.092  1.000
> >>
> >>   > N
> >>        [,1] [,2] [,3] [,4]
> >> [1,]   NA    8    8    8
> >> [2,]    8   NA    8    8
> >> [3,]    8    8   NA    8
> >> [4,]    8    8    8   NA
> >>
> >>   > round(cor(dat, use="pairwise.complete.obs"), 3) #
check
> >>          x1     x2     x3     x4
> >> x1  1.000 -0.736 -0.049  0.858
> >> x2 -0.736  1.000  0.458 -0.428
> >> x3 -0.049  0.458  1.000  0.092
> >> x4  0.858 -0.428  0.092  1.000
> >>
> >> More generally, I think that it's a good idea to learn a
little bit
> >> about R programming if you intend to use R in your work.
You'll then be
> >> able to solve problems like this yourself.
> >>
> >> I hope this helps,
> >>    John
> >>
> >>> Thank you,
> >>>
> >>> On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at
mcmaster.ca> wrote:
> >>>>
> >>>> Dear Val,
> >>>>
> >>>> On 2022-08-22 1:33 p.m., Val wrote:
> >>>>> For the time being  I am assuming the relationship
across  variables
> >>>>> is linear.  I want get the values first  and detailed
examining  of
> >>>>> the relationship will follow later.
> >>>>
> >>>> This seems backwards to me, but I'll refrain from
commenting further on
> >>>> whether what you want to do makes sense and instead
address how to do it
> >>>> (not, BTW, because I disagree with Bert's and
Tim's remarks).
> >>>>
> >>>> Please see below:
> >>>>
> >>>>>
> >>>>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron
<tebert at ufl.edu> wrote:
> >>>>>>
> >>>>>> I (maybe) agree, but I would go further than that.
There are assumptions associated with the test that are missing. It is not clear
that the relationships are all linear. Regardless of a "significant
outcome" all of the relationships need to be explored in more detail than
what is provided in the correlation test.
> >>>>>>
> >>>>>> Multiplicity adjustment as in :
sciencedirect.com/science/article/pii/S0197245600001069 is not an
issue that I can see in these data from the information provided. At least not
in the same sense as used in the link.
> >>>>>>
> >>>>>> My first guess at the meaning of
"multiplicity adjustment" was closer to the experimentwise error rate
in a multiple comparison procedure.
dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1
error rate is inflated the more test you do and if you perform enough tests you
find significant outcomes by chance alone. There is great significance in the
Redskins rule: en.wikipedia.org/wiki/Redskins_Rule.
> >>>>>>
> >>>>>> A simple solution is to apply a Bonferroni
correction where alpha is divided by the number of comparisons. If there are
250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes
in a way that makes sense. What is the connection between a football team's
last home game an the election result that would enable me to take another team
and apply their last home game result to the outcome of a different election?
> >>>>>>
> >>>>>> Another complication is if variables x2 through
x250 are themselves correlated. Not enough information was provided in the
problem to know if this is an issue, but 250 orthogonal variables in a real
dataset would be a bit unusual considering the experimentwise error rate
previously mentioned.
> >>>>>>
> >>>>>> Large datasets can be very messy.
> >>>>>>
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bert Gunter <bgunter.4567 at
gmail.com>
> >>>>>> Sent: Monday, August 22, 2022 12:07 PM
> >>>>>> To: Ebert,Timothy Aaron <tebert at ufl.edu>
> >>>>>> Cc: Val <valkremk at gmail.com>; r-help at
R-project.org (r-help at r-project.org) <r-help at r-project.org>
> >>>>>> Subject: Re: [R] Correlate
> >>>>>>
> >>>>>> [External Email]
> >>>>>>
> >>>>>> ... But of course the p-values are essentially
meaningless without some sort of multiplicity adjustment.
> >>>>>> (search on "multiplicity adjustment" for
details). :-(
> >>>>>>
> >>>>>> -- Bert
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy
Aaron <tebert at ufl.edu> wrote:
> >>>>>>>
> >>>>>>> A somewhat clunky solution:
> >>>>>>> for(i in colnames(dat)){
> >>>>>>>      print(cor.test(dat[,i], dat$x1, method =
"pearson", use = "complete.obs")$estimate)
> >>>>>>>      print(cor.test(dat[,i], dat$x1, method =
"pearson", use > >>>>>>>
"complete.obs")$p.value) }
> >>>>
> >>>> Because of missing data, this computes the correlations on
different
> >>>> subsets of the data. A simple solution is to filter the
data for NAs:
> >>>>
> >>>> D <- na.omit(dat)
> >>>>
> >>>> More comments below:
> >>>>
> >>>>>>>
> >>>>>>> Rather than printing you could set up an array
or list to save the results.
> >>>>>>>
> >>>>>>>
> >>>>>>> Tim
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: R-help <r-help-bounces at
r-project.org> On Behalf Of Val
> >>>>>>> Sent: Monday, August 22, 2022 11:09 AM
> >>>>>>> To: r-help at R-project.org (r-help at
r-project.org) <r-help at r-project.org>
> >>>>>>> Subject: [R] Correlate
> >>>>>>>
> >>>>>>> [External Email]
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I have a data set with  ~250 
variables(columns).  I want to calculate
> >>>>>>> the correlation of  one variable with the rest
of the other variables
> >>>>>>> and also want  the p-values  for each
correlation.  Please see the
> >>>>>>> sample data and my attempt.  I  have got the
correlation but unable to
> >>>>>>> get the p-values
> >>>>>>>
> >>>>>>> dat <- read.table(text="x1 x2 x3 x4
> >>>>>>>               1.68 -0.96 -1.25  0.61
> >>>>>>>              -0.06  0.41  0.06 -0.96
> >>>>>>>                  .    0.08  1.14  1.42
> >>>>>>>               0.80 -0.67  0.53 -0.68
> >>>>>>>               0.23 -0.97 -1.18 -0.78
> >>>>>>>              -1.03  1.11 -0.61    .
> >>>>>>>               2.15     .    0.02  0.66
> >>>>>>>               0.35 -0.37 -0.26  0.39
> >>>>>>>              -0.66  0.89   .    -1.49
> >>>>>>>               0.11  1.52  0.73 
-1.03",header=TRUE)
> >>>>>>>
> >>>>>>> #change all to numeric
> >>>>>>>        dat[] <- lapply(dat, function(x)
as.numeric(as.character(x)))
> >>>>
> >>>> This data manipulation is unnecessary. Just specify the
argument
> >>>> na.strings="." to read.table().
> >>>>
> >>>>>>>
> >>>>>>>        data_cor <- cor(dat[ , colnames(dat)
!= "x1"],  dat$x1, method > >>>>>>>
"pearson", use = "complete.obs")
> >>>>>>>
> >>>>>>> Result
> >>>>>>>                  [,1]
> >>>>>>> x2 -0.5845835
> >>>>>>> x3 -0.4664220
> >>>>>>> x4  0.7202837
> >>>>>>>
> >>>>>>> How do I get the p-values ?
> >>>>
> >>>> Taking a somewhat different approach from cor.test(), you
can apply
> >>>> Fisher's z-transformation (recall that D is the data
filtered for NAs):
> >>>>
> >>>>    > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) -
3), lower.tail=FALSE)
> >>>>            [,1]
> >>>> x2 0.2462807
> >>>> x3 0.3812854
> >>>> x4 0.1156939
> >>>>
> >>>> I hope this helps,
> >>>>     John
> >>>>
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>>>>
nam10.safelinks.protection.outlook.com/?url=https://stat
> >>>>>>>
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> >>>>>>>
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> >>>>>>>
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> >>>>>>>
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> >>>>>>>
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> >>>>>>> =0 PLEASE do read the posting guide
> >>>>>>>
nam10.safelinks.protection.outlook.com/?url=http://www.r
> >>>>>>>
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> >>>>>>>
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> >>>>>>>
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> >>>>>>>
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> >>>>>>>
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> >>>>>>> and provide commented, minimal,
self-contained, reproducible code.
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>>>>
nam10.safelinks.protection.outlook.com/?url=https://stat
> >>>>>>>
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> >>>>>>>
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> >>>>>>>
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> >>>>>>>
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> >>>>>>>
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> >>>>>>> =0 PLEASE do read the posting guide
> >>>>>>>
nam10.safelinks.protection.outlook.com/?url=http://www.r
> >>>>>>>
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> >>>>>>>
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> >>>>>>>
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> >>>>>>>
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> >>>>>>>
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> >>>>>>> and provide commented, minimal,
self-contained, reproducible code.
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >>>> --
> >>>> John Fox, Professor Emeritus
> >>>> McMaster University
> >>>> Hamilton, Ontario, Canada
> >>>> web: socialsciences.mcmaster.ca/jfox
> >>>>
> >> --
> >> John Fox, Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> web: socialsciences.mcmaster.ca/jfox
> >>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: socialsciences.mcmaster.ca/jfox
>