Hi, I can't seem to figure out why this gives me different answers. Probably something obvious, but I thought they would be the same. This is an minimal example from the help page of cor() :> ## swM := "swiss" with 3 "missing"s : > swM <- swiss > colnames(swM) <- abbreviate(colnames(swiss), min=6) > swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" > cor(swM, use = "na.or.complete")Frtlty Agrclt Exmntn Eductn Cathlc Infn.M Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298 0.38781500 Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837 -0.07168223 Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436 -0.10710047 Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445 -0.08343279 Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000 0.17221594 Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159 1.00000000> # why isn't this the same? > cor(swM[,c(1:2)], use = "na.or.complete")Frtlty Agrclt Frtlty 1.0000000 0.3920289 Agrclt 0.3920289 1.0000000 [[alternative HTML version deleted]]
Jeff Newmiller
2014-Apr-14 01:35 UTC
[R] correlation with missing values.. different answers
Please post in plain text per the Posting Guide.
Read ?cor, particularly the part about "complete.cases". Your two
cases have different effective input rows.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On April 13, 2014 6:08:51 PM PDT, Paul Tanger <paul.tanger at
colostate.edu> wrote:>Hi,
>I can't seem to figure out why this gives me different answers.
>Probably
>something obvious, but I thought they would be the same.
>This is an minimal example from the help page of cor() :
>
>> ## swM := "swiss" with 3 "missing"s :
>> swM <- swiss
>> colnames(swM) <- abbreviate(colnames(swiss), min=6)
>> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3
"missing"
>> cor(swM, use = "na.or.complete")
> Frtlty Agrclt Exmntn Eductn Cathlc Infn.M
>Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298
>0.38781500
>Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837
>-0.07168223
>Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436
>-0.10710047
>Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445
>-0.08343279
>Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000
>0.17221594
>Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159
>1.00000000
>> # why isn't this the same?
>> cor(swM[,c(1:2)], use = "na.or.complete")
> Frtlty Agrclt
>Frtlty 1.0000000 0.3920289
>Agrclt 0.3920289 1.0000000
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hi, I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset. cor(swM[-1,1:2]) # Frtlty Agrclt ?#Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 cor(swM[-1,])[1:2,1:2] #Frtlty Agrclt #Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 May be you can try with "pairwise.complete.obs" cor(swM, use = "pairwise.complete.obs") # Frtlty Agrclt Exmntn Eductn Cathlc Infn.M #Frtlty 1.0000000 0.39202893 -0.6531492 -0.66378886 0.4723129 0.41655603 #Agrclt 0.3920289 1.00000000 -0.7150561 -0.65221506 0.4152007 -0.03648427 #Exmntn -0.6531492 -0.71505612 1.0000000 0.69921153 -0.6003402 -0.11433546 ?#Eductn -0.6637889 -0.65221506 0.6992115 1.00000000 -0.1791334 -0.09932185 ?#Cathlc 0.4723129 0.41520069 -0.6003402 -0.17913339 1.0000000 0.18503786 ?#Infn.M 0.4165560 -0.03648427 -0.1143355 -0.09932185 0.1850379 1.00000000 cor(swM[,1:2],use="pairwise.complete.obs") # Frtlty Agrclt #Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 A.K. On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote: Hi, I can't seem to figure out why this gives me different answers.? Probably something obvious, but I thought they would be the same. This is an minimal example from the help page of cor() :> ## swM := "swiss" with? 3 "missing"s : > swM <- swiss > colnames(swM) <- abbreviate(colnames(swiss), min=6) > swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" > cor(swM, use = "na.or.complete")? ? ? ? ? Frtlty? ? ? Agrclt? ? Exmntn? ? ? Eductn? ? Cathlc? ? ? Infn.M Frtlty? 1.0000000? 0.37821953 -0.6548306 -0.67421581? 0.4772298? 0.38781500 Agrclt? 0.3782195? 1.00000000 -0.7127078 -0.64337782? 0.4014837 -0.07168223 Exmntn -0.6548306 -0.71270778? 1.0000000? 0.69776906 -0.6079436 -0.10710047 Eductn -0.6742158 -0.64337782? 0.6977691? 1.00000000 -0.1701445 -0.08343279 Cathlc? 0.4772298? 0.40148365 -0.6079436 -0.17014449? 1.0000000? 0.17221594 Infn.M? 0.3878150 -0.07168223 -0.1071005 -0.08343279? 0.1722159? 1.00000000> # why isn't this the same? > cor(swM[,c(1:2)], use = "na.or.complete")? ? ? ? ? Frtlty? ? Agrclt Frtlty 1.0000000 0.3920289 Agrclt 0.3920289 1.0000000 ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.