Hi, I can't seem to figure out why this gives me different answers. Probably something obvious, but I thought they would be the same. This is an minimal example from the help page of cor() :> ## swM := "swiss" with 3 "missing"s : > swM <- swiss > colnames(swM) <- abbreviate(colnames(swiss), min=6) > swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" > cor(swM, use = "na.or.complete")Frtlty Agrclt Exmntn Eductn Cathlc Infn.M Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298 0.38781500 Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837 -0.07168223 Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436 -0.10710047 Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445 -0.08343279 Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000 0.17221594 Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159 1.00000000> # why isn't this the same? > cor(swM[,c(1:2)], use = "na.or.complete")Frtlty Agrclt Frtlty 1.0000000 0.3920289 Agrclt 0.3920289 1.0000000 [[alternative HTML version deleted]]
Jeff Newmiller
2014-Apr-14 01:35 UTC
[R] correlation with missing values.. different answers
Please post in plain text per the Posting Guide. Read ?cor, particularly the part about "complete.cases". Your two cases have different effective input rows. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On April 13, 2014 6:08:51 PM PDT, Paul Tanger <paul.tanger at colostate.edu> wrote:>Hi, >I can't seem to figure out why this gives me different answers. >Probably >something obvious, but I thought they would be the same. >This is an minimal example from the help page of cor() : > >> ## swM := "swiss" with 3 "missing"s : >> swM <- swiss >> colnames(swM) <- abbreviate(colnames(swiss), min=6) >> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" >> cor(swM, use = "na.or.complete") > Frtlty Agrclt Exmntn Eductn Cathlc Infn.M >Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298 >0.38781500 >Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837 >-0.07168223 >Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436 >-0.10710047 >Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445 >-0.08343279 >Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000 >0.17221594 >Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159 >1.00000000 >> # why isn't this the same? >> cor(swM[,c(1:2)], use = "na.or.complete") > Frtlty Agrclt >Frtlty 1.0000000 0.3920289 >Agrclt 0.3920289 1.0000000 > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi, I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset. cor(swM[-1,1:2]) # Frtlty Agrclt ?#Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 cor(swM[-1,])[1:2,1:2] #Frtlty Agrclt #Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 May be you can try with "pairwise.complete.obs" cor(swM, use = "pairwise.complete.obs") # Frtlty Agrclt Exmntn Eductn Cathlc Infn.M #Frtlty 1.0000000 0.39202893 -0.6531492 -0.66378886 0.4723129 0.41655603 #Agrclt 0.3920289 1.00000000 -0.7150561 -0.65221506 0.4152007 -0.03648427 #Exmntn -0.6531492 -0.71505612 1.0000000 0.69921153 -0.6003402 -0.11433546 ?#Eductn -0.6637889 -0.65221506 0.6992115 1.00000000 -0.1791334 -0.09932185 ?#Cathlc 0.4723129 0.41520069 -0.6003402 -0.17913339 1.0000000 0.18503786 ?#Infn.M 0.4165560 -0.03648427 -0.1143355 -0.09932185 0.1850379 1.00000000 cor(swM[,1:2],use="pairwise.complete.obs") # Frtlty Agrclt #Frtlty 1.0000000 0.3920289 #Agrclt 0.3920289 1.0000000 A.K. On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote: Hi, I can't seem to figure out why this gives me different answers.? Probably something obvious, but I thought they would be the same. This is an minimal example from the help page of cor() :> ## swM := "swiss" with? 3 "missing"s : > swM <- swiss > colnames(swM) <- abbreviate(colnames(swiss), min=6) > swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" > cor(swM, use = "na.or.complete")? ? ? ? ? Frtlty? ? ? Agrclt? ? Exmntn? ? ? Eductn? ? Cathlc? ? ? Infn.M Frtlty? 1.0000000? 0.37821953 -0.6548306 -0.67421581? 0.4772298? 0.38781500 Agrclt? 0.3782195? 1.00000000 -0.7127078 -0.64337782? 0.4014837 -0.07168223 Exmntn -0.6548306 -0.71270778? 1.0000000? 0.69776906 -0.6079436 -0.10710047 Eductn -0.6742158 -0.64337782? 0.6977691? 1.00000000 -0.1701445 -0.08343279 Cathlc? 0.4772298? 0.40148365 -0.6079436 -0.17014449? 1.0000000? 0.17221594 Infn.M? 0.3878150 -0.07168223 -0.1071005 -0.08343279? 0.1722159? 1.00000000> # why isn't this the same? > cor(swM[,c(1:2)], use = "na.or.complete")? ? ? ? ? Frtlty? ? Agrclt Frtlty 1.0000000 0.3920289 Agrclt 0.3920289 1.0000000 ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.