Dear R forum, I have a data frame dat = data.frame( ABC = c(25.28000732,48.33857234,19.8013245,10.68361461), DEF = c(14.02722251,10.57985168,11.81890316,21.40171514), GHI = c(1,1,1,1), JKL = c(45.96423231,44.52986236,16.56514176,32.14545122), MNO = c(45.38438063,15.54338206,18.78444777,24.29486984))> datABC DEF GHI JKL MNO 1 25.28001 14.02722 1 45.96423 45.38438 2 48.33857 10.57985 1 44.52986 15.54338 3 19.80132 11.81890 1 16.56514 18.78445 4 10.68361 21.40172 1 32.14545 24.29487 When I try to find the correlation I get (which is obvious as my one column shows no variation) dat_cor = cor(dat) Warning message: In cor(dat) : the standard deviation is zero> dat_corABC DEF GHI JKL MNO ABC 1.0000000 -0.75600764 NA 0.55245223 -0.2735585 DEF -0.7560076 1.00000000 NA -0.06479082 0.2020781 GHI NA NA 1 NA NA JKL 0.5524522 -0.06479082 NA 1.00000000 0.4564568 MNO -0.2735585 0.20207810 NA 0.45645683 1.0000000 In reality I am dealing with about 300 variables and don't know which variables don't vary. My query is how do I remove the columns and rows with NA's. So for example, I need the correlation matrix for ABC, DEF, JKL and MNO only. Kindly guide. Thanking in advance. Regards Katherine [[alternative HTML version deleted]]
HI, Try: dat1<-dat[sapply(dat,function(x) length(unique(x)))>1] cor(dat1) #?????????? ABC???????? DEF???????? JKL??????? MNO #ABC? 1.0000000 -0.75600764? 0.55245223 -0.2735585 #DEF -0.7560076? 1.00000000 -0.06479082? 0.2020781 #JKL? 0.5524522 -0.06479082? 1.00000000? 0.4564568 #MNO -0.2735585? 0.20207810? 0.45645683? 1.0000000 A.K. From: Katherine Gobin <katherine_gobin at yahoo.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, June 14, 2013 10:03 AM Subject: [R] Removing "NA" from matrix Dear R forum, I have a data frame dat = data.frame( ABC = c(25.28000732,48.33857234,19.8013245,10.68361461), DEF = c(14.02722251,10.57985168,11.81890316,21.40171514), GHI = c(1,1,1,1), JKL = c(45.96423231,44.52986236,16.56514176,32.14545122), MNO = c(45.38438063,15.54338206,18.78444777,24.29486984))> dat?????? ABC????? DEF GHI????? JKL????? MNO 1 25.28001 14.02722?? 1 45.96423 45.38438 2 48.33857 10.57985?? 1 44.52986 15.54338 3 19.80132 11.81890?? 1 16.56514 18.78445 4 10.68361 21.40172?? 1 32.14545 24.29487 When I try to find the correlation I get (which is obvious as my one column shows no variation) dat_cor = cor(dat) Warning message: In cor(dat) : the standard deviation is zero> dat_cor?????????? ABC???????? DEF GHI???????? JKL??????? MNO ABC? 1.0000000 -0.75600764? NA? 0.55245223 -0.2735585 DEF -0.7560076? 1.00000000? NA -0.06479082? 0.2020781 GHI???????? NA????????? NA?? 1????????? NA???????? NA JKL? 0.5524522 -0.06479082? NA? 1.00000000? 0.4564568 MNO -0.2735585? 0.20207810? NA? 0.45645683? 1.0000000 In reality I am dealing with about 300 variables and don't know which variables don't vary. My query is how do I remove the columns and rows with NA's. So for example, I need the correlation matrix for ABC, DEF, JKL and MNO only. Kindly guide. Thanking in advance. Regards Katherine ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Jun 14, 2013, at 7:03 AM, Katherine Gobin wrote:> Dear R forum, > > I have a data frame > > > dat = data.frame( > ABC = c(25.28000732,48.33857234,19.8013245,10.68361461), > DEF = c(14.02722251,10.57985168,11.81890316,21.40171514), > GHI = c(1,1,1,1), > JKL = c(45.96423231,44.52986236,16.56514176,32.14545122), > MNO = c(45.38438063,15.54338206,18.78444777,24.29486984)) > >> dat > ABC DEF GHI JKL MNO > 1 25.28001 14.02722 1 45.96423 45.38438 > 2 48.33857 10.57985 1 44.52986 15.54338 > 3 19.80132 11.81890 1 16.56514 18.78445 > 4 10.68361 21.40172 1 32.14545 24.29487 > > > When I try to find the correlation I get (which is obvious as my one column shows no variation)Perhaps: dat_cor = cor(dat[ , sapply(dat, function(col) sd(col) != 0 ) ] )> Warning message: > In cor(dat) : the standard deviation is zero >> dat_cor > ABC DEF GHI JKL MNO > ABC 1.0000000 -0.75600764 NA 0.55245223 -0.2735585 > DEF -0.7560076 1.00000000 NA -0.06479082 0.2020781 > GHI NA NA 1 NA NA > JKL 0.5524522 -0.06479082 NA 1.00000000 0.4564568 > MNO -0.2735585 0.20207810 NA 0.45645683 1.0000000 > > > In reality I am dealing with about 300 variables and don't know which variables don't vary. > > My query is how do I remove the columns and rows with NA's. > > So for example, I need the correlation matrix for ABC, DEF, JKL and MNO only. > > Kindly guide. > > Thanking in advance. > > Regards > > Katherine > > [[alternative HTML version deleted]]Please post in plain text.> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA