thr3ads.net - R help - [R] Subsetting a data frame by dropping correlated variables [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Rita Carreira

2011-Apr-19 19:10 UTC

[R] Subsetting a data frame by dropping correlated variables

Hello R Users!
I have a data frame that has many variables, some with missing observations, and
some that are correlated with each other. I would like to subset the data by
dropping one of the variables that is correlated with another variable that I
will keep int he data frame. Alternatively, I could also drop both the variables
that are correlated with each other. Worry not! I am not deleting data, I am
just finding a subset of the data that I can use to impute some missing
observations.
I have tried the following statement 
dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use =
"pairwise.complete.obs", method ="pearson")<0.8)]
but it gives me the following error:
Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use =
"pairwise.complete.obs",  :
  undefined columns selected
Since I have several dozen data frames, it is impractical for me to manually
inspect the correlation matrices and select which variables to drop, so I am
trying to have R make the selection for me. Does any one have any idea on how to
accomplish this?
Thank you very much!
Rita ===================================== "If you think education is
expensive, try ignorance."--Derek Bok


 		 	   		  
	[[alternative HTML version deleted]]

Juliet Hannah

2011-Apr-28 02:33 UTC

head link

[R] Subsetting a data frame by dropping correlated variables

The 'findCorrelation' function in the caret package may be helpful.


On Tue, Apr 19, 2011 at 3:10 PM, Rita Carreira <ritacarreira at
hotmail.com> wrote:>
> Hello R Users!
> I have a data frame that has many variables, some with missing
observations, and some that are correlated with each other. I would like to
subset the data by dropping one of the variables that is correlated with another
variable that I will keep int he data frame. Alternatively, I could also drop
both the variables that are correlated with each other. Worry not! I am not
deleting data, I am just finding a subset of the data that I can use to impute
some missing observations.
> I have tried the following statement
> dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use =
"pairwise.complete.obs", method ="pearson")<0.8)]
> but it gives me the following error:
> Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use =
"pairwise.complete.obs", ?:
> ?undefined columns selected
> Since I have several dozen data frames, it is impractical for me to
manually inspect the correlation matrices and select which variables to drop, so
I am trying to have R make the selection for me. Does any one have any idea on
how to accomplish this?
> Thank you very much!
> Rita ===================================== "If you think education is
expensive, try ignorance."--Derek Bok
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Apr 2011 - Subsetting a data frame by dropping correlated variables

[R] Subsetting a data frame by dropping correlated variables

[R] Subsetting a data frame by dropping correlated variables

Seemingly Similar Threads