Dear All,
I am new to R, I have one question which might be easy.
I have a large data with more than 250 variable, i am reducing number of
variables by redun function as in the example below,
n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + runif(n)/10
x4 <- x1 + x2 + x3 + runif(n)/10
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
x6 <- 1*(x5=='a' | x5=='c')
data1 <- cbind(x1,x2,x3,x4,x5,x6)
data2 <- data.frame(data1)
outredun <- redun(~., data=data2, r2=.8,)
outredun
#outredun1 <- capture.output(redun(~., data=data2, r2=.8,))
#outredun1
#x25 <- outredun1[25]
#mydata12 <- daat1[myvars] #myvars I need to pass to retain variables
which gives me , say for this example Rendundant variables:x6 x4 x3 and
Predicted from variables: x1 x2 x5 as output in console.
I want to subset my original data with either by keeping 'Predicted from
variables' or by droping 'Rendundant variables'. I have tried using
capture.output function as mentioned above in the commented code but it
gives me a string like "x1 x2 x5 " which need to modify as
"x1", "x2", "x3"
as input to subset data.
As my data has more than 250 variables and evry time data and nuber of
variables are changing. How this can be achived?
Thanks in advance for the help.
Regards,
-Ajit
--
View this message in context:
http://r.789695.n4.nabble.com/Subsetting-data-by-eliminating-redundant-variables-tp3918199p3918199.html
Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2011-Oct-19 14:16 UTC
[R] Subsetting data by eliminating redundant variables
Assuming you are talking about redun() from the Hmisc package, it's
much easier than you are making it:
n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + runif(n)/10
x4 <- x1 + x2 + x3 + runif(n)/10
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
x6 <- 1*(x5=='a' | x5=='c')
data1 <- data.frame(x1,x2,x3,x4,x5,x6)
library(Hmisc)
V <- redun(~., data = data1, r2 = 0.8)
V$In
V$Out
Michael
On Wed, Oct 19, 2011 at 6:49 AM, aajit75 <aajit75 at yahoo.co.in>
wrote:> Dear All,
>
> I am new to R, I have one question which might be easy.
>
> I have a large data with more than 250 variable, i am reducing number of
> variables by redun function as in the example below,
>
> n <- 100
> x1 <- runif(n)
> x2 <- runif(n)
> x3 <- x1 + x2 + runif(n)/10
> x4 <- x1 + x2 + x3 + runif(n)/10
> x5 <-
factor(sample(c('a','b','c'),n,replace=TRUE))
> x6 <- 1*(x5=='a' | x5=='c')
> data1 <- cbind(x1,x2,x3,x4,x5,x6)
> data2 <- data.frame(data1)
> outredun <- redun(~., data=data2, r2=.8,)
> outredun
> #outredun1 <- capture.output(redun(~., data=data2, r2=.8,))
> #outredun1
> #x25 <- outredun1[25]
> #mydata12 <- daat1[myvars] #myvars I need to pass to retain variables
>
> which gives me , say for this example ?Rendundant variables:x6 x4 x3 and
> Predicted from variables: x1 x2 x5 as output in console.
>
> I want to subset my original data with either by keeping 'Predicted
from
> variables' or by droping 'Rendundant variables'. I have tried
using
> capture.output function as mentioned above in the commented code but it
> gives me a string like "x1 x2 x5 " which need to modify as
"x1", "x2", "x3"
> as input to subset data.
>
> As my data has more than 250 variables and evry time data and nuber of
> variables are changing. How this can be achived?
>
> Thanks in advance for the help.
>
> Regards,
> -Ajit
>
>
>
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Subsetting-data-by-eliminating-redundant-variables-tp3918199p3918199.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Maybe Matching Threads
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- How to get Quartiles when data contains both numeric variables and factors
- Putting directory path as a parameter