Dear All, I am new to R, I have one question which might be easy. I have a large data with more than 250 variable, i am reducing number of variables by redun function as in the example below, n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- cbind(x1,x2,x3,x4,x5,x6) data2 <- data.frame(data1) outredun <- redun(~., data=data2, r2=.8,) outredun #outredun1 <- capture.output(redun(~., data=data2, r2=.8,)) #outredun1 #x25 <- outredun1[25] #mydata12 <- daat1[myvars] #myvars I need to pass to retain variables which gives me , say for this example Rendundant variables:x6 x4 x3 and Predicted from variables: x1 x2 x5 as output in console. I want to subset my original data with either by keeping 'Predicted from variables' or by droping 'Rendundant variables'. I have tried using capture.output function as mentioned above in the commented code but it gives me a string like "x1 x2 x5 " which need to modify as "x1", "x2", "x3" as input to subset data. As my data has more than 250 variables and evry time data and nuber of variables are changing. How this can be achived? Thanks in advance for the help. Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-data-by-eliminating-redundant-variables-tp3918199p3918199.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2011-Oct-19 14:16 UTC
[R] Subsetting data by eliminating redundant variables
Assuming you are talking about redun() from the Hmisc package, it's much easier than you are making it: n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- data.frame(x1,x2,x3,x4,x5,x6) library(Hmisc) V <- redun(~., data = data1, r2 = 0.8) V$In V$Out Michael On Wed, Oct 19, 2011 at 6:49 AM, aajit75 <aajit75 at yahoo.co.in> wrote:> Dear All, > > I am new to R, I have one question which might be easy. > > I have a large data with more than 250 variable, i am reducing number of > variables by redun function as in the example below, > > n <- 100 > x1 <- runif(n) > x2 <- runif(n) > x3 <- x1 + x2 + runif(n)/10 > x4 <- x1 + x2 + x3 + runif(n)/10 > x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) > x6 <- 1*(x5=='a' | x5=='c') > data1 <- cbind(x1,x2,x3,x4,x5,x6) > data2 <- data.frame(data1) > outredun <- redun(~., data=data2, r2=.8,) > outredun > #outredun1 <- capture.output(redun(~., data=data2, r2=.8,)) > #outredun1 > #x25 <- outredun1[25] > #mydata12 <- daat1[myvars] #myvars I need to pass to retain variables > > which gives me , say for this example ?Rendundant variables:x6 x4 x3 and > Predicted from variables: x1 x2 x5 as output in console. > > I want to subset my original data with either by keeping 'Predicted from > variables' or by droping 'Rendundant variables'. I have tried using > capture.output function as mentioned above in the commented code but it > gives me a string like "x1 x2 x5 " which need to modify as "x1", "x2", "x3" > as input to subset data. > > As my data has more than 250 variables and evry time data and nuber of > variables are changing. How this can be achived? > > Thanks in advance for the help. > > Regards, > -Ajit > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Subsetting-data-by-eliminating-redundant-variables-tp3918199p3918199.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Apparently Analagous Threads
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- Help understanding why glm and lrm.fit runs with my data, but lrm does not
- How to get Quartiles when data contains both numeric variables and factors
- Putting directory path as a parameter