Martin Batholdy
2011-Nov-09 15:36 UTC
[R] algorithm that iteratively drops columns of a data-frame
Dear R-Users, I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion. So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression). The problem is that in every round I use a new subset of my data.frame. However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame). How can I solve this? I prepared a small example to make my problem easier to understand: Here I generate a data.frame containing 6 vectors with different means. The loop now should exclude the vector with the smallest mean in each round. At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means. But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame. So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round. How can I deal with this kind of problem? Any suggestions are highly appreciated! (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example) here is the sample code: x1 <- rnorm(200, 5, 2) x2 <- rnorm(200, 6, 2) x3 <- rnorm(200, 1, 2) x4 <- rnorm(200, 12, 2) x5 <- rnorm(200, 8, 2) x6 <- rnorm(200, 9, 2) data <- data.frame(x1, x2, x3, x4, x5,x6) col_means <- colMeans(data) drop <- match(min(col_means), col_means) for(i in 1:4) { col_means <- colMeans(data[,-drop]) drop <- c(drop, match(min(col_means), col_means)) }
R. Michael Weylandt
2011-Nov-09 15:47 UTC
[R] algorithm that iteratively drops columns of a data-frame
Perhaps attach placeholder names to your columns and use those rather than indices? Michael On Wed, Nov 9, 2011 at 10:36 AM, Martin Batholdy <batholdy at googlemail.com> wrote:> Dear R-Users, > > > I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion. > So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression). > > The problem is that in every round I use a new subset of my data.frame. > > However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame). > > How can I solve this? > > > > I prepared a small example to make my problem easier to understand: > > > Here I generate a data.frame containing 6 vectors with different means. > > The loop now should exclude the vector with the smallest mean in each round. > > At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means. > > But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame. > > So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round. > > > How can I deal with this kind of problem? > > Any suggestions are highly appreciated! > (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example) > > > here is the sample code: > > > x1 <- rnorm(200, 5, 2) > x2 <- rnorm(200, 6, 2) > x3 <- rnorm(200, 1, 2) > x4 <- rnorm(200, 12, 2) > x5 <- rnorm(200, 8, 2) > x6 <- rnorm(200, 9, 2) > > > data <- data.frame(x1, x2, x3, x4, x5,x6) > > col_means <- colMeans(data) > drop <- match(min(col_means), col_means) > > > for(i in 1:4) { > > ? ? ? ?col_means <- colMeans(data[,-drop]) > ? ? ? ?drop <- c(drop, match(min(col_means), col_means)) > > } > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jeff Newmiller
2011-Nov-09 16:27 UTC
[R] algorithm that iteratively drops columns of a data-frame
Try data[,!names(data) %in% names(col_means)] On Wed, 9 Nov 2011, Martin Batholdy wrote:> Dear R-Users, > > > I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion. > So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression). > > The problem is that in every round I use a new subset of my data.frame. > > However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame). > > How can I solve this? > > > > I prepared a small example to make my problem easier to understand: > > > Here I generate a data.frame containing 6 vectors with different means. > > The loop now should exclude the vector with the smallest mean in each round. > > At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means. > > But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame. > > So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round. > > > How can I deal with this kind of problem? > > Any suggestions are highly appreciated! > (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example) > > > here is the sample code: > > > x1 <- rnorm(200, 5, 2) > x2 <- rnorm(200, 6, 2) > x3 <- rnorm(200, 1, 2) > x4 <- rnorm(200, 12, 2) > x5 <- rnorm(200, 8, 2) > x6 <- rnorm(200, 9, 2) > > > data <- data.frame(x1, x2, x3, x4, x5,x6) > > col_means <- colMeans(data) > drop <- match(min(col_means), col_means) > > > for(i in 1:4) { > > col_means <- colMeans(data[,-drop]) > drop <- c(drop, match(min(col_means), col_means)) > > } > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k