thr3ads.net - R help - [R] algorithm that iteratively drops columns of a data-frame [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Martin Batholdy

2011-Nov-09 15:36 UTC

[R] algorithm that iteratively drops columns of a data-frame

Dear R-Users,


I have a problem with an algorithm that iteratively goes over a data.frame and
exclude n-columns each step based on a statistical criterion.
So that the 'column-space' gets smaller and smaller with each iteration
(like when you do stepwise regression).

The problem is that in every round I use a new subset of my data.frame.

However, as soon as I "generate" this subset by indexing the
data.frame I get of course different column-numbers (compared to my original
data-frame).

How can I solve this?



I prepared a small example to make my problem easier to understand:


Here I generate a data.frame containing 6 vectors with different means.

The loop now should exclude the vector with the smallest mean in each round.

At the end I want to have a vector ('drop') which contains the column
numbers that I can apply on the original data.frame to get a subset with the
highest means.

But the problem is that this is not working, since every time I generate a
subset ('data[,-drop]') I of course get now different column-numbers
that differ from the column-numbers of the original data-frame.

So, in the end I can't use my drop-vector on my original data-frame ? since
the dimension of the testing data-frame changes in every loop-round.


How can I deal with this kind of problem?

Any suggestions are highly appreciated! 
(of course for the example code, there are much easier method to achieve the
goal of finding the columns with the smallest means ? It is a pretty generic
example)


here is the sample code:


x1 <- rnorm(200, 5, 2)
x2 <- rnorm(200, 6, 2)
x3 <- rnorm(200, 1, 2)
x4 <- rnorm(200, 12, 2)
x5 <- rnorm(200, 8, 2)
x6 <- rnorm(200, 9, 2)


data <- data.frame(x1, x2, x3, x4, x5,x6)

col_means <- colMeans(data)
drop <- match(min(col_means), col_means)


for(i in 1:4) {

	col_means <- colMeans(data[,-drop])
	drop <- c(drop, match(min(col_means), col_means))

}

R. Michael Weylandt

2011-Nov-09 15:47 UTC

head link

[R] algorithm that iteratively drops columns of a data-frame

Perhaps attach placeholder names to your columns and use those rather
than indices?

Michael

On Wed, Nov 9, 2011 at 10:36 AM, Martin Batholdy
<batholdy at googlemail.com> wrote:> Dear R-Users,
>
>
> I have a problem with an algorithm that iteratively goes over a data.frame
and exclude n-columns each step based on a statistical criterion.
> So that the 'column-space' gets smaller and smaller with each
iteration (like when you do stepwise regression).
>
> The problem is that in every round I use a new subset of my data.frame.
>
> However, as soon as I "generate" this subset by indexing the
data.frame I get of course different column-numbers (compared to my original
data-frame).
>
> How can I solve this?
>
>
>
> I prepared a small example to make my problem easier to understand:
>
>
> Here I generate a data.frame containing 6 vectors with different means.
>
> The loop now should exclude the vector with the smallest mean in each
round.
>
> At the end I want to have a vector ('drop') which contains the
column numbers that I can apply on the original data.frame to get a subset with
the highest means.
>
> But the problem is that this is not working, since every time I generate a
subset ('data[,-drop]') I of course get now different column-numbers
that differ from the column-numbers of the original data-frame.
>
> So, in the end I can't use my drop-vector on my original data-frame ?
since the dimension of the testing data-frame changes in every loop-round.
>
>
> How can I deal with this kind of problem?
>
> Any suggestions are highly appreciated!
> (of course for the example code, there are much easier method to achieve
the goal of finding the columns with the smallest means ? It is a pretty generic
example)
>
>
> here is the sample code:
>
>
> x1 <- rnorm(200, 5, 2)
> x2 <- rnorm(200, 6, 2)
> x3 <- rnorm(200, 1, 2)
> x4 <- rnorm(200, 12, 2)
> x5 <- rnorm(200, 8, 2)
> x6 <- rnorm(200, 9, 2)
>
>
> data <- data.frame(x1, x2, x3, x4, x5,x6)
>
> col_means <- colMeans(data)
> drop <- match(min(col_means), col_means)
>
>
> for(i in 1:4) {
>
> ? ? ? ?col_means <- colMeans(data[,-drop])
> ? ? ? ?drop <- c(drop, match(min(col_means), col_means))
>
> }
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jeff Newmiller

2011-Nov-09 16:27 UTC

head link

[R] algorithm that iteratively drops columns of a data-frame

Try

data[,!names(data) %in% names(col_means)]

On Wed, 9 Nov 2011, Martin Batholdy wrote:
> Dear R-Users,
>
>
> I have a problem with an algorithm that iteratively goes over a data.frame
and exclude n-columns each step based on a statistical criterion.
> So that the 'column-space' gets smaller and smaller with each
iteration (like when you do stepwise regression).
>
> The problem is that in every round I use a new subset of my data.frame.
>
> However, as soon as I "generate" this subset by indexing the
data.frame I get of course different column-numbers (compared to my original
data-frame).
>
> How can I solve this?
>
>
>
> I prepared a small example to make my problem easier to understand:
>
>
> Here I generate a data.frame containing 6 vectors with different means.
>
> The loop now should exclude the vector with the smallest mean in each
round.
>
> At the end I want to have a vector ('drop') which contains the
column numbers that I can apply on the original data.frame to get a subset with
the highest means.
>
> But the problem is that this is not working, since every time I generate a
subset ('data[,-drop]') I of course get now different column-numbers
that differ from the column-numbers of the original data-frame.
>
> So, in the end I can't use my drop-vector on my original data-frame ?
since the dimension of the testing data-frame changes in every loop-round.
>
>
> How can I deal with this kind of problem?
>
> Any suggestions are highly appreciated!
> (of course for the example code, there are much easier method to achieve
the goal of finding the columns with the smallest means ? It is a pretty generic
example)
>
>
> here is the sample code:
>
>
> x1 <- rnorm(200, 5, 2)
> x2 <- rnorm(200, 6, 2)
> x3 <- rnorm(200, 1, 2)
> x4 <- rnorm(200, 12, 2)
> x5 <- rnorm(200, 8, 2)
> x6 <- rnorm(200, 9, 2)
>
>
> data <- data.frame(x1, x2, x3, x4, x5,x6)
>
> col_means <- colMeans(data)
> drop <- match(min(col_means), col_means)
>
>
> for(i in 1:4) {
>
> 	col_means <- colMeans(data[,-drop])
> 	drop <- c(drop, match(min(col_means), col_means))
>
> }
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

Reasonably Related Threads

Search for more maybe matching threads

R help - Nov 2011 - algorithm that iteratively drops columns of a data-frame

[R] algorithm that iteratively drops columns of a data-frame

[R] algorithm that iteratively drops columns of a data-frame

[R] algorithm that iteratively drops columns of a data-frame

Reasonably Related Threads