thr3ads.net - R help - [R] Remove columns from dataframe based on their statistics [May 2012]

If this information is useful, please help other people find it:
Share via:

Johannes Radinger

2012-May-31 13:27 UTC

[R] Remove columns from dataframe based on their statistics

Hi,

I have a dataframe and want to remove columns from it
that are populated with a similar value (for the total
column) (the variation of that column is 0). Is there an
easier way than to calculate the statistics and then
remove them by hand?

A <- runif(100)
B <- rep(1,100)
C <- rep(2.42,100)
D <- runif(100)
df <- data.frame(A,B,C,D) # if want to conditionally remove column B and C as
they show no variations

/Johannes
-- 

Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

J Toll

2012-May-31 13:52 UTC

head link

[R] Remove columns from dataframe based on their statistics

On Thu, May 31, 2012 at 8:27 AM, Johannes Radinger <JRadinger at gmx.at>
wrote:> Hi,
>
> I have a dataframe and want to remove columns from it
> that are populated with a similar value (for the total
> column) (the variation of that column is 0). Is there an
> easier way than to calculate the statistics and then
> remove them by hand?
>
> A <- runif(100)
> B <- rep(1,100)
> C <- rep(2.42,100)
> D <- runif(100)
> df <- data.frame(A,B,C,D) # if want to conditionally remove column B and
C as they show no variations

You could try something like:

for (i in seq(ncol(df), 1))
  if (length(unique(df[, i])) == 1) {
  df[, i] <- NULL
}

or for just numeric values:

for (i in seq(ncol(df), 1))
  if (all(mean(df[, i]) == df[, i])) {
  df[, i] <- NULL
}

HTH,

James

Jorge I Velez

2012-May-31 13:58 UTC

head link

[R] Remove columns from dataframe based on their statistics

Hi Johannes,

Try

df[, !apply(df, 2, function(x) sd(x, na.rm = TRUE) < 1e-10)]

HTH,
Jorge.-


On Thu, May 31, 2012 at 9:27 AM, Johannes Radinger <> wrote:
> Hi,
>
> I have a dataframe and want to remove columns from it
> that are populated with a similar value (for the total
> column) (the variation of that column is 0). Is there an
> easier way than to calculate the statistics and then
> remove them by hand?
>
> A <- runif(100)
> B <- rep(1,100)
> C <- rep(2.42,100)
> D <- runif(100)
> df <- data.frame(A,B,C,D) # if want to conditionally remove column B and
C
> as they show no variations
>
> /Johannes
> --
>
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

arun

2012-May-31 16:07 UTC

head link

[R] Remove columns from dataframe based on their statistics

HI,

I tweaked the code of James a little bit to produce the same
result.> for(i in seq(ncol(df),1))?if(sd(df[,i])==0){
?df[,i] <-NULL
?}





----- Original Message -----
From: J Toll <jctoll at gmail.com>
To: Johannes Radinger <JRadinger at gmx.at>
Cc: R-help at r-project.org
Sent: Thursday, May 31, 2012 9:52 AM
Subject: Re: [R] Remove columns from dataframe based on their statistics
On Thu, May 31, 2012 at 8:27 AM, Johannes Radinger <JRadinger at gmx.at>
wrote:> Hi,
>
> I have a dataframe and want to remove columns from it
> that are populated with a similar value (for the total
> column) (the variation of that column is 0). Is there an
> easier way than to calculate the statistics and then
> remove them by hand?
>
> A <- runif(100)
> B <- rep(1,100)
> C <- rep(2.42,100)
> D <- runif(100)
> df <- data.frame(A,B,C,D) # if want to conditionally remove column B and
C as they show no variations

You could try something like:

for (i in seq(ncol(df), 1))
? if (length(unique(df[, i])) == 1) {
? df[, i] <- NULL
}

or for just numeric values:

for (i in seq(ncol(df), 1))
? if (all(mean(df[, i]) == df[, i])) {
? df[, i] <- NULL
}

HTH,

James

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more apparently analagous threads

R help - May 2012 - Remove columns from dataframe based on their statistics

[R] Remove columns from dataframe based on their statistics

[R] Remove columns from dataframe based on their statistics

[R] Remove columns from dataframe based on their statistics

[R] Remove columns from dataframe based on their statistics

Reasonably Related Threads