Johannes Radinger
2012-May-31 13:27 UTC
[R] Remove columns from dataframe based on their statistics
Hi, I have a dataframe and want to remove columns from it that are populated with a similar value (for the total column) (the variation of that column is 0). Is there an easier way than to calculate the statistics and then remove them by hand? A <- runif(100) B <- rep(1,100) C <- rep(2.42,100) D <- runif(100) df <- data.frame(A,B,C,D) # if want to conditionally remove column B and C as they show no variations /Johannes -- Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
On Thu, May 31, 2012 at 8:27 AM, Johannes Radinger <JRadinger at gmx.at> wrote:> Hi, > > I have a dataframe and want to remove columns from it > that are populated with a similar value (for the total > column) (the variation of that column is 0). Is there an > easier way than to calculate the statistics and then > remove them by hand? > > A <- runif(100) > B <- rep(1,100) > C <- rep(2.42,100) > D <- runif(100) > df <- data.frame(A,B,C,D) # if want to conditionally remove column B and C as they show no variationsYou could try something like: for (i in seq(ncol(df), 1)) if (length(unique(df[, i])) == 1) { df[, i] <- NULL } or for just numeric values: for (i in seq(ncol(df), 1)) if (all(mean(df[, i]) == df[, i])) { df[, i] <- NULL } HTH, James
Jorge I Velez
2012-May-31 13:58 UTC
[R] Remove columns from dataframe based on their statistics
Hi Johannes, Try df[, !apply(df, 2, function(x) sd(x, na.rm = TRUE) < 1e-10)] HTH, Jorge.- On Thu, May 31, 2012 at 9:27 AM, Johannes Radinger <> wrote:> Hi, > > I have a dataframe and want to remove columns from it > that are populated with a similar value (for the total > column) (the variation of that column is 0). Is there an > easier way than to calculate the statistics and then > remove them by hand? > > A <- runif(100) > B <- rep(1,100) > C <- rep(2.42,100) > D <- runif(100) > df <- data.frame(A,B,C,D) # if want to conditionally remove column B and C > as they show no variations > > /Johannes > -- > > Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
HI, I tweaked the code of James a little bit to produce the same result.> for(i in seq(ncol(df),1))?if(sd(df[,i])==0){ ?df[,i] <-NULL ?} ----- Original Message ----- From: J Toll <jctoll at gmail.com> To: Johannes Radinger <JRadinger at gmx.at> Cc: R-help at r-project.org Sent: Thursday, May 31, 2012 9:52 AM Subject: Re: [R] Remove columns from dataframe based on their statistics On Thu, May 31, 2012 at 8:27 AM, Johannes Radinger <JRadinger at gmx.at> wrote:> Hi, > > I have a dataframe and want to remove columns from it > that are populated with a similar value (for the total > column) (the variation of that column is 0). Is there an > easier way than to calculate the statistics and then > remove them by hand? > > A <- runif(100) > B <- rep(1,100) > C <- rep(2.42,100) > D <- runif(100) > df <- data.frame(A,B,C,D) # if want to conditionally remove column B and C as they show no variationsYou could try something like: for (i in seq(ncol(df), 1)) ? if (length(unique(df[, i])) == 1) { ? df[, i] <- NULL } or for just numeric values: for (i in seq(ncol(df), 1)) ? if (all(mean(df[, i]) == df[, i])) { ? df[, i] <- NULL } HTH, James ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.