Hi all, I have a large data set and want to immediately build a 'blind' model without first examining the data. Now it appears in the data there are a lot of fields that are constant or all missing values - which prevents the model from being built. Can someone point me the right direction as to how I can automatically purge my data file of these useless fields. Thanks in advance, pdb train <- read.csv("TrainingData.csv") library(gbm) i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli..... 1: In gbm.fit(x, y, offset = offset, distribution = distribution, ... : variable 5: var1 has no variation. -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html Sent from the R help mailing list archive at Nabble.com.
You can remove NAs with: train <- subset(train, !is.na(TargetVariable)) I am not sure what you mean by constant values. You could use 'table' to determine which values appear the most and then remove them: x <- table(train$TargetVariable) train <- subset(train, !(TargetVariable %in% names(x)[x > someCountAboveWhichToDelete])) But you probably need to look at your data and determine which numbers are in the set that you need to delete. On Sat, Jul 10, 2010 at 6:28 PM, pdb <philb at philbrierley.com> wrote:> > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. > > Thanks in advance, > > pdb > > train <- read.csv("TrainingData.csv") > library(gbm) > i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli..... > > 1: In gbm.fit(x, y, offset = offset, distribution = distribution, ?... : > ?variable 5: var1 has no variation. > -- > View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Sat, Jul 10, 2010 at 6:28 PM, pdb <philb at philbrierley.com> wrote:> > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. >Try this. It will remove constant columns (such as column b below), all NA columns (such as column a below) and columns which are constant aside from NAs (such as column d below). In this example only column c should survive: # test data DF <- data.frame(a = NA, b = 1, c = 1:5, d = c(NA, NA, 1, 1, 1)) sd. <- sd(DF, na.rm = TRUE) DF[!is.na(sd.) & sd. > 0]
Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284915.html Sent from the R help mailing list archive at Nabble.com.
What was the question and answer here? -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of pdb Sent: Sunday, July 11, 2010 5:23 AM To: r-help at r-project.org Subject: Re: [R] eliminating constant variables Importance: Low Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2 284915.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}