Hi all,
I have a large data set and want to immediately build a 'blind' model
without first examining the data. Now it appears in the data there are a lot
of fields that are constant or all missing values - which prevents the model
from being built.
Can someone point me the right direction as to how I can automatically purge
my data file of these useless fields.
Thanks in advance,
pdb
train <- read.csv("TrainingData.csv")
library(gbm)
i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli.....
1: In gbm.fit(x, y, offset = offset, distribution = distribution, ... :
variable 5: var1 has no variation.
--
View this message in context:
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html
Sent from the R help mailing list archive at Nabble.com.
You can remove NAs with: train <- subset(train, !is.na(TargetVariable)) I am not sure what you mean by constant values. You could use 'table' to determine which values appear the most and then remove them: x <- table(train$TargetVariable) train <- subset(train, !(TargetVariable %in% names(x)[x > someCountAboveWhichToDelete])) But you probably need to look at your data and determine which numbers are in the set that you need to delete. On Sat, Jul 10, 2010 at 6:28 PM, pdb <philb at philbrierley.com> wrote:> > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. > > Thanks in advance, > > pdb > > train <- read.csv("TrainingData.csv") > library(gbm) > i.gbm<-gbm(TargetVariable ~ . ,data=train,distribution="bernoulli..... > > 1: In gbm.fit(x, y, offset = offset, distribution = distribution, ?... : > ?variable 5: var1 has no variation. > -- > View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284831.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Sat, Jul 10, 2010 at 6:28 PM, pdb <philb at philbrierley.com> wrote:> > Hi all, > > I have a large data set and want to immediately build a 'blind' model > without first examining the data. Now it appears in the data there are a lot > of fields that are constant or all missing values - which prevents the model > from being built. > > Can someone point me the right direction as to how I can automatically purge > my data file of these useless fields. >Try this. It will remove constant columns (such as column b below), all NA columns (such as column a below) and columns which are constant aside from NAs (such as column d below). In this example only column c should survive: # test data DF <- data.frame(a = NA, b = 1, c = 1:5, d = c(NA, NA, 1, 1, 1)) sd. <- sd(DF, na.rm = TRUE) DF[!is.na(sd.) & sd. > 0]
Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2284915.html Sent from the R help mailing list archive at Nabble.com.
What was the question and answer here?
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of pdb
Sent: Sunday, July 11, 2010 5:23 AM
To: r-help at r-project.org
Subject: Re: [R] eliminating constant variables
Importance: Low
Awsome!
It made sense once I realised SD=standard deviation !
pdb
--
View this message in context:
http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2
284915.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}