thr3ads.net - similar to: "how to evaluate the significance of attributes in tree growing"

Displaying 20 results from an estimated 20000 matches similar to: "how to evaluate the significance of attributes in tree growing"

how to evaluate the significance of attributes in tree gr owing

2005 Jan 27

how to evaluate the significance of attributes in tree gr owing

FWIW, I wrote a little function to extract variable importance as defined in the CART book a while ago. It's rather limited: Only works for regression problem, and you need to set maxsurrogate=0 and maxcompete=0. It may (or may not) help you: varimp.rpart <- function(x) { dev <- x$frame[, c("var", "dev")] dev <- dev[dev$var != "<leaf>",

a problem in random forest

2005 Oct 11

a problem in random forest

Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data looks like this: > dim(trn3) [1] 7361 209 > dim(val3) [1] 7427 209 > mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[, 1:208], ytest=val3[,209], importance=T) my test data has 7427 observations but after prediction, > dim(mg.rf2$votes)

Random Forest

2007 Apr 23

Random Forest

Hi, I am trying to print out my confusion matrix after having created my random forest. I have put in this command: fit<-randomForest(MMS_ENABLED_HANDSET~.,data=dat,ntree=500,mtry=14, na.action=na.omit,confusion=TRUE) but I can't get it to give me the confusion matrix, anyone know how this works? Thansk! Ruben [[alternative HTML version deleted]]

memory problems when combining randomForests [Broadcast]

2006 Jul 27

memory problems when combining randomForests [Broadcast]

You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki > > Dear all, > > I am trying to train a randomForest using all my control data > (12,000 cases, ~ 20 explanatory variables, 2 classes). > Because

memory problem in handling large dataset

2005 Oct 27

memory problem in handling large dataset

Dear Listers: I have a question on handling large dataset. I searched R-Search and I hope I can get more information as to my specific case. First, my dataset has 1.7 billion observations and 350 variables, among which, 300 are float and 50 are integers. My system has 8 G memory, 64bit CPU, linux box. (currently, we don't plan to buy more memory). > R.version _ platform

margins defined in randomForest and supclust

2009 Jul 22

margins defined in randomForest and supclust

Hi there, How to solve the conflicts as to the same object between two packages, for example, like margins in both randomForest and supclust? When both libraries are installed, supclust will complain "margins" defined in randomForest. I can only solve it by re-starting R, which is very inconvenient, any clever way? Thanks, Weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc.

Collapsing solution to the question discussed above: Re: multi-class classification using rpart

2005 Jan 25

Collapsing solution to the question discussed above: Re: multi-class classification using rpart

You could break your 3 class problem into several (2 or 3) 2 class problems, and then use Andy's suggestion (see the CART book). There are several ways to break the problem into 2 class problems, and several ways to combine the resulting classifiers. Tom Dietterich, Jerry Friedman, Trevor Hastie and Rob Tibshirani, among others, have articles on the question, in places like Annals of

creating a list of lists

2007 Jan 07

creating a list of lists

Hello, I'm trying to create a series of randomForest objects, basically in a loop like this: forests <- list(); for (level in 1:10) { # do some other things here # create a random forest forest <- randomForest( x = x.level, y = z.level, ntree = trees ); forests <- c(forests, forest); } But instead of creating a list of 10 forests, this creates a list

cluster in R

2006 Oct 17

cluster in R

hi, is there some good summary on clustering methods in R? It seems there are many packages involving it. And I have two questions on clustering here: 1. Is there a way of evaluate the effecitives (or seperation) of clustering (rather than by visualization)? 2. Is there a search method (like genetic search) which can help find the best subset of attributes which gives best seperation? Thanks,

randomForest

2005 Jul 07

randomForest

> From: Weiwei Shi > > it works. > thanks, > > but: (just curious) > why i tried previously and i got > > > is.vector(sample.size) > [1] TRUE Because a list is also a vector: > a <- c(list(1), list(2)) > a [[1]] [1] 1 [[2]] [1] 2 > is.vector(a) [1] TRUE > is.numeric(a) [1] FALSE Actually, the way I initialize a list of known length is by

intersect more than two sets

2007 Apr 24

intersect more than two sets

Hi, I searched the archives and did not find a good solution to that. assume I have 10 sets and I want to have the common character elements of them. how could i do that? -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

randomForest and missing data

2007 Jan 04

randomForest and missing data

Does anyone know a reason why, in principle, a call to randomForest cannot accept a data frame with missing predictor values? If each individual tree is built using CART, then it seems like this should be possible. (I understand that one may impute missing values using rfImpute or some other method, but I would like to avoid doing that.) If this functionality were available, then when the trees

gbm

2005 Jan 12

gbm

Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. thanks, Ed

Re-evaluating the tree in the random forest

2005 Sep 08

Re-evaluating the tree in the random forest

Dear mailinglist members, I was wondering if there was a way to re-evaluate the instances of a tree (in the forest) again after I have manually changed a splitpoint (or split variable) of a decision node. Here's an illustration: library("randomForest") forest.rf <- randomForest(formula = Species ~ ., data = iris, do.trace = TRUE, ntree = 3, mtry = 2, norm.votes = FALSE) # I am

have to point it out again: a distribution question

2005 Apr 28

have to point it out again: a distribution question

Stock returns and other financial data have often found to be heavy-tailed. Even Cauchy distributions (without even a first absolute moment) have been entertained as models. Your qq function subtracts numbers on the scale of a normal (0,1) distribution from the input data. When the input data are scaled so that they are insignificant compared to 1, say, then you get essentially the

how to reverse a list

2007 Apr 11

how to reverse a list

Hi, there: I am wondering if there is a quick way to "reverse" a list like this: t0 <- list(a=1, b=1, c=2, d=1) reverst t0 to t1 > t1 $`1` [1] "a" "b" "d" $`2` [1] "c" thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

RandomForest question

2005 Jul 21

RandomForest question

Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although there are only 32 explanatory variables the best classification performance is reached when

tapply

2005 Jun 20

tapply

hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513 NA 5547 387000307470 NA 5548 387000307470 NA 5549 387000307470 NA 5550 387000307470 NA 5551 387000307470 NA 5552 387000307470

get NA from outlier{randomForest}

2009 Aug 05

get NA from outlier{randomForest}

Hi I have a data frame like this: V1 V2 V3 V4 Min. :0.01146 Min. :0.0006714 Min. :0.004912 Min. : 0 1st Qu.:0.03938 1st Qu.:0.0072805 1st Qu.:0.052719 1st Qu.:1150 Median :0.04224 Median :0.0077581 Median :0.056388 Median :1150 Mean :0.04010 Mean :0.0074669 Mean :0.052602 Mean :1173 3rd

problems in loading MASS

2007 Apr 12

problems in loading MASS

Hi, there: After I upgraded my R to 2.4.1, it is my first time of trying to use MASS and found the following error message: > install.packages("MASS") --- Please select a CRAN mirror for use in this session --- trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/universal/contrib/2.4/VR_7.2-33.tgz' Content type 'application/x-gzip' length 995260 bytes opened URL

similar to: how to evaluate the significance of attributes in tree growing