Displaying 20 results from an estimated 20000 matches similar to: "how to evaluate the significance of attributes in tree growing"
2005 Jan 27
0
how to evaluate the significance of attributes in tree gr owing
FWIW, I wrote a little function to extract variable importance as defined in
the CART book a while ago. It's rather limited: Only works for regression
problem, and you need to set maxsurrogate=0 and maxcompete=0. It may (or
may not) help you:
varimp.rpart <- function(x) {
dev <- x$frame[, c("var", "dev")]
dev <- dev[dev$var != "<leaf>",
2005 Oct 11
1
a problem in random forest
Hi, there:
I spent some time on this but I think I really cannot figure it out, maybe I
missed something here:
my data looks like this:
> dim(trn3)
[1] 7361 209
> dim(val3)
[1] 7427 209
> mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[,
1:208], ytest=val3[,209], importance=T)
my test data has 7427 observations but after prediction,
> dim(mg.rf2$votes)
2007 Apr 23
6
Random Forest
Hi,
I am trying to print out my confusion matrix after having created my random
forest.
I have put in this command:
fit<-randomForest(MMS_ENABLED_HANDSET~.,data=dat,ntree=500,mtry=14,
na.action=na.omit,confusion=TRUE)
but I can't get it to give me the confusion matrix, anyone know how this
works?
Thansk!
Ruben
[[alternative HTML version deleted]]
2006 Jul 27
2
memory problems when combining randomForests [Broadcast]
You need to give us more details, like how you call randomForest, versions
of the package and R itself, etc. Also, see if this helps you:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html
Andy
From: Eleni Rapsomaniki
>
> Dear all,
>
> I am trying to train a randomForest using all my control data
> (12,000 cases, ~ 20 explanatory variables, 2 classes).
> Because
2005 Oct 27
3
memory problem in handling large dataset
Dear Listers:
I have a question on handling large dataset. I searched R-Search and I
hope I can get more information as to my specific case.
First, my dataset has 1.7 billion observations and 350 variables,
among which, 300 are float and 50 are integers.
My system has 8 G memory, 64bit CPU, linux box. (currently, we don't
plan to buy more memory).
> R.version
_
platform
2009 Jul 22
1
margins defined in randomForest and supclust
Hi there,
How to solve the conflicts as to the same object between two packages, for
example, like margins in both randomForest and supclust?
When both libraries are installed, supclust will complain "margins" defined
in randomForest.
I can only solve it by re-starting R, which is very inconvenient, any clever
way?
Thanks,
Weiwei
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
2005 Jan 25
0
Collapsing solution to the question discussed above: Re: multi-class classification using rpart
You could break your 3 class problem into several (2 or 3) 2 class problems,
and then use Andy's suggestion (see the CART book). There are several ways
to break the problem into 2 class problems, and several ways to combine the
resulting classifiers. Tom Dietterich, Jerry Friedman, Trevor Hastie and Rob
Tibshirani, among others, have articles on the question, in places like
Annals of
2007 Jan 07
2
creating a list of lists
Hello,
I'm trying to create a series of randomForest objects, basically in a
loop like this:
forests <- list();
for (level in 1:10) {
# do some other things here
# create a random forest
forest <- randomForest(
x = x.level,
y = z.level,
ntree = trees
);
forests <- c(forests, forest);
}
But instead of creating a list of 10 forests, this creates a list
2006 Oct 17
4
cluster in R
hi,
is there some good summary on clustering methods in R? It seems there
are many packages involving it.
And I have two questions on clustering here:
1. Is there a way of evaluate the effecitives (or seperation) of
clustering (rather than by visualization)?
2. Is there a search method (like genetic search) which can help find
the best subset of attributes which gives best seperation?
Thanks,
2005 Jul 07
2
randomForest
> From: Weiwei Shi
>
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
> > is.vector(sample.size)
> [1] TRUE
Because a list is also a vector:
> a <- c(list(1), list(2))
> a
[[1]]
[1] 1
[[2]]
[1] 2
> is.vector(a)
[1] TRUE
> is.numeric(a)
[1] FALSE
Actually, the way I initialize a list of known length is by
2007 Apr 24
5
intersect more than two sets
Hi,
I searched the archives and did not find a good solution to that.
assume I have 10 sets and I want to have the common character elements of them.
how could i do that?
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
2007 Jan 04
3
randomForest and missing data
Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame with missing predictor values? If each
individual tree is built using CART, then it seems like this
should be possible. (I understand that one may impute missing values
using rfImpute or some other method, but I would like to avoid doing
that.)
If this functionality were available, then when the trees
2005 Jan 12
4
gbm
Hi, there:
I am wondering if I can find some detailed explanation
on gbm or explanation on examples of gbm.
thanks,
Ed
2005 Sep 08
2
Re-evaluating the tree in the random forest
Dear mailinglist members,
I was wondering if there was a way to re-evaluate the
instances of a tree (in the forest) again after I have
manually changed a splitpoint (or split variable) of a
decision node. Here's an illustration:
library("randomForest")
forest.rf <- randomForest(formula = Species ~ ., data
= iris, do.trace = TRUE, ntree = 3, mtry = 2,
norm.votes = FALSE)
# I am
2005 Apr 28
3
have to point it out again: a distribution question
Stock returns and other financial data have often found to be heavy-tailed.
Even Cauchy distributions (without even a first absolute moment) have been
entertained as models.
Your qq function subtracts numbers on the scale of a normal (0,1)
distribution from the input data. When the input data are scaled so that
they are insignificant compared to 1, say, then you get essentially the
2007 Apr 11
5
how to reverse a list
Hi, there:
I am wondering if there is a quick way to "reverse" a list like this:
t0 <- list(a=1, b=1, c=2, d=1)
reverst t0 to t1
> t1
$`1`
[1] "a" "b" "d"
$`2`
[1] "c"
thanks.
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
2005 Jul 21
4
RandomForest question
Hello,
I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases.
I've seen that although there are only 32 explanatory variables the best classification performance is reached when
2005 Jun 20
6
tapply
hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391 2600
5541 389100307391 2600
5542 389100307391 2600
5543 389100307391 2600
5544 389100307391 2600
5546 381300302513 NA
5547 387000307470 NA
5548 387000307470 NA
5549 387000307470 NA
5550 387000307470 NA
5551 387000307470 NA
5552 387000307470
2009 Aug 05
0
get NA from outlier{randomForest}
Hi
I have a data frame like this:
V1 V2 V3 V4
Min. :0.01146 Min. :0.0006714 Min. :0.004912 Min. : 0
1st Qu.:0.03938 1st Qu.:0.0072805 1st Qu.:0.052719 1st Qu.:1150
Median :0.04224 Median :0.0077581 Median :0.056388 Median :1150
Mean :0.04010 Mean :0.0074669 Mean :0.052602 Mean :1173
3rd
2007 Apr 12
2
problems in loading MASS
Hi, there:
After I upgraded my R to 2.4.1, it is my first time of trying to use
MASS and found the following error message:
> install.packages("MASS")
--- Please select a CRAN mirror for use in this session ---
trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/universal/contrib/2.4/VR_7.2-33.tgz'
Content type 'application/x-gzip' length 995260 bytes
opened URL