Displaying 20 results from an estimated 10000 matches similar to: "how to use the randomForest and rpart function?"
2005 Sep 08
2
Re-evaluating the tree in the random forest
Dear mailinglist members,
I was wondering if there was a way to re-evaluate the
instances of a tree (in the forest) again after I have
manually changed a splitpoint (or split variable) of a
decision node. Here's an illustration:
library("randomForest")
forest.rf <- randomForest(formula = Species ~ ., data
= iris, do.trace = TRUE, ntree = 3, mtry = 2,
norm.votes = FALSE)
# I am
2006 Feb 28
3
does svm have a CV to obtain the best "cost" parameter?
Hi all,
I am using the "svm" command in the e1071 package.
Does it have an automatic way of setting the "cost" parameter?
I changed a few values for the "cost" parameter but I hope there is a
systematic way of obtaining the best "cost" value.
I noticed that there is a "cross" (Cross validation) parameter in the "svm"
function.
But I
2003 Apr 12
5
rpart vs. randomForest
Greetings. I'm trying to determine whether to use rpart or randomForest
for a classification tree. Has anybody tested efficacy formally? I've
run both and the confusion matrix for rf beats rpart. I've looking at
the rf help page and am unable to figure out how to extract the tree.
But more than that I'm looking for a more comprehensive user's guide
for randomForest including
2002 Jun 12
3
help debugging segfaults
(Sorry for the cross-post--- I wasn't sure which list is more
appropriate...)
Hi everyone,
I've run into segfaults when using my randomForest package on large dataset
(e.g., 100 x 15200) and large number of trees (e.g., ntree=7000 and
mtry=3000). I'm wondering if anyone can give me some hints on where to look
for the problem.
The randomForest package mainly consists of two things:
2002 Jun 12
3
help debugging segfaults
(Sorry for the cross-post--- I wasn't sure which list is more
appropriate...)
Hi everyone,
I've run into segfaults when using my randomForest package on large dataset
(e.g., 100 x 15200) and large number of trees (e.g., ntree=7000 and
mtry=3000). I'm wondering if anyone can give me some hints on where to look
for the problem.
The randomForest package mainly consists of two things:
2004 Apr 15
7
all(logical(0)) and any(logical(0))
Dear R-help,
I was bitten by the behavior of all() when given logical(0): It is TRUE!
(And any(logical(0)) is FALSE.) Wouldn't it be better to return logical(0)
in both cases?
The problem surfaced because some un-named individual called randomForest(x,
y, xtest, ytest,...), and gave y as a two-level factor, but ytest as just
numeric vector. I thought I check for that in my code by testing
2008 Feb 25
1
Running randomForests on large datasets
Hi,
I am trying to run randomForests on a datasets of size 500000X650 and
R pops up memory allocation error. Are there any better ways to deal
with large datasets in R, for example, Splus had something like
bigData library.
Thank you,
Nagu
2004 Apr 05
3
Can't seem to finish a randomForest.... Just goes and goe s!
When you have fairly large data, _do not use the formula interface_, as a
couple of copies of the data would be made. Try simply:
Myforest.rf <- randomForest(Mydata[, -46], Mydata[,46],
ntrees=100, mtry=7)
[Note that you don't need to set proximity (not proximities) or importance
to FALSE, as that's the default already.]
You might also want to use
2009 Apr 08
2
help with random forest package
Hello,
I am a phd student in Bioinformatics and I am using the Random Forest
package in order to classify my data, but I have some questions.
Is there a function in order to visualize the trees, so as to get the rules?
Also, could you please provide me with the code of "randomForest" function,
as I would like to see how it works. I was wondering if I can get the
classification having
2005 Jul 21
4
RandomForest question
Hello,
I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases.
I've seen that although there are only 32 explanatory variables the best classification performance is reached when
2005 Jul 07
2
randomForest
> From: Weiwei Shi
>
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
> > is.vector(sample.size)
> [1] TRUE
Because a list is also a vector:
> a <- c(list(1), list(2))
> a
[[1]]
[1] 1
[[2]]
[1] 2
> is.vector(a)
[1] TRUE
> is.numeric(a)
[1] FALSE
Actually, the way I initialize a list of known length is by
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi
I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2008 Jun 18
2
randomForest outlier
I try to use ?randomForest to find variables that are the most important to
divide my dataset (continuous, categorical variables) in two given groups.
But when I plot the outliers:
plot(outlier(FemMalSex_NAavoid88.rf33, cls=FemMalSex_NAavoid88$Sex),
type="h",col=c("red","green")[as.numeric(FemMalSex_NAavoid88$Sex)])
it seems to me that all my values appear as
2005 Jan 25
3
multi-class classification using rpart
Hi,
I am trying to make a multi-class classification tree by using rpart.
I used MASS package'd data: fgl to test and it works well.
However, when I used my small-sampled data as below, the program seems
to take forever. I am not sure if it is due to slowness or there is
something wrong with my codes or data manipulation.
Please be advised !
The data is described as the output from str()
2004 Jul 16
3
rpart and TREE, can be the same?
Hi, all,
I am wondering if it is possible to set parameters of 'rpart' and 'tree'
such that they will produce the exact same tree? Thanks.
Auston Wei
Statistical Analyst
Department of Biostatistics and Applied Mathematics
The University of Texas MD Anderson Cancer Center
Tel: 713-563-4281
Email: wwei@mdanderson.org
[[alternative HTML version deleted]]
2002 Apr 02
2
random forests for R
Hi all,
There is now a package available on CRAN that provides an R interface to Leo
Breiman's random forest classifier.
Basically, random forest does the following:
1. Select ntree, the number of trees to grow, and mtry, a number no larger
than number of variables.
2. For i = 1 to ntree:
3. Draw a bootstrap sample from the data. Call those not in the bootstrap
sample the
2002 Apr 02
2
random forests for R
Hi all,
There is now a package available on CRAN that provides an R interface to Leo
Breiman's random forest classifier.
Basically, random forest does the following:
1. Select ntree, the number of trees to grow, and mtry, a number no larger
than number of variables.
2. For i = 1 to ntree:
3. Draw a bootstrap sample from the data. Call those not in the bootstrap
sample the
2004 Jul 06
3
Code density functions
Hello
I would like to see the algorithm that R uses to generate density functions
for several distributions (i.e. Normal,Weibull, etc). I tried:
>dnorm
function (x, mean = 0, sd = 1, log = FALSE)
.Internal(dnorm(x, mean, sd, log))
<environment: namespace:stats>
How can I see the code used for densities?
Thanks!
2004 Jul 26
5
installing problems repeated.tgz linux
Hi,
i try several possibilities adn looking in the archive,
but didn't getting success to install j.lindsey's usefuel "library
repeated" on my linux (suse9.0 with kernel 2.6.7,R.1.9.1)
P.S. Windows, works fine
Many thanks for help
Christian
chris at linux:/space/downs> R CMD INSTALL - l /usr/lib/R/library repeated
WARNING: invalid package '-'
WARNING:
2004 Jan 09
2
debugging strange segfault
Dear R-devel,
Can anyone give me some hints on how to go about debugging a strange
segfault in my randomForest package? Here's the scoop:
A user reported segfault when running predict() in the randomForest package.
I asked for the data and code. The combination runs fine under WinXPPro,
but does give segfault on one of our Linux boxes running R (1.7.0 through
R-devel_2004-01-08) on