Displaying 20 results from an estimated 4000 matches similar to: "NA and NaN randomForest"
2010 Sep 21
5
removed data is still there!
I'm confused, hope someone can point out what is not obvious to me.
I thought I was creating a new data frame by 'deleting' rows from an
existing dataframe - I've tried 2 methods.
But this new data frame seems to remember values from its parent - even
though there are no occurences.
Where does it get the values versicolor and virginica from and give then a
count of 0?
What
2010 Jan 15
1
randomForest maxnodes
Has anyone sucessfully used the maxnodes feature in randomForest? I tried
setting it, but when it is non-NULL I always get back a forest in which all
trees have size 1. I am using a continuous response (regression). Any help
would be appreciated.
Thanks.
[[alternative HTML version deleted]]
2012 Aug 01
3
Neuralnet Error
I require some help in debugging this codeĀ
library(neuralnet)
ir<-read.table(file="iris_data.txt",header=TRUE,row.names=NULL)
ir1 <- data.frame(ir[1:100,2:6])
ir2 <- data.frame(ifelse(ir1$Species=="setosa",1,ifelse(ir1$Species=="versicolor",0,"")))
colnames(ir2)<-("Output")
ir3 <- data.frame(rbind(ir1[1:4],ir2))
2007 Jan 28
2
help with RandomForest classwt option
Hello there,
I am working on an extremely unbalanced two class classification problems. I
wanna use "classwt" with "down sampling" together. By checking the rfNews()
in R, it looks that classwt is not working yet. Then I looked at the
software from Salford. I did not find the down sampling option. I am
wondering if you have any experience to deal with this problem. Do you
2008 Mar 09
1
sampsize in Random Forests
Hi all,
I have a dataset where each point is assigned to a class A, B, C, or
D. Each point is also assigned to a study site. Each study site is
coded with a number ranging between 1-100. This information is stored
in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)
But I am not sure how to control the number of
2007 Apr 29
1
randomForest gives different results for formula call v. x, y methods. Why?
Just out of curiosity, I took the default "iris" example in the RF
helpfile...
but seeing the admonition against using the formula interface for large data
sets, I wanted to play around a bit to see how the various options affected
the output. Found something interesting I couldn't find documentation for...
Just like the example...
> set.seed(12) # to be sure I have
2005 Aug 26
2
problem with certain data sets when using randomForest
Hi,
Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of all the classes. So the
dataset with instances that either belong to class "a"
or "b" from the levels "a", "b" and
2005 Jul 21
4
RandomForest question
Hello,
I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases.
I've seen that although there are only 32 explanatory variables the best classification performance is reached when
2007 Oct 31
1
seg fault with randomForest ( ... , xtest )
Dear R-help,
what are the limits on xtest?
> NOT_A.rf <- randomForest (log10(Y[!A] ) ~ . , data = notA_desc ,
proximity=T ,xtest = A_desc)
*** caught segfault ***
address 0x9cdd000, cause 'memory not mapped'
Segmentation fault
I don't think that the matrix are large:
notA_desc is 651 obs of 27 variables
A_desc is 17 obs of 27 variables
thanks in advance,
Clayton
2008 Jul 20
1
confusion matrix in randomForest
I have a question on the output generated by randomForest in classification
mode, specifically, the confusion matrix. The confusion matrix lists the
various classes and how the forest classified each one, plus the
classification error. Are these numbers essentially averages over all the
trees in the forest? If so, is there a way I can get the standard deviation
values out of the randomForest,
2009 Feb 12
2
barplot() x axes are not updated after removal of categories from the dataframe
Hi all,
I'd be grateful for your help. I am a new user struggling with a barplot
issue.
I am plotting categories (X axis) and their mean count (Y axies) with
barplot().
The first call to barplot works fine.
I remove records from the dataframe using final=[!final$varname == "some
value",]
I echo the dataframe and the records are no longer in the dataframe.
When I call plot again
2012 Dec 10
3
splitting dataset based on variable and re-combining
I have a dataset and I wish to use two different models to predict. Both models are SVM. The reason for two different models is based
on the sex of the observation. I wish to be able to make predictions and have the results be in the same order as my original dataset. To
illustrate I will use iris:
# Take Iris and create a dataframe of just two Species, setosa and versicolor, shuffle them
2010 Sep 22
2
randomForest - partialPlot - Reg
Dear R Group
I had an observation that in some cases, when I use the randomForest model
to create partialPlot in R using the package "randomForest"
the y-axis displays values that are more than -1!
It is a classification problem that i was trying to address.
Any insights as to how the y axis can display value more than -1 for some
variables?
Am i missing something!
Thanks
Regards
2008 Apr 29
1
randomForest and ordered factors
Hello R-user!
I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
statistics beginner)
I try to find the most important variables to divide my dataset as
given in a categorical variable.
code:
Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4,
importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000,
keep.forest=FALSE)
My dataset contains also ordered
2008 Sep 02
2
cluster a distance(analogue)-object using agnes(cluster)
I try to perform a clustering using an existing dissimilarity matrix that I
calculated using distance (analogue)
I tried two different things. One of them worked and one not and I don`t
understand why.
Here the code:
not working example
library(cluster)
library(analogue)
iris2<-as.data.frame(iris)
str(iris2)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work
too well. (It's what was in version 3.x of the original Fortran code by
Breiman and Cutler, not the one in the new Fortran code.) I'd advise
against using it.
"sampsize" and "strata" can be use in conjunction. If "strata" is not
specified, the class labels will be used.
2009 Feb 26
1
Random Forest confusion matrix
Dear R users,
I have a question on the confusion matrix generated by function randomForest.
I used the entire data
set to generate the forest, for example:
> print(iris.rf)
Call:
randomForest(formula = Species ~ ., data = iris, importance = TRUE,
keep.forest = TRUE)
confusion
setosa versicolor virginica class.error
setosa 50 0 0 0.00
2005 Jun 01
3
x[x$a=="q",,drop=TRUE]
I'm trying to select a subset of a dataframe while
dropping some factors. While the dataset gets smaller
all Factor levels remain and I need to get rid of
them. Strangely enough, I am almost certain that the
same code on the same data worked OK earlier today -
and it is not the first time that I'm not able to
replicate earlier results with this command (I know, I
might just be going
2011 Sep 20
1
randomForest - NaN in %IncMSE
Hi
I am having a problem using varImpPlot in randomForest. I get the error
message "Error in plot.window(xlim = xlim, ylim = ylim, log = "") : need
finite 'xlim' values"
When print $importance, several variables have NaN under %IncMSE. There
are no NaNs in the original data. Can someone help me figure out what is
happening here?
Thanks!
[[alternative HTML
2005 Sep 08
2
Re-evaluating the tree in the random forest
Dear mailinglist members,
I was wondering if there was a way to re-evaluate the
instances of a tree (in the forest) again after I have
manually changed a splitpoint (or split variable) of a
decision node. Here's an illustration:
library("randomForest")
forest.rf <- randomForest(formula = Species ~ ., data
= iris, do.trace = TRUE, ntree = 3, mtry = 2,
norm.votes = FALSE)
# I am