thr3ads.net - similar to: "NA and NaN randomForest"

Displaying 20 results from an estimated 4000 matches similar to: "NA and NaN randomForest"

2010 Sep 21

removed data is still there!

I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What

randomForest maxnodes

2010 Jan 15

randomForest maxnodes

Has anyone sucessfully used the maxnodes feature in randomForest? I tried setting it, but when it is non-NULL I always get back a forest in which all trees have size 1. I am using a continuous response (regression). Any help would be appreciated. Thanks. [[alternative HTML version deleted]]

Neuralnet Error

2012 Aug 01

Neuralnet Error

I require some help in debugging this code library(neuralnet) ir<-read.table(file="iris_data.txt",header=TRUE,row.names=NULL) ir1 <- data.frame(ir[1:100,2:6]) ir2 <- data.frame(ifelse(ir1$Species=="setosa",1,ifelse(ir1$Species=="versicolor",0,""))) colnames(ir2)<-("Output") ir3 <- data.frame(rbind(ir1[1:4],ir2))

help with RandomForest classwt option

2007 Jan 28

help with RandomForest classwt option

Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you

sampsize in Random Forests

2008 Mar 09

sampsize in Random Forests

Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of

randomForest gives different results for formula call v. x, y methods. Why?

2007 Apr 29

randomForest gives different results for formula call v. x, y methods. Why?

Just out of curiosity, I took the default "iris" example in the RF helpfile... but seeing the admonition against using the formula interface for large data sets, I wanted to play around a bit to see how the various options affected the output. Found something interesting I couldn't find documentation for... Just like the example... > set.seed(12) # to be sure I have

problem with certain data sets when using randomForest

2005 Aug 26

problem with certain data sets when using randomForest

Hi, Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class "a" or "b" from the levels "a", "b" and

RandomForest question

2005 Jul 21

RandomForest question

Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although there are only 32 explanatory variables the best classification performance is reached when

seg fault with randomForest ( ... , xtest )

2007 Oct 31

seg fault with randomForest ( ... , xtest )

Dear R-help, what are the limits on xtest? > NOT_A.rf <- randomForest (log10(Y[!A] ) ~ . , data = notA_desc , proximity=T ,xtest = A_desc) *** caught segfault *** address 0x9cdd000, cause 'memory not mapped' Segmentation fault I don't think that the matrix are large: notA_desc is 651 obs of 27 variables A_desc is 17 obs of 27 variables thanks in advance, Clayton

confusion matrix in randomForest

2008 Jul 20

confusion matrix in randomForest

I have a question on the output generated by randomForest in classification mode, specifically, the confusion matrix. The confusion matrix lists the various classes and how the forest classified each one, plus the classification error. Are these numbers essentially averages over all the trees in the forest? If so, is there a way I can get the standard deviation values out of the randomForest,

barplot() x axes are not updated after removal of categories from the dataframe

2009 Feb 12

barplot() x axes are not updated after removal of categories from the dataframe

Hi all, I'd be grateful for your help. I am a new user struggling with a barplot issue. I am plotting categories (X axis) and their mean count (Y axies) with barplot(). The first call to barplot works fine. I remove records from the dataframe using final=[!final$varname == "some value",] I echo the dataframe and the records are no longer in the dataframe. When I call plot again

splitting dataset based on variable and re-combining

2012 Dec 10

splitting dataset based on variable and re-combining

I have a dataset and I wish to use two different models to predict. Both models are SVM. The reason for two different models is based on the sex of the observation. I wish to be able to make predictions and have the results be in the same order as my original dataset. To illustrate I will use iris: # Take Iris and create a dataframe of just two Species, setosa and versicolor, shuffle them

randomForest - partialPlot - Reg

2010 Sep 22

randomForest - partialPlot - Reg

Dear R Group I had an observation that in some cases, when I use the randomForest model to create partialPlot in R using the package "randomForest" the y-axis displays values that are more than -1! It is a classification problem that i was trying to address. Any insights as to how the y axis can display value more than -1 for some variables? Am i missing something! Thanks Regards

randomForest and ordered factors

2008 Apr 29

randomForest and ordered factors

Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable. code: Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4, importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000, keep.forest=FALSE) My dataset contains also ordered

cluster a distance(analogue)-object using agnes(cluster)

2008 Sep 02

cluster a distance(analogue)-object using agnes(cluster)

I try to perform a clustering using an existing dissimilarity matrix that I calculated using distance (analogue) I tried two different things. One of them worked and one not and I don`t understand why. Here the code: not working example library(cluster) library(analogue) iris2<-as.data.frame(iris) str(iris2) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

2005 Oct 27

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used.

Random Forest confusion matrix

2009 Feb 26

Random Forest confusion matrix

Dear R users, I have a question on the confusion matrix generated by function randomForest. I used the entire data set to generate the forest, for example: > print(iris.rf) Call: randomForest(formula = Species ~ ., data = iris, importance = TRUE, keep.forest = TRUE) confusion setosa versicolor virginica class.error setosa 50 0 0 0.00

x[x$a=="q",,drop=TRUE]

2005 Jun 01

x[x$a=="q",,drop=TRUE]

I'm trying to select a subset of a dataframe while dropping some factors. While the dataset gets smaller all Factor levels remain and I need to get rid of them. Strangely enough, I am almost certain that the same code on the same data worked OK earlier today - and it is not the first time that I'm not able to replicate earlier results with this command (I know, I might just be going

randomForest - NaN in %IncMSE

2011 Sep 20

randomForest - NaN in %IncMSE

Hi I am having a problem using varImpPlot in randomForest. I get the error message "Error in plot.window(xlim = xlim, ylim = ylim, log = "") : need finite 'xlim' values" When print $importance, several variables have NaN under %IncMSE. There are no NaNs in the original data. Can someone help me figure out what is happening here? Thanks! [[alternative HTML

Re-evaluating the tree in the random forest

2005 Sep 08

Re-evaluating the tree in the random forest

Dear mailinglist members, I was wondering if there was a way to re-evaluate the instances of a tree (in the forest) again after I have manually changed a splitpoint (or split variable) of a decision node. Here's an illustration: library("randomForest") forest.rf <- randomForest(formula = Species ~ ., data = iris, do.trace = TRUE, ntree = 3, mtry = 2, norm.votes = FALSE) # I am

similar to: NA and NaN randomForest