search for: classwt

Displaying 17 results from an estimated 17 matches for "classwt".

Did you mean: class
2008 May 21
1
How to use classwt parameter option in RandomForest
...actor variables using random forests in R. The variable Y acts like an ordinal variable, but I recoded it as factor variable. I ran a simulation and got OOB estimate of error rate 60%. I validated against some external datasets and got about 59% misclassification error. I would like to tinker with classwt option in the function randomForest to see if I can get a better performance the model. My confusion arises from how to define these weights. If I say, classwt = c(3,6,9,1,2,3), how exactly the levels get weighted. If this is a 6X6 matrix, I can put a number in each cell to adjust the weights. How...
2007 Jan 28
2
help with RandomForest classwt option
Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you know any method o...
2004 Jan 20
1
random forest question
Hi, here are three results of random forest (version 4.0-1). The results seem to be more or less the same which is strange because I changed the classwt. I hoped that for example classwt=c(0.45,0.1,0.45) would result in fewer cases classified as class 2. Did I understand something wrong? Christian x1rf <- randomForest(x=as.data.frame(mfilters[cvtrain,]), y=as.factor(traingroups), xtest=as.data.frame(m...
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?
...data and was wondering if, in a "0" v "1" classification forest, some combo of these options might yield better predictions when the proportion of one class is low (less than 10% in a sample of 2,000 observations). Not sure how to specify these terms... from the docs, we have: classwt: Priors of the classes. Need not add up to one. Ignored for regression. So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax demonstrated. Similar for "strata" and "sampsize" though there is a default for sampsize that makes sense... not su...
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use...
2005 Oct 25
0
Examples of "classwt", "strata", and "sampsize" in randomForest?
...unbalance data and was wondering if, in a "0" v "1" classification forest, if these options might yield better predictions when the proportion of one class is low (less than 10% in a sample of 2,000 observations). Not sure how to specify these terms... from the docs, we have: classwt: Priors of the classes. Need not add up to one. Ignored for regression. So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax demonstrated. Similar for "strata" and "sampsize" though there is a default for sampsize that makes sense... not su...
2011 Sep 13
1
class weights with Random Forest
Hi All, I am looking for a reference that explains how the randomForest function in the randomForest package uses the classwt parameter. Here: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html Andy Liaw suggests not using classwt. And according to: http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt-option-td817149.html it has "not been implemented" as of 2007. However it improved classif...
2010 Mar 08
0
error when using svm routine: Error in if (any(co)) { : missing value where TRUE/FALSE needed
Hi, I met with this error message with the following data set. Do you know how to resolve it? Thanks. > data<-read.table("c://temp3//abc.csv", sep = ",", header=T) > classwt<-c( 0.5806452, 0.4193548) > y<-data[,1] > x<-data[,2:ncol(data)] > print(y) [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 [36] 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 > print(x) rs2289472 rs1551398 rs7927894 1 CT AA...
2006 Jan 25
1
imbalanced classes
...t package to compare two classes of data, the number of cases in each class relatively low, 28 in class 1 and 9 in class 2. I'd really like to use R environment to analyze this data, however I'm finding it difficult to put much trust in the results of my analysis. As you've stated, the classwt variables do not do much, and I've tried working with the cuttoff and sampsize variables as well, with limited success in balancing error rates between the two classes. It was unclear to me how to use the cuttoff parameter correctly. If you have any recommendations here, it would be appreciat...
2002 Sep 25
5
CART vs. Random Forest
According to Dr. Breiman, the RF should be more accurate method than a single tree. However, the performance of each method seems to depend on the proprotion of outcome variable in my case. My data set is a typical classification problem (predict bad guys). When I ran both of them with different proportion of outcome variables(there's a criterion to measure the degree of bad behavior), I
2004 May 12
1
Random Forest with highly imbalanced data
Hi group, I am trying to do a RF with approx 250,000 cases. My objective is to determine the risk factors of a person being readmitted to hospital (response=1) or else (response=0). Only 10%, or 25,000 cases were readmitted. I've heard about down-sampling and class weight approach and am wondering if R can do it. Even some reference to articles will help. >From the statistical point
2007 Feb 05
0
random forest proximities
Good Day, I'm using the randomForest package to perform a classification. If I supply weights to the optional classwt argument are proximity values computed as a weighted average? I understand that the forest will possibly change as a function of the particular weights I supply. Thanks in advance. Mike Michael Fugate Los Alamos National Laboratory Mail Stop MS-F600, Los Alamos, NM 87545 (505) 667-0398
2010 May 05
1
What is the default nPerm for regression in randomForest?
Could not find it in ?randomForest. Thank you for your help! -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
2012 Oct 17
0
How to optimize or build a better random forest?
...sibsp pclass2 pclass3 sexmale "factor" "numeric" "integer" "factor" "factor" "factor" > sapply(split(train,train$survived),function(x) dim(x)[1]) 0 1 549 342 > rf <- randomForest(train[,-1], train[,1], ntree=10000,classwt=c(549/891,342/891),importance=TRUE,do.trace=FALSE) OOB estimate of error rate: 17.73% Confusion matrix: 0 1 class.error 0 500 49 0.08925319 1 109 233 0.31871345 [[alternative HTML version deleted]]
2012 Mar 03
0
Strategies to deal with unbalanced classification data in randomForest
...ate. This approach I've mostly drawn from here: ## http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm#balance ## This might not be appropriate, however, as of September it looks like Breiman method wasn't used in R df.rf.weights<-randomForest(cls~var1+var2+var3, data=df,classwt=c(1, 600), importance=TRUE) ## Nevertheless, what I am concerned about is the effect of an unbalanced data set has on my randomForest model ## For example: par(mfrow=c(1,3)) plot(df.rf) plot(df.rf.downsamp) plot(df.rf.weights) presents three very different scenarios and I having trouble resolvin...
2003 Aug 20
2
RandomForest
Hello, When I plot or look at the error rate vector for a random forest (rf$err.rate) it looks like a descending function except for a few first points of the vector with error rates values lower(sometimes much lower) than the general level of error rates for a forest with such number of trees when the error rates stop descending. Does it mean that there is a tree(s) (that is built the first in
2004 Nov 04
4
highly biased PCA data?
Hello, supposing that I have two or three clear categories for my data, lets say pet preferece across fish, cat, dog. Lets say most people rate their preference as being mostly one of the categories. I want to do pca on the data to see three 'groups' of people, one group for fish, one for cat and one for dog. I would like to see the odd person who likes both or all three in the