similar to: imbalanced classes

Displaying 20 results from an estimated 300 matches similar to: "imbalanced classes"

2004 May 12
1
Random Forest with highly imbalanced data
Hi group, I am trying to do a RF with approx 250,000 cases. My objective is to determine the risk factors of a person being readmitted to hospital (response=1) or else (response=0). Only 10%, or 25,000 cases were readmitted. I've heard about down-sampling and class weight approach and am wondering if R can do it. Even some reference to articles will help. >From the statistical point
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used.
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?
Sorry for the repost, but I've really been looking, and can't find any syntax direction on this issue... Just browsing the documentation, and searching the list came up short... I have some unbalanced data and was wondering if, in a "0" v "1" classification forest, some combo of these options might yield better predictions when the proportion of one class is low (less
2011 Sep 13
1
class weights with Random Forest
Hi All, I am looking for a reference that explains how the randomForest function in the randomForest package uses the classwt parameter. Here: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html Andy Liaw suggests not using classwt. And according to: http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt-option-td817149.html it has "not been implemented" as of 2007.
2007 Jan 28
2
help with RandomForest classwt option
Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you
2005 Oct 25
0
Examples of "classwt", "strata", and "sampsize" in randomForest?
Just browsing the documentation, and searching the list came up short... I have some unbalance data and was wondering if, in a "0" v "1" classification forest, if these options might yield better predictions when the proportion of one class is low (less than 10% in a sample of 2,000 observations). Not sure how to specify these terms... from the docs, we have: classwt: Priors
2008 Mar 09
1
sampsize in Random Forests
Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of
2006 Nov 13
1
random forest regression
Dear all, I am doing a regression in ramdomForest, using the option "sampsize" reduce the number of records used to produce the randomForest object. The manual says "For classification, if sampsize is a vector of the length the number of strata, then sampling is stratified by strata, and the elements of sampsize indicate the numbers to be drawn from the strata". I need my
2009 May 21
1
Need help on ploting Histograms
this is the command i made for a normal distribution, but when i try to plot the histograms, i dont know why the bars don't stick on the line... nsamples<-1000 sampsize<-15 Samples<-matrix(rnorm(nsamples*sampsize,0,1),nrow=nsamples) a<-apply(Samples,1,var) NC14<-a*14 x<-0:40 plot(x,dchisq(x,14),type='h') hist(NC14,freq=F,add=T) -- View this message in context:
2012 Mar 03
0
Strategies to deal with unbalanced classification data in randomForest
Hello all, I have become somewhat confused with options available for dealing with a highly unbalanced data set (10000 in one class, 50 in the other). As a summary I am unsure: a) if I am perform the two class weighting methods properly, b) if the data are too unbalanced and that this type of analysis is appropriate and c) if there is any interaction between the weighting for class imbalances
2012 Jul 21
1
GOstats: get genes for corresponding enriched GO term
Hi, I used GOstats to perform enrichment test on a set of genes (20). There are 7 GO terms with pvalue less than cuttoff and therefore shown in the result table. How can I get the information that which gene in the input gene set belong to which GO term of these enriched GO terms? Thanks for any comments. best, Tim [[alternative HTML version deleted]]
2003 Nov 21
3
speeding up a pairwise correlation calculation
Hi, I have a data.frame with 294 columns and 211 rows. I am calculating correlations between all pairs of columns (excluding column 1) and based on these correlation values I delete one column from any pair that shows a R^2 greater than a cuttoff value. (Rather than directly delete the column all I do is store the column number, and do the deletion later) The code I am using is: ndesc
2004 Jan 20
1
random forest question
Hi, here are three results of random forest (version 4.0-1). The results seem to be more or less the same which is strange because I changed the classwt. I hoped that for example classwt=c(0.45,0.1,0.45) would result in fewer cases classified as class 2. Did I understand something wrong? Christian x1rf <- randomForest(x=as.data.frame(mfilters[cvtrain,]),
2013 Feb 13
2
CARET: Any way to access other tuning parameters?
The documentation for caret::train shows a list of parameters that one can tune for each method classification/regression method. For example, for the method randomForest one can tune mtry in the call to train. But the function call to train random forests in the original package has many other parameters, e.g. sampsize, maxnodes, etc. Is there **any** way to access these parameters using train
2006 Feb 06
1
Classification of Imbalanced Data
Hi, I'm looking to perform a classification analysis on an imbalanced data set using random Forest and I'd like to reproduce the weighted random forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random Forest to Learn Imbalanced Data"; can I use the R package randomForest to perform such analysis? What is the easiest way to accomplish this task? Thanks,
2009 Sep 24
3
pipe data from plot(). was: ROCR.plot methods, cross validation averaging
All, I'm trying again with a slightly more generic version of my first question. I can extract the plotted values from hist(), boxplot(), and even plot.randomForest(). Observe: # get some data dat <- rnorm(100) # grab histogram data hdat <- hist(dat) hdat #provides details of the hist output #grab boxplot data bdat <- boxplot(dat) bdat #provides details of the boxplot
2013 May 14
0
need help for Imbalanced classification problems!!!
Hi all, I am facing the imbalanced classification problems. That means I have a dataset, in which the ratio of majority data to minority data is 100:1 (or more). In addition, the independent variables are many and this is a binary classification questions. The model I built give poor predictive power for minor data, but for the majority data the predictivity seems to overfitting. Could you
2011 Nov 03
1
non-parametric sample size calculation
Hi, I am trying to estimate the sample size needed for the comparison of two groups on a certain measurement, given some previous data at hand. I find that the data collected does not follow a normal distribution, so I would like to use a non-parametric option for sample size calculation. I found the pwr package but I don't think it has this option and on the internet found that
2010 Jul 20
1
Random Forest - Strata
Hi all, Had struggled in getting "Strata" in randomForest to work on this. Can I get randomForest for each of its TREE, to get ALL sample from some strata to build tree, while leaving some strata TOTALLY untouched as oob? e.g. in below, how I can tell RF to, - for tree 1 in the forest, to use only Site A and B to build the tree, while using the WHOLE Site C data for the oob error
2002 Apr 16
0
lowpass recommendations?
A while ago someone asked about a low-pass filter for oggenc and was told to get AFsp and filter outside of Oggenc. Well, I got it, and am totally lost (It's way more complicated than SOX) so now can anyone briefly describe what type of filter I should set up (FIR, IIR, all-pole), why one is better than the other, and if you have filter coefficient files lying around (lowpass, 19 or 20 kHz