thr3ads.net - search: "sampsiz"

2008 Mar 09

1

sampsize in Random Forests

...1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of samples taken from each study site. I tried to use 10 points from each study site: mySampSize = rep(10, 100) So my function call looks like: RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE, strata = factor(studySites), sampsize=mySampSize) But randomForest gives me the following error: Error in randomForest.default(m, y, ...) : sampsize can not be larger than class freq...

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

2005 Oct 27

1

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used. Take the iris data as an example: randomForest(Species ~ ., iris, sampsize=c(10, 30, 10)) says to randomly draw 10, 30 and 10 from the three species (with replacement)...

Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?

2005 Oct 27

1

Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?

...tions). Not sure how to specify these terms... from the docs, we have: classwt: Priors of the classes. Need not add up to one. Ignored for regression. So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax demonstrated. Similar for "strata" and "sampsize" though there is a default for sampsize that makes sense... not sure how you would make "a vector of the length the number of strata", however.... Pointers? -- --------------------------------------- David L. Van Brunt, Ph.D. mailto:dlvanbrunt@gmail.com -- -----------------------...

imbalanced classes

2006 Jan 25

1

imbalanced classes

...vely low, 28 in class 1 and 9 in class 2. I'd really like to use R environment to analyze this data, however I'm finding it difficult to put much trust in the results of my analysis. As you've stated, the classwt variables do not do much, and I've tried working with the cuttoff and sampsize variables as well, with limited success in balancing error rates between the two classes. It was unclear to me how to use the cuttoff parameter correctly. If you have any recommendations here, it would be appreciated. Additionally with the sampsize variable, I have tried a few values, for examp...

Examples of "classwt", "strata", and "sampsize" in randomForest?

2005 Oct 25

0

Examples of "classwt", "strata", and "sampsize" in randomForest?

...tions). Not sure how to specify these terms... from the docs, we have: classwt: Priors of the classes. Need not add up to one. Ignored for regression. So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax demonstrated. Similar for "strata" and "sampsize" though there is a default for sampsize that makes sense... not sure how you would make "a vector of the length the number of strata", however.... Pointers? -- --------------------------------------- David L. Van Brunt, Ph.D. mailto:dlvanbrunt@gmail.com [[alternative HTML versio...

random forest regression

2006 Nov 13

1

random forest regression

Dear all, I am doing a regression in ramdomForest, using the option "sampsize" reduce the number of records used to produce the randomForest object. The manual says "For classification, if sampsize is a vector of the length the number of strata, then sampling is stratified by strata, and the elements of sampsize indicate the numbers to be drawn from the strata&quo...

Sample size calculations for one sided binomial exact test

2011 Nov 01

1

Sample size calculations for one sided binomial exact test

I'm trying to compute sample size requirements for a binomial exact test. we want to show that the proportion is at least 90% assuming that it is 95%, with 80% power so any asymptotic approximations are out of the questions. I was planning on using binom.test to perform the simple test against a prespecified value, but cannot find any functions for computing sample size. do any exist?

Need help on ploting Histograms

2009 May 21

1

Need help on ploting Histograms

this is the command i made for a normal distribution, but when i try to plot the histograms, i dont know why the bars don't stick on the line... nsamples<-1000 sampsize<-15 Samples<-matrix(rnorm(nsamples*sampsize,0,1),nrow=nsamples) a<-apply(Samples,1,var) NC14<-a*14 x<-0:40 plot(x,dchisq(x,14),type='h') hist(NC14,freq=F,add=T) -- View this message in context: http://www.nabble.com/Need-help-on-ploting-Histograms-tp23652178p23652178.html...

randomForest

2009 Mar 20

2

randomForest

Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a dataset for use with different trees in random Forest. To be more clear, my data set contains 1500 rows, and I am growing 500 trees in Random Forest Is it possible to sample only 500 rows of data from the data set and use it for different trees in the forest. I mean each tree of the forest should use

help with RandomForest classwt option

2007 Jan 28

2

help with RandomForest classwt option

Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you

randomForest 4.3-0 released

2004 Jul 08

0

randomForest 4.3-0 released

...have been changed: partial.plot -> partialPlot var.imp.plot -> varImpPlot var.used -> varUsed * There is a new option `replace' in randomForest() (default to TRUE) indicating whether the sampling of cases is with or without replacement. * In randomForest(), the `sampsize' option now works for both classification and regression, and indicate the number of cases to be drawn to grow each tree. For classification, if sampsize is a vector of length the number of classes, then sampling is stratified by class. * With the formula interface for randomForest(),...

randomForest 4.3-0 released

2004 Jul 08

0

randomForest 4.3-0 released

...have been changed: partial.plot -> partialPlot var.imp.plot -> varImpPlot var.used -> varUsed * There is a new option `replace' in randomForest() (default to TRUE) indicating whether the sampling of cases is with or without replacement. * In randomForest(), the `sampsize' option now works for both classification and regression, and indicate the number of cases to be drawn to grow each tree. For classification, if sampsize is a vector of length the number of classes, then sampling is stratified by class. * With the formula interface for randomForest(),...

cor(X) with P-Value

2005 Jul 23

2

cor(X) with P-Value

Friends I am new to R (and statistics) so am struggling a bit. Briefly... I am interested in getting the P-Value from cor(X) where X is a matrix. I have found cor.test. Verbosely... I have 4 vectors and can generate the corellation matrix... > cor(cbind(X1, X2, X3, X4)) X1 X2 X3 X4 X1 1.00000000 -0.06190365 -0.156972795 0.182547517 X2

IMPORTANT!!!! PLEASE HELP ME

2012 Nov 24

6

IMPORTANT!!!! PLEASE HELP ME

Hi, I want to generate 10000 samples from normal distribution with replacement case and every sample size is 50. What should I do ? -- View this message in context: http://r.789695.n4.nabble.com/IMPORTANT-PLEASE-HELP-ME-tp4650676.html Sent from the R help mailing list archive at Nabble.com.

Problems with randomly generating samples

2009 May 13

2

Problems with randomly generating samples

Dear R users, Can anyone please tell me how to generate a large number of samples in R, given certain distribution and size. For example, if I want to generate 1000 samples of size n=100, with a N(0,1) distribution, how should I proceed? (Since I dont want to do "rnorm(100,0,1)" in R for 1000 times) Thanks for help Debbie

pipe data from plot(). was: ROCR.plot methods, cross validation averaging

2009 Sep 24

3

pipe data from plot(). was: ROCR.plot methods, cross validation averaging

...ion 3. If my cross validation data happen to have a list entry whose length = 2, ROCR errors out. Please see the second part of my example. Any suggestions? #reproducible examples exemplifying my questions ##part one## library(ROCR) data(ROCR.xval) # set up data so it looks more like my real data sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) testSet <- ROCR.xval # do the extraction for (i in 1:length(ROCR.xval[[1]])){ y <- sample(c(1:350),sampSize[i]) testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] } # now...

CARET: Any way to access other tuning parameters?

2013 Feb 13

2

CARET: Any way to access other tuning parameters?

...on for caret::train shows a list of parameters that one can tune for each method classification/regression method. For example, for the method randomForest one can tune mtry in the call to train. But the function call to train random forests in the original package has many other parameters, e.g. sampsize, maxnodes, etc. Is there **any** way to access these parameters using train in caret? (Is the function caret::createGrid limited to the list of parameters specified in the caret documentation, it's not super clear if the list of parameter is for all the caret APIs). Thanks, James, [[alter...

clara - memory limit

2005 Aug 03

3

clara - memory limit

Dear all, I'm trying to estimate clusters from a very large dataset using clara but the program stops with a memory error. The (very simple) code and the error: mydata<-read.dbf(file="fnorsel_4px.dbf") my.clara.7k<-clara(mydata,k=7) >Error: cannot allocate vector of size 465108 Kb The dataset contains >3,000,000 rows and 15 columns. I'm using a windows computer

Random Forest with highly imbalanced data

2004 May 12

1

Random Forest with highly imbalanced data

Hi group, I am trying to do a RF with approx 250,000 cases. My objective is to determine the risk factors of a person being readmitted to hospital (response=1) or else (response=0). Only 10%, or 25,000 cases were readmitted. I've heard about down-sampling and class weight approach and am wondering if R can do it. Even some reference to articles will help. >From the statistical point

non-parametric sample size calculation

2011 Nov 03

1

non-parametric sample size calculation

...ous data at hand. I find that the data collected does not follow a normal distribution, so I would like to use a non-parametric option for sample size calculation. I found the pwr package but I don't think it has this option and on the internet found that http://www.epibiostat.ucsf.edu/biostat/sampsize.html says only PASS allows non-parametric sample size calculations (although the webpage is not updated). Any help would be greatly appreciated Thanks, Dave [[alternative HTML version deleted]]

search for: sampsiz