Displaying 20 results from an estimated 50 matches for "sampsiz".
Did you mean:
sampsize
2008 Mar 09
1
sampsize in Random Forests
...1-100. This information is stored
in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)
But I am not sure how to control the number of samples taken from each
study site. I tried to use 10 points from each study site:
mySampSize = rep(10, 100)
So my function call looks like:
RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE,
strata = factor(studySites), sampsize=mySampSize)
But randomForest gives me the following error:
Error in randomForest.default(m, y, ...) :
sampsize can not be larger than class freq...
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work
too well. (It's what was in version 3.x of the original Fortran code by
Breiman and Cutler, not the one in the new Fortran code.) I'd advise
against using it.
"sampsize" and "strata" can be use in conjunction. If "strata" is not
specified, the class labels will be used. Take the iris data as an example:
randomForest(Species ~ ., iris, sampsize=c(10, 30, 10))
says to randomly draw 10, 30 and 10 from the three species (with
replacement)...
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?
...tions).
Not sure how to specify these terms... from the docs, we have:
classwt: Priors of the classes. Need not add up to one. Ignored for
regression.
So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax
demonstrated. Similar for "strata" and "sampsize" though there is a default
for sampsize that makes sense... not sure how you would make "a vector of
the length the number of strata", however....
Pointers?
--
---------------------------------------
David L. Van Brunt, Ph.D.
mailto:dlvanbrunt@gmail.com
--
-----------------------...
2006 Jan 25
1
imbalanced classes
...vely low, 28 in class 1 and 9
in class 2. I'd really like to use R environment to analyze this data,
however I'm finding it difficult to put much trust in the results of
my analysis. As you've stated, the classwt variables do not do much,
and I've tried working with the cuttoff and sampsize variables as
well, with limited success in balancing error rates between the two
classes.
It was unclear to me how to use the cuttoff parameter correctly. If
you have any recommendations here, it would be appreciated.
Additionally with the sampsize variable, I have tried a few values,
for examp...
2005 Oct 25
0
Examples of "classwt", "strata", and "sampsize" in randomForest?
...tions).
Not sure how to specify these terms... from the docs, we have:
classwt: Priors of the classes. Need not add up to one. Ignored for
regression.
So is this something like "... classwt=c(.90,.10)" ? I didn't see the syntax
demonstrated. Similar for "strata" and "sampsize" though there is a default
for sampsize that makes sense... not sure how you would make "a vector of
the length the number of strata", however....
Pointers?
--
---------------------------------------
David L. Van Brunt, Ph.D.
mailto:dlvanbrunt@gmail.com
[[alternative HTML versio...
2006 Nov 13
1
random forest regression
Dear all,
I am doing a regression in ramdomForest, using the option "sampsize" reduce
the number of records used to produce the randomForest object.
The manual says "For classification, if sampsize is a vector of the length
the number of strata, then sampling is stratified by strata, and the
elements of sampsize indicate the numbers to be drawn from the strata&quo...
2011 Nov 01
1
Sample size calculations for one sided binomial exact test
I'm trying to compute sample size requirements for a binomial exact test.
we want to show that the proportion is at least 90% assuming that it is
95%, with 80% power so any asymptotic approximations are out of the
questions. I was planning on using binom.test to perform the simple test
against a prespecified value, but cannot find any functions for computing
sample size. do any exist?
2009 May 21
1
Need help on ploting Histograms
this is the command i made for a normal distribution, but when i try to plot
the histograms, i dont know why the bars don't stick on the line...
nsamples<-1000
sampsize<-15
Samples<-matrix(rnorm(nsamples*sampsize,0,1),nrow=nsamples)
a<-apply(Samples,1,var)
NC14<-a*14
x<-0:40
plot(x,dchisq(x,14),type='h')
hist(NC14,freq=F,add=T)
--
View this message in context: http://www.nabble.com/Need-help-on-ploting-Histograms-tp23652178p23652178.html...
2009 Mar 20
2
randomForest
Hi!
I am dealing with random forest using R.
Is there a way to sample a fixed no.of rows from a dataset for use with
different trees in random Forest.
To be more clear, my data set contains 1500 rows, and I am growing 500 trees
in Random Forest
Is it possible to sample only 500 rows of data from the data set and use it
for different trees in the forest. I mean each tree of the forest should use
2007 Jan 28
2
help with RandomForest classwt option
Hello there,
I am working on an extremely unbalanced two class classification problems. I
wanna use "classwt" with "down sampling" together. By checking the rfNews()
in R, it looks that classwt is not working yet. Then I looked at the
software from Salford. I did not find the down sampling option. I am
wondering if you have any experience to deal with this problem. Do you
2004 Jul 08
0
randomForest 4.3-0 released
...have been changed:
partial.plot -> partialPlot
var.imp.plot -> varImpPlot
var.used -> varUsed
* There is a new option `replace' in randomForest() (default to TRUE)
indicating whether the sampling of cases is with or without
replacement.
* In randomForest(), the `sampsize' option now works for both
classification and regression, and indicate the number of cases to be
drawn to grow each tree. For classification, if sampsize is a vector of
length the number of classes, then sampling is stratified by class.
* With the formula interface for randomForest(),...
2004 Jul 08
0
randomForest 4.3-0 released
...have been changed:
partial.plot -> partialPlot
var.imp.plot -> varImpPlot
var.used -> varUsed
* There is a new option `replace' in randomForest() (default to TRUE)
indicating whether the sampling of cases is with or without
replacement.
* In randomForest(), the `sampsize' option now works for both
classification and regression, and indicate the number of cases to be
drawn to grow each tree. For classification, if sampsize is a vector of
length the number of classes, then sampling is stratified by class.
* With the formula interface for randomForest(),...
2005 Jul 23
2
cor(X) with P-Value
Friends
I am new to R (and statistics) so am struggling a bit.
Briefly...
I am interested in getting the P-Value from cor(X) where X is a matrix.
I have found cor.test.
Verbosely...
I have 4 vectors and can generate the corellation matrix...
> cor(cbind(X1, X2, X3, X4))
X1 X2 X3 X4
X1 1.00000000 -0.06190365 -0.156972795 0.182547517
X2
2012 Nov 24
6
IMPORTANT!!!! PLEASE HELP ME
Hi,
I want to generate 10000 samples from normal distribution with replacement
case and every sample size is 50. What should I do ?
--
View this message in context: http://r.789695.n4.nabble.com/IMPORTANT-PLEASE-HELP-ME-tp4650676.html
Sent from the R help mailing list archive at Nabble.com.
2009 May 13
2
Problems with randomly generating samples
Dear R users,
Can anyone please tell me how to generate a large number of samples in R, given certain distribution and size.
For example, if I want to generate 1000 samples of size n=100, with a N(0,1) distribution, how should I proceed?
(Since I dont want to do "rnorm(100,0,1)" in R for 1000 times)
Thanks for help
Debbie
2009 Sep 24
3
pipe data from plot(). was: ROCR.plot methods, cross validation averaging
...ion 3. If my cross validation data happen to have a list entry whose
length = 2, ROCR errors out. Please see the second part of my example.
Any suggestions?
#reproducible examples exemplifying my questions
##part one##
library(ROCR)
data(ROCR.xval)
# set up data so it looks more like my real data
sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25)
testSet <- ROCR.xval
# do the extraction
for (i in 1:length(ROCR.xval[[1]])){
y <- sample(c(1:350),sampSize[i])
testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y]
testSet$labels[[i]] <- ROCR.xval$labels[[i]][y]
}
# now...
2013 Feb 13
2
CARET: Any way to access other tuning parameters?
...on for caret::train shows a list of parameters that one can
tune for each method classification/regression method. For example, for
the method randomForest one can tune mtry in the call to train. But the
function call to train random forests in the original package has many
other parameters, e.g. sampsize, maxnodes, etc.
Is there **any** way to access these parameters using train in caret? (Is
the function caret::createGrid limited to the list of parameters specified
in the caret documentation, it's not super clear if the list of parameter
is for all the caret APIs).
Thanks,
James,
[[alter...
2005 Aug 03
3
clara - memory limit
Dear all,
I'm trying to estimate clusters from a very large dataset using clara but the
program stops with a memory error. The (very simple) code and the error:
mydata<-read.dbf(file="fnorsel_4px.dbf")
my.clara.7k<-clara(mydata,k=7)
>Error: cannot allocate vector of size 465108 Kb
The dataset contains >3,000,000 rows and 15 columns. I'm using a windows
computer
2004 May 12
1
Random Forest with highly imbalanced data
Hi group,
I am trying to do a RF with approx 250,000
cases. My objective is to determine the risk factors
of a person being readmitted to hospital (response=1)
or else (response=0). Only 10%, or 25,000 cases were
readmitted. I've heard about down-sampling and class
weight approach and am wondering if R can do it. Even
some reference to articles will help.
>From the statistical point
2011 Nov 03
1
non-parametric sample size calculation
...ous data at hand. I find that the data collected does not follow a normal distribution, so I would like to use a non-parametric option for sample size calculation.
I found the pwr package but I don't think it has this option and on the internet found that http://www.epibiostat.ucsf.edu/biostat/sampsize.html says only PASS allows non-parametric sample size calculations (although the webpage is not updated).
Any help would be greatly appreciated
Thanks,
Dave
[[alternative HTML version deleted]]