Dear all, i have a dataset of about 400 records , with a variable that has two levels 40 bad and 360 good among other variables,how do i come up with10 random samples that have the composition of as the main sample but maintaining the 40 bad 360 good with replacement, i recently discovered that my random samples generated dont maintain the ratio. My code is as : mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] does not give me the ratio of 40 bad and 360 good can anyone give me some pointers please? Thanks, Taby [[alternative HTML version deleted]]
If you want perfect equality, split the data in good and bad and sample from the two samples individually. On average, however, random sampling from the entire data will reproduce the proportion of good and bad in the data. hth, Daniel -- View this message in context: http://r.789695.n4.nabble.com/random-sampling-with-levels-and-with-replacement-tp3435494p3435592.html Sent from the R help mailing list archive at Nabble.com.
Petr PIKAL
2011-Apr-08 09:11 UTC
[R] Odp: random sampling with levels and with replacement
Hi r-help-bounces at r-project.org napsal dne 08.04.2011 09:31:44:> Dear all, > i have a dataset of about 400 records , with a variable that has twolevels> 40 bad and 360 good among other variables,how do i come up with10random> samples that have the composition of as the main sample but maintainingthe> 40 bad 360 good with replacement, i recently discovered that my randomsamples> generated dont maintain the ratio. My code is as : > > mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] > > does not give me the ratio of 40 bad and 360 good can anyone give mesome> pointers please?If you sample 400 items with replacement 400 times you will only accidentally get exact proportion of good and bad. Consider that in each sample your chance to get bad one is 40/360 but it does not mean that from 400 random picks you will get exactly 40 bad items. If you just want shuffle your rows use sampling without replacement. mysample <- final[sample(1:nrow(final), 400),] In that case you get the same data but with random row order. But if you want to do sample with replacement you will get on average the proportion of good and bad items. You can check it e.g. by x<-c(rep("g", 360), rep("b",40)) res<-rep(NA, 1000) for( i in 1:1000) { y<-table(sample(x,400, replace=T)) res[i]<-y[1]/y[2] hist(res) abline(v=40/360, col=2) } Regards Petr> > > > Thanks, > Taby > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi, I am not perfectly sure what you want to do, but here is what I would do to maintain good/bad ratio in the sample (as Daniel posted, split the data and sample from the groups): df <- data.frame(V1 = 1:400, V2 = c(rep("good",360), rep("bad",40))) isGood <- which(df$V2=="good") isBad <- which(df$V2=="bad") sampleGood <- df[sample(isGood, replace=TRUE),] sampleBad <- df[sample(isBad, replace=TRUE),] summary(rbind(sampleGood, sampleBad)) Please include a more specific example with test data (for "final" in this case) next time. Best regards, Andreas taby gathoni schrieb:> Dear all, > i have a dataset of about 400 records , with a variable that has two levels 40 bad and 360 good among other variables,how do i come up with10 random samples that have the composition of as the main sample but maintaining the 40 bad 360 good with replacement, i recently discovered that my random samples generated dont maintain the ratio. My code is as : > > mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] > > does not give me the ratio of 40 bad and 360 good can anyone give me some pointers please? > > > > Thanks, > Taby > > > > > [[alternative HTML version deleted]] > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Andreas Borg Medizinische Informatik UNIVERSIT?TSMEDIZIN der Johannes Gutenberg-Universit?t Institut f?r Medizinische Biometrie, Epidemiologie und Informatik Obere Zahlbacher Stra?e 69, 55131 Mainz www.imbei.uni-mainz.de Telefon +49 (0) 6131 175062 E-Mail: borg at imbei.uni-mainz.de Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und l?schen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
Maybe Matching Threads
- My very first loop!! I failed. May I have some start-up aid?
- Sampling problems
- Which Durbin-Watson is correct? (weights involved) - using durbinWatsonTest and dwtest (packages car and lmtest)
- My very first loop!! I failed. May I have some start-up aid?
- My very first loop!! I failed. May I have some start-up aid?