thr3ads.net - R help - [R] random sampling with levels and with replacement [Apr 2011]

If this information is useful, please help other people find it:
Share via:

taby gathoni

2011-Apr-08 07:31 UTC

[R] random sampling with levels and with replacement

Dear all,
i have a dataset of about 400 records , with a variable that has  two levels 40
bad and 360 good among other variables,how do i come up  with10 random samples
that have the composition of as the main sample  but maintaining the 40 bad 360
good with replacement, i recently discovered that my random samples generated
dont maintain the ratio. My code is as  :

mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] 

does not give me the ratio of 40 bad and 360 good can anyone give me some
pointers please?



Thanks,
Taby




	[[alternative HTML version deleted]]

Daniel Malter

2011-Apr-08 08:08 UTC

head link

[R] random sampling with levels and with replacement

If you want perfect equality, split the data in good and bad and sample from
the two samples individually.

On average, however, random sampling from the entire data will reproduce the
proportion of good and bad in the data.

hth,
Daniel



--
View this message in context:
http://r.789695.n4.nabble.com/random-sampling-with-levels-and-with-replacement-tp3435494p3435592.html
Sent from the R help mailing list archive at Nabble.com.

Petr PIKAL

2011-Apr-08 09:11 UTC

head link

[R] Odp: random sampling with levels and with replacement

Hi

r-help-bounces at r-project.org napsal dne 08.04.2011 09:31:44:
> Dear all,
> i have a dataset of about 400 records , with a variable that has  two 
levels > 40 bad and 360 good among other variables,how do i come up  with10 
random > samples that have the composition of as the main sample  but maintaining 
the > 40 bad 360 good with replacement, i recently discovered that my random 
samples> generated dont maintain the ratio. My code is as  :
> 
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] 
> 
> does not give me the ratio of 40 bad and 360 good can anyone give me 
some > pointers please?
If you sample 400 items with replacement 400 times you will only 
accidentally get exact proportion of good and bad. Consider that in each 
sample your chance to get bad one is 40/360 but it does not mean that from 
400 random picks you will get exactly 40 bad items.

If you just want shuffle your rows use sampling without replacement.

mysample <- final[sample(1:nrow(final), 400),] 

In that case you get the same data but with random row order.

But if you want to do sample with replacement you will get on average the 
proportion of good and bad items. You can check it e.g. by

x<-c(rep("g", 360), rep("b",40))
res<-rep(NA, 1000)
for( i in 1:1000) {

y<-table(sample(x,400, replace=T))
res[i]<-y[1]/y[2]
hist(res)
abline(v=40/360, col=2)
}

Regards
Petr


> 
> 
> 
> Thanks,
> Taby
> 
> 
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Andreas Borg

2011-Apr-08 09:13 UTC

head link

[R] random sampling with levels and with replacement

Hi,

I am not perfectly sure what you want to do, but here is what I would do 
to maintain good/bad ratio in the sample (as Daniel posted, split the 
data and sample from the groups):

df <- data.frame(V1 = 1:400, V2 = c(rep("good",360),
rep("bad",40)))
isGood <- which(df$V2=="good")
isBad <- which(df$V2=="bad")
sampleGood <- df[sample(isGood, replace=TRUE),]
sampleBad <- df[sample(isBad, replace=TRUE),]
summary(rbind(sampleGood, sampleBad))

Please include a more specific example with test data (for "final" in 
this case) next time.

Best regards,

Andreas


taby gathoni schrieb:> Dear all,
> i have a dataset of about 400 records , with a variable that has  two
levels 40 bad and 360 good among other variables,how do i come up  with10 random
samples that have the composition of as the main sample  but maintaining the 40
bad 360 good with replacement, i recently discovered that my random samples
generated dont maintain the ratio. My code is as  :
>
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),] 
>
> does not give me the ratio of 40 bad and 360 good can anyone give me some
pointers please?
>
>
>
> Thanks,
> Taby
>
>
>
>
> 	[[alternative HTML version deleted]]
>
>   
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

-- 
Andreas Borg
Medizinische Informatik

UNIVERSIT?TSMEDIZIN
der Johannes Gutenberg-Universit?t
Institut f?r Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Stra?e 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: borg at imbei.uni-mainz.de

Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren
Sie bitte sofort den
Absender und l?schen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Apr 2011 - random sampling with levels and with replacement

[R] random sampling with levels and with replacement

[R] random sampling with levels and with replacement

[R] Odp: random sampling with levels and with replacement

[R] random sampling with levels and with replacement

Apparently Analagous Threads