Hi I have an umbalanced data set where I want to predict a binary variable Y. I want to do an under sampling by keeping all the 1 and taking just some of the 0 such as I'll have 90% of 0 and 10% of 1. Can u help me do that Thank u [[alternative HTML version deleted]]
For a data set dat with variable 'case', it follows sam.rate=0.9 n.ctrl<-nrow(dat[dat$case==0,]) sam.ctrl<-dat[sample(row.names(dat[dat$case==0],n.ctrl*sam.rate,replace=F),] rbind(dat[dat$case==1,],sam.ctrl) Weidong Gu On Mon, Oct 31, 2011 at 1:54 PM, loubna ibn majdoub hassani <loubn181 at gmail.com> wrote:> Hi > I have an umbalanced data set where I want to predict a binary variable Y. > I want to do an under sampling by keeping all the 1 and taking just some of > the 0 such as I'll have 90% of 0 and 10% of 1. > Can u help me do that > Thank u > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Oct 31, 2011, at 1:54 PM, loubna ibn majdoub hassani <loubn181 at gmail.com> wrote:> Hir > I have an umbalanced data set where I want to predict a binary variable Y. > I want to do an under sampling by keeping all the 1 and taking just some of > the 0 such as I'll have 90% of 0 and 10% of 1.ou haven' t given much detail , buteo thing like this will take all of the 1's and 10% of the 0's dfrm[c(rownames(dfrm[dorm$Y==1,]), sample(rownames(dfrm[dfrm$Y==0]), 0.10)) , ]> Can u help me do that > Thank u > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.