Wadud, Zia
2007-Mar-02 20:11 UTC
[R] sampling random groups with all observations in the group
Hi I have a panel dataset with large number of groups and differing number of observations for each group. I want to randomly select say, 20% of the groups or 200 groups, but along with all observations from the selcted groups (with the corresponding data). I guess it is possible to generate a random sample from the groups ids and then match that with the entire dataset to have the intended dataset, but it sounds cumbersome and possibly there is an easier way to do this? checked the package 'sampling' or command 'sample', but they cant do exactly the same thing. I was wondering if someone on this list will be able to share his/her knowldege? Thanks in advance, Zia ********************************************************** Zia Wadud PhD Student Centre for Transport Studies Department of Civil and Environmental Engineering Imperial College London London SW7 2AZ Tel +44 (0) 207 594 6055 [[alternative HTML version deleted]]
Chuck Cleland
2007-Mar-02 21:26 UTC
[R] sampling random groups with all observations in the group
Wadud, Zia wrote:> Hi > I have a panel dataset with large number of groups and differing number > of observations for each group. I want to randomly select say, 20% of > the groups or 200 groups, but along with all observations from the > selcted groups (with the corresponding data). > I guess it is possible to generate a random sample from the groups ids > and then match that with the entire dataset to have the intended > dataset, but it sounds cumbersome and possibly there is an easier way to > do this? checked the package 'sampling' or command 'sample', but they > cant do exactly the same thing. > I was wondering if someone on this list will be able to share his/her > knowldege?How about something like this? df <- data.frame(GROUP = rep(1:5, c(2,3,4,2,2)), Y = runif(13)) # Sample Two of the Five Groups subset(df, GROUP %in% with(df, sample(unique(GROUP), 2)))> Thanks in advance, > Zia > ********************************************************** > Zia Wadud > PhD Student > Centre for Transport Studies > Department of Civil and Environmental Engineering > Imperial College London > London SW7 2AZ > Tel +44 (0) 207 594 6055 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Greg Snow
2007-Mar-02 21:42 UTC
[R] sampling random groups with all observations in the group
One possibility is to use split to create a list with each of your groups as an element, sample from the list, then combine back into a data frame. For example:> mydata <- data.frame(group=sample(LETTERS[1:5], 100, replace=TRUE),+ x= 1:100, y= rnorm(100) )> head(mydata)group x y 1 B 1 -1.1709539 2 A 2 0.2438249 3 C 3 -1.9079472 4 E 4 0.6155387 5 E 5 -1.0671110 6 C 6 0.8109344> mydata2 <- split(mydata, mydata$group) > mysamp <- sample(5,2) > mydata3 <- do.call('rbind',mydata2[mysamp]) > summary(mydata3)group x y A: 0 Min. : 3.00 Min. :-1.9079 B: 0 1st Qu.:18.75 1st Qu.:-0.9798 C:17 Median :46.50 Median :-0.4309 D:19 Mean :45.19 Mean :-0.2333 E: 0 3rd Qu.:68.25 3rd Qu.: 0.4351 Max. :97.00 Max. : 3.0469>Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wadud, Zia > Sent: Friday, March 02, 2007 1:12 PM > To: r-help at stat.math.ethz.ch > Subject: [R] sampling random groups with all observations in the group > > Hi > I have a panel dataset with large number of groups and > differing number of observations for each group. I want to > randomly select say, 20% of the groups or 200 groups, but > along with all observations from the selcted groups (with the > corresponding data). > I guess it is possible to generate a random sample from the > groups ids and then match that with the entire dataset to > have the intended dataset, but it sounds cumbersome and > possibly there is an easier way to do this? checked the > package 'sampling' or command 'sample', but they cant do > exactly the same thing. > I was wondering if someone on this list will be able to share > his/her knowldege? > Thanks in advance, > Zia > ********************************************************** > Zia Wadud > PhD Student > Centre for Transport Studies > Department of Civil and Environmental Engineering Imperial > College London London SW7 2AZ Tel +44 (0) 207 594 6055 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >