Sorry Jeff and David for not being clear! The total sample size should be at least 40, but the selection should be based on group ID. A different combination of Group ID could give at least 40. If I select group G1 with 25 count and G2 and with 15 counts then I can get a minimum of 40 counts. So G1 and G2 are selected. G1 25 G2 15 In another scenario, if G2, G3 and G4 are selected then the total count will be 58 which is greater than 40. So G2 , G3 and G4 could be selected. G2 15 G3 12 G4 31 So the restriction is to find group IDs that give a minim of 40. Once, I reached a minim of 40 then stop selecting group and output the data.. I am hope this helps On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> > This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works? > > On February 11, 2019 3:00:15 PM PST, Val <valkremk at gmail.com> wrote: > >Thank you David. > > > >However, this will not work for me. If the group ID selected then all > >of its observation should be included. > > > >On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <dcarlson at tamu.edu> > >wrote: > >> > >> First expand your data frame into a vector where G1 is repeated 25 > >times, G2 is repeated 15 times, etc. Then draw random samples of 40 > >from that vector: > >> > >> > grp <- rep(mydat$group, mydat$count) > >> > grp.sam <- sample(grp, 40) > >> > table(grp.sam) > >> grp.sam > >> G1 G2 G3 G4 G5 > >> 10 9 5 13 3 > >> > >> ---------------------------------------- > >> David L Carlson > >> Department of Anthropology > >> Texas A&M University > >> College Station, TX 77843-4352 > >> > >> > >> -----Original Message----- > >> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val > >> Sent: Monday, February 11, 2019 4:36 PM > >> To: r-help at R-project.org (r-help at r-project.org) > ><r-help at r-project.org> > >> Subject: [R] Select > >> > >> Hi all, > >> > >> I have a data frame with tow variables group and its size. > >> mydat<- read.table( text='group count > >> G1 25 > >> G2 15 > >> G3 12 > >> G4 31 > >> G5 10' , header = TRUE, as.is = TRUE ) > >> > >> I want to select group ID randomly (without replacement) until > >the > >> sum of count reaches 40. > >> So, in the first case, the data frame could be > >> G4 31 > >> 65 10 > >> > >> In other case, it could be > >> G5 10 > >> G2 15 > >> G3 12 > >> > >> How do I put sum of count variable is a minimum of 40 restriction? > >> > >> Than k you in advance > >> > >> > >> > >> > >> > >> > >> I want to select group ids randomly until I reach the > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity.
N <- 8 # however many times you want to do this ans <- lapply( seq.int( N ) , function( n ) { idx <- sample( nrow( mydat ) ) mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, "count" ] ) )[ 1 ] ) ], ] } ) On Mon, 11 Feb 2019, Val wrote:> Sorry Jeff and David for not being clear! > > The total sample size should be at least 40, but the selection should > be based on group ID. A different combination of Group ID could give > at least 40. > If I select group G1 with 25 count and G2 and with 15 counts > then I can get a minimum of 40 counts. So G1 and G2 are > selected. > G1 25 > G2 15 > > In another scenario, if G2, G3 and G4 are selected then the total > count will be 58 which is greater than 40. So G2 , G3 and G4 could > be selected. > G2 15 > G3 12 > G4 31 > > So the restriction is to find group IDs that give a minim of 40. > Once, I reached a minim of 40 then stop selecting group and output > the data.. > > I am hope this helps > > > > > On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote: >> >> This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works? >> >> On February 11, 2019 3:00:15 PM PST, Val <valkremk at gmail.com> wrote: >>> Thank you David. >>> >>> However, this will not work for me. If the group ID selected then all >>> of its observation should be included. >>> >>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <dcarlson at tamu.edu> >>> wrote: >>>> >>>> First expand your data frame into a vector where G1 is repeated 25 >>> times, G2 is repeated 15 times, etc. Then draw random samples of 40 >>> from that vector: >>>> >>>>> grp <- rep(mydat$group, mydat$count) >>>>> grp.sam <- sample(grp, 40) >>>>> table(grp.sam) >>>> grp.sam >>>> G1 G2 G3 G4 G5 >>>> 10 9 5 13 3 >>>> >>>> ---------------------------------------- >>>> David L Carlson >>>> Department of Anthropology >>>> Texas A&M University >>>> College Station, TX 77843-4352 >>>> >>>> >>>> -----Original Message----- >>>> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val >>>> Sent: Monday, February 11, 2019 4:36 PM >>>> To: r-help at R-project.org (r-help at r-project.org) >>> <r-help at r-project.org> >>>> Subject: [R] Select >>>> >>>> Hi all, >>>> >>>> I have a data frame with tow variables group and its size. >>>> mydat<- read.table( text='group count >>>> G1 25 >>>> G2 15 >>>> G3 12 >>>> G4 31 >>>> G5 10' , header = TRUE, as.is = TRUE ) >>>> >>>> I want to select group ID randomly (without replacement) until >>> the >>>> sum of count reaches 40. >>>> So, in the first case, the data frame could be >>>> G4 31 >>>> 65 10 >>>> >>>> In other case, it could be >>>> G5 10 >>>> G2 15 >>>> G3 12 >>>> >>>> How do I put sum of count variable is a minimum of 40 restriction? >>>> >>>> Than k you in advance >>>> >>>> >>>> >>>> >>>> >>>> >>>> I want to select group ids randomly until I reach the >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Thank you very much Jeff, Goran and David for your help. On Mon, Feb 11, 2019 at 6:22 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> > N <- 8 # however many times you want to do this > ans <- lapply( seq.int( N ) > , function( n ) { > idx <- sample( nrow( mydat ) ) > mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, "count" ] ) )[ 1 ] ) ], ] > } > ) > > > On Mon, 11 Feb 2019, Val wrote: > > > Sorry Jeff and David for not being clear! > > > > The total sample size should be at least 40, but the selection should > > be based on group ID. A different combination of Group ID could give > > at least 40. > > If I select group G1 with 25 count and G2 and with 15 counts > > then I can get a minimum of 40 counts. So G1 and G2 are > > selected. > > G1 25 > > G2 15 > > > > In another scenario, if G2, G3 and G4 are selected then the total > > count will be 58 which is greater than 40. So G2 , G3 and G4 could > > be selected. > > G2 15 > > G3 12 > > G4 31 > > > > So the restriction is to find group IDs that give a minim of 40. > > Once, I reached a minim of 40 then stop selecting group and output > > the data.. > > > > I am hope this helps > > > > > > > > > > On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote: > >> > >> This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works? > >> > >> On February 11, 2019 3:00:15 PM PST, Val <valkremk at gmail.com> wrote: > >>> Thank you David. > >>> > >>> However, this will not work for me. If the group ID selected then all > >>> of its observation should be included. > >>> > >>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <dcarlson at tamu.edu> > >>> wrote: > >>>> > >>>> First expand your data frame into a vector where G1 is repeated 25 > >>> times, G2 is repeated 15 times, etc. Then draw random samples of 40 > >>> from that vector: > >>>> > >>>>> grp <- rep(mydat$group, mydat$count) > >>>>> grp.sam <- sample(grp, 40) > >>>>> table(grp.sam) > >>>> grp.sam > >>>> G1 G2 G3 G4 G5 > >>>> 10 9 5 13 3 > >>>> > >>>> ---------------------------------------- > >>>> David L Carlson > >>>> Department of Anthropology > >>>> Texas A&M University > >>>> College Station, TX 77843-4352 > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val > >>>> Sent: Monday, February 11, 2019 4:36 PM > >>>> To: r-help at R-project.org (r-help at r-project.org) > >>> <r-help at r-project.org> > >>>> Subject: [R] Select > >>>> > >>>> Hi all, > >>>> > >>>> I have a data frame with tow variables group and its size. > >>>> mydat<- read.table( text='group count > >>>> G1 25 > >>>> G2 15 > >>>> G3 12 > >>>> G4 31 > >>>> G5 10' , header = TRUE, as.is = TRUE ) > >>>> > >>>> I want to select group ID randomly (without replacement) until > >>> the > >>>> sum of count reaches 40. > >>>> So, in the first case, the data frame could be > >>>> G4 31 > >>>> 65 10 > >>>> > >>>> In other case, it could be > >>>> G5 10 > >>>> G2 15 > >>>> G3 12 > >>>> > >>>> How do I put sum of count variable is a minimum of 40 restriction? > >>>> > >>>> Than k you in advance > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> I want to select group ids randomly until I reach the > >>>> > >>>> ______________________________________________ > >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------