Ahmed Attia
2018-Aug-27 22:54 UTC
[R] r-data partitioning considering two variables (character and numeric)
I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 B-testing Genotype stand_ID Inventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia
Bert Gunter
2018-Aug-27 23:09 UTC
[R] r-data partitioning considering two variables (character and numeric)
Just partition the unique stand_ID's and select on them using %in% , say: id <- unique(dataGenotype$stand_ID) tst <- sample(id, floor(length(id)/2)) wh <- dataGenotype$stand_ID %in% tst ## logical vector test<- dataGenotype[wh,] train <- dataGenotype[!wh,] There are a million variations on this theme I'm sure. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com> wrote:> I would like to partition the following dataset (dataGenotype) based > on two variables; Genotype and stand_ID, for example, for Genotype > H13: stand_ID number 7 may go to training and stand_ID number 18 and > 21 may go to testing. > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > And the desired output is the following; > > A-training > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > B-testing > > Genotype stand_ID Inventory_date stemC mheight > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > > I tried the following code; > > library(caret) > dataPartitioning <- > createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) > train = dataGenotype[dataPartitioning,] > test = dataGenotype[-dataPartitioning,] > > Also tried > > createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) > > It did not produce the desired output, the data are partitioned within > the stand_ID. For example, one row of stand_ID 7 goes to training and > two rows of stand_ID 7 go to testing. How can I partition the data by > Genotype and stand_ID together?. > > > > Ahmed Attia > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
MacQueen, Don
2018-Aug-27 23:10 UTC
[R] r-data partitioning considering two variables (character and numeric)
You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 ?On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" <r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com> wrote: I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 B-testing Genotype stand_ID Inventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
MacQueen, Don
2018-Aug-27 23:14 UTC
[R] r-data partitioning considering two variables (character and numeric)
And yes, I ignored Genotype, but for the example data none of the stand_ID values are present in more than one Genotype, so it doesn't matter. If that's not true in general, then constructing the grp variable is a little more complex, but the principle is the same. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 ?On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help" <r-help-bounces at r-project.org on behalf of r-help at r-project.org> wrote: You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" <r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com> wrote: I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 B-testing Genotype stand_ID Inventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2018-Aug-27 23:50 UTC
[R] r-data partitioning considering two variables (character and numeric)
Sorry, my bad -- careless reading: you need to do the partitioning within genotype. Something like: by(dataGenotype, dataGenotype$Genotype, function(x){ u <- unique(x$standID) tst <- x$x2 %in% sample(u, floor(length(u)/2)) list(test = x[tst,], train = x[!tst,] }) This will give a list each component of which will split the Genotype into test and train dataframe subsets by ID. These lists of data frames can then be recombined into a single test and train dataframe by, e.g. an appropriate rbind() call. HOWEVER, note that you will need to modify this function to decide what to do if/when there is only one ID in a Genotype, as Don MacQueen already pointed out. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com> wrote: > >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Ahmed Attia
2018-Aug-28 00:46 UTC
[R] r-data partitioning considering two variables (character and numeric)
Thanks Bert, worked nicely. Yes, genotypes with only one ID will be eliminated before partitioning the data. Best regards Ahmed Attia On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com> wrote: >> >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.