Ahmed Attia
2018-Aug-27 22:54 UTC
[R] r-data partitioning considering two variables (character and numeric)
I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 B-testing Genotype stand_ID Inventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia
Bert Gunter
2018-Aug-27 23:09 UTC
[R] r-data partitioning considering two variables (character and numeric)
Just partition the unique stand_ID's and select on them using %in% , say: id <- unique(dataGenotype$stand_ID) tst <- sample(id, floor(length(id)/2)) wh <- dataGenotype$stand_ID %in% tst ## logical vector test<- dataGenotype[wh,] train <- dataGenotype[!wh,] There are a million variations on this theme I'm sure. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com> wrote:> I would like to partition the following dataset (dataGenotype) based > on two variables; Genotype and stand_ID, for example, for Genotype > H13: stand_ID number 7 may go to training and stand_ID number 18 and > 21 may go to testing. > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > And the desired output is the following; > > A-training > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > B-testing > > Genotype stand_ID Inventory_date stemC mheight > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > > I tried the following code; > > library(caret) > dataPartitioning <- > createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) > train = dataGenotype[dataPartitioning,] > test = dataGenotype[-dataPartitioning,] > > Also tried > > createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) > > It did not produce the desired output, the data are partitioned within > the stand_ID. For example, one row of stand_ID 7 goes to training and > two rows of stand_ID 7 go to testing. How can I partition the data by > Genotype and stand_ID together?. > > > > Ahmed Attia > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
MacQueen, Don
2018-Aug-27 23:10 UTC
[R] r-data partitioning considering two variables (character and numeric)
You could start with split()
grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'
split(mydata, grp)
or perhaps
grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training',
'B-testing' )
split(mydata, grp)
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
?On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia"
<r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com>
wrote:
I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.
Genotype stand_ID Inventory_date stemC mheight
H13 7 5/18/2006 1940.1075 11.33995
H13 7 11/1/2008 10898.9597 23.20395
H13 7 4/14/2009 12830.1284 23.77395
H13 18 11/3/2005 2726.42 13.4432
H13 18 6/30/2008 12226.1554 24.091967
H13 18 4/14/2009 14141.68 25.0922
H13 21 5/18/2006 4981.7158 15.7173
H13 21 4/14/2009 20327.0667 27.9155
H15 9 3/31/2006 3570.06 14.7898
H15 9 11/1/2008 15138.8383 26.2088
H15 9 4/14/2009 17035.4688 26.8778
H15 20 1/18/2005 3016.881 14.1886
H15 20 10/4/2006 8330.4688 20.19425
H15 20 6/30/2008 13576.5 25.4774
H15 32 2/1/2006 3426.2525 14.31815
U21 3 1/9/2006 3660.416 15.09925
U21 3 6/30/2008 13236.29 24.27634
U21 3 4/14/2009 16124.192 25.79562
U21 67 11/4/2005 2812.8425 13.60485
U21 67 4/14/2009 13468.455 24.6203
And the desired output is the following;
A-training
Genotype stand_ID Inventory_date stemC mheight
H13 7 5/18/2006 1940.1075 11.33995
H13 7 11/1/2008 10898.9597 23.20395
H13 7 4/14/2009 12830.1284 23.77395
H15 9 3/31/2006 3570.06 14.7898
H15 9 11/1/2008 15138.8383 26.2088
H15 9 4/14/2009 17035.4688 26.8778
U21 67 11/4/2005 2812.8425 13.60485
U21 67 4/14/2009 13468.455 24.6203
B-testing
Genotype stand_ID Inventory_date stemC mheight
H13 18 11/3/2005 2726.42 13.4432
H13 18 6/30/2008 12226.1554 24.091967
H13 18 4/14/2009 14141.68 25.0922
H13 21 5/18/2006 4981.7158 15.7173
H13 21 4/14/2009 20327.0667 27.9155
H15 20 1/18/2005 3016.881 14.1886
H15 20 10/4/2006 8330.4688 20.19425
H15 20 6/30/2008 13576.5 25.4774
H15 32 2/1/2006 3426.2525 14.31815
U21 3 1/9/2006 3660.416 15.09925
U21 3 6/30/2008 13236.29 24.27634
U21 3 4/14/2009 16124.192 25.79562
I tried the following code;
library(caret)
dataPartitioning <-
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]
Also tried
createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.
Ahmed Attia
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
MacQueen, Don
2018-Aug-27 23:14 UTC
[R] r-data partitioning considering two variables (character and numeric)
And yes, I ignored Genotype, but for the example data none of the stand_ID
values are present in more than one Genotype, so it doesn't matter. If
that's not true in general, then constructing the grp variable is a little
more complex, but the principle is the same.
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
?On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org>
wrote:
You could start with split()
grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'
split(mydata, grp)
or perhaps
grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training',
'B-testing' )
split(mydata, grp)
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia"
<r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com>
wrote:
I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.
Genotype stand_ID Inventory_date stemC mheight
H13 7 5/18/2006 1940.1075 11.33995
H13 7 11/1/2008 10898.9597 23.20395
H13 7 4/14/2009 12830.1284 23.77395
H13 18 11/3/2005 2726.42 13.4432
H13 18 6/30/2008 12226.1554 24.091967
H13 18 4/14/2009 14141.68 25.0922
H13 21 5/18/2006 4981.7158 15.7173
H13 21 4/14/2009 20327.0667 27.9155
H15 9 3/31/2006 3570.06 14.7898
H15 9 11/1/2008 15138.8383 26.2088
H15 9 4/14/2009 17035.4688 26.8778
H15 20 1/18/2005 3016.881 14.1886
H15 20 10/4/2006 8330.4688 20.19425
H15 20 6/30/2008 13576.5 25.4774
H15 32 2/1/2006 3426.2525 14.31815
U21 3 1/9/2006 3660.416 15.09925
U21 3 6/30/2008 13236.29 24.27634
U21 3 4/14/2009 16124.192 25.79562
U21 67 11/4/2005 2812.8425 13.60485
U21 67 4/14/2009 13468.455 24.6203
And the desired output is the following;
A-training
Genotype stand_ID Inventory_date stemC mheight
H13 7 5/18/2006 1940.1075 11.33995
H13 7 11/1/2008 10898.9597 23.20395
H13 7 4/14/2009 12830.1284 23.77395
H15 9 3/31/2006 3570.06 14.7898
H15 9 11/1/2008 15138.8383 26.2088
H15 9 4/14/2009 17035.4688 26.8778
U21 67 11/4/2005 2812.8425 13.60485
U21 67 4/14/2009 13468.455 24.6203
B-testing
Genotype stand_ID Inventory_date stemC mheight
H13 18 11/3/2005 2726.42 13.4432
H13 18 6/30/2008 12226.1554 24.091967
H13 18 4/14/2009 14141.68 25.0922
H13 21 5/18/2006 4981.7158 15.7173
H13 21 4/14/2009 20327.0667 27.9155
H15 20 1/18/2005 3016.881 14.1886
H15 20 10/4/2006 8330.4688 20.19425
H15 20 6/30/2008 13576.5 25.4774
H15 32 2/1/2006 3426.2525 14.31815
U21 3 1/9/2006 3660.416 15.09925
U21 3 6/30/2008 13236.29 24.27634
U21 3 4/14/2009 16124.192 25.79562
I tried the following code;
library(caret)
dataPartitioning <-
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]
Also tried
createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.
Ahmed Attia
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2018-Aug-27 23:50 UTC
[R] r-data partitioning considering two variables (character and numeric)
Sorry, my bad -- careless reading: you need to do the partitioning within
genotype.
Something like:
by(dataGenotype, dataGenotype$Genotype, function(x){
u <- unique(x$standID)
tst <- x$x2 %in% sample(u, floor(length(u)/2))
list(test = x[tst,], train = x[!tst,]
})
This will give a list each component of which will split the Genotype into
test and train dataframe subsets by ID. These lists of data frames can then
be recombined into a single test and train dataframe by, e.g. an
appropriate rbind() call.
HOWEVER, note that you will need to modify this function to decide what to
do if/when there is only one ID in a Genotype, as Don MacQueen already
pointed out.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Just partition the unique stand_ID's and select on them using %in% ,
say:
>
> id <- unique(dataGenotype$stand_ID)
> tst <- sample(id, floor(length(id)/2))
> wh <- dataGenotype$stand_ID %in% tst ## logical vector
> test<- dataGenotype[wh,]
> train <- dataGenotype[!wh,]
>
> There are a million variations on this theme I'm sure.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at
gmail.com> wrote:
>
>> I would like to partition the following dataset (dataGenotype) based
>> on two variables; Genotype and stand_ID, for example, for Genotype
>> H13: stand_ID number 7 may go to training and stand_ID number 18 and
>> 21 may go to testing.
>>
>> Genotype stand_ID Inventory_date stemC mheight
>> H13 7 5/18/2006 1940.1075 11.33995
>> H13 7 11/1/2008 10898.9597 23.20395
>> H13 7 4/14/2009 12830.1284 23.77395
>> H13 18 11/3/2005 2726.42 13.4432
>> H13 18 6/30/2008 12226.1554 24.091967
>> H13 18 4/14/2009 14141.68 25.0922
>> H13 21 5/18/2006 4981.7158 15.7173
>> H13 21 4/14/2009 20327.0667 27.9155
>> H15 9 3/31/2006 3570.06 14.7898
>> H15 9 11/1/2008 15138.8383 26.2088
>> H15 9 4/14/2009 17035.4688 26.8778
>> H15 20 1/18/2005 3016.881 14.1886
>> H15 20 10/4/2006 8330.4688 20.19425
>> H15 20 6/30/2008 13576.5 25.4774
>> H15 32 2/1/2006 3426.2525 14.31815
>> U21 3 1/9/2006 3660.416 15.09925
>> U21 3 6/30/2008 13236.29 24.27634
>> U21 3 4/14/2009 16124.192 25.79562
>> U21 67 11/4/2005 2812.8425 13.60485
>> U21 67 4/14/2009 13468.455 24.6203
>>
>> And the desired output is the following;
>>
>> A-training
>>
>> Genotype stand_ID Inventory_date stemC mheight
>> H13 7 5/18/2006 1940.1075 11.33995
>> H13 7 11/1/2008 10898.9597 23.20395
>> H13 7 4/14/2009 12830.1284 23.77395
>> H15 9 3/31/2006 3570.06 14.7898
>> H15 9 11/1/2008 15138.8383 26.2088
>> H15 9 4/14/2009 17035.4688 26.8778
>> U21 67 11/4/2005 2812.8425 13.60485
>> U21 67 4/14/2009 13468.455 24.6203
>>
>> B-testing
>>
>> Genotype stand_ID Inventory_date stemC mheight
>> H13 18 11/3/2005 2726.42 13.4432
>> H13 18 6/30/2008 12226.1554 24.091967
>> H13 18 4/14/2009 14141.68 25.0922
>> H13 21 5/18/2006 4981.7158 15.7173
>> H13 21 4/14/2009 20327.0667 27.9155
>> H15 20 1/18/2005 3016.881 14.1886
>> H15 20 10/4/2006 8330.4688 20.19425
>> H15 20 6/30/2008 13576.5 25.4774
>> H15 32 2/1/2006 3426.2525 14.31815
>> U21 3 1/9/2006 3660.416 15.09925
>> U21 3 6/30/2008 13236.29 24.27634
>> U21 3 4/14/2009 16124.192 25.79562
>>
>> I tried the following code;
>>
>> library(caret)
>> dataPartitioning <-
>> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
>> train = dataGenotype[dataPartitioning,]
>> test = dataGenotype[-dataPartitioning,]
>>
>> Also tried
>>
>> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>>
>> It did not produce the desired output, the data are partitioned within
>> the stand_ID. For example, one row of stand_ID 7 goes to training and
>> two rows of stand_ID 7 go to testing. How can I partition the data by
>> Genotype and stand_ID together?.
>>
>>
>>
>> Ahmed Attia
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
[[alternative HTML version deleted]]
Ahmed Attia
2018-Aug-28 00:46 UTC
[R] r-data partitioning considering two variables (character and numeric)
Thanks Bert, worked nicely. Yes, genotypes with only one ID will be eliminated before partitioning the data. Best regards Ahmed Attia On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com> wrote: >> >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.