thr3ads.net - R help - [R] r-data partitioning considering two variables (character and numeric) [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Ahmed Attia

2018-Aug-27 22:54 UTC

[R] r-data partitioning considering two variables (character and numeric)

I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.

Genotype    stand_ID    Inventory_date  stemC   mheight
H13             7        5/18/2006  1940.1075   11.33995
H13             7        11/1/2008  10898.9597  23.20395
H13             7        4/14/2009  12830.1284  23.77395
H13            18        11/3/2005  2726.42 13.4432
H13            18        6/30/2008  12226.1554  24.091967
H13            18        4/14/2009  14141.68    25.0922
H13            21        5/18/2006  4981.7158   15.7173
H13            21        4/14/2009  20327.0667  27.9155
H15            9         3/31/2006  3570.06 14.7898
H15            9         11/1/2008  15138.8383  26.2088
H15            9         4/14/2009  17035.4688  26.8778
H15           20         1/18/2005  3016.881    14.1886
H15           20        10/4/2006   8330.4688   20.19425
H15           20        6/30/2008   13576.5 25.4774
H15           32        2/1/2006    3426.2525   14.31815
U21           3         1/9/2006    3660.416    15.09925
U21           3         6/30/2008   13236.29    24.27634
U21           3         4/14/2009   16124.192   25.79562
U21           67        11/4/2005   2812.8425   13.60485
U21           67        4/14/2009   13468.455   24.6203

And the desired output is the following;

A-training

Genotype    stand_ID    Inventory_date  stemC   mheight
H13            7         5/18/2006  1940.1075   11.33995
H13            7         11/1/2008  10898.9597  23.20395
H13            7         4/14/2009  12830.1284  23.77395
H15            9         3/31/2006  3570.06 14.7898
H15            9         11/1/2008  15138.8383  26.2088
H15            9         4/14/2009  17035.4688  26.8778
U21            67        11/4/2005  2812.8425   13.60485
U21            67        4/14/2009  13468.455   24.6203

B-testing

Genotype    stand_ID    Inventory_date  stemC   mheight
H13             18       11/3/2005  2726.42 13.4432
H13             18       6/30/2008  12226.1554  24.091967
H13             18       4/14/2009  14141.68    25.0922
H13             21       5/18/2006  4981.7158   15.7173
H13             21       4/14/2009  20327.0667  27.9155
H15             20       1/18/2005  3016.881    14.1886
H15             20       10/4/2006  8330.4688   20.19425
H15             20       6/30/2008  13576.5 25.4774
H15             32       2/1/2006   3426.2525   14.31815
U21             3        1/9/2006   3660.416    15.09925
U21             3        6/30/2008  13236.29    24.27634
U21             3        4/14/2009  16124.192   25.79562

I tried the following code;

library(caret)
dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]

Also tried

createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)

It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.



Ahmed Attia

Bert Gunter

2018-Aug-27 23:09 UTC

head link

[R] r-data partitioning considering two variables (character and numeric)

Just partition the unique stand_ID's and select on them using %in% , say:

id <- unique(dataGenotype$stand_ID)
tst <- sample(id, floor(length(id)/2))
wh <- dataGenotype$stand_ID %in% tst ## logical vector
test<- dataGenotype[wh,]
train <- dataGenotype[!wh,]

There are a million variations on this theme I'm sure.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at gmail.com>
wrote:
> I would like to partition the following dataset (dataGenotype) based
> on two variables; Genotype and stand_ID, for example, for Genotype
> H13: stand_ID number 7 may go to training and stand_ID number 18 and
> 21 may go to testing.
>
> Genotype    stand_ID    Inventory_date  stemC   mheight
> H13             7        5/18/2006  1940.1075   11.33995
> H13             7        11/1/2008  10898.9597  23.20395
> H13             7        4/14/2009  12830.1284  23.77395
> H13            18        11/3/2005  2726.42 13.4432
> H13            18        6/30/2008  12226.1554  24.091967
> H13            18        4/14/2009  14141.68    25.0922
> H13            21        5/18/2006  4981.7158   15.7173
> H13            21        4/14/2009  20327.0667  27.9155
> H15            9         3/31/2006  3570.06 14.7898
> H15            9         11/1/2008  15138.8383  26.2088
> H15            9         4/14/2009  17035.4688  26.8778
> H15           20         1/18/2005  3016.881    14.1886
> H15           20        10/4/2006   8330.4688   20.19425
> H15           20        6/30/2008   13576.5 25.4774
> H15           32        2/1/2006    3426.2525   14.31815
> U21           3         1/9/2006    3660.416    15.09925
> U21           3         6/30/2008   13236.29    24.27634
> U21           3         4/14/2009   16124.192   25.79562
> U21           67        11/4/2005   2812.8425   13.60485
> U21           67        4/14/2009   13468.455   24.6203
>
> And the desired output is the following;
>
> A-training
>
> Genotype    stand_ID    Inventory_date  stemC   mheight
> H13            7         5/18/2006  1940.1075   11.33995
> H13            7         11/1/2008  10898.9597  23.20395
> H13            7         4/14/2009  12830.1284  23.77395
> H15            9         3/31/2006  3570.06 14.7898
> H15            9         11/1/2008  15138.8383  26.2088
> H15            9         4/14/2009  17035.4688  26.8778
> U21            67        11/4/2005  2812.8425   13.60485
> U21            67        4/14/2009  13468.455   24.6203
>
> B-testing
>
> Genotype    stand_ID    Inventory_date  stemC   mheight
> H13             18       11/3/2005  2726.42 13.4432
> H13             18       6/30/2008  12226.1554  24.091967
> H13             18       4/14/2009  14141.68    25.0922
> H13             21       5/18/2006  4981.7158   15.7173
> H13             21       4/14/2009  20327.0667  27.9155
> H15             20       1/18/2005  3016.881    14.1886
> H15             20       10/4/2006  8330.4688   20.19425
> H15             20       6/30/2008  13576.5 25.4774
> H15             32       2/1/2006   3426.2525   14.31815
> U21             3        1/9/2006   3660.416    15.09925
> U21             3        6/30/2008  13236.29    24.27634
> U21             3        4/14/2009  16124.192   25.79562
>
> I tried the following code;
>
> library(caret)
> dataPartitioning <-
> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
> train = dataGenotype[dataPartitioning,]
> test = dataGenotype[-dataPartitioning,]
>
> Also tried
>
> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>
> It did not produce the desired output, the data are partitioned within
> the stand_ID. For example, one row of stand_ID 7 goes to training and
> two rows of stand_ID 7 go to testing. How can I partition the data by
> Genotype and stand_ID together?.
>
>
>
> Ahmed Attia
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

MacQueen, Don

2018-Aug-27 23:10 UTC

head link

[R] r-data partitioning considering two variables (character and numeric)

You could start with split()

grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'

split(mydata, grp)

or perhaps

grp <- ifelse(  mydata$stand_ID %in% c(7,9,67) , 'A-training',
'B-testing' )
split(mydata, grp)

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

?On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia"
<r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com>
wrote:

    I would like to partition the following dataset (dataGenotype) based
    on two variables; Genotype and stand_ID, for example, for Genotype
    H13: stand_ID number 7 may go to training and stand_ID number 18 and
    21 may go to testing.
    
    Genotype    stand_ID    Inventory_date  stemC   mheight
    H13             7        5/18/2006  1940.1075   11.33995
    H13             7        11/1/2008  10898.9597  23.20395
    H13             7        4/14/2009  12830.1284  23.77395
    H13            18        11/3/2005  2726.42 13.4432
    H13            18        6/30/2008  12226.1554  24.091967
    H13            18        4/14/2009  14141.68    25.0922
    H13            21        5/18/2006  4981.7158   15.7173
    H13            21        4/14/2009  20327.0667  27.9155
    H15            9         3/31/2006  3570.06 14.7898
    H15            9         11/1/2008  15138.8383  26.2088
    H15            9         4/14/2009  17035.4688  26.8778
    H15           20         1/18/2005  3016.881    14.1886
    H15           20        10/4/2006   8330.4688   20.19425
    H15           20        6/30/2008   13576.5 25.4774
    H15           32        2/1/2006    3426.2525   14.31815
    U21           3         1/9/2006    3660.416    15.09925
    U21           3         6/30/2008   13236.29    24.27634
    U21           3         4/14/2009   16124.192   25.79562
    U21           67        11/4/2005   2812.8425   13.60485
    U21           67        4/14/2009   13468.455   24.6203
    
    And the desired output is the following;
    
    A-training
    
    Genotype    stand_ID    Inventory_date  stemC   mheight
    H13            7         5/18/2006  1940.1075   11.33995
    H13            7         11/1/2008  10898.9597  23.20395
    H13            7         4/14/2009  12830.1284  23.77395
    H15            9         3/31/2006  3570.06 14.7898
    H15            9         11/1/2008  15138.8383  26.2088
    H15            9         4/14/2009  17035.4688  26.8778
    U21            67        11/4/2005  2812.8425   13.60485
    U21            67        4/14/2009  13468.455   24.6203
    
    B-testing
    
    Genotype    stand_ID    Inventory_date  stemC   mheight
    H13             18       11/3/2005  2726.42 13.4432
    H13             18       6/30/2008  12226.1554  24.091967
    H13             18       4/14/2009  14141.68    25.0922
    H13             21       5/18/2006  4981.7158   15.7173
    H13             21       4/14/2009  20327.0667  27.9155
    H15             20       1/18/2005  3016.881    14.1886
    H15             20       10/4/2006  8330.4688   20.19425
    H15             20       6/30/2008  13576.5 25.4774
    H15             32       2/1/2006   3426.2525   14.31815
    U21             3        1/9/2006   3660.416    15.09925
    U21             3        6/30/2008  13236.29    24.27634
    U21             3        4/14/2009  16124.192   25.79562
    
    I tried the following code;
    
    library(caret)
    dataPartitioning <-
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
    train = dataGenotype[dataPartitioning,]
    test = dataGenotype[-dataPartitioning,]
    
    Also tried
    
    createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
    
    It did not produce the desired output, the data are partitioned within
    the stand_ID. For example, one row of stand_ID 7 goes to training and
    two rows of stand_ID 7 go to testing. How can I partition the data by
    Genotype and stand_ID together?.
    
    
    
    Ahmed Attia
    
    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

MacQueen, Don

2018-Aug-27 23:14 UTC

head link

[R] r-data partitioning considering two variables (character and numeric)

And yes, I ignored Genotype, but for the example data none of the stand_ID
values are present in more than one Genotype, so it doesn't matter. If
that's not true in general, then constructing the grp variable is a little
more complex, but the principle is the same.

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

?On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org>
wrote:

    You could start with split()
    
    grp <- rep('', nrow(mydata) )
    grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
    grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'
    
    split(mydata, grp)
    
    or perhaps
    
    grp <- ifelse(  mydata$stand_ID %in% c(7,9,67) , 'A-training',
'B-testing' )
    split(mydata, grp)
    
    -Don
    
    --
    Don MacQueen
    Lawrence Livermore National Laboratory
    7000 East Ave., L-627
    Livermore, CA 94550
    925-423-1062
    Lab cell 925-724-7509
     
     
    
    On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia"
<r-help-bounces at r-project.org on behalf of ahmedatia80 at gmail.com>
wrote:
    
        I would like to partition the following dataset (dataGenotype) based
        on two variables; Genotype and stand_ID, for example, for Genotype
        H13: stand_ID number 7 may go to training and stand_ID number 18 and
        21 may go to testing.
        
        Genotype    stand_ID    Inventory_date  stemC   mheight
        H13             7        5/18/2006  1940.1075   11.33995
        H13             7        11/1/2008  10898.9597  23.20395
        H13             7        4/14/2009  12830.1284  23.77395
        H13            18        11/3/2005  2726.42 13.4432
        H13            18        6/30/2008  12226.1554  24.091967
        H13            18        4/14/2009  14141.68    25.0922
        H13            21        5/18/2006  4981.7158   15.7173
        H13            21        4/14/2009  20327.0667  27.9155
        H15            9         3/31/2006  3570.06 14.7898
        H15            9         11/1/2008  15138.8383  26.2088
        H15            9         4/14/2009  17035.4688  26.8778
        H15           20         1/18/2005  3016.881    14.1886
        H15           20        10/4/2006   8330.4688   20.19425
        H15           20        6/30/2008   13576.5 25.4774
        H15           32        2/1/2006    3426.2525   14.31815
        U21           3         1/9/2006    3660.416    15.09925
        U21           3         6/30/2008   13236.29    24.27634
        U21           3         4/14/2009   16124.192   25.79562
        U21           67        11/4/2005   2812.8425   13.60485
        U21           67        4/14/2009   13468.455   24.6203
        
        And the desired output is the following;
        
        A-training
        
        Genotype    stand_ID    Inventory_date  stemC   mheight
        H13            7         5/18/2006  1940.1075   11.33995
        H13            7         11/1/2008  10898.9597  23.20395
        H13            7         4/14/2009  12830.1284  23.77395
        H15            9         3/31/2006  3570.06 14.7898
        H15            9         11/1/2008  15138.8383  26.2088
        H15            9         4/14/2009  17035.4688  26.8778
        U21            67        11/4/2005  2812.8425   13.60485
        U21            67        4/14/2009  13468.455   24.6203
        
        B-testing
        
        Genotype    stand_ID    Inventory_date  stemC   mheight
        H13             18       11/3/2005  2726.42 13.4432
        H13             18       6/30/2008  12226.1554  24.091967
        H13             18       4/14/2009  14141.68    25.0922
        H13             21       5/18/2006  4981.7158   15.7173
        H13             21       4/14/2009  20327.0667  27.9155
        H15             20       1/18/2005  3016.881    14.1886
        H15             20       10/4/2006  8330.4688   20.19425
        H15             20       6/30/2008  13576.5 25.4774
        H15             32       2/1/2006   3426.2525   14.31815
        U21             3        1/9/2006   3660.416    15.09925
        U21             3        6/30/2008  13236.29    24.27634
        U21             3        4/14/2009  16124.192   25.79562
        
        I tried the following code;
        
        library(caret)
        dataPartitioning <-
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
        train = dataGenotype[dataPartitioning,]
        test = dataGenotype[-dataPartitioning,]
        
        Also tried
        
        createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
        
        It did not produce the desired output, the data are partitioned within
        the stand_ID. For example, one row of stand_ID 7 goes to training and
        two rows of stand_ID 7 go to testing. How can I partition the data by
        Genotype and stand_ID together?.
        
        
        
        Ahmed Attia
        
        ______________________________________________
        R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.
        
    
    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2018-Aug-27 23:50 UTC

head link

[R] r-data partitioning considering two variables (character and numeric)

Sorry, my bad -- careless reading: you need to do the partitioning within
genotype.
Something like:

by(dataGenotype, dataGenotype$Genotype, function(x){

  u <- unique(x$standID)

   tst <- x$x2 %in% sample(u, floor(length(u)/2))

   list(test = x[tst,], train = x[!tst,]

   })


This will give a list each component of which will split the Genotype into
test and train dataframe subsets by ID. These lists of data frames can then
be recombined into a single test and train dataframe by, e.g. an
appropriate rbind() call.


HOWEVER, note that you will need to modify this function to decide what to
do if/when there is only one ID in a Genotype, as Don MacQueen already
pointed out.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Just partition the unique stand_ID's and select on them using %in% ,
say:
>
> id <- unique(dataGenotype$stand_ID)
> tst <- sample(id, floor(length(id)/2))
> wh <- dataGenotype$stand_ID %in% tst ## logical vector
> test<- dataGenotype[wh,]
> train <- dataGenotype[!wh,]
>
> There are a million variations on this theme I'm sure.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at
gmail.com> wrote:
>
>> I would like to partition the following dataset (dataGenotype) based
>> on two variables; Genotype and stand_ID, for example, for Genotype
>> H13: stand_ID number 7 may go to training and stand_ID number 18 and
>> 21 may go to testing.
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13             7        5/18/2006  1940.1075   11.33995
>> H13             7        11/1/2008  10898.9597  23.20395
>> H13             7        4/14/2009  12830.1284  23.77395
>> H13            18        11/3/2005  2726.42 13.4432
>> H13            18        6/30/2008  12226.1554  24.091967
>> H13            18        4/14/2009  14141.68    25.0922
>> H13            21        5/18/2006  4981.7158   15.7173
>> H13            21        4/14/2009  20327.0667  27.9155
>> H15            9         3/31/2006  3570.06 14.7898
>> H15            9         11/1/2008  15138.8383  26.2088
>> H15            9         4/14/2009  17035.4688  26.8778
>> H15           20         1/18/2005  3016.881    14.1886
>> H15           20        10/4/2006   8330.4688   20.19425
>> H15           20        6/30/2008   13576.5 25.4774
>> H15           32        2/1/2006    3426.2525   14.31815
>> U21           3         1/9/2006    3660.416    15.09925
>> U21           3         6/30/2008   13236.29    24.27634
>> U21           3         4/14/2009   16124.192   25.79562
>> U21           67        11/4/2005   2812.8425   13.60485
>> U21           67        4/14/2009   13468.455   24.6203
>>
>> And the desired output is the following;
>>
>> A-training
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13            7         5/18/2006  1940.1075   11.33995
>> H13            7         11/1/2008  10898.9597  23.20395
>> H13            7         4/14/2009  12830.1284  23.77395
>> H15            9         3/31/2006  3570.06 14.7898
>> H15            9         11/1/2008  15138.8383  26.2088
>> H15            9         4/14/2009  17035.4688  26.8778
>> U21            67        11/4/2005  2812.8425   13.60485
>> U21            67        4/14/2009  13468.455   24.6203
>>
>> B-testing
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13             18       11/3/2005  2726.42 13.4432
>> H13             18       6/30/2008  12226.1554  24.091967
>> H13             18       4/14/2009  14141.68    25.0922
>> H13             21       5/18/2006  4981.7158   15.7173
>> H13             21       4/14/2009  20327.0667  27.9155
>> H15             20       1/18/2005  3016.881    14.1886
>> H15             20       10/4/2006  8330.4688   20.19425
>> H15             20       6/30/2008  13576.5 25.4774
>> H15             32       2/1/2006   3426.2525   14.31815
>> U21             3        1/9/2006   3660.416    15.09925
>> U21             3        6/30/2008  13236.29    24.27634
>> U21             3        4/14/2009  16124.192   25.79562
>>
>> I tried the following code;
>>
>> library(caret)
>> dataPartitioning <-
>> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
>> train = dataGenotype[dataPartitioning,]
>> test = dataGenotype[-dataPartitioning,]
>>
>> Also tried
>>
>> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>>
>> It did not produce the desired output, the data are partitioned within
>> the stand_ID. For example, one row of stand_ID 7 goes to training and
>> two rows of stand_ID 7 go to testing. How can I partition the data by
>> Genotype and stand_ID together?.
>>
>>
>>
>> Ahmed Attia
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Ahmed Attia

2018-Aug-28 00:46 UTC

head link

[R] r-data partitioning considering two variables (character and numeric)

Thanks Bert, worked nicely. Yes, genotypes with only one ID will be
eliminated before partitioning the data.


Best regards

Ahmed Attia






On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:> Just partition the unique stand_ID's and select on them using %in% ,
say:
>
> id <- unique(dataGenotype$stand_ID)
> tst <- sample(id, floor(length(id)/2))
> wh <- dataGenotype$stand_ID %in% tst ## logical vector
> test<- dataGenotype[wh,]
> train <- dataGenotype[!wh,]
>
> There are a million variations on this theme I'm sure.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedatia80 at
gmail.com> wrote:
>>
>> I would like to partition the following dataset (dataGenotype) based
>> on two variables; Genotype and stand_ID, for example, for Genotype
>> H13: stand_ID number 7 may go to training and stand_ID number 18 and
>> 21 may go to testing.
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13             7        5/18/2006  1940.1075   11.33995
>> H13             7        11/1/2008  10898.9597  23.20395
>> H13             7        4/14/2009  12830.1284  23.77395
>> H13            18        11/3/2005  2726.42 13.4432
>> H13            18        6/30/2008  12226.1554  24.091967
>> H13            18        4/14/2009  14141.68    25.0922
>> H13            21        5/18/2006  4981.7158   15.7173
>> H13            21        4/14/2009  20327.0667  27.9155
>> H15            9         3/31/2006  3570.06 14.7898
>> H15            9         11/1/2008  15138.8383  26.2088
>> H15            9         4/14/2009  17035.4688  26.8778
>> H15           20         1/18/2005  3016.881    14.1886
>> H15           20        10/4/2006   8330.4688   20.19425
>> H15           20        6/30/2008   13576.5 25.4774
>> H15           32        2/1/2006    3426.2525   14.31815
>> U21           3         1/9/2006    3660.416    15.09925
>> U21           3         6/30/2008   13236.29    24.27634
>> U21           3         4/14/2009   16124.192   25.79562
>> U21           67        11/4/2005   2812.8425   13.60485
>> U21           67        4/14/2009   13468.455   24.6203
>>
>> And the desired output is the following;
>>
>> A-training
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13            7         5/18/2006  1940.1075   11.33995
>> H13            7         11/1/2008  10898.9597  23.20395
>> H13            7         4/14/2009  12830.1284  23.77395
>> H15            9         3/31/2006  3570.06 14.7898
>> H15            9         11/1/2008  15138.8383  26.2088
>> H15            9         4/14/2009  17035.4688  26.8778
>> U21            67        11/4/2005  2812.8425   13.60485
>> U21            67        4/14/2009  13468.455   24.6203
>>
>> B-testing
>>
>> Genotype    stand_ID    Inventory_date  stemC   mheight
>> H13             18       11/3/2005  2726.42 13.4432
>> H13             18       6/30/2008  12226.1554  24.091967
>> H13             18       4/14/2009  14141.68    25.0922
>> H13             21       5/18/2006  4981.7158   15.7173
>> H13             21       4/14/2009  20327.0667  27.9155
>> H15             20       1/18/2005  3016.881    14.1886
>> H15             20       10/4/2006  8330.4688   20.19425
>> H15             20       6/30/2008  13576.5 25.4774
>> H15             32       2/1/2006   3426.2525   14.31815
>> U21             3        1/9/2006   3660.416    15.09925
>> U21             3        6/30/2008  13236.29    24.27634
>> U21             3        4/14/2009  16124.192   25.79562
>>
>> I tried the following code;
>>
>> library(caret)
>> dataPartitioning <-
>> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
>> train = dataGenotype[dataPartitioning,]
>> test = dataGenotype[-dataPartitioning,]
>>
>> Also tried
>>
>> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>>
>> It did not produce the desired output, the data are partitioned within
>> the stand_ID. For example, one row of stand_ID 7 goes to training and
>> two rows of stand_ID 7 go to testing. How can I partition the data by
>> Genotype and stand_ID together?.
>>
>>
>>
>> Ahmed Attia
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

R help - Aug 2018 - r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)

[R] r-data partitioning considering two variables (character and numeric)