thr3ads.net - R help - [R] Data-frame selection [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Cacique Samurai

2015-Oct-10 15:38 UTC

[R] Data-frame selection

Hello R-Helpers!

I have a data-frame as below (dput in the end of mail) and need to
select just the first sequence of occurrence of each "Group" in each
"ID".

For example, for ID "1" I have two sequential occurrences of T2 and
two sequential occurrences of T3:
> test [test$ID == 1, ]   ID Group  Var
3   1    T2 2.94
4   1    T2 3.23
5   1    T2 1.40
6   1    T2 1.62
7   1    T2 2.43
8   1    T2 2.53
9   1    T2 2.25
10  1    T3 1.66
11  1    T3 2.86
12  1    T3 0.53
13  1    T3 1.66
14  1    T3 3.24
15  1    T3 1.34
16  1    T2 1.86
17  1    T2 3.03
18  1    T3 3.63
19  1    T3 2.78
20  1    T3 1.49

As output, I need just the first group of T2 and T3 for this ID, like:

 ID Group  Var
3   1    T2 2.94
4   1    T2 3.23
5   1    T2 1.40
6   1    T2 1.62
7   1    T2 2.43
8   1    T2 2.53
9   1    T2 2.25
10  1    T3 1.66
11  1    T3 2.86
12  1    T3 0.53
13  1    T3 1.66
14  1    T3 3.24
15  1    T3 1.34

For others ID I have just one occurrence or sequence of occurrence of
each Group.

I tried to use a labeling variable, but cannot figure out do this
without many many loops..

Thanks in advanced,

Raoni

 dput (teste)
structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2",
"3", "4"), class = "factor"), Group =
structure(c(1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label =
c("T2",
"T3"), class = "factor"), Var = c(0.32, 1.59, 2.94, 3.23,
1.4,
1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86,
3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39,
2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73,
2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11,
1.18)), .Names = c("ID", "Group", "Var"),
row.names = c(NA, 50L
), class = "data.frame")

Jeff Newmiller

2015-Oct-10 16:13 UTC

head link

[R] Data-frame selection

?aggregate

in base R. Make a short function that returns the first element of a vector and
give that to aggregate.

Or...

library(dplyr)
( test %>% group_by( ID, Group ) %>% summarise( Var=first( Var ) ) %>%
as.data.frame )
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 10, 2015 8:38:00 AM PDT, Cacique Samurai <caciquesamurai at
gmail.com> wrote:>Hello R-Helpers!
>
>I have a data-frame as below (dput in the end of mail) and need to
>select just the first sequence of occurrence of each "Group" in
each
>"ID".
>
>For example, for ID "1" I have two sequential occurrences of T2
and
>two sequential occurrences of T3:
>
>> test [test$ID == 1, ]
>   ID Group  Var
>3   1    T2 2.94
>4   1    T2 3.23
>5   1    T2 1.40
>6   1    T2 1.62
>7   1    T2 2.43
>8   1    T2 2.53
>9   1    T2 2.25
>10  1    T3 1.66
>11  1    T3 2.86
>12  1    T3 0.53
>13  1    T3 1.66
>14  1    T3 3.24
>15  1    T3 1.34
>16  1    T2 1.86
>17  1    T2 3.03
>18  1    T3 3.63
>19  1    T3 2.78
>20  1    T3 1.49
>
>As output, I need just the first group of T2 and T3 for this ID, like:
>
> ID Group  Var
>3   1    T2 2.94
>4   1    T2 3.23
>5   1    T2 1.40
>6   1    T2 1.62
>7   1    T2 2.43
>8   1    T2 2.53
>9   1    T2 2.25
>10  1    T3 1.66
>11  1    T3 2.86
>12  1    T3 0.53
>13  1    T3 1.66
>14  1    T3 3.24
>15  1    T3 1.34
>
>For others ID I have just one occurrence or sequence of occurrence of
>each Group.
>
>I tried to use a labeling variable, but cannot figure out do this
>without many many loops..
>
>Thanks in advanced,
>
>Raoni
>
> dput (teste)
>structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2",
>"3", "4"), class = "factor"), Group =
structure(c(1L, 2L, 1L,
>1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
>2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
>2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label
>c("T2",
>"T3"), class = "factor"), Var = c(0.32, 1.59, 2.94,
3.23, 1.4,
>1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86,
>3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39,
>2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73,
>2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11,
>1.18)), .Names = c("ID", "Group", "Var"),
row.names = c(NA, 50L
>), class = "data.frame")
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Cacique Samurai

2015-Oct-10 20:27 UTC

head link

[R] Data-frame selection

Hello Jeff!

Thanks very much for your prompt reply, but this is not exactly what I
need. I need the first sequence of records. In example that I send, I
need the first seven lines of group "T2" in ID "1" (lines 3
to 9) and
others six lines of group "T3" in ID "1" (lines 10 to 15). I
have to
discard lines 16 to 20, that represent repeated sequential records of
those groups in same ID.

Others ID (I sent just a small piece of my data) I have much more
sequential lines of records of each group in each ID, and many
sequential records that should be discarded. I some cases, I have just
one record of a group in an ID.

As I told, I tried to use a labeling variable, that mark first seven
lines 3 to 9 as 1 (first sequence of T2 in ID 1), lines 10 to 15 as 1
(first sequence of T3 in ID 1), lines 16 and 17 as 2 (second sequence
of T2 in ID 1) and lines 18 to 20 as 2 (second sequence of T3 in ID
1), and so on... Then will be easy take just the first record by each
ID. But the code that I made was a long long loop sequence that at end
did not work as I want to.

Once more, thanks in advanced for your atention and help,

Raoni

2015-10-10 13:13 GMT-03:00 Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>:> ?aggregate
>
> in base R. Make a short function that returns the first element of a vector
and give that to aggregate.
>
> Or...
>
> library(dplyr)
> ( test %>% group_by( ID, Group ) %>% summarise( Var=first( Var ) )
%>% as.data.frame )
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On October 10, 2015 8:38:00 AM PDT, Cacique Samurai <caciquesamurai at
gmail.com> wrote:
>>Hello R-Helpers!
>>
>>I have a data-frame as below (dput in the end of mail) and need to
>>select just the first sequence of occurrence of each "Group"
in each
>>"ID".
>>
>>For example, for ID "1" I have two sequential occurrences of
T2 and
>>two sequential occurrences of T3:
>>
>>> test [test$ID == 1, ]
>>   ID Group  Var
>>3   1    T2 2.94
>>4   1    T2 3.23
>>5   1    T2 1.40
>>6   1    T2 1.62
>>7   1    T2 2.43
>>8   1    T2 2.53
>>9   1    T2 2.25
>>10  1    T3 1.66
>>11  1    T3 2.86
>>12  1    T3 0.53
>>13  1    T3 1.66
>>14  1    T3 3.24
>>15  1    T3 1.34
>>16  1    T2 1.86
>>17  1    T2 3.03
>>18  1    T3 3.63
>>19  1    T3 2.78
>>20  1    T3 1.49
>>
>>As output, I need just the first group of T2 and T3 for this ID, like:
>>
>> ID Group  Var
>>3   1    T2 2.94
>>4   1    T2 3.23
>>5   1    T2 1.40
>>6   1    T2 1.62
>>7   1    T2 2.43
>>8   1    T2 2.53
>>9   1    T2 2.25
>>10  1    T3 1.66
>>11  1    T3 2.86
>>12  1    T3 0.53
>>13  1    T3 1.66
>>14  1    T3 3.24
>>15  1    T3 1.34
>>
>>For others ID I have just one occurrence or sequence of occurrence of
>>each Group.
>>
>>I tried to use a labeling variable, but cannot figure out do this
>>without many many loops..
>>
>>Thanks in advanced,
>>
>>Raoni
>>
>> dput (teste)
>>structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
>>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
>>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2",
>>"3", "4"), class = "factor"), Group =
structure(c(1L, 2L, 1L,
>>1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
>>2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
>>2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label
>>c("T2",
>>"T3"), class = "factor"), Var = c(0.32, 1.59, 2.94,
3.23, 1.4,
>>1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86,
>>3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39,
>>2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73,
>>2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11,
>>1.18)), .Names = c("ID", "Group", "Var"),
row.names = c(NA, 50L
>>), class = "data.frame")
>>
>>______________________________________________
>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>


-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.raoni at gmail.com

R help - Oct 2015 - Data-frame selection

[R] Data-frame selection

[R] Data-frame selection

[R] Data-frame selection