Hello R-Helpers! I have a data-frame as below (dput in the end of mail) and need to select just the first sequence of occurrence of each "Group" in each "ID". For example, for ID "1" I have two sequential occurrences of T2 and two sequential occurrences of T3:> test [test$ID == 1, ]ID Group Var 3 1 T2 2.94 4 1 T2 3.23 5 1 T2 1.40 6 1 T2 1.62 7 1 T2 2.43 8 1 T2 2.53 9 1 T2 2.25 10 1 T3 1.66 11 1 T3 2.86 12 1 T3 0.53 13 1 T3 1.66 14 1 T3 3.24 15 1 T3 1.34 16 1 T2 1.86 17 1 T2 3.03 18 1 T3 3.63 19 1 T3 2.78 20 1 T3 1.49 As output, I need just the first group of T2 and T3 for this ID, like: ID Group Var 3 1 T2 2.94 4 1 T2 3.23 5 1 T2 1.40 6 1 T2 1.62 7 1 T2 2.43 8 1 T2 2.53 9 1 T2 2.25 10 1 T3 1.66 11 1 T3 2.86 12 1 T3 0.53 13 1 T3 1.66 14 1 T3 3.24 15 1 T3 1.34 For others ID I have just one occurrence or sequence of occurrence of each Group. I tried to use a labeling variable, but cannot figure out do this without many many loops.. Thanks in advanced, Raoni dput (teste) structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2", "3", "4"), class = "factor"), Group = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("T2", "T3"), class = "factor"), Var = c(0.32, 1.59, 2.94, 3.23, 1.4, 1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86, 3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39, 2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73, 2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11, 1.18)), .Names = c("ID", "Group", "Var"), row.names = c(NA, 50L ), class = "data.frame")
?aggregate
in base R. Make a short function that returns the first element of a vector and
give that to aggregate.
Or...
library(dplyr)
( test %>% group_by( ID, Group ) %>% summarise( Var=first( Var ) ) %>%
as.data.frame )
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On October 10, 2015 8:38:00 AM PDT, Cacique Samurai <caciquesamurai at
gmail.com> wrote:>Hello R-Helpers!
>
>I have a data-frame as below (dput in the end of mail) and need to
>select just the first sequence of occurrence of each "Group" in
each
>"ID".
>
>For example, for ID "1" I have two sequential occurrences of T2
and
>two sequential occurrences of T3:
>
>> test [test$ID == 1, ]
> ID Group Var
>3 1 T2 2.94
>4 1 T2 3.23
>5 1 T2 1.40
>6 1 T2 1.62
>7 1 T2 2.43
>8 1 T2 2.53
>9 1 T2 2.25
>10 1 T3 1.66
>11 1 T3 2.86
>12 1 T3 0.53
>13 1 T3 1.66
>14 1 T3 3.24
>15 1 T3 1.34
>16 1 T2 1.86
>17 1 T2 3.03
>18 1 T3 3.63
>19 1 T3 2.78
>20 1 T3 1.49
>
>As output, I need just the first group of T2 and T3 for this ID, like:
>
> ID Group Var
>3 1 T2 2.94
>4 1 T2 3.23
>5 1 T2 1.40
>6 1 T2 1.62
>7 1 T2 2.43
>8 1 T2 2.53
>9 1 T2 2.25
>10 1 T3 1.66
>11 1 T3 2.86
>12 1 T3 0.53
>13 1 T3 1.66
>14 1 T3 3.24
>15 1 T3 1.34
>
>For others ID I have just one occurrence or sequence of occurrence of
>each Group.
>
>I tried to use a labeling variable, but cannot figure out do this
>without many many loops..
>
>Thanks in advanced,
>
>Raoni
>
> dput (teste)
>structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2",
>"3", "4"), class = "factor"), Group =
structure(c(1L, 2L, 1L,
>1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
>2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
>2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label
>c("T2",
>"T3"), class = "factor"), Var = c(0.32, 1.59, 2.94,
3.23, 1.4,
>1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86,
>3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39,
>2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73,
>2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11,
>1.18)), .Names = c("ID", "Group", "Var"),
row.names = c(NA, 50L
>), class = "data.frame")
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hello Jeff! Thanks very much for your prompt reply, but this is not exactly what I need. I need the first sequence of records. In example that I send, I need the first seven lines of group "T2" in ID "1" (lines 3 to 9) and others six lines of group "T3" in ID "1" (lines 10 to 15). I have to discard lines 16 to 20, that represent repeated sequential records of those groups in same ID. Others ID (I sent just a small piece of my data) I have much more sequential lines of records of each group in each ID, and many sequential records that should be discarded. I some cases, I have just one record of a group in an ID. As I told, I tried to use a labeling variable, that mark first seven lines 3 to 9 as 1 (first sequence of T2 in ID 1), lines 10 to 15 as 1 (first sequence of T3 in ID 1), lines 16 and 17 as 2 (second sequence of T2 in ID 1) and lines 18 to 20 as 2 (second sequence of T3 in ID 1), and so on... Then will be easy take just the first record by each ID. But the code that I made was a long long loop sequence that at end did not work as I want to. Once more, thanks in advanced for your atention and help, Raoni 2015-10-10 13:13 GMT-03:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>:> ?aggregate > > in base R. Make a short function that returns the first element of a vector and give that to aggregate. > > Or... > > library(dplyr) > ( test %>% group_by( ID, Group ) %>% summarise( Var=first( Var ) ) %>% as.data.frame ) > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On October 10, 2015 8:38:00 AM PDT, Cacique Samurai <caciquesamurai at gmail.com> wrote: >>Hello R-Helpers! >> >>I have a data-frame as below (dput in the end of mail) and need to >>select just the first sequence of occurrence of each "Group" in each >>"ID". >> >>For example, for ID "1" I have two sequential occurrences of T2 and >>two sequential occurrences of T3: >> >>> test [test$ID == 1, ] >> ID Group Var >>3 1 T2 2.94 >>4 1 T2 3.23 >>5 1 T2 1.40 >>6 1 T2 1.62 >>7 1 T2 2.43 >>8 1 T2 2.53 >>9 1 T2 2.25 >>10 1 T3 1.66 >>11 1 T3 2.86 >>12 1 T3 0.53 >>13 1 T3 1.66 >>14 1 T3 3.24 >>15 1 T3 1.34 >>16 1 T2 1.86 >>17 1 T2 3.03 >>18 1 T3 3.63 >>19 1 T3 2.78 >>20 1 T3 1.49 >> >>As output, I need just the first group of T2 and T3 for this ID, like: >> >> ID Group Var >>3 1 T2 2.94 >>4 1 T2 3.23 >>5 1 T2 1.40 >>6 1 T2 1.62 >>7 1 T2 2.43 >>8 1 T2 2.53 >>9 1 T2 2.25 >>10 1 T3 1.66 >>11 1 T3 2.86 >>12 1 T3 0.53 >>13 1 T3 1.66 >>14 1 T3 3.24 >>15 1 T3 1.34 >> >>For others ID I have just one occurrence or sequence of occurrence of >>each Group. >> >>I tried to use a labeling variable, but cannot figure out do this >>without many many loops.. >> >>Thanks in advanced, >> >>Raoni >> >> dput (teste) >>structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, >>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, >>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2", >>"3", "4"), class = "factor"), Group = structure(c(1L, 2L, 1L, >>1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, >>2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, >>2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label >>c("T2", >>"T3"), class = "factor"), Var = c(0.32, 1.59, 2.94, 3.23, 1.4, >>1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86, >>3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39, >>2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73, >>2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11, >>1.18)), .Names = c("ID", "Group", "Var"), row.names = c(NA, 50L >>), class = "data.frame") >> >>______________________________________________ >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >-- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.raoni at gmail.com