Dear all, I have a dataset (below) by ID and time sequence. I would like to sum every four observations by ID. I am confused how to combine the two conditions. Any help will be highly appreciated. Thank you! Best. Milu ## Dataset structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 )), class = "data.frame", row.names = c(NA, -24L)) [[alternative HTML version deleted]]
I'm not sure I understand the task, but if I do, assuming your data frame is assigned to a variable named df, I would do something like sumNs = function(x, n) { if (length(x) %%n !=0) stop("Length of 'x' must be a multiple of 'n'.") n1 = length(x)/n ind = rep(1:n1, each = n) tapply(x, ind, sum) } sums = tapply(df$Value, df$ID, sumNs, 4) Peter On Sun, Dec 19, 2021 at 10:32 AM Miluji Sb <milujisb at gmail.com> wrote:> > Dear all, > > I have a dataset (below) by ID and time sequence. I would like to sum every > four observations by ID. > > I am confused how to combine the two conditions. Any help will be highly > appreciated. Thank you! > > Best. > > Milu > > ## Dataset > structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", > "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", > "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, > 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, > 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L > ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, > 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, > 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, > 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, > 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 > )), class = "data.frame", row.names = c(NA, -24L)) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Milu, Your data seems to be very consistent in that each value of ID has eight rows. You seem to want to just sum every four so that fits: ID Date Value 1 A 4140 0.000207232 2 A 4141 0.000240141 3 A 4142 0.000271414 4 A 4143 0.000258384 5 A 4144 0.000243640 6 A 4145 0.000271480 7 A 4146 0.000280585 8 A 4147 0.000289691 9 B 4140 0.000298797 10 B 4141 0.000307903 11 B 4142 0.000317008 12 B 4143 0.000326114 13 B 4144 0.000335220 14 B 4145 0.000344326 15 B 4146 0.000353431 16 B 4147 0.000362537 17 C 4140 0.000371643 18 C 4141 0.000380749 19 C 4142 0.000389854 20 C 4143 0.000398960 21 C 4144 0.000408066 22 C 4145 0.000417172 23 C 4146 0.000426277 24 C 4147 0.000435383 There are many ways to do what you want, some more general than others, but one trivial way is to add a column that contains 24 numbers ranging from 1 to 6 like this assuming mydf holds the above: Here is an example of such a vector: rep(1:(nrow(mydf)/4), each=4) [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 So you can add a column like:> mydf$fours <- rep(1:(nrow(mydf)/4), each=4) > mydfID Date Value fours 1 A 4140 0.000207232 1 2 A 4141 0.000240141 1 3 A 4142 0.000271414 1 4 A 4143 0.000258384 1 5 A 4144 0.000243640 2 6 A 4145 0.000271480 2 7 A 4146 0.000280585 2 8 A 4147 0.000289691 2 9 B 4140 0.000298797 3 10 B 4141 0.000307903 3 11 B 4142 0.000317008 3 12 B 4143 0.000326114 3 13 B 4144 0.000335220 4 14 B 4145 0.000344326 4 15 B 4146 0.000353431 4 16 B 4147 0.000362537 4 17 C 4140 0.000371643 5 18 C 4141 0.000380749 5 19 C 4142 0.000389854 5 20 C 4143 0.000398960 5 21 C 4144 0.000408066 6 22 C 4145 0.000417172 6 23 C 4146 0.000426277 6 24 C 4147 0.000435383 6 You now use grouping any way you want to apply a function and in this case you want a sum. I like to use the tidyverse functions so will show that as in: mydf %>% group_by(ID, fours) %>% summarize(sums=sum(Value), n=n()) I threw in the extra column in case your data sometimes does not have 4 at the end of a group or beginning of next. Here is the output: # A tibble: 6 x 4 # Groups: ID [3] ID fours sums n <chr> <int> <dbl> <int> 1 A 1 0.000977 4 2 A 2 0.00109 4 3 B 3 0.00125 4 4 B 4 0.00140 4 5 C 5 0.00154 4 6 C 6 0.00169 4 Of course there are all kinds of ways to do this in standard R, including trivial ones like looping over indices starting at 1 and taking four at a time and getting the Value data for mydf$Value[N] + mydf$Value[N+1] ... -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Miluji Sb Sent: Sunday, December 19, 2021 1:32 PM To: r-help mailing list <r-help at r-project.org> Subject: [R] Sum every n (4) observations by group Dear all, I have a dataset (below) by ID and time sequence. I would like to sum every four observations by ID. I am confused how to combine the two conditions. Any help will be highly appreciated. Thank you! Best. Milu ## Dataset structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), Date c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 )), class = "data.frame", row.names c(NA, -24L)) [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.