thr3ads.net - R help - [R] Sum every n (4) observations by group [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Miluji Sb

2021-Dec-19 18:31 UTC

[R] Sum every n (4) observations by group

Dear all,

I have a dataset (below) by ID and time sequence. I would like to sum every
four observations by ID.

I am confused how to combine the two conditions. Any help will be highly
appreciated. Thank you!

Best.

Milu

## Dataset
structure(list(ID = c("A", "A", "A",
"A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"B", "B", "B", "C", "C",
"C", "C", "C",
"C", "C", "C"), Date = c(4140L, 4141L, 4142L,
4143L, 4144L, 4145L,
4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L,
4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L
), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384,
0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797,
0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326,
0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854,
0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383
)), class = "data.frame", row.names = c(NA, -24L))

	[[alternative HTML version deleted]]

Peter Langfelder

2021-Dec-19 18:49 UTC

head link

[R] Sum every n (4) observations by group

I'm not sure I understand the task, but if I do, assuming your data
frame is assigned to a variable named df, I would do something like

sumNs = function(x, n)
{
   if (length(x) %%n !=0) stop("Length of 'x' must be a multiple of
'n'.")
   n1 = length(x)/n
   ind = rep(1:n1, each = n)
   tapply(x, ind, sum)
}
sums = tapply(df$Value, df$ID, sumNs, 4)

Peter

On Sun, Dec 19, 2021 at 10:32 AM Miluji Sb <milujisb at gmail.com>
wrote:>
> Dear all,
>
> I have a dataset (below) by ID and time sequence. I would like to sum every
> four observations by ID.
>
> I am confused how to combine the two conditions. Any help will be highly
> appreciated. Thank you!
>
> Best.
>
> Milu
>
> ## Dataset
> structure(list(ID = c("A", "A", "A",
"A", "A", "A", "A", "A",
> "B", "B", "B", "B", "B",
"B", "B", "B", "C", "C",
"C", "C", "C",
> "C", "C", "C"), Date = c(4140L, 4141L, 4142L,
4143L, 4144L, 4145L,
> 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L,
> 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L
> ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384,
> 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797,
> 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326,
> 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854,
> 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383
> )), class = "data.frame", row.names = c(NA, -24L))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Avi Gross

2021-Dec-19 20:09 UTC

head link

[R] Sum every n (4) observations by group

Milu,

Your data seems to be very consistent in that each value of ID has eight
rows. You seem to want to just sum every four so that fits:

   ID Date       Value
1   A 4140 0.000207232
2   A 4141 0.000240141
3   A 4142 0.000271414
4   A 4143 0.000258384
5   A 4144 0.000243640
6   A 4145 0.000271480
7   A 4146 0.000280585
8   A 4147 0.000289691
9   B 4140 0.000298797
10  B 4141 0.000307903
11  B 4142 0.000317008
12  B 4143 0.000326114
13  B 4144 0.000335220
14  B 4145 0.000344326
15  B 4146 0.000353431
16  B 4147 0.000362537
17  C 4140 0.000371643
18  C 4141 0.000380749
19  C 4142 0.000389854
20  C 4143 0.000398960
21  C 4144 0.000408066
22  C 4145 0.000417172
23  C 4146 0.000426277
24  C 4147 0.000435383

There are many ways to do what you want, some more general than others, but
one trivial way is to add a column that contains 24 numbers ranging from 1
to 6 like this assuming mydf holds the above:

Here is an example of such a vector:

rep(1:(nrow(mydf)/4), each=4)
 [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6

So you can add a column like:
> mydf$fours <- rep(1:(nrow(mydf)/4), each=4)
> mydf   ID Date       Value fours
1   A 4140 0.000207232     1
2   A 4141 0.000240141     1
3   A 4142 0.000271414     1
4   A 4143 0.000258384     1
5   A 4144 0.000243640     2
6   A 4145 0.000271480     2
7   A 4146 0.000280585     2
8   A 4147 0.000289691     2
9   B 4140 0.000298797     3
10  B 4141 0.000307903     3
11  B 4142 0.000317008     3
12  B 4143 0.000326114     3
13  B 4144 0.000335220     4
14  B 4145 0.000344326     4
15  B 4146 0.000353431     4
16  B 4147 0.000362537     4
17  C 4140 0.000371643     5
18  C 4141 0.000380749     5
19  C 4142 0.000389854     5
20  C 4143 0.000398960     5
21  C 4144 0.000408066     6
22  C 4145 0.000417172     6
23  C 4146 0.000426277     6
24  C 4147 0.000435383     6

You now use grouping any way you want to apply a function and in this case
you want a sum. I like to use the tidyverse functions so will show that as
in:

mydf %>%
  group_by(ID, fours) %>%
  summarize(sums=sum(Value), n=n())

I threw in the extra column in case your data sometimes does not have 4 at
the end of a group or beginning of next. Here is the output:

# A tibble: 6 x 4
# Groups:   ID [3]
ID    fours     sums     n
<chr> <int>    <dbl> <int>
  1 A         1 0.000977     4
2 A         2 0.00109      4
3 B         3 0.00125      4
4 B         4 0.00140      4
5 C         5 0.00154      4
6 C         6 0.00169      4

Of course there are all kinds of ways to do this in standard R, including
trivial ones like looping over indices starting at 1 and taking four at a
time and getting the Value data for mydf$Value[N] + mydf$Value[N+1] ...



-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Miluji Sb
Sent: Sunday, December 19, 2021 1:32 PM
To: r-help mailing list <r-help at r-project.org>
Subject: [R] Sum every n (4) observations by group

Dear all,

I have a dataset (below) by ID and time sequence. I would like to sum every
four observations by ID.

I am confused how to combine the two conditions. Any help will be highly
appreciated. Thank you!

Best.

Milu

## Dataset
structure(list(ID = c("A", "A", "A",
"A", "A", "A", "A", "A",
"B", "B", "B",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C",
"C", "C", "C"), Date c(4140L, 4141L, 4142L, 4143L,
4144L, 4145L, 4146L, 4147L, 4140L, 4141L,
4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L,
4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414,
0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797,
0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431,
0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066,
0.000417172, 0.000426277, 0.000435383 )), class = "data.frame",
row.names c(NA, -24L))

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Dec 2021 - Sum every n (4) observations by group

[R] Sum every n (4) observations by group

[R] Sum every n (4) observations by group

[R] Sum every n (4) observations by group