thr3ads.net - R help - [R] understanding output of tapply/by cumsum [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Gerrit Draisma

2010-Dec-07 11:39 UTC

[R] understanding output of tapply/by cumsum

Dear R-users,

I have a dataset with categories and numbers.
I would like to compute and add cumulative numbers
to the dataset.
I do not understand the structure of by(...) or
tapply(...) output enough to handle it.

Here a small example
--------------
d<-expand.grid(a=1:5,b=1:3,c=1:2)
d$n = 10 * d$a + d$b +0.1* d$c
Sn<-by(d$n,list(d$a,d$c),cumsum)
str(Sn)
---------
List of 10
  $ : num [1:3] 11.1 23.2 36.3
  $ : num [1:3] 21.1 43.2 66.3
  $ : num [1:3] 31.1 63.2 96.3
  $ : num [1:3]  41.1  83.2 126.3
  $ : num [1:3]  51.1 103.2 156.3
  $ : num [1:3] 11.2 23.4 36.6
  $ : num [1:3] 21.2 43.4 66.6
  $ : num [1:3] 31.2 63.4 96.6
  $ : num [1:3]  41.2  83.4 126.6
  $ : num [1:3]  51.2 103.4 156.6
  - attr(*, "dim")= int [1:2] 5 2
  - attr(*, "dimnames")=List of 2
   ..$ : chr [1:5] "1" "2" "3" "4" ...
   ..$ : chr [1:2] "1" "2"
  - attr(*, "call")= language by.default(data = d$n, INDICES =
list(d$a,
d$c), FUN = cumsum)
  - attr(*, "class")= chr "by
---------
# these give (a) lists of one numerical vector(a)
Sn[5,2]
Sn[cbind(d$a,d$c)]
# how to access the individual cumsum values?
# and assign them to d$Sn?
--------------

Thanks,
Gerrit.

---
Gerrit Draisma
Department of Public Health
Erasmus MC, University Medical Center Rotterdam
Room AE-235
P.O. Box 2040 3000 CA  Rotterdam The Netherlands
Phone: +31 10 7043787 Fax: +31 10 7038474
http://mgzlx4.erasmusmc.nl/pwp/?gdraisma

jim holtman

2010-Dec-07 12:43 UTC

head link

[R] understanding output of tapply/by cumsum

Maybe 'ave' is what you were looking for:
> d$cum <- ave(d$n, d$a, d$c, FUN = cumsum)
> d   a b c    n   cum
1  1 1 1 11.1  11.1
2  2 1 1 21.1  21.1
3  3 1 1 31.1  31.1
4  4 1 1 41.1  41.1
5  5 1 1 51.1  51.1
6  1 2 1 12.1  23.2
7  2 2 1 22.1  43.2
8  3 2 1 32.1  63.2
9  4 2 1 42.1  83.2
10 5 2 1 52.1 103.2
11 1 3 1 13.1  36.3
12 2 3 1 23.1  66.3
13 3 3 1 33.1  96.3
14 4 3 1 43.1 126.3
15 5 3 1 53.1 156.3
16 1 1 2 11.2  11.2
17 2 1 2 21.2  21.2
18 3 1 2 31.2  31.2
19 4 1 2 41.2  41.2
20 5 1 2 51.2  51.2
21 1 2 2 12.2  23.4
22 2 2 2 22.2  43.4
23 3 2 2 32.2  63.4
24 4 2 2 42.2  83.4
25 5 2 2 52.2 103.4
26 1 3 2 13.2  36.6
27 2 3 2 23.2  66.6
28 3 3 2 33.2  96.6
29 4 3 2 43.2 126.6
30 5 3 2 53.2 156.6>

On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl>
wrote:> Dear R-users,
>
> I have a dataset with categories and numbers.
> I would like to compute and add cumulative numbers
> to the dataset.
> I do not understand the structure of by(...) or
> tapply(...) output enough to handle it.
>
> Here a small example
> --------------
> d<-expand.grid(a=1:5,b=1:3,c=1:2)
> d$n = 10 * d$a + d$b +0.1* d$c
> Sn<-by(d$n,list(d$a,d$c),cumsum)
> str(Sn)
> ---------
> List of 10
> ?$ : num [1:3] 11.1 23.2 36.3
> ?$ : num [1:3] 21.1 43.2 66.3
> ?$ : num [1:3] 31.1 63.2 96.3
> ?$ : num [1:3] ?41.1 ?83.2 126.3
> ?$ : num [1:3] ?51.1 103.2 156.3
> ?$ : num [1:3] 11.2 23.4 36.6
> ?$ : num [1:3] 21.2 43.4 66.6
> ?$ : num [1:3] 31.2 63.4 96.6
> ?$ : num [1:3] ?41.2 ?83.4 126.6
> ?$ : num [1:3] ?51.2 103.4 156.6
> ?- attr(*, "dim")= int [1:2] 5 2
> ?- attr(*, "dimnames")=List of 2
> ?..$ : chr [1:5] "1" "2" "3" "4"
...
> ?..$ : chr [1:2] "1" "2"
> ?- attr(*, "call")= language by.default(data = d$n, INDICES =
list(d$a,
> d$c), FUN = cumsum)
> ?- attr(*, "class")= chr "by
> ---------
> # these give (a) lists of one numerical vector(a)
> Sn[5,2]
> Sn[cbind(d$a,d$c)]
> # how to access the individual cumsum values?
> # and assign them to d$Sn?
> --------------
>
> Thanks,
> Gerrit.
>
> ---
> Gerrit Draisma
> Department of Public Health
> Erasmus MC, University Medical Center Rotterdam
> Room AE-235
> P.O. Box 2040 3000 CA ?Rotterdam The Netherlands
> Phone: +31 10 7043787 Fax: +31 10 7038474
> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

jim holtman

2010-Dec-07 12:45 UTC

head link

[R] understanding output of tapply/by cumsum

You can also use 'split' to separate each group:
> split(d, list(d$a, d$c))$`1.1`
   a b c    n  cum
1  1 1 1 11.1 11.1
6  1 2 1 12.1 23.2
11 1 3 1 13.1 36.3

$`2.1`
   a b c    n  cum
2  2 1 1 21.1 21.1
7  2 2 1 22.1 43.2
12 2 3 1 23.1 66.3

$`3.1`
   a b c    n  cum
3  3 1 1 31.1 31.1
8  3 2 1 32.1 63.2
13 3 3 1 33.1 96.3

$`4.1`
   a b c    n   cum
4  4 1 1 41.1  41.1
9  4 2 1 42.1  83.2
14 4 3 1 43.1 126.3

$`5.1`
   a b c    n   cum
5  5 1 1 51.1  51.1
10 5 2 1 52.1 103.2
15 5 3 1 53.1 156.3

$`1.2`
   a b c    n  cum
16 1 1 2 11.2 11.2
21 1 2 2 12.2 23.4
26 1 3 2 13.2 36.6

$`2.2`
   a b c    n  cum
17 2 1 2 21.2 21.2
22 2 2 2 22.2 43.4
27 2 3 2 23.2 66.6

$`3.2`
   a b c    n  cum
18 3 1 2 31.2 31.2
23 3 2 2 32.2 63.4
28 3 3 2 33.2 96.6

$`4.2`
   a b c    n   cum
19 4 1 2 41.2  41.2
24 4 2 2 42.2  83.4
29 4 3 2 43.2 126.6

$`5.2`
   a b c    n   cum
20 5 1 2 51.2  51.2
25 5 2 2 52.2 103.4
30 5 3 2 53.2 156.6
>

On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl>
wrote:> Dear R-users,
>
> I have a dataset with categories and numbers.
> I would like to compute and add cumulative numbers
> to the dataset.
> I do not understand the structure of by(...) or
> tapply(...) output enough to handle it.
>
> Here a small example
> --------------
> d<-expand.grid(a=1:5,b=1:3,c=1:2)
> d$n = 10 * d$a + d$b +0.1* d$c
> Sn<-by(d$n,list(d$a,d$c),cumsum)
> str(Sn)
> ---------
> List of 10
> ?$ : num [1:3] 11.1 23.2 36.3
> ?$ : num [1:3] 21.1 43.2 66.3
> ?$ : num [1:3] 31.1 63.2 96.3
> ?$ : num [1:3] ?41.1 ?83.2 126.3
> ?$ : num [1:3] ?51.1 103.2 156.3
> ?$ : num [1:3] 11.2 23.4 36.6
> ?$ : num [1:3] 21.2 43.4 66.6
> ?$ : num [1:3] 31.2 63.4 96.6
> ?$ : num [1:3] ?41.2 ?83.4 126.6
> ?$ : num [1:3] ?51.2 103.4 156.6
> ?- attr(*, "dim")= int [1:2] 5 2
> ?- attr(*, "dimnames")=List of 2
> ?..$ : chr [1:5] "1" "2" "3" "4"
...
> ?..$ : chr [1:2] "1" "2"
> ?- attr(*, "call")= language by.default(data = d$n, INDICES =
list(d$a,
> d$c), FUN = cumsum)
> ?- attr(*, "class")= chr "by
> ---------
> # these give (a) lists of one numerical vector(a)
> Sn[5,2]
> Sn[cbind(d$a,d$c)]
> # how to access the individual cumsum values?
> # and assign them to d$Sn?
> --------------
>
> Thanks,
> Gerrit.
>
> ---
> Gerrit Draisma
> Department of Public Health
> Erasmus MC, University Medical Center Rotterdam
> Room AE-235
> P.O. Box 2040 3000 CA ?Rotterdam The Netherlands
> Phone: +31 10 7043787 Fax: +31 10 7038474
> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Gerrit Draisma

2010-Dec-08 15:56 UTC

head link

[R] understanding output of tapply/by cumsum

Thanks Jim,
"Ave" does what I wanted.
It is simpler and probably more efficient
than unlisting Sn as I tried.

Still I remain puzzled with the structure
of the by() or tapply() output and how
to access the individual cumsums.

Yes the split command is useful for checking
the result.
Gerrit.


Op 12/7/2010 1:43 PM, jim holtman schreef:> Maybe 'ave' is what you were looking for:
>
>> d$cum<- ave(d$n, d$a, d$c, FUN = cumsum)
>> d
>     a b c    n   cum
> 1  1 1 1 11.1  11.1
> 2  2 1 1 21.1  21.1
> 3  3 1 1 31.1  31.1
> 4  4 1 1 41.1  41.1
> 5  5 1 1 51.1  51.1
> 6  1 2 1 12.1  23.2
> 7  2 2 1 22.1  43.2
> 8  3 2 1 32.1  63.2
> 9  4 2 1 42.1  83.2
> 10 5 2 1 52.1 103.2
> 11 1 3 1 13.1  36.3
> 12 2 3 1 23.1  66.3
> 13 3 3 1 33.1  96.3
> 14 4 3 1 43.1 126.3
> 15 5 3 1 53.1 156.3
> 16 1 1 2 11.2  11.2
> 17 2 1 2 21.2  21.2
> 18 3 1 2 31.2  31.2
> 19 4 1 2 41.2  41.2
> 20 5 1 2 51.2  51.2
> 21 1 2 2 12.2  23.4
> 22 2 2 2 22.2  43.4
> 23 3 2 2 32.2  63.4
> 24 4 2 2 42.2  83.4
> 25 5 2 2 52.2 103.4
> 26 1 3 2 13.2  36.6
> 27 2 3 2 23.2  66.6
> 28 3 3 2 33.2  96.6
> 29 4 3 2 43.2 126.6
> 30 5 3 2 53.2 156.6
>>
>
>
> On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma<gdraisma at xs4all.nl>
wrote:
>> Dear R-users,
>>
>> I have a dataset with categories and numbers.
>> I would like to compute and add cumulative numbers
>> to the dataset.
>> I do not understand the structure of by(...) or
>> tapply(...) output enough to handle it.
>>
>> Here a small example
>> --------------
>> d<-expand.grid(a=1:5,b=1:3,c=1:2)
>> d$n = 10 * d$a + d$b +0.1* d$c
>> Sn<-by(d$n,list(d$a,d$c),cumsum)
>> str(Sn)
>> ---------
>> List of 10
>>   $ : num [1:3] 11.1 23.2 36.3
>>   $ : num [1:3] 21.1 43.2 66.3
>>   $ : num [1:3] 31.1 63.2 96.3
>>   $ : num [1:3]  41.1  83.2 126.3
>>   $ : num [1:3]  51.1 103.2 156.3
>>   $ : num [1:3] 11.2 23.4 36.6
>>   $ : num [1:3] 21.2 43.4 66.6
>>   $ : num [1:3] 31.2 63.4 96.6
>>   $ : num [1:3]  41.2  83.4 126.6
>>   $ : num [1:3]  51.2 103.4 156.6
>>   - attr(*, "dim")= int [1:2] 5 2
>>   - attr(*, "dimnames")=List of 2
>>   ..$ : chr [1:5] "1" "2" "3"
"4" ...
>>   ..$ : chr [1:2] "1" "2"
>>   - attr(*, "call")= language by.default(data = d$n, INDICES
= list(d$a,
>> d$c), FUN = cumsum)
>>   - attr(*, "class")= chr "by
>> ---------
>> # these give (a) lists of one numerical vector(a)
>> Sn[5,2]
>> Sn[cbind(d$a,d$c)]
>> # how to access the individual cumsum values?
>> # and assign them to d$Sn?
>> --------------
>>
>> Thanks,
>> Gerrit.
>>
>> ---
>> Gerrit Draisma
>> Department of Public Health
>> Erasmus MC, University Medical Center Rotterdam
>> Room AE-235
>> P.O. Box 2040 3000 CA  Rotterdam The Netherlands
>> Phone: +31 10 7043787 Fax: +31 10 7038474
>> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Dec 2010 - understanding output of tapply/by cumsum

[R] understanding output of tapply/by cumsum

[R] understanding output of tapply/by cumsum

[R] understanding output of tapply/by cumsum

[R] understanding output of tapply/by cumsum

Seemingly Similar Threads