thr3ads.net - R help - [R] Tapply for Group Specific Means and Proportions [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Bret Collier

2008-Mar-03 22:27 UTC

[R] Tapply for Group Specific Means and Proportions

UseRs,

I am working on a dataset (see small example below) where individuals 
were followed on a specific date-time combo and multiple repeated 
measurements were taken (e.g., height in meters, behavior class in 2 
letter code).  Observation numbers varied between individual (ranging 
from 1 observation for each date-time combo to >50)

I am trying to summarize the data into 1 row per individual-date-time 
combination.  I used tapply to pull mean height (TreeHt) out for each 
date-time combo.  However, all my attempts to get the proportion of 
times a specific behavior category occurs within the same date-time 
combo have failed thus far having tried tapply, aggregate, table 
(because Behavior is a factor), etc.-- likely I probably did not search 
the right word combination in the help archives

If anyone can point me in the right direction toward streamlining my 
code to output the summaries along these general lines (column headers 
being the Behavior categories, 0.xx being the proportion per date-time) 
I would appreciate it:

Date-Time	MeanHt	 PE    OS    SI  ...
28Mar96.0752  6.000000  0.xx  0.xx  0.xx ...
28Mar96.1014  7.000000  0.xx  0.xx  0.xx ...


TIA,
Bret (R 2.6.1 on I386-pc-mingw32)
Texas A&M

 > Final
    Sequence testdate testtime Behavior Substrate TreeHt
1         1  28Mar96     0752       PE        TW      6
2         2  28Mar96     0752       OS      <NA>      6
3         3  28Mar96     0752       PE        TW      6
4         4  28Mar96     0752       PE        TW      6
5         1  28Mar96     0924       PE        TW      8
6         2  28Mar96     0924       PE        BR      8
7         3  28Mar96     0924       PE        TW      7
8         4  28Mar96     0924       SI        TW      7
9         5  28Mar96     0924       PE        TW      7
10        6  28Mar96     0924       PE        TW      7
11        1  28Mar96     0954       HO        BR     10
12        2  28Mar96     0954       PE        BR     10
13        1  28Mar96     1014       PE        TW      7
14        2  28Mar96     1014       HO        TW      7
15        1  29Mar96     0835       PE        TW      4
16        2  29Mar96     0835       EA        BR      4
17        3  29Mar96     0835       MA        BR      4
18        4  29Mar96     0835       PE        TW      5
19        5  29Mar96     0835       PE        TW      5
20        6  29Mar96     0835       PE        TW     13
21        7  29Mar96     0835       PE        TW     13
22        8  29Mar96     0835       PE        TW     13
23        9  29Mar96     0835       PE        BR     13
24       10  29Mar96     0835       PE        TW     13
25       11  29Mar96     0835       HO        TW     12
26       12  29Mar96     0835       HO        TW     12
27       13  29Mar96     0835       HO        TW     12
28       14  29Mar96     0835       HO        TW     12
29       15  29Mar96     0835       PE        TW     13
30       16  29Mar96     0835       PE        TR     13
31       17  29Mar96     0835       FL      <NA>     NA
32       18  29Mar96     0835       PE        BR     12
33       19  29Mar96     0835       FL      <NA>     NA
34       20  29Mar96     0835       PE        TW     13
35       21  29Mar96     0835       PE        TW     13
36       22  29Mar96     0835       FL      <NA>     NA
37       23  29Mar96     0835       HO        TW      4
38       24  29Mar96     0835       PE        BR      5
39       25  29Mar96     0835       PE        BR      5
40       26  29Mar96     0835       PE        BR      5
41       27  29Mar96     0835       PE        TW      4
42       28  29Mar96     0835       PE        TW      5
43       29  29Mar96     0835       PE        TW      5
44       30  29Mar96     0835       PE        TW     13
45       31  29Mar96     0835       PE        TW      5
 > str(Final)
'data.frame':	45 obs. of  6 variables:
  $ Sequence : num  1 2 3 4 1 2 3 4 5 6 ...
  $ testdate : Factor w/ 2 levels "28Mar96","29Mar96": 1 1 1
1 1 1 1 1 1
1 ...
  $ testtime : Factor w/ 5 levels "0752","0835",..: 1 1 1 1
3 3 3 3 3 3 ...
  $ Behavior : Factor w/ 7 levels
"EA","FL","HO",..: 6 5 6 6 6 6 6 7 6 6 ...
  $ Substrate: Factor w/ 3 levels "BR","TR","TW":
3 NA 3 3 3 1 3 3 3 3 ...
  $ TreeHt   : num  6 6 6 6 8 8 7 7 7 7 ...
 > test<-sort((tapply(Final$TreeHt, INDEX=interaction(Final$testdate, 
Final$testtime),  FUN=mean, na.rm=TRUE)))
 > data.frame(test)
                   test
28Mar96.0752  6.000000
28Mar96.1014  7.000000
28Mar96.0924  7.333333
29Mar96.0835  8.928571
28Mar96.0954 10.000000

jim holtman

2008-Mar-03 23:37 UTC

head link

[R] Tapply for Group Specific Means and Proportions

Here is how you can get the proportions from your data frame:
> prop.table(table(paste(x$testdate, x$testtime), x$Behavior),margin=1)
                       EA         FL         HO         MA         OS
       PE         SI
  28Mar96 1014 0.00000000 0.00000000 0.50000000 0.00000000 0.00000000
0.50000000 0.00000000
  28Mar96 752  0.00000000 0.00000000 0.00000000 0.00000000 0.25000000
0.75000000 0.00000000
  28Mar96 924  0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
0.83333333 0.16666667
  28Mar96 954  0.00000000 0.00000000 0.50000000 0.00000000 0.00000000
0.50000000 0.00000000
  29Mar96 835  0.03225806 0.09677419 0.16129032 0.03225806 0.00000000
0.67741935 0.00000000>

On Mon, Mar 3, 2008 at 5:27 PM, Bret Collier <bacollier at ag.tamu.edu>
wrote:> UseRs,
>
> I am working on a dataset (see small example below) where individuals
> were followed on a specific date-time combo and multiple repeated
> measurements were taken (e.g., height in meters, behavior class in 2
> letter code).  Observation numbers varied between individual (ranging
> from 1 observation for each date-time combo to >50)
>
> I am trying to summarize the data into 1 row per individual-date-time
> combination.  I used tapply to pull mean height (TreeHt) out for each
> date-time combo.  However, all my attempts to get the proportion of
> times a specific behavior category occurs within the same date-time
> combo have failed thus far having tried tapply, aggregate, table
> (because Behavior is a factor), etc.-- likely I probably did not search
> the right word combination in the help archives
>
> If anyone can point me in the right direction toward streamlining my
> code to output the summaries along these general lines (column headers
> being the Behavior categories, 0.xx being the proportion per date-time)
> I would appreciate it:
>
> Date-Time       MeanHt   PE    OS    SI  ...
> 28Mar96.0752  6.000000  0.xx  0.xx  0.xx ...
> 28Mar96.1014  7.000000  0.xx  0.xx  0.xx ...
>
>
> TIA,
> Bret (R 2.6.1 on I386-pc-mingw32)
> Texas A&M
>
>  > Final
>    Sequence testdate testtime Behavior Substrate TreeHt
> 1         1  28Mar96     0752       PE        TW      6
> 2         2  28Mar96     0752       OS      <NA>      6
> 3         3  28Mar96     0752       PE        TW      6
> 4         4  28Mar96     0752       PE        TW      6
> 5         1  28Mar96     0924       PE        TW      8
> 6         2  28Mar96     0924       PE        BR      8
> 7         3  28Mar96     0924       PE        TW      7
> 8         4  28Mar96     0924       SI        TW      7
> 9         5  28Mar96     0924       PE        TW      7
> 10        6  28Mar96     0924       PE        TW      7
> 11        1  28Mar96     0954       HO        BR     10
> 12        2  28Mar96     0954       PE        BR     10
> 13        1  28Mar96     1014       PE        TW      7
> 14        2  28Mar96     1014       HO        TW      7
> 15        1  29Mar96     0835       PE        TW      4
> 16        2  29Mar96     0835       EA        BR      4
> 17        3  29Mar96     0835       MA        BR      4
> 18        4  29Mar96     0835       PE        TW      5
> 19        5  29Mar96     0835       PE        TW      5
> 20        6  29Mar96     0835       PE        TW     13
> 21        7  29Mar96     0835       PE        TW     13
> 22        8  29Mar96     0835       PE        TW     13
> 23        9  29Mar96     0835       PE        BR     13
> 24       10  29Mar96     0835       PE        TW     13
> 25       11  29Mar96     0835       HO        TW     12
> 26       12  29Mar96     0835       HO        TW     12
> 27       13  29Mar96     0835       HO        TW     12
> 28       14  29Mar96     0835       HO        TW     12
> 29       15  29Mar96     0835       PE        TW     13
> 30       16  29Mar96     0835       PE        TR     13
> 31       17  29Mar96     0835       FL      <NA>     NA
> 32       18  29Mar96     0835       PE        BR     12
> 33       19  29Mar96     0835       FL      <NA>     NA
> 34       20  29Mar96     0835       PE        TW     13
> 35       21  29Mar96     0835       PE        TW     13
> 36       22  29Mar96     0835       FL      <NA>     NA
> 37       23  29Mar96     0835       HO        TW      4
> 38       24  29Mar96     0835       PE        BR      5
> 39       25  29Mar96     0835       PE        BR      5
> 40       26  29Mar96     0835       PE        BR      5
> 41       27  29Mar96     0835       PE        TW      4
> 42       28  29Mar96     0835       PE        TW      5
> 43       29  29Mar96     0835       PE        TW      5
> 44       30  29Mar96     0835       PE        TW     13
> 45       31  29Mar96     0835       PE        TW      5
>  > str(Final)
> 'data.frame':   45 obs. of  6 variables:
>  $ Sequence : num  1 2 3 4 1 2 3 4 5 6 ...
>  $ testdate : Factor w/ 2 levels "28Mar96","29Mar96": 1
1 1 1 1 1 1 1 1
> 1 ...
>  $ testtime : Factor w/ 5 levels "0752","0835",..: 1 1
1 1 3 3 3 3 3 3 ...
>  $ Behavior : Factor w/ 7 levels
"EA","FL","HO",..: 6 5 6 6 6 6 6 7 6 6 ...
>  $ Substrate: Factor w/ 3 levels
"BR","TR","TW": 3 NA 3 3 3 1 3 3 3 3 ...
>  $ TreeHt   : num  6 6 6 6 8 8 7 7 7 7 ...
>  > test<-sort((tapply(Final$TreeHt, INDEX=interaction(Final$testdate,
> Final$testtime),  FUN=mean, na.rm=TRUE)))
>  > data.frame(test)
>                   test
> 28Mar96.0752  6.000000
> 28Mar96.1014  7.000000
> 28Mar96.0924  7.333333
> 29Mar96.0835  8.928571
> 28Mar96.0954 10.000000
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Maybe Matching Threads

Search for more reasonably related threads

R help - Mar 2008 - Tapply for Group Specific Means and Proportions

[R] Tapply for Group Specific Means and Proportions

[R] Tapply for Group Specific Means and Proportions

Maybe Matching Threads