thr3ads.net - R help - [R] getting means by group within time point for data on multiple lines (long rather than wide file) [Sep 2015]

If this information is useful, please help other people find it:
Share via:

John Sorkin

2015-Sep-17 11:06 UTC

[R] getting means by group within time point for data on multiple lines (long rather than wide file)

I have a long (rather than wide file), i.e. the data for each subject is on
multiple lines rather than one line. Each line has the following layout:
subject group time value
I have two groups, multiple subjects, each subject can be seen up to three times
a time 0, and at most once at times 4 and 8.
An example of the data follows:

1 control 0 100
1 control 0 NA
1 control 0 55
1 control 4 100
1 control 8 100

2 exp 0 99
2 exp 0 67
2 exp 0 66
2 exp 4 110
2 exp 8 200

I need to get means by group (control vs. exp) within time (0,4,8). The means
should include only those subjects who have at least one observation at each
time point (0, 4, 8). I also need to determine the number of subjects who
contribute data at each time-point by group. Any suggestion on how to get them
means would be appreciated. Sad to say I worked on this for four hours last
night without coming to any understanding how this can be done. UGG!

Thank you,
John



> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
Any unauthorized use, disclosure or distribution is prohibited. If you are not
the intended recipient, please contact the sender by reply email and destroy all
copies of the original message.

Duncan Murdoch

2015-Sep-17 11:36 UTC

head link

[R] getting means by group within time point for data on multiple lines (long rather than wide file)

On 17/09/2015 7:06 AM, John Sorkin wrote:> I have a long (rather than wide file), i.e. the data for each subject is on
multiple lines rather than one line. Each line has the following layout:
> subject group time value
> I have two groups, multiple subjects, each subject can be seen up to three
times a time 0, and at most once at times 4 and 8.
> An example of the data follows:
> 
> 1 control 0 100
> 1 control 0 NA
> 1 control 0 55
> 1 control 4 100
> 1 control 8 100
> 
> 2 exp 0 99
> 2 exp 0 67
> 2 exp 0 66
> 2 exp 4 110
> 2 exp 8 200
> 
> I need to get means by group (control vs. exp) within time (0,4,8). The
means should include only those subjects who have at least one observation at
each time point (0, 4, 8). I also need to determine the number of subjects who
contribute data at each time-point by group. Any suggestion on how to get them
means would be appreciated. Sad to say I worked on this for four hours last
night without coming to any understanding how this can be done. UGG!
Do it in two stages.  First, group the data by subject id, and delete
any subjects that don't have sufficient observations.  Then group by
treatment and time and take means.

The tapply() or by() functions will be useful for both of these steps.
For example,

do.call(rbind,
  by(x, x$subjectid,
     function(sub)
       if (length(unique(sub$times)) == 3) sub
       else NULL))

will remove subjects with other than 3 observed times.  (It doesn't take
NA into account; if you need to do that, you'll need to make that
function(sub) more complicated.  "sub" will be a dataframe containing
data for just one subject.)

The "do.call(rbind" puts the list output from by() back together as a
single dataframe.

Duncan Murdoch

Ivan Calandra

2015-Sep-17 11:44 UTC

head link

[R] getting means by group within time point for data on multiple lines (long rather than wide file)

Hi John,

This will not be the complete answer, but it can probably help you in 
the right direction.

First, I would subset your data.frame to include only subjects with one 
observation at each time point (and I'm not sure how to do that easily).

But then, the aggregate() function is what you need. Let's say your 
subset data.frame is called df:
aggregate(value~group+time, data=df, FUN=function(x) c(length(x),mean(x)))

By defining your own function in aggregate() you can compute both the 
length(), i.e. the number of subjects that were used in the computation, 
and the mean() per group and per time-point.

HTH,
Ivan

--
Ivan Calandra, PhD
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calandra at univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 17/09/15 13:06, John Sorkin a ?crit :> I have a long (rather than wide file), i.e. the data for each subject is on
multiple lines rather than one line. Each line has the following layout:
> subject group time value
> I have two groups, multiple subjects, each subject can be seen up to three
times a time 0, and at most once at times 4 and 8.
> An example of the data follows:
>
> 1 control 0 100
> 1 control 0 NA
> 1 control 0 55
> 1 control 4 100
> 1 control 8 100
>
> 2 exp 0 99
> 2 exp 0 67
> 2 exp 0 66
> 2 exp 4 110
> 2 exp 8 200
>
> I need to get means by group (control vs. exp) within time (0,4,8). The
means should include only those subjects who have at least one observation at
each time point (0, 4, 8). I also need to determine the number of subjects who
contribute data at each time-point by group. Any suggestion on how to get them
means would be appreciated. Sad to say I worked on this for four hours last
night without coming to any understanding how this can be done. UGG!
>
> Thank you,
> John
>
>
>
>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:8}}

R help - Sep 2015 - getting means by group within time point for data on multiple lines (long rather than wide file)

[R] getting means by group within time point for data on multiple lines (long rather than wide file)

[R] getting means by group within time point for data on multiple lines (long rather than wide file)

[R] getting means by group within time point for data on multiple lines (long rather than wide file)