John Sorkin
2015-Sep-17 11:06 UTC
[R] getting means by group within time point for data on multiple lines (long rather than wide file)
I have a long (rather than wide file), i.e. the data for each subject is on multiple lines rather than one line. Each line has the following layout: subject group time value I have two groups, multiple subjects, each subject can be seen up to three times a time 0, and at most once at times 4 and 8. An example of the data follows: 1 control 0 100 1 control 0 NA 1 control 0 55 1 control 4 100 1 control 8 100 2 exp 0 99 2 exp 0 67 2 exp 0 66 2 exp 4 110 2 exp 8 200 I need to get means by group (control vs. exp) within time (0,4,8). The means should include only those subjects who have at least one observation at each time point (0, 4, 8). I also need to determine the number of subjects who contribute data at each time-point by group. Any suggestion on how to get them means would be appreciated. Sad to say I worked on this for four hours last night without coming to any understanding how this can be done. UGG! Thank you, John> John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing)Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Duncan Murdoch
2015-Sep-17 11:36 UTC
[R] getting means by group within time point for data on multiple lines (long rather than wide file)
On 17/09/2015 7:06 AM, John Sorkin wrote:> I have a long (rather than wide file), i.e. the data for each subject is on multiple lines rather than one line. Each line has the following layout: > subject group time value > I have two groups, multiple subjects, each subject can be seen up to three times a time 0, and at most once at times 4 and 8. > An example of the data follows: > > 1 control 0 100 > 1 control 0 NA > 1 control 0 55 > 1 control 4 100 > 1 control 8 100 > > 2 exp 0 99 > 2 exp 0 67 > 2 exp 0 66 > 2 exp 4 110 > 2 exp 8 200 > > I need to get means by group (control vs. exp) within time (0,4,8). The means should include only those subjects who have at least one observation at each time point (0, 4, 8). I also need to determine the number of subjects who contribute data at each time-point by group. Any suggestion on how to get them means would be appreciated. Sad to say I worked on this for four hours last night without coming to any understanding how this can be done. UGG!Do it in two stages. First, group the data by subject id, and delete any subjects that don't have sufficient observations. Then group by treatment and time and take means. The tapply() or by() functions will be useful for both of these steps. For example, do.call(rbind, by(x, x$subjectid, function(sub) if (length(unique(sub$times)) == 3) sub else NULL)) will remove subjects with other than 3 observed times. (It doesn't take NA into account; if you need to do that, you'll need to make that function(sub) more complicated. "sub" will be a dataframe containing data for just one subject.) The "do.call(rbind" puts the list output from by() back together as a single dataframe. Duncan Murdoch
Ivan Calandra
2015-Sep-17 11:44 UTC
[R] getting means by group within time point for data on multiple lines (long rather than wide file)
Hi John, This will not be the complete answer, but it can probably help you in the right direction. First, I would subset your data.frame to include only subjects with one observation at each time point (and I'm not sure how to do that easily). But then, the aggregate() function is what you need. Let's say your subset data.frame is called df: aggregate(value~group+time, data=df, FUN=function(x) c(length(x),mean(x))) By defining your own function in aggregate() you can compute both the length(), i.e. the number of subjects that were used in the computation, and the mean() per group and per time-point. HTH, Ivan -- Ivan Calandra, PhD University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calandra at univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 17/09/15 13:06, John Sorkin a ?crit :> I have a long (rather than wide file), i.e. the data for each subject is on multiple lines rather than one line. Each line has the following layout: > subject group time value > I have two groups, multiple subjects, each subject can be seen up to three times a time 0, and at most once at times 4 and 8. > An example of the data follows: > > 1 control 0 100 > 1 control 0 NA > 1 control 0 55 > 1 control 4 100 > 1 control 8 100 > > 2 exp 0 99 > 2 exp 0 67 > 2 exp 0 66 > 2 exp 4 110 > 2 exp 8 200 > > I need to get means by group (control vs. exp) within time (0,4,8). The means should include only those subjects who have at least one observation at each time point (0, 4, 8). I also need to determine the number of subjects who contribute data at each time-point by group. Any suggestion on how to get them means would be appreciated. Sad to say I worked on this for four hours last night without coming to any understanding how this can be done. UGG! > > Thank you, > John > > > > >> John David Sorkin M.D., Ph.D. >> Professor of Medicine >> Chief, Biostatistics and Informatics >> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine >> Baltimore VA Medical Center >> 10 North Greene Street >> GRECC (BT/18/GR) >> Baltimore, MD 21201-1524 >> (Phone) 410-605-7119 >> (Fax) 410-605-7913 (Please call phone number above prior to faxing) > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:8}}