derek eder
2010-May-24 20:28 UTC
[R] calculating "treatment effects" (differences) in a data frame?
I am trying to calculating the treatment effect for individual subjects ("ID") of a ("score") between 2 time-points ("visit") (see example below). The data is in an unbalanced data.frame in "long" format with some missing data. I suspect that I am overlooking a very simple function, something along the lines of tapply(). Thank you for you attention! Derek Eder ## Examples: myData = data.frame( ID = c("a","a","b","c","c","d","d"), visit=c(1,2,1,1,2,1,2), score=c(10,2,12,16,0,NA,5) ) > myData ID visit score 1 a 1 10 2 a 2 2 3 b 1 12 4 c 1 16 5 c 2 0 6 d 1 NA 7 d 2 5 # The desired result is a vector of time differences by ID # a b c d # 8 NA 16 NA ## solutions ? # This works, but the returned data frame is awkward for me # because the "empty cells" (b and d) contain integer(0) # and not the more familiar NA. > aggregate(data=myData, score~ID,FUN=diff) ID score 1 a -8 2 b 3 c -16 4 d # This works as desired ... but somehow seems unecessarily complicated > reshape(data=myData,timevar="visit",idvar="ID", direction="wide") ID score.1 score.2 1 a 10 2 3 b 12 NA 4 c 16 0 6 d NA 5 > apply(X = reshape(data=myData,timevar="visit",idvar="ID", direction="wide")[,-1], MARGIN = 1, FUN = diff) 1 3 4 6 -8 NA -16 NA