wphantomfr
2012-Mar-21 08:48 UTC
[R] Best way to compute the difference between two levels of a factor ?
Dear R-help Members, I am wondering if anyone think of the optimal way of computing for several numeric variable the difference between 2 levels of a factor. To be clear let's generate a simple data frame with 2 numeric variables collected for different subjects (ID) and 2 levels of a TIME factor (time of evaluation) data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) ID TIME X Y 1 AA T1 9.959540 11.140529 2 AA T2 12.949522 9.896559 3 BB T1 9.039486 13.469104 4 BB T2 10.056392 14.632169 5 CC T1 8.706590 14.939197 6 CC T2 10.799296 10.747609 I want to compute for each subject and each variable (X, Y, ...) the difference between T2 and T1. Until today I do it by reshaping my dataframe to the wide format (the columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the difference between successive columns one by one : data$Xdiff=data$X.T2-data$X.T1 data$Ydiff=data$Y.T2-data$Y.T1 ... but this way is probably not optimal if the difference has to be computed for a large number of variables. How will you handle it ? Thanks in advance Sylvain Cl?ment
Peter Ehlers
2012-Mar-21 10:03 UTC
[R] Best way to compute the difference between two levels of a factor ?
On 2012-03-21 01:48, wphantomfr wrote:> Dear R-help Members, > > > I am wondering if anyone think of the optimal way of computing for > several numeric variable the difference between 2 levels of a factor. > > > To be clear let's generate a simple data frame with 2 numeric variables > collected for different subjects (ID) and 2 levels of a TIME factor > (time of evaluation) > > data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) > > ID TIME X Y > 1 AA T1 9.959540 11.140529 > 2 AA T2 12.949522 9.896559 > 3 BB T1 9.039486 13.469104 > 4 BB T2 10.056392 14.632169 > 5 CC T1 8.706590 14.939197 > 6 CC T2 10.799296 10.747609 > > I want to compute for each subject and each variable (X, Y, ...) the > difference between T2 and T1. > > Until today I do it by reshaping my dataframe to the wide format (the > columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the > difference between successive columns one by one : > data$Xdiff=data$X.T2-data$X.T1 > data$Ydiff=data$Y.T2-data$Y.T1 > ... > > but this way is probably not optimal if the difference has to be > computed for a large number of variables. > > How will you handle it ?One way is to use the plyr package: library(plyr) result <- ddply(data, "ID", summarize, DIF.X = X[TIME=="T2"] - X[TIME=="T1"], DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"]) Peter Ehlers> > > Thanks in advance > > Sylvain Cl?ment
Eik Vettorazzi
2012-Mar-21 11:51 UTC
[R] Best way to compute the difference between two levels of a factor ?
Hi Sylvain, assuming your data frame is ordered by ID and TIME, how about this aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1])) #or doing this for all but the first 2 columns of data: aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1])) cheers. Am 21.03.2012 09:48, schrieb wphantomfr:> Dear R-help Members, > > > I am wondering if anyone think of the optimal way of computing for > several numeric variable the difference between 2 levels of a factor. > > > To be clear let's generate a simple data frame with 2 numeric variables > collected for different subjects (ID) and 2 levels of a TIME factor > (time of evaluation) > > data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) > > > ID TIME X Y > 1 AA T1 9.959540 11.140529 > 2 AA T2 12.949522 9.896559 > 3 BB T1 9.039486 13.469104 > 4 BB T2 10.056392 14.632169 > 5 CC T1 8.706590 14.939197 > 6 CC T2 10.799296 10.747609 > > I want to compute for each subject and each variable (X, Y, ...) the > difference between T2 and T1. > > Until today I do it by reshaping my dataframe to the wide format (the > columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the > difference between successive columns one by one : > data$Xdiff=data$X.T2-data$X.T1 > data$Ydiff=data$Y.T2-data$Y.T1 > ... > > but this way is probably not optimal if the difference has to be > computed for a large number of variables. > > How will you handle it ? > > > Thanks in advance > > Sylvain Cl?ment > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus