thr3ads.net - R help - [R] Best way to compute the difference between two levels of a factor ? [Mar 2012]

If this information is useful, please help other people find it:
Share via:

wphantomfr

2012-Mar-21 08:48 UTC

[R] Best way to compute the difference between two levels of a factor ?

Dear R-help Members,


I am wondering if anyone think of the optimal way of computing for 
several numeric variable the difference between 2 levels of a factor.


To be clear let's generate a simple data frame with 2 numeric variables  
collected for different subjects (ID) and 2 levels of a TIME factor 
(time of evaluation)

data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))

   ID TIME         X         Y
1 AA   T1  9.959540 11.140529
2 AA   T2 12.949522  9.896559
3 BB   T1  9.039486 13.469104
4 BB   T2 10.056392 14.632169
5 CC   T1  8.706590 14.939197
6 CC   T2 10.799296 10.747609

I want to compute for each subject and each variable (X, Y, ...) the 
difference between T2 and T1.

Until today I do it by reshaping my dataframe to the wide format (the 
columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the 
difference between successive  columns one by one :
data$Xdiff=data$X.T2-data$X.T1
data$Ydiff=data$Y.T2-data$Y.T1
...

but this way is probably not optimal if the difference has to be 
computed for a large number of variables.

How will you handle it ?


Thanks in advance

Sylvain Cl?ment

Peter Ehlers

2012-Mar-21 10:03 UTC

head link

[R] Best way to compute the difference between two levels of a factor ?

On 2012-03-21 01:48, wphantomfr wrote:> Dear R-help Members,
>
>
> I am wondering if anyone think of the optimal way of computing for
> several numeric variable the difference between 2 levels of a factor.
>
>
> To be clear let's generate a simple data frame with 2 numeric variables
> collected for different subjects (ID) and 2 levels of a TIME factor
> (time of evaluation)
>
>
data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))
>
>     ID TIME         X         Y
> 1 AA   T1  9.959540 11.140529
> 2 AA   T2 12.949522  9.896559
> 3 BB   T1  9.039486 13.469104
> 4 BB   T2 10.056392 14.632169
> 5 CC   T1  8.706590 14.939197
> 6 CC   T2 10.799296 10.747609
>
> I want to compute for each subject and each variable (X, Y, ...) the
> difference between T2 and T1.
>
> Until today I do it by reshaping my dataframe to the wide format (the
> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
> difference between successive  columns one by one :
> data$Xdiff=data$X.T2-data$X.T1
> data$Ydiff=data$Y.T2-data$Y.T1
> ...
>
> but this way is probably not optimal if the difference has to be
> computed for a large number of variables.
>
> How will you handle it ?
One way is to use the plyr package:

  library(plyr)
  result <- ddply(data, "ID", summarize,
              DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
              DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])

Peter Ehlers
>
>
> Thanks in advance
>
> Sylvain Cl?ment

Eik Vettorazzi

2012-Mar-21 11:51 UTC

head link

[R] Best way to compute the difference between two levels of a factor ?

Hi Sylvain,

assuming your data frame is ordered by ID and TIME, how about this
aggregate(cbind(X,Y)~ID,data, function(x)(x[2]-x[1]))

#or doing this for all but the first 2 columns of data:
aggregate(data[,-(1:2)],by=list(data$ID), function(x)(x[2]-x[1]))

cheers.

Am 21.03.2012 09:48, schrieb wphantomfr:> Dear R-help Members,
> 
> 
> I am wondering if anyone think of the optimal way of computing for
> several numeric variable the difference between 2 levels of a factor.
> 
> 
> To be clear let's generate a simple data frame with 2 numeric variables
> collected for different subjects (ID) and 2 levels of a TIME factor
> (time of evaluation)
> 
>
data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9))
> 
> 
>   ID TIME         X         Y
> 1 AA   T1  9.959540 11.140529
> 2 AA   T2 12.949522  9.896559
> 3 BB   T1  9.039486 13.469104
> 4 BB   T2 10.056392 14.632169
> 5 CC   T1  8.706590 14.939197
> 6 CC   T2 10.799296 10.747609
> 
> I want to compute for each subject and each variable (X, Y, ...) the
> difference between T2 and T1.
> 
> Until today I do it by reshaping my dataframe to the wide format (the
> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
> difference between successive  columns one by one :
> data$Xdiff=data$X.T2-data$X.T1
> data$Ydiff=data$Y.T2-data$Y.T1
> ...
> 
> but this way is probably not optimal if the difference has to be
> computed for a large number of variables.
> 
> How will you handle it ?
> 
> 
> Thanks in advance
> 
> Sylvain Cl?ment
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts;
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr.
Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus

Reasonably Related Threads

Search for more reasonably related threads

R help - Mar 2012 - Best way to compute the difference between two levels of a factor ?

[R] Best way to compute the difference between two levels of a factor ?

[R] Best way to compute the difference between two levels of a factor ?

[R] Best way to compute the difference between two levels of a factor ?

Reasonably Related Threads