thr3ads.net - R help - [R] data prep question [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Matthew Strother

2011-Jan-15 21:26 UTC

[R] data prep question

I have a data set with several thousand observations across time, grouped by
subject (example format below)

ID		TIME	OBS
001		2200	23		
001		2400	11		
001		3200	10	
001		4500	22
003		3900 	45	
003		5605   	32
005		1800	56
005		1900	34
005		2300	23
...

I would like to identify the first time for each subject, and then subtract this
value from each subsequent time.  However, the number of observations per
subject varies widely (from 1 to 20), and the intervals between times varies
widely.   Is there a package that can help do this, or a loop that can be set up
to evaluate ID, then calculate the values?  The outcome I would like is
presented below.
ID		TIME	OBS
001		0		23		
001		200		11		
001		1000	10	
001		2300	22
003		0	 	45	
003		1705   	32
005		0		56
005		100		34
005		500		23
...

Any help appreciated.

Bill.Venables at csiro.au

2011-Jan-16 11:48 UTC

head link

[R] data prep question

Here is one way

Here is one way:
> con <- textConnection("+ ID              TIME    OBS
+ 001             2200    23             
+ 001             2400    11             
+ 001             3200    10     
+ 001             4500    22
+ 003             3900     45     
+ 003             5605     32
+ 005             1800    56
+ 005             1900    34
+ 005             2300    23")> dat <- read.table(con, header = TRUE,+ colClasses = c("factor", "numeric",
"numeric"))> closeAllConnections()
> 
> tmp <- lapply(split(dat, dat$ID), 
+ function(x) within(x, TIME <- TIME - min(TIME)))> split(dat, dat$ID) <- tmp
> dat   ID TIME OBS
1 001    0  23
2 001  200  11
3 001 1000  10
4 001 2300  22
5 003    0  45
6 003 1705  32
7 005    0  56
8 005  100  34
9 005  500  23> 

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Matthew Strother [rstrothe at gmail.com]
Sent: 16 January 2011 07:26
To: r-help at r-project.org
Subject: [R] data prep question

I have a data set with several thousand observations across time, grouped by
subject (example format below)

ID              TIME    OBS
001             2200    23
001             2400    11
001             3200    10
001             4500    22
003             3900    45
003             5605    32
005             1800    56
005             1900    34
005             2300    23
...

I would like to identify the first time for each subject, and then subtract this
value from each subsequent time.  However, the number of observations per
subject varies widely (from 1 to 20), and the intervals between times varies
widely.   Is there a package that can help do this, or a loop that can be set up
to evaluate ID, then calculate the values?  The outcome I would like is
presented below.
ID              TIME    OBS
001             0               23
001             200             11
001             1000    10
001             2300    22
003             0               45
003             1705    32
005             0               56
005             100             34
005             500             23
...

Any help appreciated.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

rstrothe

2011-Jan-16 14:15 UTC

head link

[R] data prep question

thanks so much - that did it.   

I am new to this - so the help is greatly appreciated.  

Matthew

-- 
View this message in context:
http://r.789695.n4.nabble.com/data-prep-question-tp3219824p3220026.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2011-Jan-16 15:04 UTC

head link

[R] data prep question

On Sat, Jan 15, 2011 at 4:26 PM, Matthew Strother <rstrothe at gmail.com>
wrote:> I have a data set with several thousand observations across time, grouped
by subject (example format below)
>
> ID ? ? ? ? ? ? ?TIME ? ?OBS
> 001 ? ? ? ? ? ? 2200 ? ?23
> 001 ? ? ? ? ? ? 2400 ? ?11
> 001 ? ? ? ? ? ? 3200 ? ?10
> 001 ? ? ? ? ? ? 4500 ? ?22
> 003 ? ? ? ? ? ? 3900 ? ?45
> 003 ? ? ? ? ? ? 5605 ? ?32
> 005 ? ? ? ? ? ? 1800 ? ?56
> 005 ? ? ? ? ? ? 1900 ? ?34
> 005 ? ? ? ? ? ? 2300 ? ?23
> ...
>
> I would like to identify the first time for each subject, and then subtract
this value from each subsequent time. ?However, the number of observations per
subject varies widely (from 1 to 20), and the intervals between times varies
widely. ? Is there a package that can help do this, or a loop that can be set up
to evaluate ID, then calculate the values? ?The outcome I would like is
presented below.
> ID ? ? ? ? ? ? ?TIME ? ?OBS
> 001 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 23
> 001 ? ? ? ? ? ? 200 ? ? ? ? ? ? 11
> 001 ? ? ? ? ? ? 1000 ? ?10
> 001 ? ? ? ? ? ? 2300 ? ?22
> 003 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 45
> 003 ? ? ? ? ? ? 1705 ? ?32
> 005 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 56
> 005 ? ? ? ? ? ? 100 ? ? ? ? ? ? 34
> 005 ? ? ? ? ? ? 500 ? ? ? ? ? ? 23
Since the data frame appears to be already sorted by time within ID we
can do this:
>  transform(DF, OBS = ave(OBS, ID, FUN = function(x) x - x[1]))  ID TIME OBS
1  1 2200   0
2  1 2400 -12
3  1 3200 -13
4  1 4500  -1
5  3 3900   0
6  3 5605 -13
7  5 1800   0
8  5 1900 -22
9  5 2300 -33

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Hadley Wickham

2011-Jan-16 17:47 UTC

head link

[R] data prep question

On Sun, Jan 16, 2011 at 5:48 AM,  <Bill.Venables at csiro.au>
wrote:> Here is one way
>
> Here is one way:
>
>> con <- textConnection("
> + ID ? ? ? ? ? ? ?TIME ? ?OBS
> + 001 ? ? ? ? ? ? 2200 ? ?23
> + 001 ? ? ? ? ? ? 2400 ? ?11
> + 001 ? ? ? ? ? ? 3200 ? ?10
> + 001 ? ? ? ? ? ? 4500 ? ?22
> + 003 ? ? ? ? ? ? 3900 ? ? 45
> + 003 ? ? ? ? ? ? 5605 ? ? 32
> + 005 ? ? ? ? ? ? 1800 ? ?56
> + 005 ? ? ? ? ? ? 1900 ? ?34
> + 005 ? ? ? ? ? ? 2300 ? ?23")
>> dat <- read.table(con, header = TRUE,
> + colClasses = c("factor", "numeric",
"numeric"))
>> closeAllConnections()
>>
>> tmp <- lapply(split(dat, dat$ID),
> + function(x) within(x, TIME <- TIME - min(TIME)))
>> split(dat, dat$ID) <- tmp
Or, in one line with ddply:

library(plyr)
ddply(dat, "ID", transform, TIME = TIME - min(TIME))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

R help - Jan 2011 - data prep question

[R] data prep question

[R] data prep question

[R] data prep question

[R] data prep question

[R] data prep question