thr3ads.net - R help - [R] the first and last observation for each subject [Jan 2009]

If this information is useful, please help other people find it:
Share via:

gallon li

2009-Jan-02 09:20 UTC

[R] the first and last observation for each subject

I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

	[[alternative HTML version deleted]]

Petr PIKAL

2009-Jan-02 09:55 UTC

head link

[R] Odp: the first and last observation for each subject

Hi

r-help-bounces at r-project.org napsal dne 02.01.2009 10:20:23:
> I have the following data
> 
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
> 
> x is time invariant, ID is the subject id number, y is changing over 
time.> 
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
 sapply(split(test$y, test$ID), function(x) tail(x, 1)-head(x,1))

I am leaving formating to the resulting table to you. Hint: aggregate

Best regards
Petr
> 
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
> 
> Is there any easy way to generate the data set?
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Carlos J. Gil Bellosta

2009-Jan-02 10:00 UTC

head link

[R] the first and last observation for each subject

Hello,

First, order your data by ID and time.

The columns you want in your output dataframe are then

unique(ID),

tapply( x, ID, function( z ) z[ 1 ] )

and

tapply( y, ID, function( z ) z[ lenght( z ) ] - z[ 1 ] )

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Fri, 2009-01-02 at 17:20 +0800, gallon li wrote:> I have the following data
> 
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
> 
> x is time invariant, ID is the subject id number, y is changing over time.
> 
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
> 
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
> 
> Is there any easy way to generate the data set?
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jorge Ivan Velez

2009-Jan-02 10:47 UTC

head link

[R] the first and last observation for each subject

Dear Gallon,
Assuming that your data is called "mydata", something like this should
do
the job:

newdf<-data.frame(
           ID = unique(mydata$ID),
            x = unique(mydata$x),
            y = with(mydata,tapply(y,ID,function(m) tail(m,1)-head(m,1)))
       )

newdf

HTH,

Jorge


On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon.li@gmail.com> wrote:
> I have the following data
>
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Gabor Grothendieck

2009-Jan-02 10:56 UTC

head link

[R] the first and last observation for each subject

Try this:
> Lines <- "ID x y time+ 1  10 20 0
+ 1  10 30 1
+ 1 10 40 2
+ 2 12 23 0
+ 2 12 25 1
+ 2 12 28 2
+ 2 12 38 3
+ 3 5 10 0
+ 3 5 15 2"> DF <- read.table(textConnection(Lines), header = TRUE)
> aggregate(DF[3], DF[1:2], function(x) tail(x, 1) - head(x, 1))  ID  x  y
1  3  5  5
2  1 10 20
3  2 12 15


On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon.li at gmail.com>
wrote:> I have the following data
>
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

hadley wickham

2009-Jan-02 13:52 UTC

head link

[R] the first and last observation for each subject

On Fri, Jan 2, 2009 at 3:20 AM, gallon li <gallon.li at gmail.com>
wrote:> I have the following data
>
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
One approach is to use the plyr package, as documented at
http://had.co.nz/plyr.  The basic idea is that your problem is easy to
solve if you have a subset for a single subject value:

one <- subset(DF, ID == 1)
with(one, y[length(y)] - y[1])

The difficulty is splitting up the original dataset in to subjects,
applying the solution to each piece and then joining all the results
back together.  This is what the plyr package does for you:

library(plyr)

# ddply is for splitting up data frames and combining the results
# into a data frame.  .(ID) says to split up the data frame by the subject
# variable
ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1]))

# if you want a more informative variable name in the result
# return a named vector:
ddply(DF, .(ID), function(one) c(diff = with(one, y[length(y)] - y[1])))

# plyr takes care of labelling the result for you.

You don't say why you want to include x, or what to do if x is not
invariant, but here are couple of options:

# Split up by ID and x
ddply(DF, .(ID, x), function(one) c(diff = with(one, y[length(y)] - y[1])))

# Return the first x value
ddply(DF, .(ID), function(one) {
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

# Throw an error is x is not unique

ddply(DF, .(ID), function(one) {
  stopifnot(length(unique(one$x)) == 1)
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

Regards,

Hadley

-- 
http://had.co.nz/

Stavros Macrakis

2009-Jan-02 18:16 UTC

head link

[R] the first and last observation for each subject

I think there's a pretty simple solution here, though probably not the
most efficient:

t(sapply(split(a,a$ID),
    function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y)))))

Using 'unique' instead of min or [[1]] has the advantage that if x is
in fact not time-invariant, this gives an error rather than silently
ignore inconsistencies.

Trying to package up this idiom into a function leads to:

select <-
  function(df, groupby, selection)
   {
     pf <- parent.frame()
     fields <- substitute(selection)
     t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)),
             function(q) eval(fields,q,enclos=pf)))  }

which I admit is rather ugly (and does no error-checking), but it does work:
> select(a,ID,list(min(ID),unique(x),max(y)-min(y)))    [,1] [,2] [,3]
  1 1    10   20
  2 2    12   15
  3 3    5    5

Perhaps some of the more experienced people on the list could show me
how to write this more cleanly.

           -s


On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon.li at gmail.com>
wrote:> I have the following data
>
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

William Dunlap

2009-Jan-05 04:24 UTC

head link

[R] the first and last observation for each subject

> [R] the first and last observation for each subject
> hadley wickham h.wickham at gmail.com
> Fri Jan 2 14:52:42 CET 2009
> 
> On Fri, Jan 2, 2009 at 3:20 AM, gallon li <gallon.li at gmail.com>
wrote:> > I have the following data
> >
> > ID x y time
> > 1  10 20 0
> > 1  10 30 1
> > 1 10 40 2
> > 2 12 23 0
> > 2 12 25 1
> > 2 12 28 2
> > 2 12 38 3
> > 3 5 10 0
> > 3 5 15 2
> > .....
> >
> > x is time invariant, ID is the subject id number, y is changing over
time.> >
> > I want to find out the difference between the first and last
observed y> > value for each subject and get a table like
> >
> > ID x y
> > 1 10 20
> > 2 12 15
> > 3 5 5
> > ......
> >
> > Is there any easy way to generate the data set?
> 
> One approach is to use the plyr package, as documented at
> http://had.co.nz/plyr.  The basic idea is that your problem is easy to
> solve if you have a subset for a single subject value:
> 
> one <- subset(DF, ID == 1)
> with(one, y[length(y)] - y[1])
> 
> The difficulty is splitting up the original dataset in to subjects,
> applying the solution to each piece and then joining all the results
> back together.  This is what the plyr package does for you:
> 
> library(plyr)
> 
> # ddply is for splitting up data frames and combining the results
> # into a data frame.  .(ID) says to split up the data frame by the
subject> # variable
> ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1]))
> ...
The above is much quicker than the versions based on aggregate and
easy to understand.  Another approach is more specialized but useful
when you have lots of ID's (e.g., millions) and speed is very important.
It computes where the first and last entry for each ID in a vectorized
computation, akin to the computation that rle() uses:

f0 <- 
function(DF){
   changes <- DF$ID[-1] != DF$ID[-length(DF$ID)]
   first <- c(TRUE, changes)
   last <- c(changes, TRUE)
   ydiff <- DF$y[last] - DF$y[first]
   DF <- DF[first,]
   DF$y <- ydiff
   DF
}


Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jan 2009 - the first and last observation for each subject

[R] the first and last observation for each subject

[R] Odp: the first and last observation for each subject

[R] the first and last observation for each subject

[R] the first and last observation for each subject

[R] the first and last observation for each subject

[R] the first and last observation for each subject

[R] the first and last observation for each subject

[R] the first and last observation for each subject

Possibly Parallel Threads