Hi, Can someone help for a R question? I have a data set like: Name CheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be Name CheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]]
William Dunlap
2015-Jan-24 00:14 UTC
[R] get latest dates for different people in a dataset
Here is one way. Sort the data.frame, first by Name then break ties with CheckInDate. Then choose the rows that are the last in a run of identical Name values.> txt <- "Name CheckInDate Temp+ John 1/3/2014 97 + Mary 1/3/2014 98.1 + Sam 1/4/2014 97.5 + John 1/4/2014 99"> d <- read.table(header=TRUE,colClasses=c("character","character","numeric"), text=txt)> d$CheckInDate <- as.Date(d$CheckInDate, as.Date, format="%d/%m/%Y") > isEndOfRun <- function(x) c(x[-1] != x[-length(x)], TRUE) > dSorted <- d[order(d$Name, d$CheckInDate), ] > dLatestVisit <- dSorted[isEndOfRun(dSorted$Name), ] > dLatestVisitName CheckInDate Temp 4 John 2014-04-01 99.0 2 Mary 2014-03-01 98.1 3 Sam 2014-04-01 97.5 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard <RTan at panagora.com> wrote:> Hi, > > Can someone help for a R question? > > I have a data set like: > > Name CheckInDate Temp > John 1/3/2014 97 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > John 1/4/2014 99 > > I'd like to return a dataset that for each Name, get the row that is the > latest CheckInDate for that person. For the example above it would be > > Name CheckInDate Temp > John 1/4/2014 99 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > > > Thank you for your help! > > Richard > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> do.call(rbind, lapply(split(data, data$Name), function(x)x[order(x$CheckInDate),][nrow(x),])) Name CheckInDate Temp John John 2014-04-01 99.0 Mary Mary 2014-03-01 98.1 Sam Sam 2014-04-01 97.5 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/23/2015 05:43 PM, Tan, Richard wrote:> Hi, > > Can someone help for a R question? > > I have a data set like: > > Name CheckInDate Temp > John 1/3/2014 97 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > John 1/4/2014 99 > > I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be > > Name CheckInDate Temp > John 1/4/2014 99 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > > > Thank you for your help! > > Richard > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Richard, You could also do it using the package dplyr: dta <- data.frame(Name=c('John','Mary','Sam','John'), CheckInDate=as.Date(c('1/3/2014','1/3/2014','1/4/2014','1/4/2014'), format='%d/%m/%Y'), Temp=c(97,98.1,97.5,99)) library(dplyr) dta %>% group_by(Name) %>% filter(CheckInDate==max(CheckInDate)) Source: local data frame [3 x 3] Groups: Name Name CheckInDate Temp 1 Mary 2014-03-01 98.1 2 Sam 2014-04-01 97.5 3 John 2014-04-01 99.0 On 24 January 2015 at 01:09, Chel Hee Lee <chl948 at mail.usask.ca> wrote:>> do.call(rbind, lapply(split(data, data$Name), function(x) >> x[order(x$CheckInDate),][nrow(x),])) > Name CheckInDate Temp > John John 2014-04-01 99.0 > Mary Mary 2014-03-01 98.1 > Sam Sam 2014-04-01 97.5 >> > > Is this what you are looking for? I hope this helps. > > Chel Hee Lee > > > On 01/23/2015 05:43 PM, Tan, Richard wrote: >> >> Hi, >> >> Can someone help for a R question? >> >> I have a data set like: >> >> Name CheckInDate Temp >> John 1/3/2014 97 >> Mary 1/3/2014 98.1 >> Sam 1/4/2014 97.5 >> John 1/4/2014 99 >> >> I'd like to return a dataset that for each Name, get the row that is the >> latest CheckInDate for that person. For the example above it would be >> >> Name CheckInDate Temp >> John 1/4/2014 99 >> Mary 1/3/2014 98.1 >> Sam 1/4/2014 97.5 >> >> >> Thank you for your help! >> >> Richard >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Göran Broström
2015-Jan-25 09:01 UTC
[R] get latest dates for different people in a dataset
On 2015-01-24 01:14, William Dunlap wrote:> Here is one way. Sort the data.frame, first by Name then break ties with > CheckInDate. > Then choose the rows that are the last in a run of identical Name values.I do it by sorting by the reverse order of CheckinDate (last date first) within Name, then > dLatestVisit <- dSorted[!duplicated(dSorted$Name), ] I guess it is faster, but who knows? G?ran> >> txt <- "Name CheckInDate Temp > + John 1/3/2014 97 > + Mary 1/3/2014 98.1 > + Sam 1/4/2014 97.5 > + John 1/4/2014 99" >> d <- read.table(header=TRUE, > colClasses=c("character","character","numeric"), text=txt) >> d$CheckInDate <- as.Date(d$CheckInDate, as.Date, format="%d/%m/%Y") >> isEndOfRun <- function(x) c(x[-1] != x[-length(x)], TRUE) >> dSorted <- d[order(d$Name, d$CheckInDate), ] >> dLatestVisit <- dSorted[isEndOfRun(dSorted$Name), ] >> dLatestVisit > Name CheckInDate Temp > 4 John 2014-04-01 99.0 > 2 Mary 2014-03-01 98.1 > 3 Sam 2014-04-01 97.5 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard <RTan at panagora.com> wrote: > >> Hi, >> >> Can someone help for a R question? >> >> I have a data set like: >> >> Name CheckInDate Temp >> John 1/3/2014 97 >> Mary 1/3/2014 98.1 >> Sam 1/4/2014 97.5 >> John 1/4/2014 99 >> >> I'd like to return a dataset that for each Name, get the row that is the >> latest CheckInDate for that person. For the example above it would be >> >> Name CheckInDate Temp >> John 1/4/2014 99 >> Mary 1/3/2014 98.1 >> Sam 1/4/2014 97.5 >> >> >> Thank you for your help! >> >> Richard >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you! -----Original Message----- From: Chel Hee Lee [mailto:chl948 at mail.usask.ca] Sent: Friday, January 23, 2015 8:09 PM To: Tan, Richard; 'r-help at R-project.org' Subject: Re: [R] get latest dates for different people in a dataset > do.call(rbind, lapply(split(data, data$Name), function(x) x[order(x$CheckInDate),][nrow(x),])) Name CheckInDate Temp John John 2014-04-01 99.0 Mary Mary 2014-03-01 98.1 Sam Sam 2014-04-01 97.5 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/23/2015 05:43 PM, Tan, Richard wrote:> Hi, > > Can someone help for a R question? > > I have a data set like: > > Name CheckInDate Temp > John 1/3/2014 97 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > John 1/4/2014 99 > > I'd like to return a dataset that for each Name, get the row that is > the latest CheckInDate for that person. For the example above it > would be > > Name CheckInDate Temp > John 1/4/2014 99 > Mary 1/3/2014 98.1 > Sam 1/4/2014 97.5 > > > Thank you for your help! > > Richard > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you! From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Friday, January 23, 2015 7:14 PM To: Tan, Richard Cc: r-help at R-project.org Subject: Re: [R] get latest dates for different people in a dataset Here is one way. Sort the data.frame, first by Name then break ties with CheckInDate. Then choose the rows that are the last in a run of identical Name values.> txt <- "Name CheckInDate Temp+ John 1/3/2014 97 + Mary 1/3/2014 98.1 + Sam 1/4/2014 97.5 + John 1/4/2014 99"> d <- read.table(header=TRUE, colClasses=c("character","character","numeric"), text=txt) > d$CheckInDate <- as.Date(d$CheckInDate, as.Date, format="%d/%m/%Y") > isEndOfRun <- function(x) c(x[-1] != x[-length(x)], TRUE) > dSorted <- d[order(d$Name, d$CheckInDate), ] > dLatestVisit <- dSorted[isEndOfRun(dSorted$Name), ] > dLatestVisitName CheckInDate Temp 4 John 2014-04-01 99.0 2 Mary 2014-03-01 98.1 3 Sam 2014-04-01 97.5 Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard <RTan at panagora.com<mailto:RTan at panagora.com>> wrote: Hi, Can someone help for a R question? I have a data set like: Name CheckInDate Temp John 1/3/2014 97 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 John 1/4/2014 99 I'd like to return a dataset that for each Name, get the row that is the latest CheckInDate for that person. For the example above it would be Name CheckInDate Temp John 1/4/2014 99 Mary 1/3/2014 98.1 Sam 1/4/2014 97.5 Thank you for your help! Richard [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]