Kondamani, Arjun (GMI - NY Corporate Bonds)
2008-Jan-10 22:36 UTC
[R] Extracting last time value
I have a dataframe as follows: Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 For every date, I want to extract the row that has the greatest time. Therefore, ending up like: 20110620 11:39:00 9 20110621 11:41:00 8 20110622 14:07:00 6 I am using for loops (for every date, find largest time value) to do this determination, but I am sure there is a simpler way in "R". Has anyone run into this before? -------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. -------------------------------------------------------- [[alternative HTML version deleted]]
This should do it:> x <- read.table(textConnection("Date time value+ 20110620 11:18:00 7 + 20110620 11:39:00 9 + 20110621 11:41:00 8 + 20110621 11:40:00 6 + 20110622 14:05:00 8 + 20110622 14:06:00 6"), header=TRUE, as.is=TRUE)> > x$time <- as.POSIXct(strptime(x$time, "%H:%M:%S")) > do.call(rbind, lapply(split(x, x$Date), function(.day){+ .day[which.max(.day$time),] + })) Date time value 20110620 20110620 2008-01-10 11:39:00 9 20110621 20110621 2008-01-10 11:41:00 8 20110622 20110622 2008-01-10 14:06:00 6>On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) <arjun_kondamani at ml.com> wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6 > > I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
On Thu, 10 Jan 2008, Kondamani, Arjun (GMI - NY Corporate Bonds) wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6You could use the time series functionality in "zoo" for this: ## data and package x <- read.table(textConnection("Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6"), header = TRUE, as.is = TRUE) library("zoo") ## turn data frame into a time series z <- zoo(x$value, as.POSIXct(strptime(paste(x$Date, x$time), "%Y%m%d %H:%M:%S"))) ## aggregate aggregate(z, as.Date, tail, 1) ## or if you want to retain the associated last time ix <- zoo(1:length(z), time(z)) last.ix <- aggregate(ix, as.Date, tail, 1) z[as.numeric(last.ix)] hth, Z> I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Here is a slight variation of ones you have already received. This
also uses zoo.
In the DF<- line we read it in as a data frame. In the z<- line we
convert
it to zoo using chron date/times (see R News 4/1 for more).
We calculate the positions of duplicate dates and in the last line we
extract the rows corresponding to non-duplicates.
Lines <- "Date time value
20110620 11:18:00 7
20110620 11:39:00 9
20110621 11:41:00 8
20110621 11:40:00 6
20110622 14:05:00 8
20110622 14:06:00 6
"
library(zoo)
library(chron)
DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date),
"%Y%m%d"))), DF$time)
isdup <- duplicated(dates(time(z)), fromLast = TRUE)
z[!isdup,]
For more on zoo see:
library(zoo)
vignette("zoo")
vignette("zoo-quickref")
On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds)
<arjun_kondamani at ml.com> wrote:> I have a dataframe as follows:
>
> Date time value
> 20110620 11:18:00 7
> 20110620 11:39:00 9
> 20110621 11:41:00 8
> 20110621 11:40:00 6
> 20110622 14:05:00 8
> 20110622 14:06:00 6
>
> For every date, I want to extract the row that has the greatest time.
> Therefore, ending up like:
> 20110620 11:39:00 9
> 20110621 11:41:00 8
> 20110622 14:07:00 6
>
> I am using for loops (for every date, find largest time value) to do
> this determination, but I am sure there is a simpler way in "R".
Has
> anyone run into this before?
> --------------------------------------------------------
>
> This message w/attachments (message) may be privileged, confidential or
proprietary, and if you are not an intended recipient, please notify the sender,
do not use or share it and delete it. Unless specifically indicated, this
message is not an offer to sell or a solicitation of any investment products or
other financial product or service, an official confirmation of any transaction,
or an official statement of Merrill Lynch. Subject to applicable law, Merrill
Lynch may monitor, review and retain e-communications (EC) traveling through its
networks/systems. The laws of the country of each sender/recipient may impact
the handling of EC, and EC may be archived, supervised and produced in countries
other than the country in which you are located. This message cannot be
guaranteed to be secure or error-free. This message is subject to terms
available at the following link: http://www.ml.com/e-communications_terms/. By
messaging with Merrill Lynch you consent to the foregoing.
> --------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Kondamani, Arjun (GMI - NY Corporate Bonds)
2008-Jan-11 16:15 UTC
[R] Extracting last time value
Thank you, Gabor, Jim, and Achim!! All of your suggestions/pointers were
extremely useful.
-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
Sent: Thursday, January 10, 2008 7:46 PM
To: Kondamani, Arjun (GMI - NY Corporate Bonds)
Cc: r-help at r-project.org
Subject: Re: [R] Extracting last time value
Here is a slight variation of ones you have already received. This also
uses zoo.
In the DF<- line we read it in as a data frame. In the z<- line we
convert it to zoo using chron date/times (see R News 4/1 for more). We
calculate the positions of duplicate dates and in the last line we
extract the rows corresponding to non-duplicates.
Lines <- "Date time value
20110620 11:18:00 7
20110620 11:39:00 9
20110621 11:41:00 8
20110621 11:40:00 6
20110622 14:05:00 8
20110622 14:06:00 6
"
library(zoo)
library(chron)
DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) z
<- zoo(DF$value, chron(unclass(as.Date(format(DF$Date),
"%Y%m%d"))),
DF$time)
isdup <- duplicated(dates(time(z)), fromLast = TRUE)
z[!isdup,]
For more on zoo see:
library(zoo)
vignette("zoo")
vignette("zoo-quickref")
On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds)
<arjun_kondamani at ml.com> wrote:> I have a dataframe as follows:
>
> Date time value
> 20110620 11:18:00 7
> 20110620 11:39:00 9
> 20110621 11:41:00 8
> 20110621 11:40:00 6
> 20110622 14:05:00 8
> 20110622 14:06:00 6
>
> For every date, I want to extract the row that has the greatest time.
> Therefore, ending up like:
> 20110620 11:39:00 9
> 20110621 11:41:00 8
> 20110622 14:07:00 6
>
> I am using for loops (for every date, find largest time value) to do
> this determination, but I am sure there is a simpler way in "R".
Has
> anyone run into this before?
> --------------------------------------------------------
>
> This message w/attachments (message) may be privileged, confidential
> or proprietary, and if you are not an intended recipient, please
notify the sender, do not use or share it and delete it. Unless
specifically indicated, this message is not an offer to sell or a
solicitation of any investment products or other financial product or
service, an official confirmation of any transaction, or an official
statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may
monitor, review and retain e-communications (EC) traveling through its
networks/systems. The laws of the country of each sender/recipient may
impact the handling of EC, and EC may be archived, supervised and
produced in countries other than the country in which you are located.
This message cannot be guaranteed to be secure or error-free. This
message is subject to terms available at the following link:
http://www.ml.com/e-communications_terms/. By messaging with Merrill
Lynch you consent to the foregoing.> --------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--------------------------------------------------------
This message w/attachments (message) may be privileged, confidential or
proprietary, and if you are not an intended recipient, please notify the sender,
do not use or share it and delete it. Unless specifically indicated, this
message is not an offer to sell or a solicitation of any investment products or
other financial product or service, an official confirmation of any transaction,
or an official statement of Merrill Lynch. Subject to applicable law, Merrill
Lynch may monitor, review and retain e-communications (EC) traveling through its
networks/systems. The laws of the country of each sender/recipient may impact
the handling of EC, and EC may be archived, supervised and produced in countries
other than the country in which you are located. This message cannot be
guaranteed to be secure or error-free. This message is subject to terms
available at the following link: http://www.ml.com/e-communications_terms/. By
messaging with Merrill Lynch you consent to the foregoing.
There were errors in the code I posted. Here it is again. See my prior post in this thread for comments. Lines <- "Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 " library(zoo) library(chron) DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d")), DF$time)) isdup <- duplicated(dates(time(z)), fromLast = TRUE) z[!isdup] and here is what it looks like when you run it:> Lines <- "Date time value+ 20110620 11:18:00 7 + 20110620 11:39:00 9 + 20110621 11:41:00 8 + 20110621 11:40:00 6 + 20110622 14:05:00 8 + 20110622 14:06:00 6 + "> > library(zoo) > library(chron) > DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) > z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d")), DF$time)) > isdup <- duplicated(dates(time(z)), fromLast = TRUE) > z[!isdup](06/20/11 11:39:00) (06/21/11 11:41:00) (06/22/11 14:06:00) 9 8 6 On Jan 10, 2008 7:46 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Here is a slight variation of ones you have already received. This > also uses zoo. > > In the DF<- line we read it in as a data frame. In the z<- line we convert > it to zoo using chron date/times (see R News 4/1 for more). > We calculate the positions of duplicate dates and in the last line we > extract the rows corresponding to non-duplicates. > > Lines <- "Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > " > > library(zoo) > library(chron) > DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) > z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d"))), DF$time) > > isdup <- duplicated(dates(time(z)), fromLast = TRUE) > z[!isdup,] > > For more on zoo see: > > library(zoo) > vignette("zoo") > vignette("zoo-quickref") > > > On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) > <arjun_kondamani at ml.com> wrote: > > > I have a dataframe as follows: > > > > Date time value > > 20110620 11:18:00 7 > > 20110620 11:39:00 9 > > 20110621 11:41:00 8 > > 20110621 11:40:00 6 > > 20110622 14:05:00 8 > > 20110622 14:06:00 6 > > > > For every date, I want to extract the row that has the greatest time. > > Therefore, ending up like: > > 20110620 11:39:00 9 > > 20110621 11:41:00 8 > > 20110622 14:07:00 6 > > > > I am using for loops (for every date, find largest time value) to do > > this determination, but I am sure there is a simpler way in "R". Has > > anyone run into this before? > > -------------------------------------------------------- > > > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > > -------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >