Kondamani, Arjun (GMI - NY Corporate Bonds)
2008-Jan-10 22:36 UTC
[R] Extracting last time value
I have a dataframe as follows: Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 For every date, I want to extract the row that has the greatest time. Therefore, ending up like: 20110620 11:39:00 9 20110621 11:41:00 8 20110622 14:07:00 6 I am using for loops (for every date, find largest time value) to do this determination, but I am sure there is a simpler way in "R". Has anyone run into this before? -------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. -------------------------------------------------------- [[alternative HTML version deleted]]
This should do it:> x <- read.table(textConnection("Date time value+ 20110620 11:18:00 7 + 20110620 11:39:00 9 + 20110621 11:41:00 8 + 20110621 11:40:00 6 + 20110622 14:05:00 8 + 20110622 14:06:00 6"), header=TRUE, as.is=TRUE)> > x$time <- as.POSIXct(strptime(x$time, "%H:%M:%S")) > do.call(rbind, lapply(split(x, x$Date), function(.day){+ .day[which.max(.day$time),] + })) Date time value 20110620 20110620 2008-01-10 11:39:00 9 20110621 20110621 2008-01-10 11:41:00 8 20110622 20110622 2008-01-10 14:06:00 6>On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) <arjun_kondamani at ml.com> wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6 > > I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
On Thu, 10 Jan 2008, Kondamani, Arjun (GMI - NY Corporate Bonds) wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6You could use the time series functionality in "zoo" for this: ## data and package x <- read.table(textConnection("Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6"), header = TRUE, as.is = TRUE) library("zoo") ## turn data frame into a time series z <- zoo(x$value, as.POSIXct(strptime(paste(x$Date, x$time), "%Y%m%d %H:%M:%S"))) ## aggregate aggregate(z, as.Date, tail, 1) ## or if you want to retain the associated last time ix <- zoo(1:length(z), time(z)) last.ix <- aggregate(ix, as.Date, tail, 1) z[as.numeric(last.ix)] hth, Z> I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Here is a slight variation of ones you have already received. This also uses zoo. In the DF<- line we read it in as a data frame. In the z<- line we convert it to zoo using chron date/times (see R News 4/1 for more). We calculate the positions of duplicate dates and in the last line we extract the rows corresponding to non-duplicates. Lines <- "Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 " library(zoo) library(chron) DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d"))), DF$time) isdup <- duplicated(dates(time(z)), fromLast = TRUE) z[!isdup,] For more on zoo see: library(zoo) vignette("zoo") vignette("zoo-quickref") On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) <arjun_kondamani at ml.com> wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6 > > I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Kondamani, Arjun (GMI - NY Corporate Bonds)
2008-Jan-11 16:15 UTC
[R] Extracting last time value
Thank you, Gabor, Jim, and Achim!! All of your suggestions/pointers were extremely useful. -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Sent: Thursday, January 10, 2008 7:46 PM To: Kondamani, Arjun (GMI - NY Corporate Bonds) Cc: r-help at r-project.org Subject: Re: [R] Extracting last time value Here is a slight variation of ones you have already received. This also uses zoo. In the DF<- line we read it in as a data frame. In the z<- line we convert it to zoo using chron date/times (see R News 4/1 for more). We calculate the positions of duplicate dates and in the last line we extract the rows corresponding to non-duplicates. Lines <- "Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 " library(zoo) library(chron) DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d"))), DF$time) isdup <- duplicated(dates(time(z)), fromLast = TRUE) z[!isdup,] For more on zoo see: library(zoo) vignette("zoo") vignette("zoo-quickref") On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) <arjun_kondamani at ml.com> wrote:> I have a dataframe as follows: > > Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > > For every date, I want to extract the row that has the greatest time. > Therefore, ending up like: > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110622 14:07:00 6 > > I am using for loops (for every date, find largest time value) to do > this determination, but I am sure there is a simpler way in "R". Has > anyone run into this before? > -------------------------------------------------------- > > This message w/attachments (message) may be privileged, confidential > or proprietary, and if you are not an intended recipient, pleasenotify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.> -------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
There were errors in the code I posted. Here it is again. See my prior post in this thread for comments. Lines <- "Date time value 20110620 11:18:00 7 20110620 11:39:00 9 20110621 11:41:00 8 20110621 11:40:00 6 20110622 14:05:00 8 20110622 14:06:00 6 " library(zoo) library(chron) DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d")), DF$time)) isdup <- duplicated(dates(time(z)), fromLast = TRUE) z[!isdup] and here is what it looks like when you run it:> Lines <- "Date time value+ 20110620 11:18:00 7 + 20110620 11:39:00 9 + 20110621 11:41:00 8 + 20110621 11:40:00 6 + 20110622 14:05:00 8 + 20110622 14:06:00 6 + "> > library(zoo) > library(chron) > DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) > z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d")), DF$time)) > isdup <- duplicated(dates(time(z)), fromLast = TRUE) > z[!isdup](06/20/11 11:39:00) (06/21/11 11:41:00) (06/22/11 14:06:00) 9 8 6 On Jan 10, 2008 7:46 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Here is a slight variation of ones you have already received. This > also uses zoo. > > In the DF<- line we read it in as a data frame. In the z<- line we convert > it to zoo using chron date/times (see R News 4/1 for more). > We calculate the positions of duplicate dates and in the last line we > extract the rows corresponding to non-duplicates. > > Lines <- "Date time value > 20110620 11:18:00 7 > 20110620 11:39:00 9 > 20110621 11:41:00 8 > 20110621 11:40:00 6 > 20110622 14:05:00 8 > 20110622 14:06:00 6 > " > > library(zoo) > library(chron) > DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) > z <- zoo(DF$value, chron(unclass(as.Date(format(DF$Date), "%Y%m%d"))), DF$time) > > isdup <- duplicated(dates(time(z)), fromLast = TRUE) > z[!isdup,] > > For more on zoo see: > > library(zoo) > vignette("zoo") > vignette("zoo-quickref") > > > On Jan 10, 2008 5:36 PM, Kondamani, Arjun (GMI - NY Corporate Bonds) > <arjun_kondamani at ml.com> wrote: > > > I have a dataframe as follows: > > > > Date time value > > 20110620 11:18:00 7 > > 20110620 11:39:00 9 > > 20110621 11:41:00 8 > > 20110621 11:40:00 6 > > 20110622 14:05:00 8 > > 20110622 14:06:00 6 > > > > For every date, I want to extract the row that has the greatest time. > > Therefore, ending up like: > > 20110620 11:39:00 9 > > 20110621 11:41:00 8 > > 20110622 14:07:00 6 > > > > I am using for loops (for every date, find largest time value) to do > > this determination, but I am sure there is a simpler way in "R". Has > > anyone run into this before? > > -------------------------------------------------------- > > > > This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. > > -------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >