Hi all, I have a simple data frame, first list is a list of dates (in "%Y-%m-%d" format) and second list an observation on that particular date. There might not be observations everyday. Let's just say there are no observations on saturdays and sundays. Now I want to select the first observation of every month into a list. Is there an easy way to do that? Date Observation ---- ----------- 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 The result I need is the data frame 2007-05-21 10 2007-04-07 30 2007-03-05 10 or I am equally happy with just the vector c(10, 30, 10) I am new to R and after going through the manuals and the documentation I can gather, I have come up with a convoluted way of doing it 1) I first get the Date into a vector. (I am articificially reproducing this vector below and call it A) > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date ("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date ("2007-04-07"), as.Date("2007-03-05")) > A [1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" [6] "2007-04-07" "2007-03-05" 2) use cut with breaks falling on the months > B<-cut(A, breaks="month") > B [1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 [7] 2007-03-01 Levels: 2007-03-01 2007-04-01 2007-05-01 3) then split to get a list of vectors group by the boundary of the date > C<-split(A, B) > C $`2007-03-01` [1] "2007-03-05" $`2007-04-01` [1] "2007-04-10" "2007-04-09" "2007-04-07" $`2007-05-01` [1] "2007-05-23" "2007-05-22" "2007-05-21" 4) in a for loop I loop through the elements within the list (the elements are vectors of dates) with each vector I find the minimum and concatentate it to a final vector D > D<-numeric() > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))} > class(D)<-"Date" > D [1] "2007-03-05" "2007-04-07" "2007-05-21" Next with D, I then go back and find out the positions of the elements in D within A. And then use the result as an index vector into the vector of observations (which is not shown here) I feel sure I am doing it the stupid way (or the procedural way) Is there a more declarative way of doing it? Any pointers will be greatly appreciated! Thanks a lot in advance, Albert Pang [[alternative HTML version deleted]]
Here is one way of doing it:> x <- "Date Observation+ 2007-05-23 20 + 2007-05-22 30 + 2007-05-21 10 + 2007-04-10 50 + 2007-04-09 40 + 2007-04-07 30 + 2007-03-05 10"> x <- read.table(textConnection(x), header=TRUE,+ colClasses=c("POSIXct", "integer"))> # split the data by year-month and find the minimum day > minDay <- lapply(split(x, cut(x$Date, breaks='month')), function(month){+ month[which.min(month$Date),] # minimum date in the month + })> do.call('rbind', minDay) # put it back together in a dataframeDate Observation 2007-03-01 2007-03-05 10 2007-04-01 2007-04-07 30 2007-05-01 2007-05-21 10>On 5/27/07, Albert Pang <albert.pang@mac.com> wrote:> > Hi all, I have a simple data frame, first list is a list of dates (in > "%Y-%m-%d" format) and second list an observation on that particular > date. There might not be observations everyday. Let's just say > there are no observations on saturdays and sundays. Now I want to > select the first observation of every month into a list. Is there an > easy way to do that? > > Date Observation > ---- ----------- > 2007-05-23 20 > 2007-05-22 30 > 2007-05-21 10 > > 2007-04-10 50 > 2007-04-09 40 > 2007-04-07 30 > > 2007-03-05 10 > > The result I need is the data frame > > 2007-05-21 10 > 2007-04-07 30 > 2007-03-05 10 > > or I am equally happy with just the vector c(10, 30, 10) > > I am new to R and after going through the manuals and the > documentation I can gather, I have come up with a convoluted way of > doing it > > 1) I first get the Date into a vector. (I am articificially > reproducing this vector below and call it A) > > > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date > ("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date > ("2007-04-07"), as.Date("2007-03-05")) > > A > [1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" > [6] "2007-04-07" "2007-03-05" > > > 2) use cut with breaks falling on the months > > > B<-cut(A, breaks="month") > > B > [1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 > [7] 2007-03-01 > Levels: 2007-03-01 2007-04-01 2007-05-01 > > > 3) then split to get a list of vectors group by the boundary of the > date > > > C<-split(A, B) > > C > $`2007-03-01` > [1] "2007-03-05" > > $`2007-04-01` > [1] "2007-04-10" "2007-04-09" "2007-04-07" > > $`2007-05-01` > [1] "2007-05-23" "2007-05-22" "2007-05-21" > > > 4) in a for loop I loop through the elements within the list (the > elements are vectors of dates) with each vector I find the minimum > and concatentate it to a final vector D > > > D<-numeric() > > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))} > > class(D)<-"Date" > > D > [1] "2007-03-05" "2007-04-07" "2007-05-21" > > Next with D, I then go back and find out the positions of the > elements in D within A. And then use the result as an index vector > into the vector of observations (which is not shown here) I feel > sure I am doing it the stupid way (or the procedural way) > > Is there a more declarative way of doing it? Any pointers will be > greatly appreciated! > > Thanks a lot in advance, > > Albert Pang > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Gabor Grothendieck
2007-May-27 14:48 UTC
[R] Looking for the first observation within the month
Use the zoo package to represent data like this. Here time(z) is a vector of the dates and as.yearmon(time(z)) is the year/month of each date. With FUN=head1, ave picks out the first date in any month and aggregate then aggregates over all values in the same year/month choosing the first one. Lines <- "Date Observation 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 " library(zoo) # z <- read.zoo("myfile.dat", header = TRUE) z <- read.zoo(textConnection(Lines), header = TRUE) head1 <- function(x, n = 1) head(x, n) aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1) For more on zoo try: library(zoo) vignette("zoo") and also read the Help Desk article in R News 4/1 about dates. On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:> Hi all, I have a simple data frame, first list is a list of dates (in > "%Y-%m-%d" format) and second list an observation on that particular > date. There might not be observations everyday. Let's just say > there are no observations on saturdays and sundays. Now I want to > select the first observation of every month into a list. Is there an > easy way to do that? > > Date Observation > ---- ----------- > 2007-05-23 20 > 2007-05-22 30 > 2007-05-21 10 > > 2007-04-10 50 > 2007-04-09 40 > 2007-04-07 30 > > 2007-03-05 10 > > The result I need is the data frame > > 2007-05-21 10 > 2007-04-07 30 > 2007-03-05 10 > > or I am equally happy with just the vector c(10, 30, 10) > > I am new to R and after going through the manuals and the > documentation I can gather, I have come up with a convoluted way of > doing it > > 1) I first get the Date into a vector. (I am articificially > reproducing this vector below and call it A) > > > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date > ("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date > ("2007-04-07"), as.Date("2007-03-05")) > > A > [1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" > [6] "2007-04-07" "2007-03-05" > > > 2) use cut with breaks falling on the months > > > B<-cut(A, breaks="month") > > B > [1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 > [7] 2007-03-01 > Levels: 2007-03-01 2007-04-01 2007-05-01 > > > 3) then split to get a list of vectors group by the boundary of the > date > > > C<-split(A, B) > > C > $`2007-03-01` > [1] "2007-03-05" > > $`2007-04-01` > [1] "2007-04-10" "2007-04-09" "2007-04-07" > > $`2007-05-01` > [1] "2007-05-23" "2007-05-22" "2007-05-21" > > > 4) in a for loop I loop through the elements within the list (the > elements are vectors of dates) with each vector I find the minimum > and concatentate it to a final vector D > > > D<-numeric() > > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))} > > class(D)<-"Date" > > D > [1] "2007-03-05" "2007-04-07" "2007-05-21" > > Next with D, I then go back and find out the positions of the > elements in D within A. And then use the result as an index vector > into the vector of observations (which is not shown here) I feel > sure I am doing it the stupid way (or the procedural way) > > Is there a more declarative way of doing it? Any pointers will be > greatly appreciated! > > Thanks a lot in advance, > > Albert Pang > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >