Dear R users, Given this data: x <- seq(1,100,1) dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00") dfx <- data.frame(dx) Now to play around for example: subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00")) Ok. Now for some reason I want to extract the datapoints between hours 10:00:00 and 14:00:00, so I thought well: subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10) Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied Well that did not work. But why does the following work? 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10 Is there something I miss about subset()? Or is there even another way of aggregating over an hourly time interval in a nicer way? Best Regards, Michael Bach [[alternative HTML version deleted]]
On Mar 24, 2011, at 9:29 AM, Michael Bach wrote:> Dear R users, > > Given this data: > > x <- seq(1,100,1) > dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00") > dfx <- data.frame(dx) > > Now to play around for example: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00")) > > Ok. Now for some reason I want to extract the datapoints between hours > 10:00:00 and 14:00:00, so I thought well: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > > as.POSIXlt(dx)$hour > & as.POSIXlt(dx)$hour < 10) > Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied > > Well that did not work. But why does the following work? > > 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10 > > Is there something I miss about subset()? Or is there even another > way of > aggregating over an hourly time interval in a nicer way?I'm not sure what problem is odccuring with your method. The way I would have done it worked. The findInterval function also seemed to allow classification by intervals of 3600 seconds: > subset(dfx, dx > as.POSIXct("2007-06-01 10:00:00") & dx < as.POSIXct("2007-06-01 14:00:00")) dx 41 2007-06-01 10:15:00 42 2007-06-01 10:30:00 43 2007-06-01 10:45:00 44 2007-06-01 11:00:00 45 2007-06-01 11:15:00 46 2007-06-01 11:30:00 47 2007-06-01 11:45:00 48 2007-06-01 12:00:00 49 2007-06-01 12:15:00 50 2007-06-01 12:30:00 51 2007-06-01 12:45:00 52 2007-06-01 13:00:00 53 2007-06-01 13:15:00 54 2007-06-01 13:30:00 55 2007-06-01 13:45:00 > findInterval(dfx$dx, c( as.numeric(range(dfx$dx)[1] +(1:24)*3600) ) ) [1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 [30] 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 [59] 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 18 19 19 19 19 20 20 20 20 21 21 21 [88] 21 22 22 22 22 23 23 23 23 24 24 24 24> > Best Regards, > Michael Bach > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Thu, Mar 24, 2011 at 1:29 PM, Michael Bach <phaebz at gmail.com> wrote:> Dear R users, > > Given this data: > > x <- seq(1,100,1) > dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00") > dfx <- data.frame(dx) > > Now to play around for example: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00")) > > Ok. Now for some reason I want to extract the datapoints between hours > 10:00:00 and 14:00:00, so I thought well: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > as.POSIXlt(dx)$hour > & as.POSIXlt(dx)$hour < 10)did you mean subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00") & 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10) # "&" instead of "," I didn't completely "parse" the meaning of these conditions but the way you have it, there are three arguments to subset, first two as expected but the third one (select) would be for selecting columns and you have just one in your data frame. (?subset)> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied > > Well that did not work. But why does the following work? > > 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10 > > Is there something I miss about subset()? Or is there even another way of > aggregating over an hourly time interval in a nicer way? > > Best Regards, > Michael Bach > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 03/24/2011 06:29 AM, Michael Bach wrote:> Dear R users, > > Given this data: > > x<- seq(1,100,1) > dx<- as.POSIXct(x*900, origin="2007-06-01 00:00:00") > dfx<- data.frame(dx) > > Now to play around for example: > > subset(dfx, dx> as.POSIXct("2007-06-01 16:00:00")) > > Ok. Now for some reason I want to extract the datapoints between hours > 10:00:00 and 14:00:00, so I thought well: > > subset(dfx, dx> as.POSIXct("2007-06-01 16:00:00"), 14> as.POSIXlt(dx)$hour > & as.POSIXlt(dx)$hour< 10) > Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied > > Well that did not work. But why does the following work? > > 14> as.POSIXlt(dx)$hour& as.POSIXlt(dx)$hour< 10 >It does work. Try it.> Is there something I miss about subset()?You have given three arguments to subset. Your third argument is a poor choice for selecting columns. Try: subset(dfx, dx> as.POSIXct("2007-06-01 16:00:00")& 14> as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour< 10) or better yet, tmp<- as.POSIXlt( dfx$dx ) subset(dfx, dx> as.POSIXct("2007-06-01 16:00:00")& 14> tmp$hour& tmp$hour< 10) since the as.POSIXlt is a rather heavyweight operation.> Or is there even another way of > aggregating over an hourly time interval in a nicer way?This is not aggregation. This is selection. It is only when you summarize the selected data that you are aggregating. Normally, the term aggregating is applied when you use a grouping column and collapse many values with the same characteristics into one value per set of characteristics. For example using base functions, dfx$interval <- cut(tmp$hour,c(-1,10,14,24)) aggregate(dfx$dx,list(Interval=dfx$interval),length) or aggregate(dfx$dx,list(Hour=tmp$hour),length) but I find that the plyr library is much more user-friendly than aggregate.> Best Regards, > Michael Bach > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Thu, Mar 24, 2011 at 8:29 AM, Michael Bach <phaebz at gmail.com> wrote:> Dear R users, > > Given this data: > > x <- seq(1,100,1) > dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00") > dfx <- data.frame(dx) > > Now to play around for example: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00")) > > Ok. Now for some reason I want to extract the datapoints between hours > 10:00:00 and 14:00:00, so I thought well: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > as.POSIXlt(dx)$hour > & as.POSIXlt(dx)$hour < 10) > Error in as.POSIXlt.numeric(dx) : 'origin' must be suppliedAs others have noted you used a , instead of &. I wanted to point out that this is a little easier to express with the lubridate package: subset(dfx, dx > ymd("2007-06-01") & hour(dx) > 14 & hour(x) < 10) but I presume you meant: subset(dfx, dx > ymd("2007-06-01") & hour(dx) > 10 & hour(x) < 14) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On Thu, Mar 24, 2011 at 9:29 AM, Michael Bach <phaebz at gmail.com> wrote:> Dear R users, > > Given this data: > > x <- seq(1,100,1) > dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00") > dfx <- data.frame(dx) > > Now to play around for example: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00")) > > Ok. Now for some reason I want to extract the datapoints between hours > 10:00:00 and 14:00:00, so I thought well: > > subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > as.POSIXlt(dx)$hour > & as.POSIXlt(dx)$hour < 10) > Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied > > Well that did not work. But why does the following work? > > 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10 > > Is there something I miss about subset()? Or is there even another way of > aggregating over an hourly time interval in a nicer way? >Here is yet another solution: hr <- function(x) as.numeric(format(x, "%H")) subset(dfx, as.Date(dx) > "2007-06-01" & hr(dx) > 10 & hr(dx) < 14) Although that seems to be what you asked for perhaps you really meant to include 10:00 and 14:00. In that case, since we have data at a granularity of one minute try this: hhmm <- function(x) as.numeric(format(x, "%H%M")) subset(dfx, as.Date(dx) > "2007-06-01" & hhmm(dx) >= 1000 & hhmm(dx) <= 1400) Note that the above calculate days and hours relative to the current time zone. Since your data seems not to have time zones you may be better off using chron rather than POSIXct to avoid potential time zone errors. In that case see R News 4/1 and its references and note the availability of the hours() and related functions. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com