thr3ads.net - R help - [R] subset and as.POSIXct / as.POSIXlt oddness [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Michael Bach

2011-Mar-24 13:29 UTC

[R] subset and as.POSIXct / as.POSIXlt oddness

Dear R users,

Given this data:

x <- seq(1,100,1)
dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
dfx <- data.frame(dx)

Now to play around for example:

subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))

Ok. Now for some reason I want to extract the datapoints between hours
10:00:00 and 14:00:00, so I thought well:

subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 >
as.POSIXlt(dx)$hour
& as.POSIXlt(dx)$hour < 10)
Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied

Well that did not work. But why does the following work?

14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10

Is there something I miss about subset()? Or is there even another way of
aggregating over an hourly time interval in a nicer way?

Best Regards,
Michael Bach

	[[alternative HTML version deleted]]

David Winsemius

2011-Mar-24 13:44 UTC

head link

[R] subset and as.POSIXct / as.POSIXlt oddness

On Mar 24, 2011, at 9:29 AM, Michael Bach wrote:
> Dear R users,
>
> Given this data:
>
> x <- seq(1,100,1)
> dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx <- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 >  
> as.POSIXlt(dx)$hour
> & as.POSIXlt(dx)$hour < 10)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
>
> Well that did not work. But why does the following work?
>
> 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10
>
> Is there something I miss about subset()? Or is there even another  
> way of
> aggregating over an hourly time interval in a nicer way?
I'm not sure what problem is odccuring with your method. The way I  
would have done it worked. The findInterval function also seemed to  
allow classification by intervals of 3600 seconds:

 > subset(dfx, dx > as.POSIXct("2007-06-01 10:00:00") & dx
<
as.POSIXct("2007-06-01 14:00:00"))
                     dx
41 2007-06-01 10:15:00
42 2007-06-01 10:30:00
43 2007-06-01 10:45:00
44 2007-06-01 11:00:00
45 2007-06-01 11:15:00
46 2007-06-01 11:30:00
47 2007-06-01 11:45:00
48 2007-06-01 12:00:00
49 2007-06-01 12:15:00
50 2007-06-01 12:30:00
51 2007-06-01 12:45:00
52 2007-06-01 13:00:00
53 2007-06-01 13:15:00
54 2007-06-01 13:30:00
55 2007-06-01 13:45:00

 > findInterval(dfx$dx, c( as.numeric(range(dfx$dx)[1]  
+(1:24)*3600) )  )
   [1]  0  0  0  0  1  1  1  1  2  2  2  2  3  3  3  3  4  4  4  4  5   
5  5  5  6  6  6  6  7
  [30]  7  7  7  8  8  8  8  9  9  9  9 10 10 10 10 11 11 11 11 12 12  
12 12 13 13 13 13 14 14
  [59] 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 18 19 19 19  
19 20 20 20 20 21 21 21
  [88] 21 22 22 22 22 23 23 23 23 24 24 24 24
>
> Best Regards,
> Michael Bach
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Kenn Konstabel

2011-Mar-24 14:30 UTC

head link

[R] subset and as.POSIXct / as.POSIXlt oddness

On Thu, Mar 24, 2011 at 1:29 PM, Michael Bach <phaebz at gmail.com>
wrote:> Dear R users,
>
> Given this data:
>
> x <- seq(1,100,1)
> dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx <- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 >
as.POSIXlt(dx)$hour
> & as.POSIXlt(dx)$hour < 10)
did you mean

subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00") & 14 >
as.POSIXlt(dx)$hour
 & as.POSIXlt(dx)$hour < 10)
# "&" instead of ","

I didn't completely "parse" the meaning of these conditions  but
the
way you have it, there are three arguments to subset, first two as
expected but the third one (select) would be for selecting columns and
you have just one in your data frame. (?subset)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
>
> Well that did not work. But why does the following work?
>
> 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10
>
> Is there something I miss about subset()? Or is there even another way of
> aggregating over an hourly time interval in a nicer way?
>
> Best Regards,
> Michael Bach
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jeff Newmiller

2011-Mar-24 14:52 UTC

head link

[R] subset and as.POSIXct / as.POSIXlt oddness

On 03/24/2011 06:29 AM, Michael Bach wrote:> Dear R users,
>
> Given this data:
>
> x<- seq(1,100,1)
> dx<- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx<- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx>  as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx>  as.POSIXct("2007-06-01 16:00:00"), 14> 
as.POSIXlt(dx)$hour
> &  as.POSIXlt(dx)$hour<  10)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
>
> Well that did not work. But why does the following work?
>
> 14>  as.POSIXlt(dx)$hour&  as.POSIXlt(dx)$hour<  10
>
It does work. Try it.
> Is there something I miss about subset()?
You have given three arguments to subset.  Your third argument is a poor 
choice for selecting columns. Try:

subset(dfx, dx>  as.POSIXct("2007-06-01 16:00:00")&  14> 
as.POSIXlt(dx)$hour
&  as.POSIXlt(dx)$hour<  10)

or better yet,

tmp<- as.POSIXlt( dfx$dx )

subset(dfx, dx>  as.POSIXct("2007-06-01 16:00:00")&  14> 
tmp$hour&  tmp$hour<  10)

since the as.POSIXlt is a rather heavyweight operation.

>   Or is there even another way of
> aggregating over an hourly time interval in a nicer way?
This is not aggregation.  This is selection. It is only when you summarize 
the selected data that you are aggregating.

Normally, the term aggregating is applied when you use a grouping column and 
collapse many values with the same characteristics into one value per set of 
characteristics.  For example using base functions,

dfx$interval <- cut(tmp$hour,c(-1,10,14,24))
aggregate(dfx$dx,list(Interval=dfx$interval),length)

or

aggregate(dfx$dx,list(Hour=tmp$hour),length)

but I find that the plyr library is much more user-friendly than aggregate.
> Best Regards,
> Michael Bach
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Hadley Wickham

2011-Mar-24 15:18 UTC

head link

[R] subset and as.POSIXct / as.POSIXlt oddness

On Thu, Mar 24, 2011 at 8:29 AM, Michael Bach <phaebz at gmail.com>
wrote:> Dear R users,
>
> Given this data:
>
> x <- seq(1,100,1)
> dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx <- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 >
as.POSIXlt(dx)$hour
> & as.POSIXlt(dx)$hour < 10)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
As others have noted you used a , instead of &.  I wanted to point out
that this is a little easier to express with the lubridate package:

subset(dfx, dx > ymd("2007-06-01") & hour(dx) > 14 &
hour(x) < 10)

but I presume you meant:

subset(dfx, dx > ymd("2007-06-01") & hour(dx) > 10 &
hour(x) < 14)

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Gabor Grothendieck

2011-Mar-24 16:09 UTC

head link

[R] subset and as.POSIXct / as.POSIXlt oddness

On Thu, Mar 24, 2011 at 9:29 AM, Michael Bach <phaebz at gmail.com>
wrote:> Dear R users,
>
> Given this data:
>
> x <- seq(1,100,1)
> dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx <- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 >
as.POSIXlt(dx)$hour
> & as.POSIXlt(dx)$hour < 10)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
>
> Well that did not work. But why does the following work?
>
> 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10
>
> Is there something I miss about subset()? Or is there even another way of
> aggregating over an hourly time interval in a nicer way?
>
Here is yet another solution:

hr <- function(x) as.numeric(format(x, "%H"))
subset(dfx, as.Date(dx) > "2007-06-01" & hr(dx) > 10 &
hr(dx) < 14)

Although that seems to be what you asked for perhaps you really meant
to include 10:00 and 14:00.  In that case, since we have data at a
granularity of one minute try this:

hhmm <- function(x) as.numeric(format(x, "%H%M"))
subset(dfx, as.Date(dx) > "2007-06-01" & hhmm(dx) >= 1000
& hhmm(dx) <= 1400)

Note that the above calculate days and hours relative to the current
time zone.  Since your data seems not to have time zones you may be
better off using chron rather than POSIXct to avoid potential time
zone errors.   In that case see R News 4/1 and its references and note
the availability of the hours() and related functions.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Mar 2011 - subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

[R] subset and as.POSIXct / as.POSIXlt oddness

Possibly Parallel Threads