thr3ads.net - R help - [R] Subset of time observations where timediff

If this information is useful, please help other people find it:
Share via:

Karl Ove Hufthammer

2009-Dec-07 15:57 UTC

[R] Subset of time observations where timediff > 60 secs

Dear list members

I have a rather large vector (part of a data frame) giving the time 
(date + time, POSIXct) of observations. The times are irregular (with 
both small and large jumps) but increasing, and there are several 
millions of them.

I now wish to reduce my data set, so that I only have observations which 
are at least (for example) 60 seconds apart. Basically, I need (all) the 
indices of my time variable where the difference in times are at least 
60 seconds.

I thought this would be a rather simple task, but perhaps I'm tired, for 
I couldn't figure out how to do it in a even moderately elegant way (not 
looping over all the values, which is quite slow).

This solution seemed sensible:

x=cumsum(diff(timevar) %/% 60)
ind=c(1,cumsum(rle(x)$lengths)+1) # And perhaps removing the last value

but doesn't work, as it only captures the 'first times' in each 
60-second interval following the first time value, and thus may include 
times with values that are closer than 60 seconds.

I also considered round.POSIXct and trunc.POSIXct, but these are not 
appropriate either, for obvious reasons.

So, any ideas how to do this in an elegant and efficient way?

-- 
Karl Ove Hufthammer

Gabor Grothendieck

2009-Dec-07 17:27 UTC

head link

[R] Subset of time observations where timediff > 60 secs

There is an example at the end of the Prices and Returns section of
the zoo-quickref vignette in the zoo package.

library(zoo)
vignette("zoo-quickref")

If speed is your main concern check this recent thread that was posted
on R-sig-finance:
http://n4.nabble.com/SUMMARY-Reducing-an-intra-day-dataset-into-one-obs-per-second-td949612.html

On Mon, Dec 7, 2009 at 10:57 AM, Karl Ove Hufthammer <karl at huftis.org>
wrote:> Dear list members
>
> I have a rather large vector (part of a data frame) giving the time
> (date + time, POSIXct) of observations. The times are irregular (with
> both small and large jumps) but increasing, and there are several
> millions of them.
>
> I now wish to reduce my data set, so that I only have observations which
> are at least (for example) 60 seconds apart. Basically, I need (all) the
> indices of my time variable where the difference in times are at least
> 60 seconds.
>
> I thought this would be a rather simple task, but perhaps I'm tired,
for
> I couldn't figure out how to do it in a even moderately elegant way
(not
> looping over all the values, which is quite slow).
>
> This solution seemed sensible:
>
> x=cumsum(diff(timevar) %/% 60)
> ind=c(1,cumsum(rle(x)$lengths)+1) # And perhaps removing the last value
>
> but doesn't work, as it only captures the 'first times' in each
> 60-second interval following the first time value, and thus may include
> times with values that are closer than 60 seconds.
>
> I also considered round.POSIXct and trunc.POSIXct, but these are not
> appropriate either, for obvious reasons.
>
> So, any ideas how to do this in an elegant and efficient way?
>
> --
> Karl Ove Hufthammer
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Dec 2009 - Subset of time observations where timediff > 60 secs

[R] Subset of time observations where timediff > 60 secs

[R] Subset of time observations where timediff > 60 secs

Reasonably Related Threads