Thomas Barningham
2016-Feb-29 15:20 UTC
[R] removing data based on date pairs in a separate data frame
Dear R users, I have two data frames. The first contains a date/time column and the concentration of a species: head(mydata) date species 1 2016-01-31 23:59:53 -559.17 2 2016-02-01 00:00:53 -556.68 3 2016-02-01 00:01:53 -554.89 4 2016-02-01 00:02:53 -556.72 5 2016-02-01 00:03:53 -557.36 6 2016-02-01 00:13:53 -561.42 The second contains a list of start and end date pairs: head(mydata_flag) start_date end_date 1 2016-02-01 00:01:00 2016-02-01 00:03:00 2 2016-02-01 00:10:00 2016-02-01 00:15:00 I need to loop through all pairs of dates in the mydata_flag data frame and then remove any data in the mydata data frame that is between each of the date pairs. The result for what I've presented here would look something like this: date species 1 2016-01-31 23:59:53 -559.17 2 2016-02-01 00:00:53 -556.68 3 2016-02-01 00:03:53 -557.36 I've searched high and low for answer to this. I know it's a subsetting problem but I don't know how to approach it. Subset answers tend to have one start end date pair and keep the data between the dates. I need to remove data between the dates and I have a full data frame of date/time pairs to consider. For background info: this is to flag bad atmospheric data between times that there were known instrumentation issues. Thanks in advance, Thomas -- Thomas Barningham Centre for Ocean and Atmospheric Sciences School of Environmental Sciences University of East Anglia Norwich Research Park Norwich NR4 7TJ
Bert Gunter
2016-Feb-29 17:33 UTC
[R] removing data based on date pairs in a separate data frame
What is the format of your date columns? -- character, factor, POSIXxx,... ?? See ?str to find out. (Reply to the list, not just me; others are far more facile at dates than I am). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Feb 29, 2016 at 7:20 AM, Thomas Barningham <stbarningham at gmail.com> wrote:> Dear R users, > > I have two data frames. > > The first contains a date/time column and the concentration of a species: > > head(mydata) > date species > 1 2016-01-31 23:59:53 -559.17 > 2 2016-02-01 00:00:53 -556.68 > 3 2016-02-01 00:01:53 -554.89 > 4 2016-02-01 00:02:53 -556.72 > 5 2016-02-01 00:03:53 -557.36 > 6 2016-02-01 00:13:53 -561.42 > > > The second contains a list of start and end date pairs: > > head(mydata_flag) > start_date end_date > 1 2016-02-01 00:01:00 2016-02-01 00:03:00 > 2 2016-02-01 00:10:00 2016-02-01 00:15:00 > > I need to loop through all pairs of dates in the mydata_flag data > frame and then remove any data in the mydata data frame that is > between each of the date pairs. > > The result for what I've presented here would look something like this: > date species > 1 2016-01-31 23:59:53 -559.17 > 2 2016-02-01 00:00:53 -556.68 > 3 2016-02-01 00:03:53 -557.36 > > I've searched high and low for answer to this. I know it's a > subsetting problem but I don't know how to approach it. Subset answers > tend to have one start end date pair and keep the data between the > dates. I need to remove data between the dates and I have a full data > frame of date/time pairs to consider. For background info: this is to > flag bad atmospheric data between times that there were known > instrumentation issues. > > Thanks in advance, > > Thomas > > -- > Thomas Barningham > Centre for Ocean and Atmospheric Sciences > School of Environmental Sciences > University of East Anglia > Norwich Research Park > Norwich > NR4 7TJ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2016-Feb-29 18:44 UTC
[R] removing data based on date pairs in a separate data frame
If your start/end pairs are not overlapping you can use findInterval() to do this pretty quickly. E.g., isInABound <- function (x, low, high) { stopifnot(length(low) == length(high)) bounds <- rep(low, each = 2) bounds[seq(2, length(bounds), by = 2)] <- high stopifnot(!is.unsorted(bounds)) findInterval(x, bounds)%%2 == 1 }> i <- isInABound(mydata$date, mydata_flag$start_date, mydata_flag$end_date) > mydata[!i,]date species 1 2016-01-31 23:59:53 -559.17 2 2016-02-01 00:00:53 -556.68 5 2016-02-01 00:03:53 -557.36 Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Feb 29, 2016 at 7:20 AM, Thomas Barningham <stbarningham at gmail.com> wrote:> Dear R users, > > I have two data frames. > > The first contains a date/time column and the concentration of a species: > > head(mydata) > date species > 1 2016-01-31 23:59:53 -559.17 > 2 2016-02-01 00:00:53 -556.68 > 3 2016-02-01 00:01:53 -554.89 > 4 2016-02-01 00:02:53 -556.72 > 5 2016-02-01 00:03:53 -557.36 > 6 2016-02-01 00:13:53 -561.42 > > > The second contains a list of start and end date pairs: > > head(mydata_flag) > start_date end_date > 1 2016-02-01 00:01:00 2016-02-01 00:03:00 > 2 2016-02-01 00:10:00 2016-02-01 00:15:00 > > I need to loop through all pairs of dates in the mydata_flag data > frame and then remove any data in the mydata data frame that is > between each of the date pairs. > > The result for what I've presented here would look something like this: > date species > 1 2016-01-31 23:59:53 -559.17 > 2 2016-02-01 00:00:53 -556.68 > 3 2016-02-01 00:03:53 -557.36 > > I've searched high and low for answer to this. I know it's a > subsetting problem but I don't know how to approach it. Subset answers > tend to have one start end date pair and keep the data between the > dates. I need to remove data between the dates and I have a full data > frame of date/time pairs to consider. For background info: this is to > flag bad atmospheric data between times that there were known > instrumentation issues. > > Thanks in advance, > > Thomas > > -- > Thomas Barningham > Centre for Ocean and Atmospheric Sciences > School of Environmental Sciences > University of East Anglia > Norwich Research Park > Norwich > NR4 7TJ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]