Here is one way of doing it. Create a new dataframe with the values,
look for the breaks in times between the two sequences and then create
a results dataframe:
> # convert time to POSIXct
> d1$time1 <- as.POSIXct(paste(d1[[1]], d1[[2]]))
> d2$time2 <- as.POSIXct(paste(d2[[1]], d2[[2]]))
> d1$ID1 <- row.names(d1)
> d2$ID2 <- row.names(d2)
> d1
ID1 time1 dt
1 1 2008-01-02 13:11:00 10
2 2 2008-01-02 14:20:00 20
3 3 2008-01-02 15:42:00 30
4 4 2008-01-02 16:45:00 40
5 5 2008-01-02 17:42:00 50
6 6 2008-01-02 20:40:00 60> d2
ID2 time2 d1
101 101 2008-01-02 14:29:00 75
102 102 2008-01-02 17:55:00 105
103 103 2008-02-07 20:01:00 8> # create dataframe of times
> checkTime <- data.frame(index=c(seq(nrow(d1)), seq(nrow(d2))),
+ ID=c(d1$ID1, d2$ID2),
+ times=c(d1$time1, d2$time2), key=c(rep(0, nrow(d1)), rep(1,
nrow(d2))))> # sort by time
> checkTime <- checkTime[order(checkTime$times),]
> # find the breaks (transition from 0 -> 1)
> breaks <- which(diff(checkTime$key) == 1)
> # find out which ones are withing 15 minutes
> dif.15 <- which(difftime(checkTime$times[breaks + 1],
checkTime$times[breaks], units='mins') <= 15)
> # print out the values
> data.frame(ID1=checkTime$ID[breaks[dif.15]],
time1=checkTime$times[breaks[dif.15]],
+ time2=checkTime$times[breaks[dif.15] + 1])
ID1 time1 time2
1 2 2008-01-02 14:20:00 2008-01-02 14:29:00
2 5 2008-01-02 17:42:00 2008-01-02 17:55:00>
>
>
On Wed, Apr 9, 2008 at 4:53 AM, ravi <rv15i at yahoo.se>
wrote:> Hi,
> I would like to merge two data frames. It is just that I want the merging
to be done with some kind of a fuzzy criterion. Let me explain.
> My first data frame looks like this :
>
> ID1 time1 dt
> 1 2008-01-02 13:11 10
> 2 2008-01-02 14:20 20
> 3 2008-01-02 15:42 30
> 4 2008-01-02 16:45 40
> 5 2008-01-02 17:42 50
> 6 2008-01-02 20:40 60
>
>
> My second data frame :
>
> ID2 time2 d1
> 101 2008-01-02 14:29 75
> 102 2008-01-02 17:55 105
> 103 2008-02-07 20:01 8
>
>
>
> I want the merging to be done such that time2 is in the range between time1
and (time1+15 min).
> That is, my merged data frame should be :
>
> ID1 time1 time2
> 2 2008-01-02 14:20 2008-01-02 14:29
> 5 2008-01-02 17:42 2008-01-02 17:55
>
>
> My data frames have thousands of records. If the two data frames are d1 and
d2,
>
> d3<-merge(d1,d2,by.x=time1,by.y=time2)
> will work only for exact matching. One possible option is to match the
times for the date and hour times only (by filtering away the minute data).
> But this is only a partial solution as I am not interested in data where
the time difference is more than 15 minutes.
>
> How can I make the merge to work for fuzzy matching?
> Would it be easier to convert the times into data classes? Or, it better to
treat them as strings and use regular expresssions for doing the matching?
>
> I would appreciate any help that I can get.
> Thanking You,
> Ravi
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?