thr3ads.net - R help - [R] Fuzzy merge using timestamps [Nov 2010]

If this information is useful, please help other people find it:
Share via:

blurg

2010-Nov-08 22:04 UTC

[R] Fuzzy merge using timestamps

Greetings Supreme Council of R Masters,

Like toddler, I have gotten my head stuck in the banisters of R ... again. 
Let it be know I am still a neophyte in the R-community forum world, so
please don't flame me too bad.  

I have two sets of data, each with a set of timestamps.  I would like to
somehow merge the datasets based on the timestamps and an individual
identifier.  That is there are several individuals all with timestamps, with
times that could overlap.  By browsing through some of the older posts, I
got the idea to create a third data frame of both sets of timestamps,
individual identifiers, and a key to determine which dataset they have come
from, then find the breaks to determine which of each dataset should be
paired.  the code I have written so far look something like this.

gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)

gpsdata$ID1 <- row.names(gpsdata) 
urdata$ID2 <- row.names(urdata) 

gpsdata$key1 <- rep(0, nrow(gpsdata))
urdata$key2 <- rep(1, nrow(urdata))

checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
	ARC=c(gpsdata$gpsARC, urdata$urARC),
	times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
	key=c(gpsdata$key1, urdata$key2))

checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing
FALSE),]

breaks <- which(diff(checkTime$key) == 1)

match <- data.frame(ID1=checkTime$ID[breaks], 
	gpsARC = checkTime$ARC[breaks],
	urARC = checkTime$ARC[breaks + 1], 
	t_datetimegps=checkTime$times[breaks], 
	t_datetimeur=checkTime$times[breaks + 1])

#Then I merge the 'match' data frame with the gpsdata data frame and the
product with the urdata data frame.  The problem is that when I create the
checkTime data frame and sort it, it sorts the urdata portion first then the
gpsdata portion.   So my key column looks like
1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. 
S.O.S!!!!  Why is it doing this?  Shouldn't it just order the timestamps of
both data frames together?

Thanks for all your enlightenment.






-- 
View this message in context:
http://r.789695.n4.nabble.com/Fuzzy-merge-using-timestamps-tp3032745p3032745.html
Sent from the R help mailing list archive at Nabble.com.

blurg

2010-Nov-10 16:38 UTC

head link

[R] Fuzzy merge using timestamps

Let it be know I am still a neophyte in the R-community forum world, so
please don't flame me too bad.   

I have two sets of data, each with a set of timestamps.  I would like to
somehow merge the datasets based on the timestamps and an individual
identifier.  That is there are several individuals all with timestamps, with
times that could overlap.  By browsing through some of the older posts, I
got the idea to create a third data frame of both sets of timestamps,
individual identifiers, and a key to determine which dataset they have come
from, then find the breaks to determine which of each dataset should be
paired.  the code I have written so far look something like this. 

gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps) 
urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur) 

gpsdata$ID1 <- row.names(gpsdata) 
urdata$ID2 <- row.names(urdata) 

gpsdata$key1 <- rep(0, nrow(gpsdata)) 
urdata$key2 <- rep(1, nrow(urdata)) 

checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2), 
        ARC=c(gpsdata$gpsARC, urdata$urARC), 
        times=c(gpsdata$t_datetimegps, urdata$t_datetimeur), 
        key=c(gpsdata$key1, urdata$key2)) 

checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing
FALSE),]

breaks <- which(diff(checkTime$key) == 1) 

match <- data.frame(ID1=checkTime$ID[breaks], 
        gpsARC = checkTime$ARC[breaks], 
        urARC = checkTime$ARC[breaks + 1], 
        t_datetimegps=checkTime$times[breaks], 
        t_datetimeur=checkTime$times[breaks + 1]) 

#Then I merge the 'match' data frame with the gpsdata data frame and the
product with the urdata data frame.  The problem is that when I create the
checkTime data frame and sort it, it sorts the urdata portion first then the
gpsdata portion.   So my key column looks like
1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. 
S.O.S!!!!  Why is it doing this?  Shouldn't it just order the timestamps of
both data frames together? 

Thanks for all your help.
-- 
View this message in context:
http://r.789695.n4.nabble.com/Fuzzy-merge-using-timestamps-tp3036415p3036415.html
Sent from the R help mailing list archive at Nabble.com.

Ian Craig

2010-Nov-10 17:57 UTC

head link

[R] Fuzzy merge using timestamps

Greetings Supreme Council of R Masters,

Like toddler, I have gotten my head stuck in the banisters of R ... again.
 Let it be know I am still a neophyte in the R-community forum world, so
please don't flame me too bad.

I have two sets of data, each with a set of timestamps.  I would like to
somehow merge the datasets based on the timestamps and an individual
identifier.  That is there are several individuals all with timestamps, with
times that could overlap.  By browsing through some of the older posts, I
got the idea to create a third data frame of both sets of timestamps,
individual identifiers, and a key to determine which dataset they have come
from, then find the breaks to determine which of each dataset should be
paired.  the code I have written so far look something like this.

gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)

gpsdata$ID1 <- row.names(gpsdata)
urdata$ID2 <- row.names(urdata)

gpsdata$key1 <- rep(0, nrow(gpsdata))
urdata$key2 <- rep(1, nrow(urdata))

checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
        ARC=c(gpsdata$gpsARC, urdata$urARC),
        times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
        key=c(gpsdata$key1, urdata$key2))

checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing
FALSE),]

breaks <- which(diff(checkTime$key) == 1)

match <- data.frame(ID1=checkTime$ID[breaks],
        gpsARC = checkTime$ARC[breaks],
        urARC = checkTime$ARC[breaks + 1],
        t_datetimegps=checkTime$times[breaks],
        t_datetimeur=checkTime$times[breaks + 1])

#Then I merge the 'match' data frame with the gpsdata data frame and the
product with the urdata data frame.  The problem is that when I create the
checkTime data frame and sort it, it sorts the urdata portion first then the
gpsdata portion.   So my key column looks like
1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
 S.O.S!!!!  Why is it doing this?  Shouldn't it just order the timestamps of
both data frames together?

Thanks for all your enlightenment.

-- 
    O__  ----
   c/ /'_ ---
  (*) \(*)  --

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Nov 2010 - Fuzzy merge using timestamps

[R] Fuzzy merge using timestamps

[R] Fuzzy merge using timestamps

[R] Fuzzy merge using timestamps

Maybe Matching Threads