Greetings Supreme Council of R Masters, Like toddler, I have gotten my head stuck in the banisters of R ... again. Let it be know I am still a neophyte in the R-community forum world, so please don't flame me too bad. I have two sets of data, each with a set of timestamps. I would like to somehow merge the datasets based on the timestamps and an individual identifier. That is there are several individuals all with timestamps, with times that could overlap. By browsing through some of the older posts, I got the idea to create a third data frame of both sets of timestamps, individual identifiers, and a key to determine which dataset they have come from, then find the breaks to determine which of each dataset should be paired. the code I have written so far look something like this. gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps) urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur) gpsdata$ID1 <- row.names(gpsdata) urdata$ID2 <- row.names(urdata) gpsdata$key1 <- rep(0, nrow(gpsdata)) urdata$key2 <- rep(1, nrow(urdata)) checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2), ARC=c(gpsdata$gpsARC, urdata$urARC), times=c(gpsdata$t_datetimegps, urdata$t_datetimeur), key=c(gpsdata$key1, urdata$key2)) checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing FALSE),] breaks <- which(diff(checkTime$key) == 1) match <- data.frame(ID1=checkTime$ID[breaks], gpsARC = checkTime$ARC[breaks], urARC = checkTime$ARC[breaks + 1], t_datetimegps=checkTime$times[breaks], t_datetimeur=checkTime$times[breaks + 1]) #Then I merge the 'match' data frame with the gpsdata data frame and the product with the urdata data frame. The problem is that when I create the checkTime data frame and sort it, it sorts the urdata portion first then the gpsdata portion. So my key column looks like 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. S.O.S!!!! Why is it doing this? Shouldn't it just order the timestamps of both data frames together? Thanks for all your enlightenment. -- View this message in context: http://r.789695.n4.nabble.com/Fuzzy-merge-using-timestamps-tp3032745p3032745.html Sent from the R help mailing list archive at Nabble.com.
Let it be know I am still a neophyte in the R-community forum world, so please don't flame me too bad. I have two sets of data, each with a set of timestamps. I would like to somehow merge the datasets based on the timestamps and an individual identifier. That is there are several individuals all with timestamps, with times that could overlap. By browsing through some of the older posts, I got the idea to create a third data frame of both sets of timestamps, individual identifiers, and a key to determine which dataset they have come from, then find the breaks to determine which of each dataset should be paired. the code I have written so far look something like this. gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps) urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur) gpsdata$ID1 <- row.names(gpsdata) urdata$ID2 <- row.names(urdata) gpsdata$key1 <- rep(0, nrow(gpsdata)) urdata$key2 <- rep(1, nrow(urdata)) checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2), ARC=c(gpsdata$gpsARC, urdata$urARC), times=c(gpsdata$t_datetimegps, urdata$t_datetimeur), key=c(gpsdata$key1, urdata$key2)) checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing FALSE),] breaks <- which(diff(checkTime$key) == 1) match <- data.frame(ID1=checkTime$ID[breaks], gpsARC = checkTime$ARC[breaks], urARC = checkTime$ARC[breaks + 1], t_datetimegps=checkTime$times[breaks], t_datetimeur=checkTime$times[breaks + 1]) #Then I merge the 'match' data frame with the gpsdata data frame and the product with the urdata data frame. The problem is that when I create the checkTime data frame and sort it, it sorts the urdata portion first then the gpsdata portion. So my key column looks like 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. S.O.S!!!! Why is it doing this? Shouldn't it just order the timestamps of both data frames together? Thanks for all your help. -- View this message in context: http://r.789695.n4.nabble.com/Fuzzy-merge-using-timestamps-tp3036415p3036415.html Sent from the R help mailing list archive at Nabble.com.
Greetings Supreme Council of R Masters, Like toddler, I have gotten my head stuck in the banisters of R ... again. Let it be know I am still a neophyte in the R-community forum world, so please don't flame me too bad. I have two sets of data, each with a set of timestamps. I would like to somehow merge the datasets based on the timestamps and an individual identifier. That is there are several individuals all with timestamps, with times that could overlap. By browsing through some of the older posts, I got the idea to create a third data frame of both sets of timestamps, individual identifiers, and a key to determine which dataset they have come from, then find the breaks to determine which of each dataset should be paired. the code I have written so far look something like this. gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps) urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur) gpsdata$ID1 <- row.names(gpsdata) urdata$ID2 <- row.names(urdata) gpsdata$key1 <- rep(0, nrow(gpsdata)) urdata$key2 <- rep(1, nrow(urdata)) checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2), ARC=c(gpsdata$gpsARC, urdata$urARC), times=c(gpsdata$t_datetimegps, urdata$t_datetimeur), key=c(gpsdata$key1, urdata$key2)) checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing FALSE),] breaks <- which(diff(checkTime$key) == 1) match <- data.frame(ID1=checkTime$ID[breaks], gpsARC = checkTime$ARC[breaks], urARC = checkTime$ARC[breaks + 1], t_datetimegps=checkTime$times[breaks], t_datetimeur=checkTime$times[breaks + 1]) #Then I merge the 'match' data frame with the gpsdata data frame and the product with the urdata data frame. The problem is that when I create the checkTime data frame and sort it, it sorts the urdata portion first then the gpsdata portion. So my key column looks like 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. S.O.S!!!! Why is it doing this? Shouldn't it just order the timestamps of both data frames together? Thanks for all your enlightenment. -- O__ ---- c/ /'_ --- (*) \(*) -- [[alternative HTML version deleted]]