Ralf B
2010-Apr-12 17:50 UTC
[R] Data Synchronization -- detecting time differences in multi-source data
Hi R enthusiasts, I am dealing with logging data from different sources that contain data from user activities. The data is all timelined with one column containing Epoch time and two columns containing data (x and y coordinates of mouse movements) = three columns for each source. I have up to 10 such sources and with 100000s of log entries. Here the header: timestamp1, x1, y1, timestamp2, x2, y2, ..... Since data is recorded from different sources, I have time differences in the measurements between source 1 and source 2. Sometimes these time differences are regular (e.g. source 1 is always 10 ms off source 2) but they can also be dynamic (e.g. based on some network latency issues, differences can increase or decrease at any time). The x and y value measurements always match, but since they are screen coordinates they may repeat in various places. Some sources start earlier than others, which means time lined entries do not match on each line. I am looking for a pointer to some general statistical methods that allows me to automatically detect time differences in such data sets. Methods that detect blocks of measurements across sources and compare their time line and flag those cases where they divert. Which field of stats deals with this? What R packages are specialized on such problems? Thanks a lot, Ralf