Martin Tomko
2014-Mar-23 00:32 UTC
time series processing - count of datestamp delta's, per group
Apologies if the question is a but naïve, I am a novice in time series data handling in R I have the following type of data, in a long format ( as called by the spacetime vignette – the table contains also space, not noted here): User | Date | Otherdata | A | 01/01/2014 | aa A | 01/01/2014 | bb A | 01/01/2014 | cc B | 01/01/2014 | aa B | 05/01/2014 | cc A | 07/01/2014 | aa C | 05/02/2014 | xx C | 20/02/2014 | yy Etc [A,B,C,…] are user Ids (some strings). Date is converted into a Date format (2013-10-15) The table is sorted by User and then by Date, and is over 800K records long. There are about 20K users. User | Date | Otherdata | A | 2014-01-01 | aa A | 2014-01-01 | bb A | 2014-01-01 | cc A | 2014-01-07 | aa B | 2014-01-01 | aa B | 2014-01-05 | cc C | 2014-02-05 | xx C | 2014-02-20 | yy I want to: Get a frequency table ( and ultimately plot) of the count of differences (in days) between records of a user. Meaning, I would first get the unique days recorded: A | 2014-01-01 A | 2014-01-07 B | 2014-01-01 B | 2014-01-05 C | 2014-02-05 C | 2014-02-20 And then want to run the differences between timestamps within a group defined by the user, in days: A| 6 B| 4 C|15 Imagining that I have tens of thousands of records, I then want the table with the counts of differences ( across all users) ( in our case it would be 6, 4 and 15, all counte = 1) IN the larger sample, something like this: DeltaDays | Count 1 | 150 2 | 320 … N | X I know there are all sorts of packages for time analysis, but I could not find a simple function like this (incl searching here http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf ). I assume that something working on a simple data frame would be sufficient, but I am happy ( prefer?) to use TS. I would appreciate any hints. The ultimate analysis involves also space, so hints in the direction of space-time are welcome. Ultimately, I would like to separate records for each user into a dataset that can be handled separately, but splitting it into a large number of files does not seem wise. Any hint also appreciated. Thanks, Martin [[alternative HTML version deleted]]