Hi, thanks everyone for any help in advance. I found myself dealing with a tabular time-series data formatted each row like [ time stamp, ID, values]. I made a small examples: X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) X$x = rnorm(15) 't' is time stamp, 'id' is identifier, 'x' is time series values. They are not necessarily ordered and have sometimes missing values. In order to do any analysis, I used to convert this type of data into a matrix form : Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) rownames(Y) = sort(unique(X$id)) colnames(Y) = sort(unique(X$t)) for(i in 1:nrow(Y)){ xi = X[ X$id == rownames(Y)[i], ] Y[i, match(xi$t, colnames(Y)) ] = xi$x } Then, run any R operations on Y. Now, this conversion gets very painfully slow as my data gets substantially larger. I was wondering if there is some better ways to convert a table like 'X' into a matrix like 'Y', or even better ways to re-format data, not necessarily matrix form. Young [[alternative HTML version deleted]]
Is this what you want:> X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id + c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > Xt id x 1 1 a -0.6264538 2 1 b 0.1836433 3 1 c -0.8356286 4 2 c 1.5952808 5 2 b 0.3295078 6 2 d -0.8204684 7 2 e 0.4874291 8 3 b 0.7383247 9 3 a 0.5757814 10 3 e -0.3053884 11 4 a 1.5117812 12 4 b 0.3898432 13 4 d -0.6212406 14 5 b -2.2146999 15 5 c 1.1249309> require(reshape)Loading required package: reshape Loading required package: plyr> X.m <- melt(X, measure='x') > cast(X.m, id ~ t, sum)id 1 2 3 4 5 1 a -0.6264538 0.0000000 0.5757814 1.5117812 0.000000 2 b 0.1836433 0.3295078 0.7383247 0.3898432 -2.214700 3 c -0.8356286 1.5952808 0.0000000 0.0000000 1.124931 4 d 0.0000000 -0.8204684 0.0000000 -0.6212406 0.000000 5 e 0.0000000 0.4874291 -0.3053884 0.0000000 0.000000>On Wed, Jul 1, 2009 at 1:35 PM, Young Cho<young.stat at gmail.com> wrote:> Hi, thanks everyone for any help in advance. > > I found myself dealing with a tabular time-series data formatted ?each row > like [ time stamp, ? ID, values]. I made a small examples: > > X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id > c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > > 't' is time stamp, 'id' is identifier, 'x' is time series values. They are > not necessarily ordered and have sometimes missing values. In order to do > any analysis, I used to convert this type of data into a matrix form : > > Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) > rownames(Y) = sort(unique(X$id)) > colnames(Y) = sort(unique(X$t)) > for(i in 1:nrow(Y)){ > ? xi = X[ X$id == rownames(Y)[i], ] > ? Y[i, match(xi$t, colnames(Y)) ] = xi$x > } > > Then, run any R operations on Y. Now, this conversion gets very painfully > slow as my data gets substantially larger. I was wondering if there is some > better ways to convert a table like 'X' into a matrix like 'Y', or even > better ways to re-format data, not necessarily matrix form. > > Young > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
This may be closer; forgot about the NAs> cast(X.m, id ~ t, function(x) if (length(x)==0) NA else sum(x))id 1 2 3 4 5 1 a -0.6264538 NA 0.5757814 1.5117812 NA 2 b 0.1836433 0.3295078 0.7383247 0.3898432 -2.214700 3 c -0.8356286 1.5952808 NA NA 1.124931 4 d NA -0.8204684 NA -0.6212406 NA 5 e NA 0.4874291 -0.3053884 NA NA>On Wed, Jul 1, 2009 at 1:35 PM, Young Cho<young.stat at gmail.com> wrote:> Hi, thanks everyone for any help in advance. > > I found myself dealing with a tabular time-series data formatted ?each row > like [ time stamp, ? ID, values]. I made a small examples: > > X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id > c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > > 't' is time stamp, 'id' is identifier, 'x' is time series values. They are > not necessarily ordered and have sometimes missing values. In order to do > any analysis, I used to convert this type of data into a matrix form : > > Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) > rownames(Y) = sort(unique(X$id)) > colnames(Y) = sort(unique(X$t)) > for(i in 1:nrow(Y)){ > ? xi = X[ X$id == rownames(Y)[i], ] > ? Y[i, match(xi$t, colnames(Y)) ] = xi$x > } > > Then, run any R operations on Y. Now, this conversion gets very painfully > slow as my data gets substantially larger. I was wondering if there is some > better ways to convert a table like 'X' into a matrix like 'Y', or even > better ways to re-format data, not necessarily matrix form. > > Young > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jorge Ivan Velez
2009-Jul-01 20:59 UTC
[R] convert tabular time series into a matrix format
Dear Young, Try this: with(X, tapply(x, list(t,id), function(y) y )) HTH, Jorge On Wed, Jul 1, 2009 at 1:35 PM, Young Cho <young.stat@gmail.com> wrote:> Hi, thanks everyone for any help in advance. > > I found myself dealing with a tabular time-series data formatted each row > like [ time stamp, ID, values]. I made a small examples: > > X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id > c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > > 't' is time stamp, 'id' is identifier, 'x' is time series values. They are > not necessarily ordered and have sometimes missing values. In order to do > any analysis, I used to convert this type of data into a matrix form : > > Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) > rownames(Y) = sort(unique(X$id)) > colnames(Y) = sort(unique(X$t)) > for(i in 1:nrow(Y)){ > xi = X[ X$id == rownames(Y)[i], ] > Y[i, match(xi$t, colnames(Y)) ] = xi$x > } > > Then, run any R operations on Y. Now, this conversion gets very painfully > slow as my data gets substantially larger. I was wondering if there is some > better ways to convert a table like 'X' into a matrix like 'Y', or even > better ways to re-format data, not necessarily matrix form. > > Young > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Gabor Grothendieck
2009-Jul-01 21:01 UTC
[R] convert tabular time series into a matrix format
Can't say whether its any faster but read.zoo in the devel version of zoo can do this using the split= argument where split=2 in the example below says to split it into time series defined by the second column. Lines <- '"t" "id" "x" 1 "a" -1.71941257904109 1 "b" 1.33629503083329 1 "c" 1.61337372076629 2 "c" 1.34439849784170 2 "b" 0.167475882421629 2 "d" -0.447868997085645 2 "e" -0.592497543309015 3 "b" 0.952268366091281 3 "a" -0.532804723204108 3 "e" -1.20064709102901 4 "a" 2.10892119828104 4 "b" -0.0550779510849278 4 "d" 1.98864434974374 5 "b" 1.59258204616364 5 "c" 0.65185104628371' library(zoo) source("http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/pkg/R/read.zoo.R?rev=588&root=zoo") z <- read.zoo(textConnection(Lines), header = TRUE, split = 2) See the three zoo vignettes and the zoo help pages for more on zoo. On Wed, Jul 1, 2009 at 1:35 PM, Young Cho<young.stat at gmail.com> wrote:> Hi, thanks everyone for any help in advance. > > I found myself dealing with a tabular time-series data formatted ?each row > like [ time stamp, ? ID, values]. I made a small examples: > > X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id > c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > > 't' is time stamp, 'id' is identifier, 'x' is time series values. They are > not necessarily ordered and have sometimes missing values. In order to do > any analysis, I used to convert this type of data into a matrix form : > > Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) > rownames(Y) = sort(unique(X$id)) > colnames(Y) = sort(unique(X$t)) > for(i in 1:nrow(Y)){ > ? xi = X[ X$id == rownames(Y)[i], ] > ? Y[i, match(xi$t, colnames(Y)) ] = xi$x > } > > Then, run any R operations on Y. Now, this conversion gets very painfully > slow as my data gets substantially larger. I was wondering if there is some > better ways to convert a table like 'X' into a matrix like 'Y', or even > better ways to re-format data, not necessarily matrix form. > > Young > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Henrique Dallazuanna
2009-Jul-01 21:56 UTC
[R] convert tabular time series into a matrix format
Try this: xtabs(x ~ t + id, data = X) On Wed, Jul 1, 2009 at 2:35 PM, Young Cho <young.stat@gmail.com> wrote:> Hi, thanks everyone for any help in advance. > > I found myself dealing with a tabular time-series data formatted each row > like [ time stamp, ID, values]. I made a small examples: > > X = data.frame(t=c(1,1,1,2,2,2,2,3,3,3,4,4,4,5,5),id > c('a','b','c','c','b','d','e','b','a','e','a','b','d','b','c')) > X$x = rnorm(15) > > 't' is time stamp, 'id' is identifier, 'x' is time series values. They are > not necessarily ordered and have sometimes missing values. In order to do > any analysis, I used to convert this type of data into a matrix form : > > Y = matrix(NA,length(unique(X$id)),length(unique(X$t))) > rownames(Y) = sort(unique(X$id)) > colnames(Y) = sort(unique(X$t)) > for(i in 1:nrow(Y)){ > xi = X[ X$id == rownames(Y)[i], ] > Y[i, match(xi$t, colnames(Y)) ] = xi$x > } > > Then, run any R operations on Y. Now, this conversion gets very painfully > slow as my data gets substantially larger. I was wondering if there is some > better ways to convert a table like 'X' into a matrix like 'Y', or even > better ways to re-format data, not necessarily matrix form. > > Young > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]