Hello List, I am working on creating periodograms from IP network traffic logs using the Fast Fourier Transform. The FFT requires all the data points to be evenly-spaced in the time domain (constant delta-T), so I have a step where I zero-pad the data. Lately I've been wondering if there is a faster way to do this. Here's what I've got: * data1 is a data frame consisting of a timestamp, in seconds, from the beginning of the network log, and the number of network events that fell on that timestamp. Example: time,events 0,1 1,30 5,14 10,4 *data2 is the zero-padded data frame. It has length equal to the greatest value of "time" in data2: time,events 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 So I run this for loop: for(i in 1:length(data1[,1])) { data2[data1[i,1],2]<-data1[i,2] } Which goes to each row in data1, reads the timestamp, and writes the "events" to the corresponding row in data2. The result is: time,events 0,1 1,30 2,0 3,0 4,0 5,14 6,0 7,0 9,0 9,0 10,4 For a 24-hour log (86,400 seconds) this can take a while...Any advice on how to speed it up would be appreciated. Thanks, Pete Cap --------------------------------- [[alternative HTML version deleted]]
How about starting your time from 1 instead of 0 to make indexing earier (you can always substract one later). If so:> xtime events 1 1 1 2 2 30 3 6 14 4 11 4> y <- data.frame(time=seq(max(x$time)), events=rep(0, max(x$time))) > ytime events 1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 6 6 0 7 7 0 8 8 0 9 9 0 10 10 0 11 11 0> y$events[x$time] <- x$events > ytime events 1 1 1 2 2 30 3 3 0 4 4 0 5 5 0 6 6 14 7 7 0 8 8 0 9 9 0 10 10 0 11 11 4>On 5/30/06, Pete Cap <peteoutside@yahoo.com> wrote:> > Hello List, > > I am working on creating periodograms from IP network traffic logs using > the Fast Fourier Transform. The FFT requires all the data points to be > evenly-spaced in the time domain (constant delta-T), so I have a step where > I zero-pad the data. > > Lately I've been wondering if there is a faster way to do this. Here's > what I've got: > > * data1 is a data frame consisting of a timestamp, in seconds, from the > beginning of the network log, and the number of network events that fell on > that timestamp. > Example: > time,events > 0,1 > 1,30 > 5,14 > 10,4 > > *data2 is the zero-padded data frame. It has length equal to the greatest > value of "time" in data2: > time,events > 1,0 > 2,0 > 3,0 > 4,0 > 5,0 > 6,0 > 7,0 > 8,0 > 9,0 > 10,0 > > So I run this for loop: > for(i in 1:length(data1[,1])) { > data2[data1[i,1],2]<-data1[i,2] > } > > Which goes to each row in data1, reads the timestamp, and writes the > "events" to the corresponding row in data2. The result is: > time,events > 0,1 > 1,30 > 2,0 > 3,0 > 4,0 > 5,14 > 6,0 > 7,0 > 9,0 > 9,0 > 10,4 > > For a 24-hour log (86,400 seconds) this can take a while...Any advice on > how to speed it up would be appreciated. > > Thanks, > Pete Cap > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]]
Try this: Lines <- "time,events 0,1 1,30 5,14 10,4" library(zoo) data1 <- read.zoo(textConnection(Lines), header = TRUE, sep = ",") data2 <- as.ts(data1) data2[is.na(data2)] <- 0 # omit this lines if NAs in extra positions is ok On 5/30/06, Pete Cap <peteoutside at yahoo.com> wrote:> Hello List, > > I am working on creating periodograms from IP network traffic logs using the Fast Fourier Transform. The FFT requires all the data points to be evenly-spaced in the time domain (constant delta-T), so I have a step where I zero-pad the data. > > Lately I've been wondering if there is a faster way to do this. Here's what I've got: > > * data1 is a data frame consisting of a timestamp, in seconds, from the beginning of the network log, and the number of network events that fell on that timestamp. > Example: > time,events > 0,1 > 1,30 > 5,14 > 10,4 > > *data2 is the zero-padded data frame. It has length equal to the greatest value of "time" in data2: > time,events > 1,0 > 2,0 > 3,0 > 4,0 > 5,0 > 6,0 > 7,0 > 8,0 > 9,0 > 10,0 > > So I run this for loop: > for(i in 1:length(data1[,1])) { > data2[data1[i,1],2]<-data1[i,2] > } > > Which goes to each row in data1, reads the timestamp, and writes the "events" to the corresponding row in data2. The result is: > time,events > 0,1 > 1,30 > 2,0 > 3,0 > 4,0 > 5,14 > 6,0 > 7,0 > 9,0 > 9,0 > 10,4 > > For a 24-hour log (86,400 seconds) this can take a while...Any advice on how to speed it up would be appreciated. > > Thanks, > Pete Cap > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Why not something simple like: # Toy example: data1 <- data.frame(time=c(0,1,5,10),events=c(1,30,14,4)) data2 <- rep(0,11) # Or more generally data2 <- rep(0,1+max(data1$time)) # You don't need a for loop! Use the indexing capabilities of R! data2[data1$time+1] <- data1$events # The ``+1'' is to allow for 0-origin. data2 <- ts(data2,start=0) ??? cheers, Rolf Turner rolf at math.unb.ca