Dear all, We have a large data set with temperature data for weather stations across the globe (15000 stations). For each station, we need to calculate the number of days a certain temperature is exceeded. So far we used the following S code, where mat88 is a matrix containing rows of 365 daily temperatures for each of 15000 weather stations: m <- 37 n <- 2 outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) for(i in 1:nrow(mat88)) { # i <- 3 row1 <- as.data.frame(df88[i, ]) temprow37 <- select.rows(row1, row1 > m) temprow39 <- select.rows(row1, row1 > m + n) temprow41 <- select.rows(row1, row1 > m + 2 * n) outmat88[i, 1] <- max(row1, na.rm = T) outmat88[i, 2] <- count.rows(temprow37) outmat88[i, 3] <- count.rows(temprow39) outmat88[i, 4] <- count.rows(temprow41) } outmat88 We have transferred the data to a more potent Linux box running R, but still hope to speed up the code. I know a for loop should be avoided when looking for speed. I also know the answer is in something like tapply, but my understanding of these commands is still to limited to see the solution. Could someone show me the way!? Thanks in advance, Sander. -- -------------------------------------------- Dr Sander P. Oom Animal, Plant and Environmental Sciences, University of the Witwatersrand Private Bag 3, Wits 2050, South Africa Tel (work) +27 (0)11 717 64 04 Tel (home) +27 (0)18 297 44 51 Fax +27 (0)18 299 24 64 Email sander at oomvanlieshout.net Web www.oomvanlieshout.net/sander
maybe you are looking for something along these lines: mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) temps <- c(37, 39, 41) ################# ind <- matrix(0, length(temps), ncol(mat)) for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) ind I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Sander Oom" <slist at oomvanlieshout.net> To: <r-help at stat.math.ethz.ch> Sent: Friday, June 10, 2005 10:50 AM Subject: [R] Replacing for loop with tapply!?> Dear all, > > We have a large data set with temperature data for weather stations > across the globe (15000 stations). > > For each station, we need to calculate the number of days a certain > temperature is exceeded. > > So far we used the following S code, where mat88 is a matrix > containing > rows of 365 daily temperatures for each of 15000 weather stations: > > m <- 37 > n <- 2 > outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) > for(i in 1:nrow(mat88)) { > # i <- 3 > row1 <- as.data.frame(df88[i, ]) > temprow37 <- select.rows(row1, row1 > m) > temprow39 <- select.rows(row1, row1 > m + n) > temprow41 <- select.rows(row1, row1 > m + 2 * n) > outmat88[i, 1] <- max(row1, na.rm = T) > outmat88[i, 2] <- count.rows(temprow37) > outmat88[i, 3] <- count.rows(temprow39) > outmat88[i, 4] <- count.rows(temprow41) > } > outmat88 > > We have transferred the data to a more potent Linux box running R, > but > still hope to speed up the code. > > I know a for loop should be avoided when looking for speed. I also > know > the answer is in something like tapply, but my understanding of > these > commands is still to limited to see the solution. Could someone show > me > the way!? > > Thanks in advance, > > Sander. > -- > -------------------------------------------- > Dr Sander P. Oom > Animal, Plant and Environmental Sciences, > University of the Witwatersrand > Private Bag 3, Wits 2050, South Africa > Tel (work) +27 (0)11 717 64 04 > Tel (home) +27 (0)18 297 44 51 > Fax +27 (0)18 299 24 64 > Email sander at oomvanlieshout.net > Web www.oomvanlieshout.net/sander > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Sander Oom wrote:>Dear all, > >We have a large data set with temperature data for weather stations >across the globe (15000 stations). > >For each station, we need to calculate the number of days a certain >temperature is exceeded. > >So far we used the following S code, where mat88 is a matrix containing >rows of 365 daily temperatures for each of 15000 weather stations: > > m <- 37 > n <- 2 > outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) > for(i in 1:nrow(mat88)) { > # i <- 3 > row1 <- as.data.frame(df88[i, ]) > temprow37 <- select.rows(row1, row1 > m) > temprow39 <- select.rows(row1, row1 > m + n) > temprow41 <- select.rows(row1, row1 > m + 2 * n) > outmat88[i, 1] <- max(row1, na.rm = T) > outmat88[i, 2] <- count.rows(temprow37) > outmat88[i, 3] <- count.rows(temprow39) > outmat88[i, 4] <- count.rows(temprow41) > } > outmat88 > > >What you need is not tapply but apply. Something like apply(mat88, 1, function(x) sum(x > 30)) where your treshold should replace 30 and the `1' refers to rows. For multiple tresholds: apply(mat88, 1, function(x) c( sum(x>20), sum(x>25), sum(x>30))) Kjetil>We have transferred the data to a more potent Linux box running R, but >still hope to speed up the code. > >I know a for loop should be avoided when looking for speed. I also know >the answer is in something like tapply, but my understanding of these >commands is still to limited to see the solution. Could someone show me >the way!? > >Thanks in advance, > >Sander. > >-- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus.