Hello, I am having difficulty filtering data. I am working with flow data collected at a stream gage. For each record, I have a date and flow value. I have filtered this data to only include days when flow values exceed a given threshold. Here is my problem. Within this subset of data, I often have several consecutive days above the threshold. From this group of days, I wish to select the record (both date and flow) containing the maximum flow. If an exceedance is isolated ( the preceeding and succeeding day is below the threshold) I also wish to select that record. For example from the data set Day Flow 1 10 4 13 5 20 6 15 9 13 I would like the 1st, 3rd and 5th record filered. Any ideas on how I would write such and algorithm would be appreciated. Thanks, Matt Pocernich -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
here a way to do it #your data (days have to be sorted!) da <- cbind(c(1,4,5,6,9),c(10,13,20,15,13)) #the length of it l <- dim(da)[1] #make day-groups gr <- cumsum(c(T,da[2:l]-da[2:l-1]>1)) #find the index of the maximum of each group mi <- tapply(da[,2],gr,function(a)(1:length(a))[a==max(a)]) #add them to the start index of each group mi <- c(0,cumsum(tapply(da[,2],gr,length)))[1:length(mi)]+mi #output da[mi,] Matt Pocernich wrote:> > Hello, > > I am having difficulty filtering data. I am working with flow data > collected at a stream gage. For each record, I have a date and flow > value. I have filtered this data to only include days when flow values > exceed a given threshold. > > Here is my problem. Within this subset of data, I often have several > consecutive days above the threshold. From this group of days, I wish to > select the record (both date and flow) containing the maximum flow. If an > exceedance is isolated ( the preceeding and succeeding day is below the > threshold) I also wish to select that record. > > For example from the data set > > Day Flow > > 1 10 > 4 13 > 5 20 > 6 15 > 9 13 > > I would like the 1st, 3rd and 5th record filered. > > Any ideas on how I would write such and algorithm would be appreciated. > > Thanks, > > Matt Pocernich > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Joerg Maeder IACETH INSTITUTE PhD Student FOR ATMOSPHERIC Phone: +41 1 633 36 25 AND CLIMATE SCIENCE Fax: +41 1 633 10 58 ETH Z?RICH Switzerland -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
At 08:39 PM 11/6/2001 -0700, Matt Pocernich wrote:>I am having difficulty filtering data. I am working with flow data >collected at a stream gage. For each record, I have a date and flow >value. I have filtered this data to only include days when flow values >exceed a given threshold. > >Here is my problem. Within this subset of data, I often have several >consecutive days above the threshold. From this group of days, I wish to >select the record (both date and flow) containing the maximum flow. If an >exceedance is isolated ( the preceeding and succeeding day is below the >threshold) I also wish to select that record. > >For example from the data set > >Day Flow > >1 10 >4 13 >5 20 >6 15 >9 13 > >I would like the 1st, 3rd and 5th record filered. > >Any ideas on how I would write such and algorithm would be appreciated.Dear Matt, Here's a function that does what you want with loops. Perhaps someone else will produce a more elegant solution: > select.rows <- function(data) { + indices <- data[,1] + values <- data[,2] + n <- length(indices) + if (n == 0) stop('no data') + if (n == 1) return(data) + selection <- rep(0, n) # so as not to grow the selection vector + current <- 1 + number <- 1 + for (i in 2:n){ + if (indices[i] == 1 + indices[i - 1]){ + if (values[i] > values[current]) current <- i + } + else { + selection[number] <- current + number <- number + 1 + current <- i + } + } + selection[number] <- current + data[selection,] + } > > data <- matrix(c(1,4,5,6,9, 10,13,20,15,13), 5, 2) > colnames(data) <- c('Day', 'Flow') > select.rows(data) Day Flow [1,] 1 10 [2,] 5 20 [3,] 9 13 I hope that this isn't too inefficient. John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._