I have a dataframe with the following structure id date value ------------------------- 1 22/08/2006 48 1 24/08/2006 50 1 28/08/2006 150 1 30/08/2006 100 1 01/09/2006 30 2 11/08/2006 30 2 22/08/2006 100 2 28/08/2006 11 2 02/09/2006 5 3 01/07/2006 3 3 01/08/2006 100 3 01/09/2006 100 4 22/08/2006 48 4 24/08/2006 50 4 28/08/2006 150 4 30/08/2006 100 4 01/09/2006 30 4 03/09/2006 100 4 06/09/2006 100 N.B.: dates in european format; ordered dataframe For each ID I need to select the first occurrence of all the rows which are the first of at least two with "value" >= 50. Rather convoluted explication. I mean that for each id I have to select the first row in which value is > 50 only if at least the following row has "value" > 50 too. If this is not true I repeat the test for all the following rows in which "value" > 50 untill I find a record that respects the condition this means that with my example dataframe the result is : id date value ------------------------- 1 28/08/2006 150 3 01/08/2006 100 4 28/08/2006 150 It's clear that a for loop would work but I think that that is a better way. I tried "by" and could obtain the first row for wich "value" is > 50. I thought of an iterative process (delete the first row > 50, find the second row > 50, examine if there are rows in the middle) but it is quite inelegant as if the first value is not the "good" one I have to repeat the process for a a priori unknown number of times. Thanks in advance for Your help Sandro Bonfigli
Gabor Grothendieck
2006-Aug-22 23:22 UTC
[R] Selection on dataframe based on order of rows
Try this: # data DF <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4), date = structure(c(8, 9, 10, 11, 3, 7, 8, 10, 4, 1, 2, 3, 8, 9, 10, 11, 3, 5, 6), .Label = c("01/07/2006", "01/08/2006", "01/09/2006", "02/09/2006", "03/09/2006", "06/09/2006", "11/08/2006", "22/08/2006", "24/08/2006", "28/08/2006", "30/08/2006" ), class = "factor"), value = c(48, 50, 150, 100, 30, 30, 100, 11, 5, 3, 100, 100, 48, 50, 150, 100, 30, 100, 100)), .Names = c("id", "date", "value"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")) f <- function(x) { idx <- which(x$value > 50 & c(x$value[-1], 0) > 50) if (length(idx) > 0) x[idx[1],] } do.call(rbind, by(DF, DF$id, f)) On 8/22/06, Bonfigli Sandro <bonfigli at inmi.it> wrote:> I have a dataframe with the following structure > > id date value > ------------------------- > 1 22/08/2006 48 > 1 24/08/2006 50 > 1 28/08/2006 150 > 1 30/08/2006 100 > 1 01/09/2006 30 > 2 11/08/2006 30 > 2 22/08/2006 100 > 2 28/08/2006 11 > 2 02/09/2006 5 > 3 01/07/2006 3 > 3 01/08/2006 100 > 3 01/09/2006 100 > 4 22/08/2006 48 > 4 24/08/2006 50 > 4 28/08/2006 150 > 4 30/08/2006 100 > 4 01/09/2006 30 > 4 03/09/2006 100 > 4 06/09/2006 100 > > > N.B.: dates in european format; ordered dataframe > > For each ID I need to select the first occurrence of > all the rows which are the first of at least two with > "value" >= 50. > > Rather convoluted explication. I mean that for each id I have to select > the first row in which value is > 50 only if at least the following row > has "value" > 50 too. If this is not true I repeat the test for all the > following rows in which "value" > 50 untill I find a record that respects > the condition > > this means that with my example dataframe the result is : > id date value > ------------------------- > 1 28/08/2006 150 > 3 01/08/2006 100 > 4 28/08/2006 150 > > It's clear that a for loop would work but I think that that is a better > way. > > I tried "by" and could obtain the first row for wich "value" is > 50. > > I thought of an iterative process (delete the first row > 50, find the > second row > 50, examine if there are rows in the middle) but it > is quite inelegant as if the first value is not the "good" one I have to > repeat the process for a a priori unknown number of times. > > Thanks in advance for Your help > > Sandro Bonfigli > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >