thr3ads.net - R help - [R] Selection on dataframe based on order of rows [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Bonfigli Sandro

2006-Aug-22 18:15 UTC

[R] Selection on dataframe based on order of rows

I have a dataframe with the following structure

id    date         value
-------------------------
1    22/08/2006     48
1    24/08/2006     50
1    28/08/2006     150
1    30/08/2006     100
1    01/09/2006     30
2    11/08/2006     30
2    22/08/2006     100
2    28/08/2006     11
2    02/09/2006     5
3    01/07/2006     3
3    01/08/2006     100
3    01/09/2006     100
4    22/08/2006     48
4    24/08/2006     50
4    28/08/2006     150
4    30/08/2006     100
4    01/09/2006     30
4    03/09/2006     100
4    06/09/2006     100


N.B.: dates in european format; ordered dataframe

For each ID I need to select the first occurrence of
all the rows which are the first of at least two with 
"value" >= 50.

Rather convoluted explication. I mean that for each id I have to select
the first row in which value is > 50 only if at least the following row 
has "value" > 50 too. If this is not true I repeat the test for all
the
following rows in which "value" > 50 untill I find a record that
respects
the condition

this means that with my example dataframe the result is :
id    date         value
-------------------------
1    28/08/2006     150
3    01/08/2006     100
4    28/08/2006     150

It's clear that a for loop would work but I think that that is a better 
way.

I tried "by" and could obtain the first row for wich "value"
is > 50.

I thought of an iterative process (delete the first row > 50, find the 
second row > 50, examine if there are rows in the middle) but it
is quite inelegant as if the first value is not the "good" one I have
to
repeat the process for a a priori unknown number of times.

Thanks in advance for Your help

  Sandro Bonfigli

Gabor Grothendieck

2006-Aug-22 23:22 UTC

head link

[R] Selection on dataframe based on order of rows

Try this:

# data
DF <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4,
4, 4, 4, 4, 4, 4), date = structure(c(8, 9, 10, 11, 3, 7, 8,
10, 4, 1, 2, 3, 8, 9, 10, 11, 3, 5, 6), .Label = c("01/07/2006",
"01/08/2006", "01/09/2006", "02/09/2006",
"03/09/2006", "06/09/2006",
"11/08/2006", "22/08/2006", "24/08/2006",
"28/08/2006", "30/08/2006"
), class = "factor"), value = c(48, 50, 150, 100, 30, 30, 100,
11, 5, 3, 100, 100, 48, 50, 150, 100, 30, 100, 100)), .Names = c("id",
"date", "value"), class = "data.frame", row.names
= c("1", "2",
"3", "4", "5", "6", "7",
"8", "9", "10", "11", "12",
"13", "14",
"15", "16", "17", "18", "19"))

f <- function(x) {
	idx <- which(x$value > 50 & c(x$value[-1], 0) > 50)
	if (length(idx) > 0) x[idx[1],]
}
do.call(rbind, by(DF, DF$id, f))


On 8/22/06, Bonfigli Sandro <bonfigli at inmi.it>
wrote:> I have a dataframe with the following structure
>
> id    date         value
> -------------------------
> 1    22/08/2006     48
> 1    24/08/2006     50
> 1    28/08/2006     150
> 1    30/08/2006     100
> 1    01/09/2006     30
> 2    11/08/2006     30
> 2    22/08/2006     100
> 2    28/08/2006     11
> 2    02/09/2006     5
> 3    01/07/2006     3
> 3    01/08/2006     100
> 3    01/09/2006     100
> 4    22/08/2006     48
> 4    24/08/2006     50
> 4    28/08/2006     150
> 4    30/08/2006     100
> 4    01/09/2006     30
> 4    03/09/2006     100
> 4    06/09/2006     100
>
>
> N.B.: dates in european format; ordered dataframe
>
> For each ID I need to select the first occurrence of
> all the rows which are the first of at least two with
> "value" >= 50.
>
> Rather convoluted explication. I mean that for each id I have to select
> the first row in which value is > 50 only if at least the following row
> has "value" > 50 too. If this is not true I repeat the test
for all the
> following rows in which "value" > 50 untill I find a record
that respects
> the condition
>
> this means that with my example dataframe the result is :
> id    date         value
> -------------------------
> 1    28/08/2006     150
> 3    01/08/2006     100
> 4    28/08/2006     150
>
> It's clear that a for loop would work but I think that that is a better
> way.
>
> I tried "by" and could obtain the first row for wich
"value" is > 50.
>
> I thought of an iterative process (delete the first row > 50, find the
> second row > 50, examine if there are rows in the middle) but it
> is quite inelegant as if the first value is not the "good" one I
have to
> repeat the process for a a priori unknown number of times.
>
> Thanks in advance for Your help
>
>  Sandro Bonfigli
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Aug 2006 - Selection on dataframe based on order of rows

[R] Selection on dataframe based on order of rows

[R] Selection on dataframe based on order of rows

Possibly Parallel Threads