thr3ads.net - R help - [R] numbering consecutive rows based on length criteria [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Morway, Eric

2015-Mar-02 17:43 UTC

[R] numbering consecutive rows based on length criteria

Using this dataset:

dat <- read.table(textConnection("day    noRes.Q    wRes.Q
 1  237074.41 215409.41
 2 2336240.20 164835.16
 3   84855.42 357062.72
 4   76993.48 386326.78
 5   73489.47 307144.09
 6   70246.96  75885.75
 7   69630.09  74054.33
 8   66714.78  70071.80
 9  122296.90  66579.08
10   63502.71  65811.37
11   63401.84  64795.12
12   63387.84  64401.14
13   63186.10  64163.95
14   63160.74  63468.25
15   60471.15  60719.15
16   58235.63  57655.14
17   58089.73  58061.34
18   57846.39  57357.89
19   57839.42  56495.69
20   57740.06  56219.97
21   58068.57  55810.91
22   58358.34  56437.81
23   76284.90  73722.92
24  105138.31 100729.00
25  147203.03 178079.38
26  109996.02 111113.95
27   91424.20  87391.56
28   89065.91  87196.69
29   86628.74  84809.07
30   79357.60  77555.62"),header=T)

I'm attempting to generate a column that continuously numbers consecutive
rows where wRes.Q is greater than noRes.Q.  To that end, I've come up with
the following:

dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))

The problem with dat$cnt is that it doesn't start over with 1 when a
'new'
group of either true or false is encountered.  Thus, row 9's cnt value
should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
(the desired result is shown below)

In the larger dataset I'm working with (>6,000 rows), there are blocks of
rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
My goal is to plot these blocks of rows as polygons in a time series plot.
If, for the small example provided, the number of consecutive rows with
dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
satisfying this criteria in this small example are rows 3-8 and 10-15), is
there a way to add a column that uniquely numbers these blocks of rows? I'd
like to end up with the following, which shows the correct "cnt"
column and
a column called "plygn" that is my ultimate goal:

dat
# day    noRes.Q    wRes.Q   flg cnt  plygn
#   1  237074.41 215409.41 FALSE   1     NA
#   2 2336240.20 164835.16 FALSE   2     NA
#   3   84855.42 357062.72  TRUE   1      1
#   4   76993.48 386326.78  TRUE   2      1
#   5   73489.47 307144.09  TRUE   3      1
#   6   70246.96  75885.75  TRUE   4      1
#   7   69630.09  74054.33  TRUE   5      1
#   8   66714.78  70071.80  TRUE   6      1
#   9  122296.90  66579.08 FALSE   1     NA
#  10   63502.71  65811.37  TRUE   1      2
#  11   63401.84  64795.12  TRUE   2      2
#  12   63387.84  64401.14  TRUE   3      2
#  13   63186.10  64163.95  TRUE   4      2
#  14   63160.74  63468.25  TRUE   5      2
#  15   60471.15  60719.15  TRUE   6      2
#  16   58235.63  57655.14 FALSE   1     NA
#  17   58089.73  58061.34 FALSE   2     NA
#  18   57846.39  57357.89 FALSE   3     NA
#  19   57839.42  56495.69 FALSE   4     NA
#  20   57740.06  56219.97 FALSE   5     NA
#  21   58068.57  55810.91 FALSE   6     NA
#  22   58358.34  56437.81 FALSE   7     NA
#  23   76284.90  73722.92 FALSE   8     NA
#  24  105138.31 100729.00 FALSE   9     NA
#  25  147203.03 178079.38  TRUE   1     NA
#  26  109996.02 111113.95  TRUE   2     NA
#  27   91424.20  87391.56 FALSE   1     NA
#  28   89065.91  87196.69 FALSE   2     NA
#  29   86628.74  84809.07 FALSE   3     NA
#  30   79357.60  77555.62 FALSE   4     NA

Thanks, Eric

	[[alternative HTML version deleted]]

Marc Schwartz

2015-Mar-02 17:56 UTC

head link

[R] numbering consecutive rows based on length criteria

On Mar 2, 2015, at 11:43 AM, Morway, Eric <emorway at usgs.gov>
wrote:> 
> Using this dataset:
> 
> dat <- read.table(textConnection("day    noRes.Q    wRes.Q
> 1  237074.41 215409.41
> 2 2336240.20 164835.16
> 3   84855.42 357062.72
> 4   76993.48 386326.78
> 5   73489.47 307144.09
> 6   70246.96  75885.75
> 7   69630.09  74054.33
> 8   66714.78  70071.80
> 9  122296.90  66579.08
> 10   63502.71  65811.37
> 11   63401.84  64795.12
> 12   63387.84  64401.14
> 13   63186.10  64163.95
> 14   63160.74  63468.25
> 15   60471.15  60719.15
> 16   58235.63  57655.14
> 17   58089.73  58061.34
> 18   57846.39  57357.89
> 19   57839.42  56495.69
> 20   57740.06  56219.97
> 21   58068.57  55810.91
> 22   58358.34  56437.81
> 23   76284.90  73722.92
> 24  105138.31 100729.00
> 25  147203.03 178079.38
> 26  109996.02 111113.95
> 27   91424.20  87391.56
> 28   89065.91  87196.69
> 29   86628.74  84809.07
> 30   79357.60  77555.62"),header=T)
> 
> I'm attempting to generate a column that continuously numbers
consecutive
> rows where wRes.Q is greater than noRes.Q.  To that end, I've come up
with
> the following:
> 
> dat$flg <- dat$wRes.Q>dat$noRes.Q
> dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
> 
> The problem with dat$cnt is that it doesn't start over with 1 when a
'new'
> group of either true or false is encountered.  Thus, row 9's cnt value
> should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
> (the desired result is shown below)
> 
> In the larger dataset I'm working with (>6,000 rows), there are
blocks of
> rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
> My goal is to plot these blocks of rows as polygons in a time series plot.
> If, for the small example provided, the number of consecutive rows with
> dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
> satisfying this criteria in this small example are rows 3-8 and 10-15), is
> there a way to add a column that uniquely numbers these blocks of rows?
I'd
> like to end up with the following, which shows the correct "cnt"
column and
> a column called "plygn" that is my ultimate goal:
> 
> dat
> # day    noRes.Q    wRes.Q   flg cnt  plygn
> #   1  237074.41 215409.41 FALSE   1     NA
> #   2 2336240.20 164835.16 FALSE   2     NA
> #   3   84855.42 357062.72  TRUE   1      1
> #   4   76993.48 386326.78  TRUE   2      1
> #   5   73489.47 307144.09  TRUE   3      1
> #   6   70246.96  75885.75  TRUE   4      1
> #   7   69630.09  74054.33  TRUE   5      1
> #   8   66714.78  70071.80  TRUE   6      1
> #   9  122296.90  66579.08 FALSE   1     NA
> #  10   63502.71  65811.37  TRUE   1      2
> #  11   63401.84  64795.12  TRUE   2      2
> #  12   63387.84  64401.14  TRUE   3      2
> #  13   63186.10  64163.95  TRUE   4      2
> #  14   63160.74  63468.25  TRUE   5      2
> #  15   60471.15  60719.15  TRUE   6      2
> #  16   58235.63  57655.14 FALSE   1     NA
> #  17   58089.73  58061.34 FALSE   2     NA
> #  18   57846.39  57357.89 FALSE   3     NA
> #  19   57839.42  56495.69 FALSE   4     NA
> #  20   57740.06  56219.97 FALSE   5     NA
> #  21   58068.57  55810.91 FALSE   6     NA
> #  22   58358.34  56437.81 FALSE   7     NA
> #  23   76284.90  73722.92 FALSE   8     NA
> #  24  105138.31 100729.00 FALSE   9     NA
> #  25  147203.03 178079.38  TRUE   1     NA
> #  26  109996.02 111113.95  TRUE   2     NA
> #  27   91424.20  87391.56 FALSE   1     NA
> #  28   89065.91  87196.69 FALSE   2     NA
> #  29   86628.74  84809.07 FALSE   3     NA
> #  30   79357.60  77555.62 FALSE   4     NA
> 
> Thanks, Eric

Hi,

See ?rle
> unlist(sapply(rle(with(dat, wRes.Q > noRes.Q))$lengths, seq)) [1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4

cbind() the result above to your data frame.

Regards,

Marc Schwartz

Thierry Onkelinx

2015-Mar-02 18:02 UTC

head link

[R] numbering consecutive rows based on length criteria

Dear Eric,

Here is a solution using the plyr package.

library(plyr)
dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$group <- cumsum(c(0, abs(diff(dat$flg))))
ddply(dat, "group", function(x){
  if(x$flg[1] && nrow(x) >= 5){
    x$plygn <- seq_along(x$group)
  } else {
    x$plygn <- NA
  }
  x
})

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-03-02 18:43 GMT+01:00 Morway, Eric <emorway at usgs.gov>:
> Using this dataset:
>
> dat <- read.table(textConnection("day    noRes.Q    wRes.Q
>  1  237074.41 215409.41
>  2 2336240.20 164835.16
>  3   84855.42 357062.72
>  4   76993.48 386326.78
>  5   73489.47 307144.09
>  6   70246.96  75885.75
>  7   69630.09  74054.33
>  8   66714.78  70071.80
>  9  122296.90  66579.08
> 10   63502.71  65811.37
> 11   63401.84  64795.12
> 12   63387.84  64401.14
> 13   63186.10  64163.95
> 14   63160.74  63468.25
> 15   60471.15  60719.15
> 16   58235.63  57655.14
> 17   58089.73  58061.34
> 18   57846.39  57357.89
> 19   57839.42  56495.69
> 20   57740.06  56219.97
> 21   58068.57  55810.91
> 22   58358.34  56437.81
> 23   76284.90  73722.92
> 24  105138.31 100729.00
> 25  147203.03 178079.38
> 26  109996.02 111113.95
> 27   91424.20  87391.56
> 28   89065.91  87196.69
> 29   86628.74  84809.07
> 30   79357.60  77555.62"),header=T)
>
> I'm attempting to generate a column that continuously numbers
consecutive
> rows where wRes.Q is greater than noRes.Q.  To that end, I've come up
with
> the following:
>
> dat$flg <- dat$wRes.Q>dat$noRes.Q
> dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
>
> The problem with dat$cnt is that it doesn't start over with 1 when a
'new'
> group of either true or false is encountered.  Thus, row 9's cnt value
> should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
> (the desired result is shown below)
>
> In the larger dataset I'm working with (>6,000 rows), there are
blocks of
> rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
> My goal is to plot these blocks of rows as polygons in a time series plot.
> If, for the small example provided, the number of consecutive rows with
> dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
> satisfying this criteria in this small example are rows 3-8 and 10-15), is
> there a way to add a column that uniquely numbers these blocks of rows?
I'd
> like to end up with the following, which shows the correct "cnt"
column and
> a column called "plygn" that is my ultimate goal:
>
> dat
> # day    noRes.Q    wRes.Q   flg cnt  plygn
> #   1  237074.41 215409.41 FALSE   1     NA
> #   2 2336240.20 164835.16 FALSE   2     NA
> #   3   84855.42 357062.72  TRUE   1      1
> #   4   76993.48 386326.78  TRUE   2      1
> #   5   73489.47 307144.09  TRUE   3      1
> #   6   70246.96  75885.75  TRUE   4      1
> #   7   69630.09  74054.33  TRUE   5      1
> #   8   66714.78  70071.80  TRUE   6      1
> #   9  122296.90  66579.08 FALSE   1     NA
> #  10   63502.71  65811.37  TRUE   1      2
> #  11   63401.84  64795.12  TRUE   2      2
> #  12   63387.84  64401.14  TRUE   3      2
> #  13   63186.10  64163.95  TRUE   4      2
> #  14   63160.74  63468.25  TRUE   5      2
> #  15   60471.15  60719.15  TRUE   6      2
> #  16   58235.63  57655.14 FALSE   1     NA
> #  17   58089.73  58061.34 FALSE   2     NA
> #  18   57846.39  57357.89 FALSE   3     NA
> #  19   57839.42  56495.69 FALSE   4     NA
> #  20   57740.06  56219.97 FALSE   5     NA
> #  21   58068.57  55810.91 FALSE   6     NA
> #  22   58358.34  56437.81 FALSE   7     NA
> #  23   76284.90  73722.92 FALSE   8     NA
> #  24  105138.31 100729.00 FALSE   9     NA
> #  25  147203.03 178079.38  TRUE   1     NA
> #  26  109996.02 111113.95  TRUE   2     NA
> #  27   91424.20  87391.56 FALSE   1     NA
> #  28   89065.91  87196.69 FALSE   2     NA
> #  29   86628.74  84809.07 FALSE   3     NA
> #  30   79357.60  77555.62 FALSE   4     NA
>
> Thanks, Eric
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Mar 2015 - numbering consecutive rows based on length criteria

[R] numbering consecutive rows based on length criteria

[R] numbering consecutive rows based on length criteria

[R] numbering consecutive rows based on length criteria