Morway, Eric
2015-Mar-02 17:43 UTC
[R] numbering consecutive rows based on length criteria
Using this dataset: dat <- read.table(textConnection("day noRes.Q wRes.Q 1 237074.41 215409.41 2 2336240.20 164835.16 3 84855.42 357062.72 4 76993.48 386326.78 5 73489.47 307144.09 6 70246.96 75885.75 7 69630.09 74054.33 8 66714.78 70071.80 9 122296.90 66579.08 10 63502.71 65811.37 11 63401.84 64795.12 12 63387.84 64401.14 13 63186.10 64163.95 14 63160.74 63468.25 15 60471.15 60719.15 16 58235.63 57655.14 17 58089.73 58061.34 18 57846.39 57357.89 19 57839.42 56495.69 20 57740.06 56219.97 21 58068.57 55810.91 22 58358.34 56437.81 23 76284.90 73722.92 24 105138.31 100729.00 25 147203.03 178079.38 26 109996.02 111113.95 27 91424.20 87391.56 28 89065.91 87196.69 29 86628.74 84809.07 30 79357.60 77555.62"),header=T) I'm attempting to generate a column that continuously numbers consecutive rows where wRes.Q is greater than noRes.Q. To that end, I've come up with the following: dat$flg <- dat$wRes.Q>dat$noRes.Q dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) The problem with dat$cnt is that it doesn't start over with 1 when a 'new' group of either true or false is encountered. Thus, row 9's cnt value should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. (the desired result is shown below) In the larger dataset I'm working with (>6,000 rows), there are blocks of rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. My goal is to plot these blocks of rows as polygons in a time series plot. If, for the small example provided, the number of consecutive rows with dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows satisfying this criteria in this small example are rows 3-8 and 10-15), is there a way to add a column that uniquely numbers these blocks of rows? I'd like to end up with the following, which shows the correct "cnt" column and a column called "plygn" that is my ultimate goal: dat # day noRes.Q wRes.Q flg cnt plygn # 1 237074.41 215409.41 FALSE 1 NA # 2 2336240.20 164835.16 FALSE 2 NA # 3 84855.42 357062.72 TRUE 1 1 # 4 76993.48 386326.78 TRUE 2 1 # 5 73489.47 307144.09 TRUE 3 1 # 6 70246.96 75885.75 TRUE 4 1 # 7 69630.09 74054.33 TRUE 5 1 # 8 66714.78 70071.80 TRUE 6 1 # 9 122296.90 66579.08 FALSE 1 NA # 10 63502.71 65811.37 TRUE 1 2 # 11 63401.84 64795.12 TRUE 2 2 # 12 63387.84 64401.14 TRUE 3 2 # 13 63186.10 64163.95 TRUE 4 2 # 14 63160.74 63468.25 TRUE 5 2 # 15 60471.15 60719.15 TRUE 6 2 # 16 58235.63 57655.14 FALSE 1 NA # 17 58089.73 58061.34 FALSE 2 NA # 18 57846.39 57357.89 FALSE 3 NA # 19 57839.42 56495.69 FALSE 4 NA # 20 57740.06 56219.97 FALSE 5 NA # 21 58068.57 55810.91 FALSE 6 NA # 22 58358.34 56437.81 FALSE 7 NA # 23 76284.90 73722.92 FALSE 8 NA # 24 105138.31 100729.00 FALSE 9 NA # 25 147203.03 178079.38 TRUE 1 NA # 26 109996.02 111113.95 TRUE 2 NA # 27 91424.20 87391.56 FALSE 1 NA # 28 89065.91 87196.69 FALSE 2 NA # 29 86628.74 84809.07 FALSE 3 NA # 30 79357.60 77555.62 FALSE 4 NA Thanks, Eric [[alternative HTML version deleted]]
Marc Schwartz
2015-Mar-02 17:56 UTC
[R] numbering consecutive rows based on length criteria
On Mar 2, 2015, at 11:43 AM, Morway, Eric <emorway at usgs.gov> wrote:> > Using this dataset: > > dat <- read.table(textConnection("day noRes.Q wRes.Q > 1 237074.41 215409.41 > 2 2336240.20 164835.16 > 3 84855.42 357062.72 > 4 76993.48 386326.78 > 5 73489.47 307144.09 > 6 70246.96 75885.75 > 7 69630.09 74054.33 > 8 66714.78 70071.80 > 9 122296.90 66579.08 > 10 63502.71 65811.37 > 11 63401.84 64795.12 > 12 63387.84 64401.14 > 13 63186.10 64163.95 > 14 63160.74 63468.25 > 15 60471.15 60719.15 > 16 58235.63 57655.14 > 17 58089.73 58061.34 > 18 57846.39 57357.89 > 19 57839.42 56495.69 > 20 57740.06 56219.97 > 21 58068.57 55810.91 > 22 58358.34 56437.81 > 23 76284.90 73722.92 > 24 105138.31 100729.00 > 25 147203.03 178079.38 > 26 109996.02 111113.95 > 27 91424.20 87391.56 > 28 89065.91 87196.69 > 29 86628.74 84809.07 > 30 79357.60 77555.62"),header=T) > > I'm attempting to generate a column that continuously numbers consecutive > rows where wRes.Q is greater than noRes.Q. To that end, I've come up with > the following: > > dat$flg <- dat$wRes.Q>dat$noRes.Q > dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) > > The problem with dat$cnt is that it doesn't start over with 1 when a 'new' > group of either true or false is encountered. Thus, row 9's cnt value > should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. > (the desired result is shown below) > > In the larger dataset I'm working with (>6,000 rows), there are blocks of > rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. > My goal is to plot these blocks of rows as polygons in a time series plot. > If, for the small example provided, the number of consecutive rows with > dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows > satisfying this criteria in this small example are rows 3-8 and 10-15), is > there a way to add a column that uniquely numbers these blocks of rows? I'd > like to end up with the following, which shows the correct "cnt" column and > a column called "plygn" that is my ultimate goal: > > dat > # day noRes.Q wRes.Q flg cnt plygn > # 1 237074.41 215409.41 FALSE 1 NA > # 2 2336240.20 164835.16 FALSE 2 NA > # 3 84855.42 357062.72 TRUE 1 1 > # 4 76993.48 386326.78 TRUE 2 1 > # 5 73489.47 307144.09 TRUE 3 1 > # 6 70246.96 75885.75 TRUE 4 1 > # 7 69630.09 74054.33 TRUE 5 1 > # 8 66714.78 70071.80 TRUE 6 1 > # 9 122296.90 66579.08 FALSE 1 NA > # 10 63502.71 65811.37 TRUE 1 2 > # 11 63401.84 64795.12 TRUE 2 2 > # 12 63387.84 64401.14 TRUE 3 2 > # 13 63186.10 64163.95 TRUE 4 2 > # 14 63160.74 63468.25 TRUE 5 2 > # 15 60471.15 60719.15 TRUE 6 2 > # 16 58235.63 57655.14 FALSE 1 NA > # 17 58089.73 58061.34 FALSE 2 NA > # 18 57846.39 57357.89 FALSE 3 NA > # 19 57839.42 56495.69 FALSE 4 NA > # 20 57740.06 56219.97 FALSE 5 NA > # 21 58068.57 55810.91 FALSE 6 NA > # 22 58358.34 56437.81 FALSE 7 NA > # 23 76284.90 73722.92 FALSE 8 NA > # 24 105138.31 100729.00 FALSE 9 NA > # 25 147203.03 178079.38 TRUE 1 NA > # 26 109996.02 111113.95 TRUE 2 NA > # 27 91424.20 87391.56 FALSE 1 NA > # 28 89065.91 87196.69 FALSE 2 NA > # 29 86628.74 84809.07 FALSE 3 NA > # 30 79357.60 77555.62 FALSE 4 NA > > Thanks, EricHi, See ?rle> unlist(sapply(rle(with(dat, wRes.Q > noRes.Q))$lengths, seq))[1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4 cbind() the result above to your data frame. Regards, Marc Schwartz
Thierry Onkelinx
2015-Mar-02 18:02 UTC
[R] numbering consecutive rows based on length criteria
Dear Eric, Here is a solution using the plyr package. library(plyr) dat$flg <- dat$wRes.Q>dat$noRes.Q dat$group <- cumsum(c(0, abs(diff(dat$flg)))) ddply(dat, "group", function(x){ if(x$flg[1] && nrow(x) >= 5){ x$plygn <- seq_along(x$group) } else { x$plygn <- NA } x }) Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-03-02 18:43 GMT+01:00 Morway, Eric <emorway at usgs.gov>:> Using this dataset: > > dat <- read.table(textConnection("day noRes.Q wRes.Q > 1 237074.41 215409.41 > 2 2336240.20 164835.16 > 3 84855.42 357062.72 > 4 76993.48 386326.78 > 5 73489.47 307144.09 > 6 70246.96 75885.75 > 7 69630.09 74054.33 > 8 66714.78 70071.80 > 9 122296.90 66579.08 > 10 63502.71 65811.37 > 11 63401.84 64795.12 > 12 63387.84 64401.14 > 13 63186.10 64163.95 > 14 63160.74 63468.25 > 15 60471.15 60719.15 > 16 58235.63 57655.14 > 17 58089.73 58061.34 > 18 57846.39 57357.89 > 19 57839.42 56495.69 > 20 57740.06 56219.97 > 21 58068.57 55810.91 > 22 58358.34 56437.81 > 23 76284.90 73722.92 > 24 105138.31 100729.00 > 25 147203.03 178079.38 > 26 109996.02 111113.95 > 27 91424.20 87391.56 > 28 89065.91 87196.69 > 29 86628.74 84809.07 > 30 79357.60 77555.62"),header=T) > > I'm attempting to generate a column that continuously numbers consecutive > rows where wRes.Q is greater than noRes.Q. To that end, I've come up with > the following: > > dat$flg <- dat$wRes.Q>dat$noRes.Q > dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) > > The problem with dat$cnt is that it doesn't start over with 1 when a 'new' > group of either true or false is encountered. Thus, row 9's cnt value > should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. > (the desired result is shown below) > > In the larger dataset I'm working with (>6,000 rows), there are blocks of > rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. > My goal is to plot these blocks of rows as polygons in a time series plot. > If, for the small example provided, the number of consecutive rows with > dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows > satisfying this criteria in this small example are rows 3-8 and 10-15), is > there a way to add a column that uniquely numbers these blocks of rows? I'd > like to end up with the following, which shows the correct "cnt" column and > a column called "plygn" that is my ultimate goal: > > dat > # day noRes.Q wRes.Q flg cnt plygn > # 1 237074.41 215409.41 FALSE 1 NA > # 2 2336240.20 164835.16 FALSE 2 NA > # 3 84855.42 357062.72 TRUE 1 1 > # 4 76993.48 386326.78 TRUE 2 1 > # 5 73489.47 307144.09 TRUE 3 1 > # 6 70246.96 75885.75 TRUE 4 1 > # 7 69630.09 74054.33 TRUE 5 1 > # 8 66714.78 70071.80 TRUE 6 1 > # 9 122296.90 66579.08 FALSE 1 NA > # 10 63502.71 65811.37 TRUE 1 2 > # 11 63401.84 64795.12 TRUE 2 2 > # 12 63387.84 64401.14 TRUE 3 2 > # 13 63186.10 64163.95 TRUE 4 2 > # 14 63160.74 63468.25 TRUE 5 2 > # 15 60471.15 60719.15 TRUE 6 2 > # 16 58235.63 57655.14 FALSE 1 NA > # 17 58089.73 58061.34 FALSE 2 NA > # 18 57846.39 57357.89 FALSE 3 NA > # 19 57839.42 56495.69 FALSE 4 NA > # 20 57740.06 56219.97 FALSE 5 NA > # 21 58068.57 55810.91 FALSE 6 NA > # 22 58358.34 56437.81 FALSE 7 NA > # 23 76284.90 73722.92 FALSE 8 NA > # 24 105138.31 100729.00 FALSE 9 NA > # 25 147203.03 178079.38 TRUE 1 NA > # 26 109996.02 111113.95 TRUE 2 NA > # 27 91424.20 87391.56 FALSE 1 NA > # 28 89065.91 87196.69 FALSE 2 NA > # 29 86628.74 84809.07 FALSE 3 NA > # 30 79357.60 77555.62 FALSE 4 NA > > Thanks, Eric > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]