Knut Krueger
2018-Sep-27 14:48 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi to all I need a subset for values if there are f.e 3 values successive in a column of a Data Frame: Example from the subset help page: subset(airquality, Temp > 80, select = c(Ozone, Temp)) 29 45 81 35 NA 84 36 NA 85 38 29 82 39 NA 87 40 71 90 41 39 87 42 NA 93 43 NA 92 44 23 82 ..... I would like to get only ... 40 71 90 41 39 87 42 NA 93 43 NA 92 44 23 82 .... because the left column is ascending more than f.e three times without gap Any hints for a package or do I need to build a own function? Kind Regards Knut
Bert Gunter
2018-Sep-27 15:09 UTC
[R] subset only if f.e a column is successive for more than 3 values
1. I assume the values are integers, not floats/numerics (which woud make it more complicated). 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a row. I don't have time to work out details, but perhaps that helps. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at krueger-family.de> wrote:> Hi to all > > I need a subset for values if there are f.e 3 values successive in a > column of a Data Frame: > Example from the subset help page: > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > 29 45 81 > 35 NA 84 > 36 NA 85 > 38 29 82 > 39 NA 87 > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > ..... > > I would like to get only > > ... > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > .... > > because the left column is ascending more than f.e three times without gap > > Any hints for a package or do I need to build a own function? > > Kind Regards Knut > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jim Lemon
2018-Sep-27 22:35 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi Knut, As Bert said, you can start with diff and work from there. I can easily get the text for the subset, but despite fooling around with "parse", "eval" and "expression", I couldn't get it to work: # use a bigger subset to test whether multiple runs can be extracted kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp")) kkdf$index<-as.numeric(rownames(kkdf)) # get the run length encoding seqindx<-rle(diff(kkdf$index)==1) # get a logical vector of the starts of the runs runsel<-seqindx$lengths >= 3 & seqindx$values # get the indices for the starts of the runs starts<-cumsum(seqindx$lengths)[runsel[-1]]+1 # and the ends ends<-cumsum(seqindx$lengths)[runsel]+1 # the character representation of the subset as indices is paste0("c(",paste(starts,ends,sep=":",collapse=","),")") I expect there will be a lightning response from someone who knows about converting the resulting string into whatever is needed. Jim On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > 1. I assume the values are integers, not floats/numerics (which woud make > it more complicated). > > 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a > row. > > I don't have time to work out details, but perhaps that helps. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at krueger-family.de> > wrote: > > > Hi to all > > > > I need a subset for values if there are f.e 3 values successive in a > > column of a Data Frame: > > Example from the subset help page: > > > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > > 29 45 81 > > 35 NA 84 > > 36 NA 85 > > 38 29 82 > > 39 NA 87 > > 40 71 90 > > 41 39 87 > > 42 NA 93 > > 43 NA 92 > > 44 23 82 > > ..... > > > > I would like to get only > > > > ... > > 40 71 90 > > 41 39 87 > > 42 NA 93 > > 43 NA 92 > > 44 23 82 > > .... > > > > because the left column is ascending more than f.e three times without gap > > > > Any hints for a package or do I need to build a own function? > > > > Kind Regards Knut > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2018-Sep-28 15:22 UTC
[R] subset only if f.e a column is successive for more than 3 values
Do you also want lines 38 and 39 (in addition to 40:44), or do I misunderstand your problem? When you deal with runs of data, think of the rle (run-length encoding) function. E.g. here is a barely tested function to find runs of a given minimum length and a given difference between successive values. It also returns a 'runNumber' so you can split the result into runs. findRuns <- function(x, minRunLength=3, difference=1) { # for integral x, find runs of length at least 'minRunLength' # with 'difference' between succesive values d <- diff(x) dRle <- rle(d) w <- rep(dRle$lengths>=minRunLength-1 & dRle$values==difference, dRle$lengths) values <- x[c(FALSE,w) | c(w,FALSE)] runNumber <- cumsum(c(TRUE, diff(values)!=difference)) data.frame(values=values, runNumber=runNumber) }> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20))values runNumber 1 1 1 2 2 1 3 3 1 4 17 2 5 18 2 6 19 2 7 20 2> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), minRunLength=4)values runNumber 1 17 1 2 18 1 3 19 1 4 20 1> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), difference=-2)values runNumber 1 10 1 2 8 1 3 6 1 4 4 1 Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Sep 27, 2018 at 7:48 AM, Knut Krueger <rhelp at krueger-family.de> wrote:> Hi to all > > I need a subset for values if there are f.e 3 values successive in a > column of a Data Frame: > Example from the subset help page: > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > 29 45 81 > 35 NA 84 > 36 NA 85 > 38 29 82 > 39 NA 87 > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > ..... > > I would like to get only > > ... > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > .... > > because the left column is ascending more than f.e three times without gap > > Any hints for a package or do I need to build a own function? > > Kind Regards Knut > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]