Jim Lemon
2018-Sep-27 22:35 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi Knut, As Bert said, you can start with diff and work from there. I can easily get the text for the subset, but despite fooling around with "parse", "eval" and "expression", I couldn't get it to work: # use a bigger subset to test whether multiple runs can be extracted kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp")) kkdf$index<-as.numeric(rownames(kkdf)) # get the run length encoding seqindx<-rle(diff(kkdf$index)==1) # get a logical vector of the starts of the runs runsel<-seqindx$lengths >= 3 & seqindx$values # get the indices for the starts of the runs starts<-cumsum(seqindx$lengths)[runsel[-1]]+1 # and the ends ends<-cumsum(seqindx$lengths)[runsel]+1 # the character representation of the subset as indices is paste0("c(",paste(starts,ends,sep=":",collapse=","),")") I expect there will be a lightning response from someone who knows about converting the resulting string into whatever is needed. Jim On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > 1. I assume the values are integers, not floats/numerics (which woud make > it more complicated). > > 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a > row. > > I don't have time to work out details, but perhaps that helps. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at krueger-family.de> > wrote: > > > Hi to all > > > > I need a subset for values if there are f.e 3 values successive in a > > column of a Data Frame: > > Example from the subset help page: > > > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > > 29 45 81 > > 35 NA 84 > > 36 NA 85 > > 38 29 82 > > 39 NA 87 > > 40 71 90 > > 41 39 87 > > 42 NA 93 > > 43 NA 92 > > 44 23 82 > > ..... > > > > I would like to get only > > > > ... > > 40 71 90 > > 41 39 87 > > 42 NA 93 > > 43 NA 92 > > 44 23 82 > > .... > > > > because the left column is ascending more than f.e three times without gap > > > > Any hints for a package or do I need to build a own function? > > > > Kind Regards Knut > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2018-Sep-27 22:43 UTC
[R] subset only if f.e a column is successive for more than 3 values
Bugger! It's eval(parse(text=paste0("kkdf[c(",paste(starts,ends,sep=":",collapse=","),"),]"))) What a mess! Jim On Fri, Sep 28, 2018 at 8:35 AM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Knut, > As Bert said, you can start with diff and work from there. I can > easily get the text for the subset, but despite fooling around with > "parse", "eval" and "expression", I couldn't get it to work: > > # use a bigger subset to test whether multiple runs can be extracted > kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp")) > kkdf$index<-as.numeric(rownames(kkdf)) > # get the run length encoding > seqindx<-rle(diff(kkdf$index)==1) > # get a logical vector of the starts of the runs > runsel<-seqindx$lengths >= 3 & seqindx$values > # get the indices for the starts of the runs > starts<-cumsum(seqindx$lengths)[runsel[-1]]+1 > # and the ends > ends<-cumsum(seqindx$lengths)[runsel]+1 > # the character representation of the subset as indices is > paste0("c(",paste(starts,ends,sep=":",collapse=","),")") > > I expect there will be a lightning response from someone who knows > about converting the resulting string into whatever is needed. > > Jim > On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > 1. I assume the values are integers, not floats/numerics (which woud make > > it more complicated). > > > > 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a > > row. > > > > I don't have time to work out details, but perhaps that helps. > > > > Cheers, > > Bert > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at krueger-family.de> > > wrote: > > > > > Hi to all > > > > > > I need a subset for values if there are f.e 3 values successive in a > > > column of a Data Frame: > > > Example from the subset help page: > > > > > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > > > 29 45 81 > > > 35 NA 84 > > > 36 NA 85 > > > 38 29 82 > > > 39 NA 87 > > > 40 71 90 > > > 41 39 87 > > > 42 NA 93 > > > 43 NA 92 > > > 44 23 82 > > > ..... > > > > > > I would like to get only > > > > > > ... > > > 40 71 90 > > > 41 39 87 > > > 42 NA 93 > > > 43 NA 92 > > > 44 23 82 > > > .... > > > > > > because the left column is ascending more than f.e three times without gap > > > > > > Any hints for a package or do I need to build a own function? > > > > > > Kind Regards Knut > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Knut Krueger
2018-Sep-28 15:08 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi Jim, thank's it is working with the given example, but whats the difference when using testdata=data.frame(TIME=c("17:11:20", "17:11:21", "17:11:22", "17:11:23", "17:11:24", "17:11:25", "17:11:26", "17:11:27", "17:11:28", "17:21:43", "17:22:16", "17:22:19", "18:04:48", "18:04:49", "18:04:50", "18:04:51", "18:04:52", "19:50:09", "00:59:27", "00:59:28", "00:59:29", "04:13:40", "04:13:43", "04:13:44"), index=c(8960,8961,8962,8963,8964,8965,8966,8967,8968,9583,9616,9619,12168,12169,12170,12171,12172,18489 ,37047,37048,37049,48700,48701,48702)) seqindx<-rle(diff(testdata$index)==1) runsel<-seqindx$lengths >= 3 & seqindx$values # get the indices for the starts of the runs starts<-cumsum(seqindx$lengths)[runsel[-1]]+1 # and the ends ends<-cumsum(seqindx$lengths)[runsel]+1 eval(parse(text=paste0("testdata[c(",paste(starts,ends,sep=":",collapse=","),"),]"))) the result (index) is 12168,9619,9616,9583,8968,12168,12169,12170,12171,12172 maybe the gaps between .. 8967,8968,9583,9616,9619,12168,12169 ..? Regards Knut