Knut Krueger
2018-Sep-27 14:48 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi to all I need a subset for values if there are f.e 3 values successive in a column of a Data Frame: Example from the subset help page: subset(airquality, Temp > 80, select = c(Ozone, Temp)) 29 45 81 35 NA 84 36 NA 85 38 29 82 39 NA 87 40 71 90 41 39 87 42 NA 93 43 NA 92 44 23 82 ..... I would like to get only ... 40 71 90 41 39 87 42 NA 93 43 NA 92 44 23 82 .... because the left column is ascending more than f.e three times without gap Any hints for a package or do I need to build a own function? Kind Regards Knut
Bert Gunter
2018-Sep-27 15:09 UTC
[R] subset only if f.e a column is successive for more than 3 values
1. I assume the values are integers, not floats/numerics (which woud make it more complicated). 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a row. I don't have time to work out details, but perhaps that helps. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at krueger-family.de> wrote:> Hi to all > > I need a subset for values if there are f.e 3 values successive in a > column of a Data Frame: > Example from the subset help page: > > subset(airquality, Temp > 80, select = c(Ozone, Temp)) > 29 45 81 > 35 NA 84 > 36 NA 85 > 38 29 82 > 39 NA 87 > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > ..... > > I would like to get only > > ... > 40 71 90 > 41 39 87 > 42 NA 93 > 43 NA 92 > 44 23 82 > .... > > because the left column is ascending more than f.e three times without gap > > Any hints for a package or do I need to build a own function? > > Kind Regards Knut > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jim Lemon
2018-Sep-27 22:35 UTC
[R] subset only if f.e a column is successive for more than 3 values
Hi Knut,
As Bert said, you can start with diff and work from there. I can
easily get the text for the subset, but despite fooling around with
"parse", "eval" and "expression", I couldn't
get it to work:
# use a bigger subset to test whether multiple runs can be extracted
kkdf<-subset(airquality,Temp >
77,select=c("Ozone","Temp"))
kkdf$index<-as.numeric(rownames(kkdf))
# get the run length encoding
seqindx<-rle(diff(kkdf$index)==1)
# get a logical vector of the starts of the runs
runsel<-seqindx$lengths >= 3 & seqindx$values
# get the indices for the starts of the runs
starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
# and the ends
ends<-cumsum(seqindx$lengths)[runsel]+1
# the character representation of the subset as indices is
paste0("c(",paste(starts,ends,sep=":",collapse=","),")")
I expect there will be a lightning response from someone who knows
about converting the resulting string into whatever is needed.
Jim
On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> 1. I assume the values are integers, not floats/numerics (which woud make
> it more complicated).
>
> 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's
in a
> row.
>
> I don't have time to work out details, but perhaps that helps.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp at
krueger-family.de>
> wrote:
>
> > Hi to all
> >
> > I need a subset for values if there are f.e 3 values successive in a
> > column of a Data Frame:
> > Example from the subset help page:
> >
> > subset(airquality, Temp > 80, select = c(Ozone, Temp))
> > 29 45 81
> > 35 NA 84
> > 36 NA 85
> > 38 29 82
> > 39 NA 87
> > 40 71 90
> > 41 39 87
> > 42 NA 93
> > 43 NA 92
> > 44 23 82
> > .....
> >
> > I would like to get only
> >
> > ...
> > 40 71 90
> > 41 39 87
> > 42 NA 93
> > 43 NA 92
> > 44 23 82
> > ....
> >
> > because the left column is ascending more than f.e three times without
gap
> >
> > Any hints for a package or do I need to build a own function?
> >
> > Kind Regards Knut
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2018-Sep-28 15:22 UTC
[R] subset only if f.e a column is successive for more than 3 values
Do you also want lines 38 and 39 (in addition to 40:44), or do I
misunderstand your problem?
When you deal with runs of data, think of the rle (run-length encoding)
function. E.g. here is
a barely tested function to find runs of a given minimum length and a given
difference between
successive values. It also returns a 'runNumber' so you can split the
result into runs.
findRuns <- function(x, minRunLength=3, difference=1) {
# for integral x, find runs of length at least 'minRunLength'
# with 'difference' between succesive values
d <- diff(x)
dRle <- rle(d)
w <- rep(dRle$lengths>=minRunLength-1 & dRle$values==difference,
dRle$lengths)
values <- x[c(FALSE,w) | c(w,FALSE)]
runNumber <- cumsum(c(TRUE, diff(values)!=difference))
data.frame(values=values, runNumber=runNumber)
}
> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20))
values runNumber
1 1 1
2 2 1
3 3 1
4 17 2
5 18 2
6 19 2
7 20 2> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), minRunLength=4)
values runNumber
1 17 1
2 18 1
3 19 1
4 20 1> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), difference=-2)
values runNumber
1 10 1
2 8 1
3 6 1
4 4 1
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Sep 27, 2018 at 7:48 AM, Knut Krueger <rhelp at krueger-family.de>
wrote:
> Hi to all
>
> I need a subset for values if there are f.e 3 values successive in a
> column of a Data Frame:
> Example from the subset help page:
>
> subset(airquality, Temp > 80, select = c(Ozone, Temp))
> 29 45 81
> 35 NA 84
> 36 NA 85
> 38 29 82
> 39 NA 87
> 40 71 90
> 41 39 87
> 42 NA 93
> 43 NA 92
> 44 23 82
> .....
>
> I would like to get only
>
> ...
> 40 71 90
> 41 39 87
> 42 NA 93
> 43 NA 92
> 44 23 82
> ....
>
> because the left column is ascending more than f.e three times without gap
>
> Any hints for a package or do I need to build a own function?
>
> Kind Regards Knut
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]