Cormac Long
2011-Jun-23 13:44 UTC
[R] problem (and solution) to rle on vector with NA values
Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example ??????? rv<-c(1,1,NA,NA,3,3,3);rle(rv) Returns ??????? Run Length Encoding ??????? ? lengths: int [1:4] 2 1 1 3 ??????? ? values : num [1:4] 1 NA NA 3 not ??????? Run Length Encoding ??????? ? lengths: int [1:3] 2 2 3 ??????? ? values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test ? ? ? ?? y <- x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle<-function (x) { ??? if (!is.vector(x) && !is.list(x)) ??????? stop("'x' must be an atomic vector") ??? n <- length(x) ??? if (n == 0L) ??????? return(structure(list(lengths = integer(), values = x), ??????????? class = "rle")) ??? #### BEGIN NEW SECTION PART 1 #### ??? naRepFlag<-F ??? if(any(is.na(x))){ ??????? naRepFlag<-T ??????? IS_LOGIC<-ifelse(typeof(x)=="logical",T,F) ??????? if(typeof(x)=="logical"){ ??????????? x<-as.integer(x) ??????????? naMaskVal<-2 ??????? }else if(typeof(x)=="character"){ ??????????? naMaskVal<-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="") ??????? }else{ ??????????? naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 ??????? } ??????? x[which(is.na(x))]<-naMaskVal ??? } ??? #### END NEW SECTION PART 1 #### ??? y <- x[-1L] != x[-n] ??? i <- c(which(y), n) ??? #### BEGIN NEW SECTION PART 2 #### ??? if(naRepFlag) ??????? x[which(x==naMaskVal)]<-NA ??? if(IS_LOGIC) ??????? x<-as.logical(x) ??? #### END NEW SECTION PART 2 #### ??? structure(list(lengths = diff(c(0L, i)), values = x[i]), ??????? class = "rle") } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Dr. Cormac Long.
Peter Ehlers
2011-Jun-23 14:47 UTC
[R] problem (and solution) to rle on vector with NA values
On 2011-06-23 06:44, Cormac Long wrote:> Hello there R-help, > > I'm not sure if this should be posted here - so apologies if this is the case. > I've found a problem while using rle and am proposing a solution to the issue. > > Description: > I ran into a niggle with rle today when working with vectors with NA values > (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values > is not encoded in the same way as a run of other values. See the following > example as an illustration: > > Example: > The example > rv<-c(1,1,NA,NA,3,3,3);rle(rv) > Returns > Run Length Encoding > lengths: int [1:4] 2 1 1 3 > values : num [1:4] 1 NA NA 3 > not > Run Length Encoding > lengths: int [1:3] 2 2 3 > values : num [1:3] 1 NA 3 > as I expected. This caused my code to fail later (unsurprising). > > Analysis: > The problem stems from the test > y<- x[-1L] != x[-n] > in line 7 of the rle function body. In this test, NA values return logical NA > values, not TRUE/FALSE (again, unsurprising). > > Resolution: > I modified the rle function code as included below. As far as I tested, this > modification appears safe. The convoluted construction of naMaskVal > should guarantee that the NA masking value is always different from > any value in the vector and should be safe regardless of the input vector > form (a raw vector is not handled since the NA values do not apply here). > > rle<-function (x) > { > if (!is.vector(x)&& !is.list(x)) > stop("'x' must be an atomic vector") > n<- length(x) > if (n == 0L) > return(structure(list(lengths = integer(), values = x), > class = "rle")) > > #### BEGIN NEW SECTION PART 1 #### > naRepFlag<-F > if(any(is.na(x))){ > naRepFlag<-T > IS_LOGIC<-ifelse(typeof(x)=="logical",T,F) > > if(typeof(x)=="logical"){ > x<-as.integer(x) > naMaskVal<-2 > }else if(typeof(x)=="character"){ > naMaskVal<-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="") > }else{ > naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 > } > > x[which(is.na(x))]<-naMaskVal > } > #### END NEW SECTION PART 1 #### > > y<- x[-1L] != x[-n] > i<- c(which(y), n) > > #### BEGIN NEW SECTION PART 2 #### > if(naRepFlag) > x[which(x==naMaskVal)]<-NA > > if(IS_LOGIC) > x<-as.logical(x) > #### END NEW SECTION PART 2 #### > > structure(list(lengths = diff(c(0L, i)), values = x[i]), > class = "rle") > } > > Conclusion: > I think that the proposed code modification is an improvement on the existing > implementation of rle. Is it impertinent to suggest this R-modification to the > gurus at R? > > Best wishes (in flame-war trepidation),Well, it's not worth a flame, but ... from the help page (see 'Details'): "Missing values are regarded as unequal to the previous value, even if that is also missing." Peter Ehlers> Dr. Cormac Long. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Nick Sabbe
2011-Jun-23 14:59 UTC
[R] problem (and solution) to rle on vector with NA values
Hello Cormac. Not having thoroughly checked whether your code actually works, the behavior of rle you describe is the one documented (check the details of ?rle) and makes sense as the missingness could have different reasons. As such, changing this type of behavior would probably break a lot of existing code that is built on top of rle. There are other peculiarities and disputabilities about some base R functions (the order of the arguments for sample trips me every time), but unless the argument is really strong or a downright bug, I doubt people will be willing to change this. Perhaps making the new behavior optional (through a new parameter na.action or similar, with the default the original behavior) is an option? Feel free to run your own version of rle in any case. I suggest you rename it, though, as it may cause problems for some packages. Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Cormac Long > Sent: donderdag 23 juni 2011 15:44 > To: r-help at r-project.org > Subject: [R] problem (and solution) to rle on vector with NA values > > Hello there R-help, > > I'm not sure if this should be posted here - so apologies if this is > the case. > I've found a problem while using rle and am proposing a solution to the > issue. > > Description: > I ran into a niggle with rle today when working with vectors with NA > values > (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA > values > is not encoded in the same way as a run of other values. See the > following > example as an illustration: > > Example: > The example > ??????? rv<-c(1,1,NA,NA,3,3,3);rle(rv) > Returns > ??????? Run Length Encoding > ??????? ? lengths: int [1:4] 2 1 1 3 > ??????? ? values : num [1:4] 1 NA NA 3 > not > ??????? Run Length Encoding > ??????? ? lengths: int [1:3] 2 2 3 > ??????? ? values : num [1:3] 1 NA 3 > as I expected. This caused my code to fail later (unsurprising). > > Analysis: > The problem stems from the test > ? ? ? ?? y <- x[-1L] != x[-n] > in line 7 of the rle function body. In this test, NA values return > logical NA > values, not TRUE/FALSE (again, unsurprising). > > Resolution: > I modified the rle function code as included below. As far as I tested, > this > modification appears safe. The convoluted construction of naMaskVal > should guarantee that the NA masking value is always different from > any value in the vector and should be safe regardless of the input > vector > form (a raw vector is not handled since the NA values do not apply > here). > > rle<-function (x) > { > ??? if (!is.vector(x) && !is.list(x)) > ??????? stop("'x' must be an atomic vector") > ??? n <- length(x) > ??? if (n == 0L) > ??????? return(structure(list(lengths = integer(), values = x), > ??????????? class = "rle")) > > ??? #### BEGIN NEW SECTION PART 1 #### > ??? naRepFlag<-F > ??? if(any(is.na(x))){ > ??????? naRepFlag<-T > ??????? IS_LOGIC<-ifelse(typeof(x)=="logical",T,F) > > ??????? if(typeof(x)=="logical"){ > ??????????? x<-as.integer(x) > ??????????? naMaskVal<-2 > ??????? }else if(typeof(x)=="character"){ > ??????????? naMaskVal<- > paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="") > ??????? }else{ > ??????????? naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 > ??????? } > > ??????? x[which(is.na(x))]<-naMaskVal > ??? } > ??? #### END NEW SECTION PART 1 #### > > ??? y <- x[-1L] != x[-n] > ??? i <- c(which(y), n) > > ??? #### BEGIN NEW SECTION PART 2 #### > ??? if(naRepFlag) > ??????? x[which(x==naMaskVal)]<-NA > > ??? if(IS_LOGIC) > ??????? x<-as.logical(x) > ??? #### END NEW SECTION PART 2 #### > > ??? structure(list(lengths = diff(c(0L, i)), values = x[i]), > ??????? class = "rle") > } > > Conclusion: > I think that the proposed code modification is an improvement on the > existing > implementation of rle. Is it impertinent to suggest this R-modification > to the > gurus at R? > > Best wishes (in flame-war trepidation), > Dr. Cormac Long. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Cormac Long
2011-Jun-23 15:37 UTC
[R] problem (and solution) to rle on vector with NA values
D'oh! Completely missed that. Definately a case or RTFMS (RTFM, Stupid). My apologies for the spam. Sincerely (with additional grovelling) Cormac. On 23 June 2011 15:59, Nick Sabbe <nick.sabbe at ugent.be> wrote:> Hello Cormac. > > Not having thoroughly checked whether your code actually works, the behavior > of rle you describe is the one documented (check the details of ?rle) and > makes sense as the missingness could have different reasons. > As such, changing this type of behavior would probably break a lot of > existing code that is built on top of rle. > > There are other peculiarities and disputabilities about some base R > functions (the order of the arguments for sample trips me every time), but > unless the argument is really strong or a downright bug, I doubt people will > be willing to change this. Perhaps making the new behavior optional (through > a new parameter na.action or similar, with the default the original > behavior) is an option? > > Feel free to run your own version of rle in any case. I suggest you rename > it, though, as it may cause problems for some packages. > > > Nick Sabbe > -- > ping: nick.sabbe at ugent.be > link: http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > > -- Do Not Disapprove > > > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of Cormac Long >> Sent: donderdag 23 juni 2011 15:44 >> To: r-help at r-project.org >> Subject: [R] problem (and solution) to rle on vector with NA values >> >> Hello there R-help, >> >> I'm not sure if this should be posted here - so apologies if this is >> the case. >> I've found a problem while using rle and am proposing a solution to the >> issue. >> >> Description: >> I ran into a niggle with rle today when working with vectors with NA >> values >> (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA >> values >> is not encoded in the same way as a run of other values. See the >> following >> example as an illustration: >> >> Example: >> The example >> ??????? rv<-c(1,1,NA,NA,3,3,3);rle(rv) >> Returns >> ??????? Run Length Encoding >> ??????? ? lengths: int [1:4] 2 1 1 3 >> ??????? ? values : num [1:4] 1 NA NA 3 >> not >> ??????? Run Length Encoding >> ??????? ? lengths: int [1:3] 2 2 3 >> ??????? ? values : num [1:3] 1 NA 3 >> as I expected. This caused my code to fail later (unsurprising). >> >> Analysis: >> The problem stems from the test >> ? ? ? ?? y <- x[-1L] != x[-n] >> in line 7 of the rle function body. In this test, NA values return >> logical NA >> values, not TRUE/FALSE (again, unsurprising). >> >> Resolution: >> I modified the rle function code as included below. As far as I tested, >> this >> modification appears safe. The convoluted construction of naMaskVal >> should guarantee that the NA masking value is always different from >> any value in the vector and should be safe regardless of the input >> vector >> form (a raw vector is not handled since the NA values do not apply >> here). >> >> rle<-function (x) >> { >> ??? if (!is.vector(x) && !is.list(x)) >> ??????? stop("'x' must be an atomic vector") >> ??? n <- length(x) >> ??? if (n == 0L) >> ??????? return(structure(list(lengths = integer(), values = x), >> ??????????? class = "rle")) >> >> ??? #### BEGIN NEW SECTION PART 1 #### >> ??? naRepFlag<-F >> ??? if(any(is.na(x))){ >> ??????? naRepFlag<-T >> ??????? IS_LOGIC<-ifelse(typeof(x)=="logical",T,F) >> >> ??????? if(typeof(x)=="logical"){ >> ??????????? x<-as.integer(x) >> ??????????? naMaskVal<-2 >> ??????? }else if(typeof(x)=="character"){ >> ??????????? naMaskVal<- >> paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="") >> ??????? }else{ >> ??????????? naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 >> ??????? } >> >> ??????? x[which(is.na(x))]<-naMaskVal >> ??? } >> ??? #### END NEW SECTION PART 1 #### >> >> ??? y <- x[-1L] != x[-n] >> ??? i <- c(which(y), n) >> >> ??? #### BEGIN NEW SECTION PART 2 #### >> ??? if(naRepFlag) >> ??????? x[which(x==naMaskVal)]<-NA >> >> ??? if(IS_LOGIC) >> ??????? x<-as.logical(x) >> ??? #### END NEW SECTION PART 2 #### >> >> ??? structure(list(lengths = diff(c(0L, i)), values = x[i]), >> ??????? class = "rle") >> } >> >> Conclusion: >> I think that the proposed code modification is an improvement on the >> existing >> implementation of rle. Is it impertinent to suggest this R-modification >> to the >> gurus at R? >> >> Best wishes (in flame-war trepidation), >> Dr. Cormac Long. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > >