Stephanie M. Gogarten
2015-Mar-03 22:09 UTC
[Rd] [R] Why does R replace all row values with NAs
On 3/3/15 1:26 PM, Herv? Pag?s wrote:> > > On 03/03/2015 02:28 AM, Martin Maechler wrote: >> Diverted from R-help : >> .... as it gets into musing about new R language "primitives" >> >>>>>>> William Dunlap <wdunlap at tibco.com> >>>>>>> on Fri, 27 Feb 2015 08:04:36 -0800 writes: >> >> > You could define functions like >> >> > is.true <- function(x) !is.na(x) & x >> > is.false <- function(x) !is.na(x) & !x >> >> > and use them in your selections. E.g., >> >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> >> x[is.true(x$c >= 6), ] >> > a b c >> > 7 7 8 7 >> > 10 10 11 10 >> >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com >> >> Yes; the Matrix package has had these >> >> is0 <- function(x) !is.na(x) & x == 0 >> isN0 <- function(x) is.na(x) | x != 0 >> is1 <- function(x) !is.na(x) & x # also == "isTRUE componentwise" > > Note that using %in% to block propagation of NAs is about 2x faster: > > > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE) > > microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x) > Unit: milliseconds > expr min lq mean median uq > as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 > !is.na(x) & x 11.202808 11.402437 11.469101 11.44848 11.517576 > max neval > 40.36472 100 > 11.90916 100Unfortunately %in% does not preserve matrix dimensions: > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > dim(x) [1] 50 10 > dim(!is.na(x) & x) [1] 50 10 > dim(as.logical(x) %in% TRUE) NULL Stephanie> > > >> >> namespace hidden for a while [note the comment of the last one!] >> and using them for readibility in its own code. >> >> Maybe we should (again) consider providing some versions of >> these with R ? >> >> The Matrix package also has had fast >> >> allFalse <- all0 <- function(x) .Call(R_all0, x) >> anyFalse <- any0 <- function(x) .Call(R_any0, x) >> ## >> ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0 >> ## any0 <- function(x) isTRUE(any(x == 0)) ## ~= anyFalse >> >> namespace hidden as well, already, which probably could also be >> brought to base R. >> >> One big reason to *not* go there (to internal C code) at all with R is >> that >> S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) >> and 'is.na() have been known and package writers have >> programmed methods for these. >> To ensure that S3 and S4 dispatch works "correctly" also inside >> such new internals is much less easily achieved, and so >> such a C-based internal function is0() would no longer be >> equivalent with !is.na(x) & x == 0 >> as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() >> method. > > Excellent point. Thank you! It really makes a big difference for > developers who maintain a complex hierarchy of S4 classes and methods, > when functions like is.true, anyFalse, etc..., which can be expressed in > terms of more basic operations like ==, !=, !, is.na, etc..., just work > out-of-the-box on objects for which these basic operations are defined. > > There is conceptually a small set of "building blocks", at least for > objects with a vector-like or list-like semantic, that can be used > to formally describe the semantic of many functions in base R. This > is what the man page for anyNA does by saying: > > anyNA implements any(is.na(x)) > > even though the actual implementation differs, but that's ok, as long > as anyNA is equivalent to doing any(is.na(x)) on any object for which > building block is.na() is implemented. > > Unfortunately there is no clearly identified set of building blocks > in base R. For example, if I want the comparison operations to work > on my object, I need to implement ==, >, <, !=, <=, and >= (the > 'Compare' group generics) even though it should be enough to implement > == and >=, because all the others can be described in terms of these > 2 building blocks. unique/duplicated is another example (unique(x) is > conceptually x[!duplicated(x)]). And so on... > > Cheers, > H. > >> >> OTOH, simple R versions such as your 'is.true', called 'is1' >> inside Matrix maybe optimizable a bit by the byte compiler (and >> jit and other such tricks) and still keep the full >> semantic including correct method dispatch. >> >> Martin Maechler, ETH Zurich >> >> >> > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < >> > dimitri.liakhovitski at gmail.com> wrote: >> >> >> Thank you very much, Duncan. >> >> All this being said: >> >> >> >> What would you say is the most elegant and most safe way to >> solve such >> >> a seemingly simple task? >> >> >> >> Thank you! >> >> >> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch >> >> <murdoch.duncan at gmail.com> wrote: >> >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >> >> >> So, Duncan, do I understand you correctly: >> >> >> >> >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so >> it returns >> >> >> a logical value of NA. >> >> > >> >> > Yes, when x$x is NA. (Though I think you meant x$c.) >> >> > >> >> >> When this logical value is applied to a row, the R says: >> hell, I don't >> >> >> know if I should keep it or not, so, just in case, I am >> going to keep >> >> >> it, but I'll replace all the values in this row with NAs? >> >> > >> >> > Yes. Indexing with a logical NA is probably a mistake, and >> this is one >> >> > way to signal it without actually triggering a warning or >> error. >> >> > >> >> > BTW, I should have mentioned that the example where you >> indexed using >> >> > -which(x$c>=6) is a bad idea: if none of the entries were 6 >> or more, >> >> > this would be indexing with an empty vector, and you'd get >> nothing, not >> >> > everything. >> >> > >> >> > Duncan Murdoch >> >> > >> >> > >> >> >> >> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >> >> >> <murdoch.duncan at gmail.com> wrote: >> >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >> >> >>>> I know how to get the output I need, but I would benefit >> from an >> >> >>>> explanation why R behaves the way it does. >> >> >>>> >> >> >>>> # I have a data frame x: >> >> >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> >> >>>> x >> >> >>>> # I want to toss rows in x that contain values >=6. But I >> don't want >> >> >>>> to toss my NAs there. >> >> >>>> >> >> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, >> understand why >> >> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, >> understand >> >> why >> >> >>>> x[-which(x$c>=6),] # output I need >> >> >>>> >> >> >>>> # Here is my question: why does the following line >> replace the values >> >> >>>> of all rows that contain an NA # in x$c with NAs? >> >> >>>> >> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole >> row an NA. >> >> Why??? >> >> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be >> >> super-explicit >> >> >>>> >> >> >>>> Thank you very much! >> >> >>> >> >> >>> Most of your examples (except the ones using which()) are >> doing logical >> >> >>> indexing. In logical indexing, TRUE keeps a line, FALSE >> drops the >> >> line, >> >> >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, >> you get the >> >> >>> third kind of indexing. >> >> >>> >> >> >>> Your last example works because in the cases where x$c is >> NA, it >> >> >>> evaluates NA | TRUE, and that evaluates to TRUE. In the >> cases where >> >> x$c >> >> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as >> x$c < 6, >> >> >>> which will be either TRUE or FALSE. >> >> >>> >> >> >>> Duncan Murdoch >> >> >>> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> -- >> >> Dimitri Liakhovitski >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible >> code. >> >> >> >> > [[alternative HTML version deleted]] >> >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >
Stephanie, Actually, it's as.logical that isn't preserving matrix dimensions, because it coerces to a logical vector:> x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > dim(as.logical(x))NULL ~G On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten < sdmorris at u.washington.edu> wrote:> > > On 3/3/15 1:26 PM, Herv? Pag?s wrote: > >> >> >> On 03/03/2015 02:28 AM, Martin Maechler wrote: >> >>> Diverted from R-help : >>> .... as it gets into musing about new R language "primitives" >>> >>> William Dunlap <wdunlap at tibco.com> >>>>>>>> on Fri, 27 Feb 2015 08:04:36 -0800 writes: >>>>>>>> >>>>>>> >>> > You could define functions like >>> >>> > is.true <- function(x) !is.na(x) & x >>> > is.false <- function(x) !is.na(x) & !x >>> >>> > and use them in your selections. E.g., >>> >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>> >> x[is.true(x$c >= 6), ] >>> > a b c >>> > 7 7 8 7 >>> > 10 10 11 10 >>> >>> > Bill Dunlap >>> > TIBCO Software >>> > wdunlap tibco.com >>> >>> Yes; the Matrix package has had these >>> >>> is0 <- function(x) !is.na(x) & x == 0 >>> isN0 <- function(x) is.na(x) | x != 0 >>> is1 <- function(x) !is.na(x) & x # also == "isTRUE componentwise" >>> >> >> Note that using %in% to block propagation of NAs is about 2x faster: >> >> > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE) >> > microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x) >> Unit: milliseconds >> expr min lq mean median uq >> as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 >> !is.na(x) & x 11.202808 11.402437 11.469101 11.44848 >> 11.517576 >> max neval >> 40.36472 100 >> 11.90916 100 >> > > Unfortunately %in% does not preserve matrix dimensions: > > > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > > dim(x) > [1] 50 10 > > dim(!is.na(x) & x) > [1] 50 10 > > dim(as.logical(x) %in% TRUE) > NULL > > Stephanie > > > >> >> >> >>> namespace hidden for a while [note the comment of the last one!] >>> and using them for readibility in its own code. >>> >>> Maybe we should (again) consider providing some versions of >>> these with R ? >>> >>> The Matrix package also has had fast >>> >>> allFalse <- all0 <- function(x) .Call(R_all0, x) >>> anyFalse <- any0 <- function(x) .Call(R_any0, x) >>> ## >>> ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0 >>> ## any0 <- function(x) isTRUE(any(x == 0)) ## ~= anyFalse >>> >>> namespace hidden as well, already, which probably could also be >>> brought to base R. >>> >>> One big reason to *not* go there (to internal C code) at all with R is >>> that >>> S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) >>> and 'is.na() have been known and package writers have >>> programmed methods for these. >>> To ensure that S3 and S4 dispatch works "correctly" also inside >>> such new internals is much less easily achieved, and so >>> such a C-based internal function is0() would no longer be >>> equivalent with !is.na(x) & x == 0 >>> as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() >>> method. >>> >> >> Excellent point. Thank you! It really makes a big difference for >> developers who maintain a complex hierarchy of S4 classes and methods, >> when functions like is.true, anyFalse, etc..., which can be expressed in >> terms of more basic operations like ==, !=, !, is.na, etc..., just work >> out-of-the-box on objects for which these basic operations are defined. >> >> There is conceptually a small set of "building blocks", at least for >> objects with a vector-like or list-like semantic, that can be used >> to formally describe the semantic of many functions in base R. This >> is what the man page for anyNA does by saying: >> >> anyNA implements any(is.na(x)) >> >> even though the actual implementation differs, but that's ok, as long >> as anyNA is equivalent to doing any(is.na(x)) on any object for which >> building block is.na() is implemented. >> >> Unfortunately there is no clearly identified set of building blocks >> in base R. For example, if I want the comparison operations to work >> on my object, I need to implement ==, >, <, !=, <=, and >= (the >> 'Compare' group generics) even though it should be enough to implement >> == and >=, because all the others can be described in terms of these >> 2 building blocks. unique/duplicated is another example (unique(x) is >> conceptually x[!duplicated(x)]). And so on... >> >> Cheers, >> H. >> >> >>> OTOH, simple R versions such as your 'is.true', called 'is1' >>> inside Matrix maybe optimizable a bit by the byte compiler (and >>> jit and other such tricks) and still keep the full >>> semantic including correct method dispatch. >>> >>> Martin Maechler, ETH Zurich >>> >>> >>> > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < >>> > dimitri.liakhovitski at gmail.com> wrote: >>> >>> >> Thank you very much, Duncan. >>> >> All this being said: >>> >> >>> >> What would you say is the most elegant and most safe way to >>> solve such >>> >> a seemingly simple task? >>> >> >>> >> Thank you! >>> >> >>> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch >>> >> <murdoch.duncan at gmail.com> wrote: >>> >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >>> >> >> So, Duncan, do I understand you correctly: >>> >> >> >>> >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so >>> it returns >>> >> >> a logical value of NA. >>> >> > >>> >> > Yes, when x$x is NA. (Though I think you meant x$c.) >>> >> > >>> >> >> When this logical value is applied to a row, the R says: >>> hell, I don't >>> >> >> know if I should keep it or not, so, just in case, I am >>> going to keep >>> >> >> it, but I'll replace all the values in this row with NAs? >>> >> > >>> >> > Yes. Indexing with a logical NA is probably a mistake, and >>> this is one >>> >> > way to signal it without actually triggering a warning or >>> error. >>> >> > >>> >> > BTW, I should have mentioned that the example where you >>> indexed using >>> >> > -which(x$c>=6) is a bad idea: if none of the entries were 6 >>> or more, >>> >> > this would be indexing with an empty vector, and you'd get >>> nothing, not >>> >> > everything. >>> >> > >>> >> > Duncan Murdoch >>> >> > >>> >> > >>> >> >> >>> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >>> >> >> <murdoch.duncan at gmail.com> wrote: >>> >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>> >> >>>> I know how to get the output I need, but I would benefit >>> from an >>> >> >>>> explanation why R behaves the way it does. >>> >> >>>> >>> >> >>>> # I have a data frame x: >>> >> >>>> x = data.frame(a=1:10,b=2:11,c=c( >>> 1,NA,3,NA,5,NA,7,NA,NA,10)) >>> >> >>>> x >>> >> >>>> # I want to toss rows in x that contain values >=6. But I >>> don't want >>> >> >>>> to toss my NAs there. >>> >> >>>> >>> >> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, >>> understand why >>> >> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, >>> understand >>> >> why >>> >> >>>> x[-which(x$c>=6),] # output I need >>> >> >>>> >>> >> >>>> # Here is my question: why does the following line >>> replace the values >>> >> >>>> of all rows that contain an NA # in x$c with NAs? >>> >> >>>> >>> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole >>> row an NA. >>> >> Why??? >>> >> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be >>> >> super-explicit >>> >> >>>> >>> >> >>>> Thank you very much! >>> >> >>> >>> >> >>> Most of your examples (except the ones using which()) are >>> doing logical >>> >> >>> indexing. In logical indexing, TRUE keeps a line, FALSE >>> drops the >>> >> line, >>> >> >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, >>> you get the >>> >> >>> third kind of indexing. >>> >> >>> >>> >> >>> Your last example works because in the cases where x$c is >>> NA, it >>> >> >>> evaluates NA | TRUE, and that evaluates to TRUE. In the >>> cases where >>> >> x$c >>> >> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as >>> x$c < 6, >>> >> >>> which will be either TRUE or FALSE. >>> >> >>> >>> >> >>> Duncan Murdoch >>> >> >>> >>> >> >> >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Dimitri Liakhovitski >>> >> >>> >> ______________________________________________ >>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >>> see >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >>> >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible >>> code. >>> >> >>> >>> > [[alternative HTML version deleted]] >>> >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc. [[alternative HTML version deleted]]
On 03/03/2015 02:17 PM, Gabriel Becker wrote:> Stephanie, > > Actually, it's as.logical that isn't preserving matrix dimensions, > because it coerces to a logical vector: > > > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > > dim(as.logical(x))It's true, as.logical() doesn't help here but Stephanie is right, %in% does not preserve the dimensions either: > dim(x %in% 1:5) NULL That's because match() itself doesn't preserve the dimensions: > dim(match(x, 1:5)) NULL So maybe my fast is.true() should be: is.true <- function(x) { ans <- as.logical(x) %in% TRUE if (is.null(dim(x))) { names(ans) <- names(x) } else { dim(ans) <- dim(x) dimnames(ans) <- dimnames(x) } ans } or something like that... H.> NULL > > ~G > > On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten > <sdmorris at u.washington.edu <mailto:sdmorris at u.washington.edu>> wrote: > > > > On 3/3/15 1:26 PM, Herv? Pag?s wrote: > > > > On 03/03/2015 02:28 AM, Martin Maechler wrote: > > Diverted from R-help : > .... as it gets into musing about new R language "primitives" > > William Dunlap <wdunlap at tibco.com > <mailto:wdunlap at tibco.com>> > on Fri, 27 Feb 2015 08:04:36 -0800 > writes: > > > > You could define functions like > > > is.true <- function(x) !is.na <http://is.na>(x) & x > > is.false <- function(x) !is.na <http://is.na>(x) & !x > > > and use them in your selections. E.g., > >> x <- > data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10)) > >> x[is.true(x$c >= 6), ] > > a b c > > 7 7 8 7 > > 10 10 11 10 > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com <http://tibco.com> > > Yes; the Matrix package has had these > > is0 <- function(x) !is.na <http://is.na>(x) & x == 0 > isN0 <- function(x) is.na <http://is.na>(x) | x != 0 > is1 <- function(x) !is.na <http://is.na>(x) & x # also => "isTRUE componentwise" > > > Note that using %in% to block propagation of NAs is about 2x faster: > > > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE) > > microbenchmark(as.logical(x) %in% TRUE, !is.na > <http://is.na>(x) & x) > Unit: milliseconds > expr min lq mean > median uq > as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 > 6.29488 6.346028 > !is.na <http://is.na>(x) & x 11.202808 11.402437 > 11.469101 11.44848 11.517576 > max neval > 40.36472 100 <tel:40.36472%20%20%20100> > 11.90916 100 > > > Unfortunately %in% does not preserve matrix dimensions: > > > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), > nrow=50) > > dim(x) > [1] 50 10 > > dim(!is.na <http://is.na>(x) & x) > [1] 50 10 > > dim(as.logical(x) %in% TRUE) > NULL > > Stephanie > > > > > > > namespace hidden for a while [note the comment of the last > one!] > and using them for readibility in its own code. > > Maybe we should (again) consider providing some versions of > these with R ? > > The Matrix package also has had fast > > allFalse <- all0 <- function(x) .Call(R_all0, x) > anyFalse <- any0 <- function(x) .Call(R_any0, x) > ## > ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0 > ## any0 <- function(x) isTRUE(any(x == 0)) ## ~> anyFalse > > namespace hidden as well, already, which probably could also be > brought to base R. > > One big reason to *not* go there (to internal C code) at all > with R is > that > S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group > generics) > and 'is.na <http://is.na>() have been known and package > writers have > programmed methods for these. > To ensure that S3 and S4 dispatch works "correctly" also inside > such new internals is much less easily achieved, and so > such a C-based internal function is0() would no longer be > equivalent with !is.na <http://is.na>(x) & x == 0 > as soon as 'x' is an "object" with a '==', 'Compare' and/or > an is.na <http://is.na>() > method. > > > Excellent point. Thank you! It really makes a big difference for > developers who maintain a complex hierarchy of S4 classes and > methods, > when functions like is.true, anyFalse, etc..., which can be > expressed in > terms of more basic operations like ==, !=, !, is.na > <http://is.na>, etc..., just work > out-of-the-box on objects for which these basic operations are > defined. > > There is conceptually a small set of "building blocks", at least for > objects with a vector-like or list-like semantic, that can be used > to formally describe the semantic of many functions in base R. This > is what the man page for anyNA does by saying: > > anyNA implements any(is.na <http://is.na>(x)) > > even though the actual implementation differs, but that's ok, as > long > as anyNA is equivalent to doing any(is.na <http://is.na>(x)) on > any object for which > building block is.na <http://is.na>() is implemented. > > Unfortunately there is no clearly identified set of building blocks > in base R. For example, if I want the comparison operations to work > on my object, I need to implement ==, >, <, !=, <=, and >= (the > 'Compare' group generics) even though it should be enough to > implement > == and >=, because all the others can be described in terms of these > 2 building blocks. unique/duplicated is another example > (unique(x) is > conceptually x[!duplicated(x)]). And so on... > > Cheers, > H. > > > OTOH, simple R versions such as your 'is.true', called 'is1' > inside Matrix maybe optimizable a bit by the byte compiler (and > jit and other such tricks) and still keep the full > semantic including correct method dispatch. > > Martin Maechler, ETH Zurich > > > > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < > > dimitri.liakhovitski at gmail.com > <mailto:dimitri.liakhovitski at gmail.com>__> wrote: > > >> Thank you very much, Duncan. > >> All this being said: > >> > >> What would you say is the most elegant and most > safe way to > solve such > >> a seemingly simple task? > >> > >> Thank you! > >> > >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch > >> <murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>> wrote: > >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: > >> >> So, Duncan, do I understand you correctly: > >> >> > >> >> When I use x$x<6, R doesn't know if it's TRUE or > FALSE, so > it returns > >> >> a logical value of NA. > >> > > >> > Yes, when x$x is NA. (Though I think you meant x$c.) > >> > > >> >> When this logical value is applied to a row, the > R says: > hell, I don't > >> >> know if I should keep it or not, so, just in > case, I am > going to keep > >> >> it, but I'll replace all the values in this row > with NAs? > >> > > >> > Yes. Indexing with a logical NA is probably a > mistake, and > this is one > >> > way to signal it without actually triggering a > warning or > error. > >> > > >> > BTW, I should have mentioned that the example > where you > indexed using > >> > -which(x$c>=6) is a bad idea: if none of the > entries were 6 > or more, > >> > this would be indexing with an empty vector, and > you'd get > nothing, not > >> > everything. > >> > > >> > Duncan Murdoch > >> > > >> > > >> >> > >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch > >> >> <murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>> wrote: > >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: > >> >>>> I know how to get the output I need, but I > would benefit > from an > >> >>>> explanation why R behaves the way it does. > >> >>>> > >> >>>> # I have a data frame x: > >> >>>> x > data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10)) > >> >>>> x > >> >>>> # I want to toss rows in x that contain values > >=6. But I > don't want > >> >>>> to toss my NAs there. > >> >>>> > >> >>>> subset(x,c<6) # Works correctly, but removes > NAs in c, > understand why > >> >>>> x[which(x$c<6),] # Works correctly, but > removes NAs in c, > understand > >> why > >> >>>> x[-which(x$c>=6),] # output I need > >> >>>> > >> >>>> # Here is my question: why does the following line > replace the values > >> >>>> of all rows that contain an NA # in x$c with NAs? > >> >>>> > >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes > the whole > row an NA. > >> Why??? > >> >>>> x[(x$c<6) | is.na <http://is.na>(x$c),] # > output I need - I have to be > >> super-explicit > >> >>>> > >> >>>> Thank you very much! > >> >>> > >> >>> Most of your examples (except the ones using > which()) are > doing logical > >> >>> indexing. In logical indexing, TRUE keeps a > line, FALSE > drops the > >> line, > >> >>> and NA returns NA. Since "x$c < 6" is NA if > x$c is NA, > you get the > >> >>> third kind of indexing. > >> >>> > >> >>> Your last example works because in the cases > where x$c is > NA, it > >> >>> evaluates NA | TRUE, and that evaluates to > TRUE. In the > cases where > >> x$c > >> >>> is not NA, you get x$c < 6 | FALSE, and that's > the same as > x$c < 6, > >> >>> which will be either TRUE or FALSE. > >> >>> > >> >>> Duncan Murdoch > >> >>> > >> >> > >> >> > >> >> > >> > > >> > >> > >> > >> -- > >> Dimitri Liakhovitski > >> > >> ________________________________________________ > >> R-help at r-project.org <mailto:R-help at r-project.org> > mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/__listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > >> PLEASE do read the posting guide > >> http://www.R-project.org/__posting-guide.html > <http://www.R-project.org/posting-guide.html> > >> and provide commented, minimal, self-contained, > reproducible > code. > >> > > > [[alternative HTML version deleted]] > > > ________________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> > mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/__listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide > http://www.R-project.org/__posting-guide.html > <http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, > reproducible code. > > ________________________________________________ > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing > list > https://stat.ethz.ch/mailman/__listinfo/r-devel > <https://stat.ethz.ch/mailman/listinfo/r-devel> > > > > ________________________________________________ > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > https://stat.ethz.ch/mailman/__listinfo/r-devel > <https://stat.ethz.ch/mailman/listinfo/r-devel> > > > > > -- > Gabriel Becker, PhD > Computational Biologist > Bioinformatics and Computational Biology > Genentech, Inc.-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Maybe Matching Threads
- [R] Why does R replace all row values with NAs
- [R] Why does R replace all row values with NAs
- [R] Why does R replace all row values with NAs
- function(x) !is.na(x) & x "is_true(.)" etc. was: [R] How to index..
- Efficiency question: replacing all NAs with a zero