Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >> So, Duncan, do I understand you correctly: >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >> a logical value of NA. > > Yes, when x$x is NA. (Though I think you meant x$c.) > >> When this logical value is applied to a row, the R says: hell, I don't >> know if I should keep it or not, so, just in case, I am going to keep >> it, but I'll replace all the values in this row with NAs? > > Yes. Indexing with a logical NA is probably a mistake, and this is one > way to signal it without actually triggering a warning or error. > > BTW, I should have mentioned that the example where you indexed using > -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, > this would be indexing with an empty vector, and you'd get nothing, not > everything. > > Duncan Murdoch > > >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>>> I know how to get the output I need, but I would benefit from an >>>> explanation why R behaves the way it does. >>>> >>>> # I have a data frame x: >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>>> x >>>> # I want to toss rows in x that contain values >=6. But I don't want >>>> to toss my NAs there. >>>> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >>>> x[-which(x$c>=6),] # output I need >>>> >>>> # Here is my question: why does the following line replace the values >>>> of all rows that contain an NA # in x$c with NAs? >>>> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >>>> >>>> Thank you very much! >>> >>> Most of your examples (except the ones using which()) are doing logical >>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >>> third kind of indexing. >>> >>> Your last example works because in the cases where x$c is NA, it >>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >>> which will be either TRUE or FALSE. >>> >>> Duncan Murdoch >>> >> >> >> >-- Dimitri Liakhovitski
On 27/02/2015 10:27 AM, Dimitri Liakhovitski wrote:> Thank you very much, Duncan. > All this being said: > > What would you say is the most elegant and most safe way to solve such > a seemingly simple task?If you have NA values, test for them explicitly, e.g. your original x[(x$c<6) | is.na(x$c),] I would write it as x[is.na(x$c) | x$c < 6,] but that's purely a style difference, I don't think it would affect execution time (or results). I like to put the weird case first because it will remind me that things are more complicated than you might guess. Duncan Murdoch> > Thank you! > > On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >>> So, Duncan, do I understand you correctly: >>> >>> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >>> a logical value of NA. >> >> Yes, when x$x is NA. (Though I think you meant x$c.) >> >>> When this logical value is applied to a row, the R says: hell, I don't >>> know if I should keep it or not, so, just in case, I am going to keep >>> it, but I'll replace all the values in this row with NAs? >> >> Yes. Indexing with a logical NA is probably a mistake, and this is one >> way to signal it without actually triggering a warning or error. >> >> BTW, I should have mentioned that the example where you indexed using >> -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, >> this would be indexing with an empty vector, and you'd get nothing, not >> everything. >> >> Duncan Murdoch >> >> >>> >>> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >>> <murdoch.duncan at gmail.com> wrote: >>>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>>>> I know how to get the output I need, but I would benefit from an >>>>> explanation why R behaves the way it does. >>>>> >>>>> # I have a data frame x: >>>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>>>> x >>>>> # I want to toss rows in x that contain values >=6. But I don't want >>>>> to toss my NAs there. >>>>> >>>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >>>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >>>>> x[-which(x$c>=6),] # output I need >>>>> >>>>> # Here is my question: why does the following line replace the values >>>>> of all rows that contain an NA # in x$c with NAs? >>>>> >>>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >>>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >>>>> >>>>> Thank you very much! >>>> >>>> Most of your examples (except the ones using which()) are doing logical >>>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, >>>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >>>> third kind of indexing. >>>> >>>> Your last example works because in the cases where x$c is NA, it >>>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c >>>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >>>> which will be either TRUE or FALSE. >>>> >>>> Duncan Murdoch >>>> >>> >>> >>> >> > > >
You could define functions like is.true <- function(x) !is.na(x) & x is.false <- function(x) !is.na(x) & !x and use them in your selections. E.g., > x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) > x[is.true(x$c >= 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> Thank you very much, Duncan. > All this being said: > > What would you say is the most elegant and most safe way to solve such > a seemingly simple task? > > Thank you! > > On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: > > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: > >> So, Duncan, do I understand you correctly: > >> > >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns > >> a logical value of NA. > > > > Yes, when x$x is NA. (Though I think you meant x$c.) > > > >> When this logical value is applied to a row, the R says: hell, I don't > >> know if I should keep it or not, so, just in case, I am going to keep > >> it, but I'll replace all the values in this row with NAs? > > > > Yes. Indexing with a logical NA is probably a mistake, and this is one > > way to signal it without actually triggering a warning or error. > > > > BTW, I should have mentioned that the example where you indexed using > > -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, > > this would be indexing with an empty vector, and you'd get nothing, not > > everything. > > > > Duncan Murdoch > > > > > >> > >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch > >> <murdoch.duncan at gmail.com> wrote: > >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: > >>>> I know how to get the output I need, but I would benefit from an > >>>> explanation why R behaves the way it does. > >>>> > >>>> # I have a data frame x: > >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) > >>>> x > >>>> # I want to toss rows in x that contain values >=6. But I don't want > >>>> to toss my NAs there. > >>>> > >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why > >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand > why > >>>> x[-which(x$c>=6),] # output I need > >>>> > >>>> # Here is my question: why does the following line replace the values > >>>> of all rows that contain an NA # in x$c with NAs? > >>>> > >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. > Why??? > >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be > super-explicit > >>>> > >>>> Thank you very much! > >>> > >>> Most of your examples (except the ones using which()) are doing logical > >>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the > line, > >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the > >>> third kind of indexing. > >>> > >>> Your last example works because in the cases where x$c is NA, it > >>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where > x$c > >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, > >>> which will be either TRUE or FALSE. > >>> > >>> Duncan Murdoch > >>> > >> > >> > >> > > > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you very much guys! On Fri, Feb 27, 2015 at 11:04 AM, William Dunlap <wdunlap at tibco.com> wrote:> You could define functions like > is.true <- function(x) !is.na(x) & x > is.false <- function(x) !is.na(x) & !x > and use them in your selections. E.g., > > x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) > > x[is.true(x$c >= 6), ] > a b c > 7 7 8 7 > 10 10 11 10 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> >> Thank you very much, Duncan. >> All this being said: >> >> What would you say is the most elegant and most safe way to solve such >> a seemingly simple task? >> >> Thank you! >> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >> >> So, Duncan, do I understand you correctly: >> >> >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >> >> a logical value of NA. >> > >> > Yes, when x$x is NA. (Though I think you meant x$c.) >> > >> >> When this logical value is applied to a row, the R says: hell, I don't >> >> know if I should keep it or not, so, just in case, I am going to keep >> >> it, but I'll replace all the values in this row with NAs? >> > >> > Yes. Indexing with a logical NA is probably a mistake, and this is one >> > way to signal it without actually triggering a warning or error. >> > >> > BTW, I should have mentioned that the example where you indexed using >> > -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, >> > this would be indexing with an empty vector, and you'd get nothing, not >> > everything. >> > >> > Duncan Murdoch >> > >> > >> >> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >> >> <murdoch.duncan at gmail.com> wrote: >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >> >>>> I know how to get the output I need, but I would benefit from an >> >>>> explanation why R behaves the way it does. >> >>>> >> >>>> # I have a data frame x: >> >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> >>>> x >> >>>> # I want to toss rows in x that contain values >=6. But I don't want >> >>>> to toss my NAs there. >> >>>> >> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand >> >>>> why >> >>>> x[-which(x$c>=6),] # output I need >> >>>> >> >>>> # Here is my question: why does the following line replace the values >> >>>> of all rows that contain an NA # in x$c with NAs? >> >>>> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. >> >>>> Why??? >> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be >> >>>> super-explicit >> >>>> >> >>>> Thank you very much! >> >>> >> >>> Most of your examples (except the ones using which()) are doing >> >>> logical >> >>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the >> >>> line, >> >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >> >>> third kind of indexing. >> >>> >> >>> Your last example works because in the cases where x$c is NA, it >> >>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where >> >>> x$c >> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >> >>> which will be either TRUE or FALSE. >> >>> >> >>> Duncan Murdoch >> >>> >> >> >> >> >> >> >> > >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski
Diverted from R-help : .... as it gets into musing about new R language "primitives">>>>> William Dunlap <wdunlap at tibco.com> >>>>> on Fri, 27 Feb 2015 08:04:36 -0800 writes:> You could define functions like > is.true <- function(x) !is.na(x) & x > is.false <- function(x) !is.na(x) & !x > and use them in your selections. E.g., >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> x[is.true(x$c >= 6), ] > a b c > 7 7 8 7 > 10 10 11 10 > Bill Dunlap > TIBCO Software > wdunlap tibco.com Yes; the Matrix package has had these is0 <- function(x) !is.na(x) & x == 0 isN0 <- function(x) is.na(x) | x != 0 is1 <- function(x) !is.na(x) & x # also == "isTRUE componentwise" namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse <- all0 <- function(x) .Call(R_all0, x) anyFalse <- any0 <- function(x) .Call(R_any0, x) ## ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0 ## any0 <- function(x) isTRUE(any(x == 0)) ## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works "correctly" also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with !is.na(x) & x == 0 as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() method. OTOH, simple R versions such as your 'is.true', called 'is1' inside Matrix maybe optimizable a bit by the byte compiler (and jit and other such tricks) and still keep the full semantic including correct method dispatch. Martin Maechler, ETH Zurich > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < > dimitri.liakhovitski at gmail.com> wrote: >> Thank you very much, Duncan. >> All this being said: >> >> What would you say is the most elegant and most safe way to solve such >> a seemingly simple task? >> >> Thank you! >> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >> >> So, Duncan, do I understand you correctly: >> >> >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >> >> a logical value of NA. >> > >> > Yes, when x$x is NA. (Though I think you meant x$c.) >> > >> >> When this logical value is applied to a row, the R says: hell, I don't >> >> know if I should keep it or not, so, just in case, I am going to keep >> >> it, but I'll replace all the values in this row with NAs? >> > >> > Yes. Indexing with a logical NA is probably a mistake, and this is one >> > way to signal it without actually triggering a warning or error. >> > >> > BTW, I should have mentioned that the example where you indexed using >> > -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, >> > this would be indexing with an empty vector, and you'd get nothing, not >> > everything. >> > >> > Duncan Murdoch >> > >> > >> >> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >> >> <murdoch.duncan at gmail.com> wrote: >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >> >>>> I know how to get the output I need, but I would benefit from an >> >>>> explanation why R behaves the way it does. >> >>>> >> >>>> # I have a data frame x: >> >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> >>>> x >> >>>> # I want to toss rows in x that contain values >=6. But I don't want >> >>>> to toss my NAs there. >> >>>> >> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand >> why >> >>>> x[-which(x$c>=6),] # output I need >> >>>> >> >>>> # Here is my question: why does the following line replace the values >> >>>> of all rows that contain an NA # in x$c with NAs? >> >>>> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. >> Why??? >> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be >> super-explicit >> >>>> >> >>>> Thank you very much! >> >>> >> >>> Most of your examples (except the ones using which()) are doing logical >> >>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the >> line, >> >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >> >>> third kind of indexing. >> >>> >> >>> Your last example works because in the cases where x$c is NA, it >> >>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where >> x$c >> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >> >>> which will be either TRUE or FALSE. >> >>> >> >>> Duncan Murdoch >> >>> >> >> >> >> >> >> >> > >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.