So, Duncan, do I understand you correctly: When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns a logical value of NA. When this logical value is applied to a row, the R says: hell, I don't know if I should keep it or not, so, just in case, I am going to keep it, but I'll replace all the values in this row with NAs? On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >> I know how to get the output I need, but I would benefit from an >> explanation why R behaves the way it does. >> >> # I have a data frame x: >> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> x >> # I want to toss rows in x that contain values >=6. But I don't want >> to toss my NAs there. >> >> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >> x[-which(x$c>=6),] # output I need >> >> # Here is my question: why does the following line replace the values >> of all rows that contain an NA # in x$c with NAs? >> >> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >> >> Thank you very much! > > Most of your examples (except the ones using which()) are doing logical > indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, > and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the > third kind of indexing. > > Your last example works because in the cases where x$c is NA, it > evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c > is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, > which will be either TRUE or FALSE. > > Duncan Murdoch >-- Dimitri Liakhovitski
On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:> So, Duncan, do I understand you correctly: > > When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns > a logical value of NA.Yes, when x$x is NA. (Though I think you meant x$c.)> When this logical value is applied to a row, the R says: hell, I don't > know if I should keep it or not, so, just in case, I am going to keep > it, but I'll replace all the values in this row with NAs?Yes. Indexing with a logical NA is probably a mistake, and this is one way to signal it without actually triggering a warning or error. BTW, I should have mentioned that the example where you indexed using -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, this would be indexing with an empty vector, and you'd get nothing, not everything. Duncan Murdoch> > On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>> I know how to get the output I need, but I would benefit from an >>> explanation why R behaves the way it does. >>> >>> # I have a data frame x: >>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>> x >>> # I want to toss rows in x that contain values >=6. But I don't want >>> to toss my NAs there. >>> >>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >>> x[-which(x$c>=6),] # output I need >>> >>> # Here is my question: why does the following line replace the values >>> of all rows that contain an NA # in x$c with NAs? >>> >>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >>> >>> Thank you very much! >> >> Most of your examples (except the ones using which()) are doing logical >> indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, >> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >> third kind of indexing. >> >> Your last example works because in the cases where x$c is NA, it >> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c >> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >> which will be either TRUE or FALSE. >> >> Duncan Murdoch >> > > >
Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >> So, Duncan, do I understand you correctly: >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >> a logical value of NA. > > Yes, when x$x is NA. (Though I think you meant x$c.) > >> When this logical value is applied to a row, the R says: hell, I don't >> know if I should keep it or not, so, just in case, I am going to keep >> it, but I'll replace all the values in this row with NAs? > > Yes. Indexing with a logical NA is probably a mistake, and this is one > way to signal it without actually triggering a warning or error. > > BTW, I should have mentioned that the example where you indexed using > -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, > this would be indexing with an empty vector, and you'd get nothing, not > everything. > > Duncan Murdoch > > >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>>> I know how to get the output I need, but I would benefit from an >>>> explanation why R behaves the way it does. >>>> >>>> # I have a data frame x: >>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>>> x >>>> # I want to toss rows in x that contain values >=6. But I don't want >>>> to toss my NAs there. >>>> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >>>> x[-which(x$c>=6),] # output I need >>>> >>>> # Here is my question: why does the following line replace the values >>>> of all rows that contain an NA # in x$c with NAs? >>>> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >>>> >>>> Thank you very much! >>> >>> Most of your examples (except the ones using which()) are doing logical >>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >>> third kind of indexing. >>> >>> Your last example works because in the cases where x$c is NA, it >>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c >>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >>> which will be either TRUE or FALSE. >>> >>> Duncan Murdoch >>> >> >> >> >-- Dimitri Liakhovitski
> On 27 Feb 2015, at 16:02 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > Yes. Indexing with a logical NA is probably a mistake, and this is one > way to signal it without actually triggering a warning or error.There are cases where it isn't (usually) a mistake, e.g. pch=c(25,24)[sex], where it is quite crucial that the result has the same length as the index (i.e., sex) and where it makes good sense to use an NA plotting character if sex is unknown. For logical index, it is harder to come up with a good excuse for the NA behaviour, except that R's NA is by default logical so there would be trouble explaining differences between c(x[NA], x[1]) and x[c(NA, 1)]. (The annoyance of getting a data frame half-full of NA was the reason that subset() was written so that it removes rows corresponding to NA indices). -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com