I know how to get the output I need, but I would benefit from an explanation why R behaves the way it does. # I have a data frame x: x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x # I want to toss rows in x that contain values >=6. But I don't want to toss my NAs there. subset(x,c<6) # Works correctly, but removes NAs in c, understand why x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why x[-which(x$c>=6),] # output I need # Here is my question: why does the following line replace the values of all rows that contain an NA # in x$c with NAs? x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit Thank you very much! -- Dimitri Liakhovitski
On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:> I know how to get the output I need, but I would benefit from an > explanation why R behaves the way it does. > > # I have a data frame x: > x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) > x > # I want to toss rows in x that contain values >=6. But I don't want > to toss my NAs there. > > subset(x,c<6) # Works correctly, but removes NAs in c, understand why > x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why > x[-which(x$c>=6),] # output I need > > # Here is my question: why does the following line replace the values > of all rows that contain an NA # in x$c with NAs? > > x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? > x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit > > Thank you very much!Most of your examples (except the ones using which()) are doing logical indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the third kind of indexing. Your last example works because in the cases where x$c is NA, it evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, which will be either TRUE or FALSE. Duncan Murdoch
So, Duncan, do I understand you correctly: When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns a logical value of NA. When this logical value is applied to a row, the R says: hell, I don't know if I should keep it or not, so, just in case, I am going to keep it, but I'll replace all the values in this row with NAs? On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >> I know how to get the output I need, but I would benefit from an >> explanation why R behaves the way it does. >> >> # I have a data frame x: >> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >> x >> # I want to toss rows in x that contain values >=6. But I don't want >> to toss my NAs there. >> >> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >> x[-which(x$c>=6),] # output I need >> >> # Here is my question: why does the following line replace the values >> of all rows that contain an NA # in x$c with NAs? >> >> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >> >> Thank you very much! > > Most of your examples (except the ones using which()) are doing logical > indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, > and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the > third kind of indexing. > > Your last example works because in the cases where x$c is NA, it > evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c > is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, > which will be either TRUE or FALSE. > > Duncan Murdoch >-- Dimitri Liakhovitski