Hello, I am currently working with a dataframe which has some missing values represented by "NA". whenever, I add two columns in which at least one of the pair of an observation is "NA", the sum returns zero. That is for the same observation, if dataframe$A = 20 dataframe$B = NA dataframe$A + dataframe$B returns zero. I do not want to delete the observations with the NA's. How do I go about carrying out the necessary operations without deleting the observations with the NA's Thank you
That is not how R works. 20+NA is NA, which is not the same as zero. This is not optional behaviour. I notice that you put quotes around the NA.... if those really are there then you should be getting an error. You need to assemble a reproducible example, such as is described at [1]. By doing so you will either see your mistake or have something we can help you debug. Be sure to give one or more examples of results you expect to obtain, since your email below does not indicate what your desired result is. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On September 6, 2015 3:22:32 PM PDT, Olu Ola via R-help <r-help at r-project.org> wrote:>Hello, >I am currently working with a dataframe which has some missing values >represented by "NA". whenever, I add two columns in which at least one >of the pair of an observation is "NA", the sum returns zero. That is >for the same observation, if > >dataframe$A = 20 >dataframe$B = NA > >dataframe$A + dataframe$B returns zero. > >I do not want to delete the observations with the NA's. How do I go >about carrying out the necessary operations without deleting the >observations with the NA's > >Thank you > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On 07/09/15 10:22, Olu Ola via R-help wrote:> Hello, I am currently working with a dataframe which has some missing > values represented by "NA". whenever, I add two columns in which at > least one of the pair of an observation is "NA", the sum returns > zero. That is for the same observation, if > > dataframe$A = 20 dataframe$B = NA > > dataframe$A + dataframe$B returns zero.No it does not. It returns NA. As it should.> I do not want to delete the observations with the NA's. How do I go > about carrying out the necessary operations without deleting the > observations with the NA's.Your question seems to demonstrate a substantial amount of confusion. Amongst other things you probably want to deal with vectors (or perhaps matrices) rather than data frames. To sum a numeric vector, ignoring missing values, you can use the sum() function, setting the argument "na.rm" to TRUE. E.g. v <- c(1,NA,2,NA,3,NA,4,NA) sum(v,na.rm=TRUE) # Gives 10. Ignore other advice that you were given, to replace NAs in your data frame (???) by zeroes. That is very dangerous, misleading and confusing. "Missing" and "zero" are *VERY* different concepts. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
# it's not clear what your question is, but here's a stab in the dark at a solution! ind<- !is.na(dataframe$A) & !is.na(dataframe$B) dataframe$A[ind] + dataframe$B[ind] - Dan P.S. I'm sure there are ways to do this using one of R's functions for automatically removing NA's (na.rm = T), but unless you use them all the time, their behavior is not always predictable (e.g., sometimes it ignores a whole row with one NA in it; sometimes it treats NAs as zeros). Explicitly defining the indices that you want to exclude may make it easier to avoid difficult-to-find errors. -- View this message in context: http://r.789695.n4.nabble.com/Handling-NA-in-summation-tp4711923p4711932.html Sent from the R help mailing list archive at Nabble.com.
Hello, I need to first apologize for the error in my first question dataframe$A = 20 dataframe$B = NA dataframe$A + dataframe$B actually? returns NA You quite understand my point of view. This is a household level data where you need to compute the total income of each household member before aggregating by household. Assume you have a household with 5 members 4 out of the 5 household members do have a full-time job 3 of the household members do not have a part-time job so that the column for these part-time job records NA for these three household members. 1 of the household members neither has a full-time nor part-time job When I add the column for the full-time job and the part-time job for the five household members, it returns NA as the total income for the two household members who at least should have their total income equal to their full-time job income. Based on the scenario described above, only one of the household members should have NA for the total income but R returns NA as the total income for the two household members who at least should have their total income equal to their full-time job income. This is just the first step because subsequently, I will need to compute mean. If I go ahead to replace the NA's with zeros it will bias my mean. So all I need is a way to still retain my NA so that my mean and other relevant computations will not be biased. Thank you -------------------------------------------- On Sun, 9/6/15, Rolf Turner <r.turner at auckland.ac.nz> wrote: Subject: Re: [FORGED] [R] Handling "NA" in summation Date: Sunday, September 6, 2015, 7:16 PM On 07/09/15 10:22, Olu Ola via R-help wrote: > Hello, I am currently working with a dataframe which has some missing > values represented by "NA". whenever, I add two columns in which at > least one of the pair of an observation is "NA", the sum returns > zero. That is for the same observation, if > > dataframe$A = 20 dataframe$B = NA > > dataframe$A + dataframe$B? returns zero. No it does not.? It returns NA.? As it should. > I do not want to delete the observations with the NA's. How do I go > about carrying out the necessary operations without deleting the > observations with the NA's. Your question seems to demonstrate a substantial amount of confusion. Amongst other things you probably want to deal with vectors (or perhaps matrices) rather than data frames. To sum a numeric vector, ignoring missing values, you can use the sum() function, setting the argument "na.rm" to TRUE.? E.g. ? ? v <- c(1,NA,2,NA,3,NA,4,NA) ? ? sum(v,na.rm=TRUE) # Gives 10. Ignore other advice that you were given, to replace NAs in your data frame (???) by zeroes.? That is very dangerous, misleading and confusing.? "Missing" and "zero" are *VERY* different concepts. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276