Allen Bingham
2015-Jan-25 23:21 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
I understand that in order to get the sum function to ignore missing values I need to supply the argument na.rm=TRUE. However, when summing numeric values in which ALL components are "NA" ... the result is 0.0 ... instead of (what I would get from SAS) of NA (or in the case of SAS "."). Accordingly, I've had to go to 'extreme' measures to get the sum function to result in NA if all arguments are missing (otherwise give me a sum of all non-NA elements). So for example here's a snippet of code that ALMOST does what I want: SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2), select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) In reality this does NOT give me records with NA for SumValue ... but it doesn't give me values for any records in which both Variable.1 and Variable.2 are NA --- which is "good enough" for my purposes. I'm guessing with a little more work I could come up with a way to adapt the code above so that I could get it to work like SAS's sum function ... ... but before I go that extra mile I thought I'd ask others if they know of functions in either base R ... or in a package that will better mimic the SAS sum function. Any suggestions? Thanks. ______________________________________ Allen Bingham aebingham2 at gmail.com
John Fox
2015-Jan-26 00:17 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
Dear Allen, This seems reasonably straightforward to me, suggesting that I might not properly understand what you want to do. How about something like the following?> mysum <- function(...){+ x <- c(...) + if (all(is.na(x))) NA else sum(x, na.rm=TRUE) + }> mysum(1, 2, 3, NA)[1] 6> mysum(1:3)[1] 6> mysum(NA, NA, NA)[1] NA> mysum(c(NA, NA, NA))[1] NA I hope this helps, John ------------------------------------------------ John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sun, 25 Jan 2015 15:21:52 -0800 "Allen Bingham" <aebingham2 at gmail.com> wrote:> I understand that in order to get the sum function to ignore missing values > I need to supply the argument na.rm=TRUE. However, when summing numeric > values in which ALL components are "NA" ... the result is 0.0 ... instead of > (what I would get from SAS) of NA (or in the case of SAS "."). > > Accordingly, I've had to go to 'extreme' measures to get the sum function to > result in NA if all arguments are missing (otherwise give me a sum of all > non-NA elements). > > So for example here's a snippet of code that ALMOST does what I want: > > > SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2), > select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) > > In reality this does NOT give me records with NA for SumValue ... but it > doesn't give me values for any records in which both Variable.1 and > Variable.2 are NA --- which is "good enough" for my purposes. > > I'm guessing with a little more work I could come up with a way to adapt the > code above so that I could get it to work like SAS's sum function ... > > ... but before I go that extra mile I thought I'd ask others if they know of > functions in either base R ... or in a package that will better mimic the > SAS sum function. > > Any suggestions? > > Thanks. > ______________________________________ > Allen Bingham > aebingham2 at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2015-Jan-26 00:21 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
Hi Allen, How about this: sum_w_NA<-function(x) ifelse(all(is.na(x)),NA,sum(x,na.rm=TRUE)) Jim On Mon, Jan 26, 2015 at 10:21 AM, Allen Bingham <aebingham2 at gmail.com> wrote:> I understand that in order to get the sum function to ignore missing values > I need to supply the argument na.rm=TRUE. However, when summing numeric > values in which ALL components are "NA" ... the result is 0.0 ... instead of > (what I would get from SAS) of NA (or in the case of SAS "."). > > Accordingly, I've had to go to 'extreme' measures to get the sum function to > result in NA if all arguments are missing (otherwise give me a sum of all > non-NA elements). > > So for example here's a snippet of code that ALMOST does what I want: > > > SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2), > select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) > > In reality this does NOT give me records with NA for SumValue ... but it > doesn't give me values for any records in which both Variable.1 and > Variable.2 are NA --- which is "good enough" for my purposes. > > I'm guessing with a little more work I could come up with a way to adapt the > code above so that I could get it to work like SAS's sum function ... > > ... but before I go that extra mile I thought I'd ask others if they know of > functions in either base R ... or in a package that will better mimic the > SAS sum function. > > Any suggestions? > > Thanks. > ______________________________________ > Allen Bingham > aebingham2 at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
peter dalgaard
2015-Jan-26 10:16 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
Ouch. Please avoid ifelse() in non-vectorized contexts. John Fox has the right idea. -pd On 26 Jan 2015, at 01:21 , Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Allen, > How about this: > > sum_w_NA<-function(x) ifelse(all(is.na(x)),NA,sum(x,na.rm=TRUE)) > > Jim > > > On Mon, Jan 26, 2015 at 10:21 AM, Allen Bingham <aebingham2 at gmail.com> wrote: >> I understand that in order to get the sum function to ignore missing values >> I need to supply the argument na.rm=TRUE. However, when summing numeric >> values in which ALL components are "NA" ... the result is 0.0 ... instead of >> (what I would get from SAS) of NA (or in the case of SAS "."). >> >> Accordingly, I've had to go to 'extreme' measures to get the sum function to >> result in NA if all arguments are missing (otherwise give me a sum of all >> non-NA elements). >> >> So for example here's a snippet of code that ALMOST does what I want: >> >> >> SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2), >> select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) >> >> In reality this does NOT give me records with NA for SumValue ... but it >> doesn't give me values for any records in which both Variable.1 and >> Variable.2 are NA --- which is "good enough" for my purposes. >> >> I'm guessing with a little more work I could come up with a way to adapt the >> code above so that I could get it to work like SAS's sum function ... >> >> ... but before I go that extra mile I thought I'd ask others if they know of >> functions in either base R ... or in a package that will better mimic the >> SAS sum function. >> >> Any suggestions? >> >> Thanks. >> ______________________________________ >> Allen Bingham >> aebingham2 at gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Martin Maechler
2015-Jan-26 12:45 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
>>>>> Jim Lemon <drjimlemon at gmail.com> >>>>> on Mon, 26 Jan 2015 11:21:03 +1100 writes:> Hi Allen, How about this: > sum_w_NA<-function(x) ifelse(all(is.na(x)),NA,sum(x,na.rm=TRUE)) Excuse, Jim, but that's yet another "horrible misuse of ifelse()" John Fox's reply *did* contain the "proper" solution if (all(is.na(x))) NA else sum(x, na.rm=TRUE) The ifelse() function should never be used in such cases. Read more after googling "Do NOT use ifelse()" -- include the quotes in your search -- or directly at http://stat.ethz.ch/pipermail/r-help/2014-December/424367.html Yes, this has been on R-help a month ago.. Martin > On Mon, Jan 26, 2015 at 10:21 AM, Allen Bingham > <aebingham2 at gmail.com> wrote: >> I understand that in order to get the sum function to >> ignore missing values I need to supply the argument >> na.rm=TRUE. However, when summing numeric values in which >> ALL components are "NA" ... the result is 0.0 ... instead >> of (what I would get from SAS) of NA (or in the case of >> SAS "."). >> >> Accordingly, I've had to go to 'extreme' measures to get >> the sum function to result in NA if all arguments are >> missing (otherwise give me a sum of all non-NA elements). >> >> So for example here's a snippet of code that ALMOST does >> what I want: >> >> >> SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2), >> select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) >> >> In reality this does NOT give me records with NA for >> SumValue ... but it doesn't give me values for any >> records in which both Variable.1 and Variable.2 are NA >> --- which is "good enough" for my purposes. >> >> I'm guessing with a little more work I could come up with >> a way to adapt the code above so that I could get it to >> work like SAS's sum function ... >> >> ... but before I go that extra mile I thought I'd ask >> others if they know of functions in either base R ... or >> in a package that will better mimic the SAS sum function. >> >> Any suggestions? >> >> Thanks. ______________________________________ Allen >> Bingham aebingham2 at gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and >> more, see https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide >> commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.
MacQueen, Don
2015-Jan-26 21:02 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
I'm a little puzzled by the assertion that the result is 0.0 when all the elements are NA:> sum(NA)[1] NA> sum(c(NA,NA))[1] NA> sum(rep(NA, 10))[1] NA> sum(as.numeric(letters[1:4]))[1] NA Warning message: NAs introduced by coercion Considering that the example snippet of code has several other aspects besides using sum(), among them subsetting rows of a data frame when there are apparently NAs in some its variables ... I wonder if the reason for the failure of that snippet has been misunderstood? -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/25/15, 3:21 PM, "Allen Bingham" <aebingham2 at gmail.com> wrote:>I understand that in order to get the sum function to ignore missing >values >I need to supply the argument na.rm=TRUE. However, when summing numeric >values in which ALL components are "NA" ... the result is 0.0 ... instead >of >(what I would get from SAS) of NA (or in the case of SAS "."). > >Accordingly, I've had to go to 'extreme' measures to get the sum function >to >result in NA if all arguments are missing (otherwise give me a sum of all >non-NA elements). > >So for example here's a snippet of code that ALMOST does what I want: > > >SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variable.2 >), >select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) > >In reality this does NOT give me records with NA for SumValue ... but it >doesn't give me values for any records in which both Variable.1 and >Variable.2 are NA --- which is "good enough" for my purposes. > >I'm guessing with a little more work I could come up with a way to adapt >the >code above so that I could get it to work like SAS's sum function ... > >... but before I go that extra mile I thought I'd ask others if they know >of >functions in either base R ... or in a package that will better mimic the >SAS sum function. > >Any suggestions? > >Thanks. >______________________________________ >Allen Bingham >aebingham2 at gmail.com > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2015-Jan-26 21:17 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
Try with na.rm=TRUE. On Jan 26, 2015 4:04 PM, "MacQueen, Don" <macqueen1 at llnl.gov> wrote:> I'm a little puzzled by the assertion that the result is 0.0 when all the > elements are NA: > > > sum(NA) > [1] NA > > > sum(c(NA,NA)) > [1] NA > > > sum(rep(NA, 10)) > [1] NA > > > sum(as.numeric(letters[1:4])) > [1] NA > Warning message: > NAs introduced by coercion > > > Considering that the example snippet of code has several other aspects > besides using sum(), among them subsetting rows of a data frame when there > are apparently NAs in some its variables ... I wonder if the reason for > the failure of that snippet has been misunderstood? > > > -- > Don MacQueen > > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > > > > > > On 1/25/15, 3:21 PM, "Allen Bingham" <aebingham2 at gmail.com> wrote: > > >I understand that in order to get the sum function to ignore missing > >values > >I need to supply the argument na.rm=TRUE. However, when summing numeric > >values in which ALL components are "NA" ... the result is 0.0 ... instead > >of > >(what I would get from SAS) of NA (or in the case of SAS "."). > > > >Accordingly, I've had to go to 'extreme' measures to get the sum function > >to > >result in NA if all arguments are missing (otherwise give me a sum of all > >non-NA elements). > > > >So for example here's a snippet of code that ALMOST does what I want: > > > > > >SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na > (Variable.2 > >), > >select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) > > > >In reality this does NOT give me records with NA for SumValue ... but it > >doesn't give me values for any records in which both Variable.1 and > >Variable.2 are NA --- which is "good enough" for my purposes. > > > >I'm guessing with a little more work I could come up with a way to adapt > >the > >code above so that I could get it to work like SAS's sum function ... > > > >... but before I go that extra mile I thought I'd ask others if they know > >of > >functions in either base R ... or in a package that will better mimic the > >SAS sum function. > > > >Any suggestions? > > > >Thanks. > >______________________________________ > >Allen Bingham > >aebingham2 at gmail.com > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Allen Bingham
2015-Jan-26 21:49 UTC
[R] Sum function and missing values --- need to mimic SAS sum function
Don, The default for the sum function is to NOT remove NA before summing (i.e., option na.rm=FALSE), here's the results with na.rm=TRUE> sum(NA,na.rm=TRUE)[1] 0> sum(c(NA,NA),na.rm=TRUE)[1] 0> sum(rep(NA,10),na.rm=TRUE)[1] 0> sum(as.numeric(letters[1:4]),na.rm=TRUE)[1] 0 Warning message: NAs introduced by coercion Hope that explains it a bit better. Others have replied with suggested solutions to my 'problem', and the one by John Fox is what I need (an actual function that I can use in an apply statement), although the suggested code by Sven Templer is appealing in its simplicity. Allen -----Original Message----- From: MacQueen, Don [mailto:macqueen1 at llnl.gov] Sent: Monday, January 26, 2015 1:03 PM To: Allen Bingham; r-help at r-project.org Subject: Re: [R] Sum function and missing values --- need to mimic SAS sum function I'm a little puzzled by the assertion that the result is 0.0 when all the elements are NA:> sum(NA)[1] NA> sum(c(NA,NA))[1] NA> sum(rep(NA, 10))[1] NA> sum(as.numeric(letters[1:4]))[1] NA Warning message: NAs introduced by coercion Considering that the example snippet of code has several other aspects besides using sum(), among them subsetting rows of a data frame when there are apparently NAs in some its variables ... I wonder if the reason for the failure of that snippet has been misunderstood? -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/25/15, 3:21 PM, "Allen Bingham" <aebingham2 at gmail.com> wrote:>I understand that in order to get the sum function to ignore missing >values I need to supply the argument na.rm=TRUE. However, when summing >numeric values in which ALL components are "NA" ... the result is 0.0 >... instead of (what I would get from SAS) of NA (or in the case of SAS >"."). > >Accordingly, I've had to go to 'extreme' measures to get the sum >function to result in NA if all arguments are missing (otherwise give >me a sum of all non-NA elements). > >So for example here's a snippet of code that ALMOST does what I want: > > >SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variabl >e.2 >), >select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE) > >In reality this does NOT give me records with NA for SumValue ... but >it doesn't give me values for any records in which both Variable.1 and >Variable.2 are NA --- which is "good enough" for my purposes. > >I'm guessing with a little more work I could come up with a way to >adapt the code above so that I could get it to work like SAS's sum >function ... > >... but before I go that extra mile I thought I'd ask others if they >know of functions in either base R ... or in a package that will better >mimic the SAS sum function. > >Any suggestions? > >Thanks. >______________________________________ >Allen Bingham >aebingham2 at gmail.com > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.