gregory_r_warnes@groton.pfizer.com
2003-May-22 22:33 UTC
[Rd] grep, gsub, sub have problems with NA values (PR#3078)
In a string context, grep, gsub, sub are improperly treating NA (missing) as the string "NA", and returning unexpected results> grep("A", c(NA,"NA"))[1] 1 2 # expected: # [1] 2> gsub("A", "X", c(NA,"NA"))[1] "NX" "NX" # expected # [1] NA "NX"> sub("A", "X", c(NA,"NA"))[1] "NX" "NX" # expected # [1] NA "NX" These same functions also don't like 'bare' NA's, presumably because a bare NA is technically a factor object.> grep("A", NA)Error in grep(pattern, x, ignore.case, extended, value) : invalid argument This is, understandable to users who are aware of the actual class of NA, but it would be helpful if bare NAs were treated the same as character NAs (when handling of these is fixed, of course!). -Greg LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}
Thomas Lumley
2003-May-23 17:46 UTC
[Rd] grep, gsub, sub have problems with NA values (PR#3078)
On Thu, 22 May 2003 gregory_r_warnes@groton.pfizer.com wrote:> > In a string context, grep, gsub, sub are improperly treating NA (missing) as > the string "NA", and returning unexpected results >as were chartr, abbreviate, substr, substring, strsplit. Fixed in r-devel, for the case of NA in the `main' string. Haven't yet decided what to do about grep(as.character(NA), x) or substr(x,1,2)<-as.charcter(NA) -thomas
Warnes, Gregory R
2003-May-23 23:22 UTC
[Rd] grep, gsub, sub have problems with NA values (PR#3078)
FormatC also has the reverse problem, it detects any factor contianing the string "NA" and converts it to a factor:> formatC(factor("NAME"),width=8)[1] <NA> Levels: NAME -Greg> -----Original Message----- > From: Thomas Lumley [mailto:tlumley@u.washington.edu] > Sent: Friday, May 23, 2003 11:47 AM > To: gregory_r_warnes@groton.pfizer.com > Cc: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] grep, gsub, sub have problems with NA > values (PR#3078) > > > On Thu, 22 May 2003 gregory_r_warnes@groton.pfizer.com wrote: > > > > > In a string context, grep, gsub, sub are improperly > treating NA (missing) as > > the string "NA", and returning unexpected results > > > > as were chartr, abbreviate, substr, substring, strsplit. > Fixed in r-devel, > for the case of NA in the `main' string. Haven't yet decided > what to do > about > grep(as.character(NA), x) > or > substr(x,1,2)<-as.charcter(NA) > > > > -thomas >LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}
Warnes, Gregory R
2003-May-24 14:17 UTC
[Rd] grep, gsub, sub have problems with NA values (PR#3078)
I see that this came out garbled. It should have read: FormatC also has problems: It incorrectly convertys any factor level *containing* the characters 'NA' to a missing value.> > formatC(factor("NAME"),width=8) > [1] <NA> > Levels: NAME-G> -----Original Message----- > From: Warnes, Gregory R > Sent: Friday, May 23, 2003 5:22 PM > To: 'Thomas Lumley'; Warnes, Gregory R > Cc: r-devel@stat.math.ethz.ch > Subject: RE: [Rd] grep, gsub, sub have problems with NA > values (PR#3078) > > > > FormatC also has the reverse problem, it detects any factor > contianing the string "NA" and converts it to a factor: > > > formatC(factor("NAME"),width=8) > [1] <NA> > Levels: NAME > > -Greg > > > > > -----Original Message----- > > From: Thomas Lumley [mailto:tlumley@u.washington.edu] > > Sent: Friday, May 23, 2003 11:47 AM > > To: gregory_r_warnes@groton.pfizer.com > > Cc: r-devel@stat.math.ethz.ch > > Subject: Re: [Rd] grep, gsub, sub have problems with NA > > values (PR#3078) > > > > > > On Thu, 22 May 2003 gregory_r_warnes@groton.pfizer.com wrote: > > > > > > > > In a string context, grep, gsub, sub are improperly > > treating NA (missing) as > > > the string "NA", and returning unexpected results > > > > > > > as were chartr, abbreviate, substr, substring, strsplit. > > Fixed in r-devel, > > for the case of NA in the `main' string. Haven't yet decided > > what to do > > about > > grep(as.character(NA), x) > > or > > substr(x,1,2)<-as.charcter(NA) > > > > > > > > -thomas > > >LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}
Warnes, Gregory R
2003-May-24 14:36 UTC
[Rd] grep, gsub, sub have problems with NA values (PR#3078)
Oh dear, more careful checking shows that all elements of a factor get converted to NA by formatC, but the results retain the factor levels:> x <- factor(letters[1:5], width=8) > formatC(x)[1] <NA> <NA> <NA> <NA> <NA> Levels: a b c d e I have a hard time justifying this behavior. I expected it to act like format.char:> format.char(x,width=8)[1] "a " "b " "c " "d " "e " Warning message: format.char: coercing 'x' to 'character' in: format.char(x, width = 8) The way this came up was in formatting all of the elements of a dataframe to have width 8 so that I could create a fixed width output file... -G> -----Original Message----- > From: Warnes, Gregory R > Sent: Saturday, May 24, 2003 8:17 AM > To: Warnes, Gregory R; 'Thomas Lumley' > Cc: 'r-devel@stat.math.ethz.ch' > Subject: RE: [Rd] grep, gsub, sub have problems with NA > values (PR#3078) > > > > I see that this came out garbled. It should have read: > > FormatC also has problems: It incorrectly convertys any > factor level *containing* the characters 'NA' to a missing value. > > > > formatC(factor("NAME"),width=8) > > [1] <NA> > > Levels: NAME > > -G > > > > -----Original Message----- > > From: Warnes, Gregory R > > Sent: Friday, May 23, 2003 5:22 PM > > To: 'Thomas Lumley'; Warnes, Gregory R > > Cc: r-devel@stat.math.ethz.ch > > Subject: RE: [Rd] grep, gsub, sub have problems with NA > > values (PR#3078) > > > > > > > > FormatC also has the reverse problem, it detects any factor > > contianing the string "NA" and converts it to a factor: > > > > > formatC(factor("NAME"),width=8) > > [1] <NA> > > Levels: NAME > > > > -Greg > > > > > > > > > -----Original Message----- > > > From: Thomas Lumley [mailto:tlumley@u.washington.edu] > > > Sent: Friday, May 23, 2003 11:47 AM > > > To: gregory_r_warnes@groton.pfizer.com > > > Cc: r-devel@stat.math.ethz.ch > > > Subject: Re: [Rd] grep, gsub, sub have problems with NA > > > values (PR#3078) > > > > > > > > > On Thu, 22 May 2003 gregory_r_warnes@groton.pfizer.com wrote: > > > > > > > > > > > In a string context, grep, gsub, sub are improperly > > > treating NA (missing) as > > > > the string "NA", and returning unexpected results > > > > > > > > > > as were chartr, abbreviate, substr, substring, strsplit. > > > Fixed in r-devel, > > > for the case of NA in the `main' string. Haven't yet decided > > > what to do > > > about > > > grep(as.character(NA), x) > > > or > > > substr(x,1,2)<-as.charcter(NA) > > > > > > > > > > > > -thomas > > > > > >LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}