Jameson C . Burt
2001-Dec-05 22:00 UTC
[R] Detecting numerical value in character variable
I have a variable that can have either numeric or character values. When numeric, I take one action; when not-numeric, I take another action. Unfortunately, my approaches are awkward, so I look for others' approaches. To detect a numeric value, I have semi-successfully used two appoaches. I somewhat simplify here using direct character values like "123" rather than a variable. 1. !is.na(as.numeric("123")) which responds "TRUE", but !is.na(as.numeric("abc")) responds FALSE #so I know it is not numeric Warning message: NAs introduced by coercion This all works well enough except the error message looks bad when printed, and hints that I use the wrong appoach. 2. !as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123"))))) This responds "TRUE" for the string "123" having only numeric characters. However, notice how harsh this is on the reader. Unfortunately, "regexpr" here responds in -1 and 1 rather than FALSE and TRUE, so this becomes an extra verbose appoach. My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE? One first imagines trying, is.numeric("123") but this responds FALSE, telling us merely that this is a character string. This problem arises in an R program I have used for years to balance my checkbook, producing 5 lines identical to my bank's statement. I input my checkbook data from a file with one natural column having entries like (excluding # comments), 3117 #check number SALARY:10-1-01 #salary deposited on 10/1/2001 TRANSF:10-23-01 #transfer between accounts on 10/23/2001 These non-numerical descriptive entries speed balancing my checkbook, especially when I error. -- Jameson C. Burt, NJ9L Fairfax, Virginia, USA jameson at coost.com http://www.coost.com (202) 690-0380 (work) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian Ripley
2001-Dec-06 08:22 UTC
[R] Detecting numerical value in character variable
On Wed, 5 Dec 2001, Jameson C . Burt wrote:> I have a variable that can have either numeric or character values. > When numeric, I take one action; when not-numeric, I take another action. > Unfortunately, my approaches are awkward, so I look for others' approaches. > > To detect a numeric value, I have semi-successfully used two appoaches. > I somewhat simplify here using direct character values like "123" rather than a variable. > 1. !is.na(as.numeric("123")) > which responds "TRUE", but > !is.na(as.numeric("abc")) > responds > FALSE #so I know it is not numeric > Warning message: > NAs introduced by coercion > This all works well enough except the error message looks bad > when printed, and hints that I use the wrong appoach.That is the best current approach. Set options(warn=-1) around the piece of code using it. Another approach in 1.4.0 (real soon now) is to use type.convert, and check if the answer is mode "numeric").> > 2. !as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123"))))) > This responds "TRUE" for the string "123" having only numeric characters. > However, notice how harsh this is on the reader.Well, numbers can have decimal points in, and you are only testing if any character is non-numeric. regexpr("[^\.0-9]","123") == -1 would be pretty good. This would not allow exponential notation nor Inf or -Inf, though.> Unfortunately, "regexpr" here responds in -1 and 1 rather than FALSE and TRUE, > so this becomes an extra verbose appoach.See above.> My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE? > One first imagines trying, > is.numeric("123") > but this responds FALSE, telling us merely that this is a character string.Correct, as documented. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Jameson C . Burt <jameson at monumental.com> writes:>I have a variable that can have either numeric or character values. >When numeric, I take one action; when not-numeric, I take another action. >Unfortunately, my approaches are awkward, so I look for others' approaches. > >To detect a numeric value, I have semi-successfully used two appoaches. >I somewhat simplify here using direct character values like "123" rather than a >variable. >1. !is.na(as.numeric("123")) > which responds "TRUE", but > !is.na(as.numeric("abc")) > responds > FALSE #so I know it is not numeric > Warning message: > NAs introduced by coercion > This all works well enough except the error message looks bad > when printed, and hints that I use the wrong appoach. > >2. !as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123"))) >)) > This responds "TRUE" for the string "123" having only numeric characters. > However, notice how harsh this is on the reader. > > Unfortunately, "regexpr" here responds in -1 and 1 rather than FALSE and >TRUE, > so this becomes an extra verbose appoach. > >My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE? >One first imagines trying, > is.numeric("123") >but this responds FALSE, telling us merely that this is a character string. > > >This problem arises in an R program I have used for years to balance my >checkbook, >producing 5 lines identical to my bank's statement. >I input my checkbook data from a file with one natural column having entries >like >(excluding # comments), > 3117 #check number > SALARY:10-1-01 #salary deposited on 10/1/2001 > TRANSF:10-23-01 #transfer between accounts on 10/23/2001 >These non-numerical descriptive entries speed balancing my checkbook, >especially when I error.Your first solution is fine: a <- c("a", "b", 3, 4, "f") b <- as.numeric(a) a[!is.na(b)] but gives warnings. Suppress them with options(): a <- c("a", "b", 3, 4, "f") options(warn = -1) b <- as.numeric(a) a[!is.na(b)] Remember to reinstate warnings: options(warn = 1) When you are finished. See help(options). Mark -- Mark Myatt -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._