Jameson C . Burt
2001-Dec-05 22:00 UTC
[R] Detecting numerical value in character variable
I have a variable that can have either numeric or character values.
When numeric, I take one action; when not-numeric, I take another action.
Unfortunately, my approaches are awkward, so I look for others' approaches.
To detect a numeric value, I have semi-successfully used two appoaches.
I somewhat simplify here using direct character values like "123"
rather than a variable.
1. !is.na(as.numeric("123"))
which responds "TRUE", but
!is.na(as.numeric("abc"))
responds
FALSE #so I know it is not numeric
Warning message:
NAs introduced by coercion
This all works well enough except the error message looks bad
when printed, and hints that I use the wrong appoach.
2.
!as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123")))))
This responds "TRUE" for the string "123" having only
numeric characters.
However, notice how harsh this is on the reader.
Unfortunately, "regexpr" here responds in -1 and 1 rather than
FALSE and TRUE,
so this becomes an extra verbose appoach.
My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE?
One first imagines trying,
is.numeric("123")
but this responds FALSE, telling us merely that this is a character string.
This problem arises in an R program I have used for years to balance my
checkbook,
producing 5 lines identical to my bank's statement.
I input my checkbook data from a file with one natural column having entries
like
(excluding # comments),
3117 #check number
SALARY:10-1-01 #salary deposited on 10/1/2001
TRANSF:10-23-01 #transfer between accounts on 10/23/2001
These non-numerical descriptive entries speed balancing my checkbook,
especially when I error.
--
Jameson C. Burt, NJ9L Fairfax, Virginia, USA
jameson at coost.com http://www.coost.com
(202) 690-0380 (work)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian Ripley
2001-Dec-06 08:22 UTC
[R] Detecting numerical value in character variable
On Wed, 5 Dec 2001, Jameson C . Burt wrote:> I have a variable that can have either numeric or character values. > When numeric, I take one action; when not-numeric, I take another action. > Unfortunately, my approaches are awkward, so I look for others' approaches. > > To detect a numeric value, I have semi-successfully used two appoaches. > I somewhat simplify here using direct character values like "123" rather than a variable. > 1. !is.na(as.numeric("123")) > which responds "TRUE", but > !is.na(as.numeric("abc")) > responds > FALSE #so I know it is not numeric > Warning message: > NAs introduced by coercion > This all works well enough except the error message looks bad > when printed, and hints that I use the wrong appoach.That is the best current approach. Set options(warn=-1) around the piece of code using it. Another approach in 1.4.0 (real soon now) is to use type.convert, and check if the answer is mode "numeric").> > 2. !as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123"))))) > This responds "TRUE" for the string "123" having only numeric characters. > However, notice how harsh this is on the reader.Well, numbers can have decimal points in, and you are only testing if any character is non-numeric. regexpr("[^\.0-9]","123") == -1 would be pretty good. This would not allow exponential notation nor Inf or -Inf, though.> Unfortunately, "regexpr" here responds in -1 and 1 rather than FALSE and TRUE, > so this becomes an extra verbose appoach.See above.> My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE? > One first imagines trying, > is.numeric("123") > but this responds FALSE, telling us merely that this is a character string.Correct, as documented. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Jameson C . Burt <jameson at monumental.com> writes:>I have a variable that can have either numeric or character values. >When numeric, I take one action; when not-numeric, I take another action. >Unfortunately, my approaches are awkward, so I look for others' approaches. > >To detect a numeric value, I have semi-successfully used two appoaches. >I somewhat simplify here using direct character values like "123" rather than a >variable. >1. !is.na(as.numeric("123")) > which responds "TRUE", but > !is.na(as.numeric("abc")) > responds > FALSE #so I know it is not numeric > Warning message: > NAs introduced by coercion > This all works well enough except the error message looks bad > when printed, and hints that I use the wrong appoach. > >2. !as.logical(gsub("1","T",gsub("-1","F",as.character(regexpr("[^0-9]","123"))) >)) > This responds "TRUE" for the string "123" having only numeric characters. > However, notice how harsh this is on the reader. > > Unfortunately, "regexpr" here responds in -1 and 1 rather than FALSE and >TRUE, > so this becomes an extra verbose appoach. > >My question: CAN ONE BETTER DETECT NUMERIC DATA IN A CHARACTER VARIABLE? >One first imagines trying, > is.numeric("123") >but this responds FALSE, telling us merely that this is a character string. > > >This problem arises in an R program I have used for years to balance my >checkbook, >producing 5 lines identical to my bank's statement. >I input my checkbook data from a file with one natural column having entries >like >(excluding # comments), > 3117 #check number > SALARY:10-1-01 #salary deposited on 10/1/2001 > TRANSF:10-23-01 #transfer between accounts on 10/23/2001 >These non-numerical descriptive entries speed balancing my checkbook, >especially when I error.Your first solution is fine: a <- c("a", "b", 3, 4, "f") b <- as.numeric(a) a[!is.na(b)] but gives warnings. Suppress them with options(): a <- c("a", "b", 3, 4, "f") options(warn = -1) b <- as.numeric(a) a[!is.na(b)] Remember to reinstate warnings: options(warn = 1) When you are finished. See help(options). Mark -- Mark Myatt -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._