Morway, Eric
2014-Feb-19 18:26 UTC
[R] Generalizing a regex for retrieving numbers with and without scientific notation
I'm trying to extract all of the values from edm in the example below. However, the first attempt only retrieves the final number in the sequence since it is recorded using scientific notation. The second attempt retrieves all of the numbers, but omits the scientific notation component of the final number. How can I make the regular expression more general such that I get every value AND its corresponding "E"-value (i.e., "...E-06"), where pertinent? I've spent time reading through ?regex, but my attempts to use the "*" character, where the preceding item will be matched zero or more times, have so far proven syntactically incorrect or generally unsuccessful. .Appreciate the help, Eric edm <- c("","param_value","6.301343","6.366305","6.431268","6.496230","6.561192","6.626155","9.091117E-06") param_values <- strapply(edm,"\\d+\\.\\d+E[-+]?\\d+", as.numeric, simplify=cbind) param_values #[1,] 9.091117e-06 param_values <- strapply(edm,"\\d+\\.\\d+", as.numeric, simplify=cbind) param_values #[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117 [[alternative HTML version deleted]]
Marc Schwartz
2014-Feb-19 18:38 UTC
[R] Generalizing a regex for retrieving numbers with and without scientific notation
On Feb 19, 2014, at 12:26 PM, Morway, Eric <emorway at usgs.gov> wrote:> I'm trying to extract all of the values from edm in the example below. > However, the first attempt only retrieves the final number in the sequence > since it is recorded using scientific notation. The second attempt > retrieves all of the numbers, but omits the scientific notation component > of the final number. How can I make the regular expression more general > such that I get every value AND its corresponding "E"-value (i.e., > "...E-06"), where pertinent? I've spent time reading through ?regex, but > my attempts to use the "*" character, where the preceding item will be > matched zero or more times, have so far proven syntactically incorrect or > generally unsuccessful. .Appreciate the help, Eric > > edm <- > c("","param_value","6.301343","6.366305","6.431268","6.496230","6.561192","6.626155","9.091117E-06") > > param_values <- strapply(edm,"\\d+\\.\\d+E[-+]?\\d+", as.numeric, > simplify=cbind) > param_values > #[1,] 9.091117e-06 > > param_values <- strapply(edm,"\\d+\\.\\d+", as.numeric, simplify=cbind) > param_values > #[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117If the individual elements of the vector are either numeric or non-numeric, why not just use:> as.numeric(edm)[1] NA NA 6.301343e+00 6.366305e+00 6.431268e+00 [6] 6.496230e+00 6.561192e+00 6.626155e+00 9.091117e-06 Warning message: NAs introduced by coercion The non-numeric elements are returned as NA's, which you can remove by using ?na.omit. The only reason to use a regex would be if the individual elements themselves contained both numeric and non-numeric characters. If you then want to explicitly format numeric output (which would yield a character vector), you can use ?sprintf or ?format. Keep in mind the difference between how R *PRINTS* a numeric value and how R *STORES* a numeric value internally. Regards, Marc Schwartz