thr3ads.net - R help - [R] parsing numeric values [Nov 2009]

If this information is useful, please help other people find it:
Share via:

baptiste auguie

2009-Nov-18 11:57 UTC

[R] parsing numeric values

Dear list,

I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,

input <-
readLines(textConnection(
"some text
  <ax> =    1.3770E-03     <bx> =    3.4644E-07
  <ay> =    1.9412E-04     <by> =    4.8840E-08

other text
  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))

## this is what I want
results <- c(as.numeric(strsplit(grep("<ax>", input,val=T),
" ")[[1]][8]),
             as.numeric(strsplit(grep("<ay>", input,val=T),
" ")[[1]][8]),
             as.numeric(strsplit(grep("<aax>", input,val=T),
" ")[[1]][9]),
             as.numeric(strsplit(grep("<aay>", input,val=T),
" ")[[1]][9])
             )

## [1] 0.00137700 0.00019412 0.00137700 0.00019412

The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing <ax> and <aax> for
instance (hence the indices 8 and 9 respectively).

I tried to use gsubfn for a cleaner construct,

strapply(input, "<ax> += +([0-9.]+)", c,
simplify=rbind,combine=as.numeric)

but I can't seem to find the correct regular expression to deal with
the exponent.


Any tips are welcome!


Best regards,

baptiste

Henrique Dallazuanna

2009-Nov-18 12:28 UTC

head link

[R] parsing numeric values

Try this:

strapply(input, "([0-9]+\\.[0-9]+E-[0-9]+)", c, simplify = rbind,
combine = as.numeric)

On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
<baptiste.auguie at googlemail.com> wrote:> Dear list,
>
> I'm seeking advice to extract some numeric values from a log file
> created by an external program. Consider the following example,
>
> input <-
> readLines(textConnection(
> "some text
> ?<ax> = ? ?1.3770E-03 ? ? <bx> = ? ?3.4644E-07
> ?<ay> = ? ?1.9412E-04 ? ? <by> = ? ?4.8840E-08
>
> other text
> ?<aax> ?= ? ?1.3770E-03 ? ? <bbx> = ? ?3.4644E-07
> ?<aay> ?= ? ?1.9412E-04 ? ? <bby> = ? ?4.8840E-08"))
>
> ## this is what I want
> results <- c(as.numeric(strsplit(grep("<ax>",
input,val=T), " ")[[1]][8]),
> ? ? ? ? ? ? as.numeric(strsplit(grep("<ay>", input,val=T),
" ")[[1]][8]),
> ? ? ? ? ? ? as.numeric(strsplit(grep("<aax>", input,val=T),
" ")[[1]][9]),
> ? ? ? ? ? ? as.numeric(strsplit(grep("<aay>", input,val=T),
" ")[[1]][9])
> ? ? ? ? ? ? )
>
> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>
> The use of strsplit is not ideal here as there is a different number
> of space characters in the lines containing <ax> and <aax> for
> instance (hence the indices 8 and 9 respectively).
>
> I tried to use gsubfn for a cleaner construct,
>
> strapply(input, "<ax> += +([0-9.]+)", c,
simplify=rbind,combine=as.numeric)
>
> but I can't seem to find the correct regular expression to deal with
> the exponent.
>
>
> Any tips are welcome!
>
>
> Best regards,
>
> baptiste
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

Bert Gunter

2009-Nov-18 17:44 UTC

head link

[R] parsing numeric values

The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.

## Using your example...
## First replace everything but the number with spaces
> z <- gsub("[^[:digit:]E.+-]"," ",input)
> z[1] "         "                                         
[2] "            1.3770E-03               3.4644E-07"   
[3] "            1.9412E-04               4.8840E-08"   
[4] ""                                                  
[5] "          "                                        
[6] "              1.3770E-03                3.4644E-07"
[7] "              1.9412E-04                4.8840E-08"

## Now it can be scanned to a numeric via
> z<-scan(textConnection(z),what=0)
Read 8 items> z[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08

########
I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).

Best,

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values

Dear list,

I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,

input <-
readLines(textConnection(
"some text
  <ax> =    1.3770E-03     <bx> =    3.4644E-07
  <ay> =    1.9412E-04     <by> =    4.8840E-08

other text
  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))

## this is what I want
results <- c(as.numeric(strsplit(grep("<ax>", input,val=T),
" ")[[1]][8]),
             as.numeric(strsplit(grep("<ay>", input,val=T),
" ")[[1]][8]),
             as.numeric(strsplit(grep("<aax>", input,val=T),
" ")[[1]][9]),
             as.numeric(strsplit(grep("<aay>", input,val=T),
" ")[[1]][9])
             )

## [1] 0.00137700 0.00019412 0.00137700 0.00019412

The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing <ax> and <aax> for
instance (hence the indices 8 and 9 respectively).

I tried to use gsubfn for a cleaner construct,

strapply(input, "<ax> += +([0-9.]+)", c,
simplify=rbind,combine=as.numeric)

but I can't seem to find the correct regular expression to deal with
the exponent.


Any tips are welcome!


Best regards,

baptiste

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more reasonably related threads

R help - Nov 2009 - parsing numeric values

[R] parsing numeric values

[R] parsing numeric values

[R] parsing numeric values

Seemingly Similar Threads