Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input <- readLines(textConnection( "some text <ax> = 1.3770E-03 <bx> = 3.4644E-07 <ay> = 1.9412E-04 <by> = 4.8840E-08 other text <aax> = 1.3770E-03 <bbx> = 3.4644E-07 <aay> = 1.9412E-04 <bby> = 4.8840E-08")) ## this is what I want results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing <ax> and <aax> for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste
Try this: strapply(input, "([0-9]+\\.[0-9]+E-[0-9]+)", c, simplify = rbind, combine = as.numeric) On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie <baptiste.auguie at googlemail.com> wrote:> Dear list, > > I'm seeking advice to extract some numeric values from a log file > created by an external program. Consider the following example, > > input <- > readLines(textConnection( > "some text > ?<ax> = ? ?1.3770E-03 ? ? <bx> = ? ?3.4644E-07 > ?<ay> = ? ?1.9412E-04 ? ? <by> = ? ?4.8840E-08 > > other text > ?<aax> ?= ? ?1.3770E-03 ? ? <bbx> = ? ?3.4644E-07 > ?<aay> ?= ? ?1.9412E-04 ? ? <bby> = ? ?4.8840E-08")) > > ## this is what I want > results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), > ? ? ? ? ? ? as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), > ? ? ? ? ? ? as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), > ? ? ? ? ? ? as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) > ? ? ? ? ? ? ) > > ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 > > The use of strsplit is not ideal here as there is a different number > of space characters in the lines containing <ax> and <aax> for > instance (hence the indices 8 and 9 respectively). > > I tried to use gsubfn for a cleaner construct, > > strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) > > but I can't seem to find the correct regular expression to deal with > the exponent. > > > Any tips are welcome! > > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces> z <- gsub("[^[:digit:]E.+-]"," ",input) > z[1] " " [2] " 1.3770E-03 3.4644E-07" [3] " 1.9412E-04 4.8840E-08" [4] "" [5] " " [6] " 1.3770E-03 3.4644E-07" [7] " 1.9412E-04 4.8840E-08" ## Now it can be scanned to a numeric via> z<-scan(textConnection(z),what=0)Read 8 items> z[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 ######## I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input <- readLines(textConnection( "some text <ax> = 1.3770E-03 <bx> = 3.4644E-07 <ay> = 1.9412E-04 <by> = 4.8840E-08 other text <aax> = 1.3770E-03 <bbx> = 3.4644E-07 <aay> = 1.9412E-04 <bby> = 4.8840E-08")) ## this is what I want results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing <ax> and <aax> for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.