Assuming only START fields match pat:
> ## this one has more fields: how do I generalize the regular expression?
> st2 = c("START text1 1 text2 2.3 text3 5", "whatever
intermediate text",
+ "START text1 23.4 text2 3.1415 text3 6")>
> pat <- "[[:alnum:]]+ +([0-9.]+)"
> s <- strapply(st2, pat, c, simplify = rbind)
>
> pat2 <- "([[:alnum:]]+) +[0-9.]+"
> colnames(s) <- strapply(st2[1], pat2, c, simplify = rbind)
> s
text1 text2 text3
[1,] "1" "2.3" "5"
[2,] "23.4" "3.1415" "6"
If there are non-START fields that do match pat then grep out the
START fields first.
On Mon, Oct 26, 2009 at 9:30 AM, baptiste auguie
<baptiste.auguie at googlemail.com> wrote:> Dear list,
>
> I have the following text to parse (originating from readLines as some
> lines have unequal size),
>
> st = c("START text1 1 text2 2.3", "whatever intermediate
text", "START
> text1 23.4 text2 3.1415")
>
> from which I'd like to extract the lines starting with
"START", and
> group the subsequent fields in a data.frame in this format:
>
> ?text1 ?text2
> ? ? 1 ? ?2.3
> ?23.4 3.1415
>
>
> All the lines containing "START" have the same number of fields,
but
> this number may vary from file to file.
>
> I have managed to get this minimal example work, but I am at a loss as
> for handling an arbitrary number of couples (text value),
>
> library(gsubfn)
>
> ( parsed > strapply(st, "^START +([[:alnum:]]+) +([0-9.]+)
+([[:alnum:]]+)
> +([0-9.]+)",c, simplify=rbind,combine=c) )
>
> d = data.frame(parsed[ ,c(2,4)])
> names(d) <- apply(parsed[ ,c(1,3)], 2, unique)
> d
>
> ## this one has more fields: how do I generalize the regular expression?
> st2 = c("START text1 1 text2 2.3 text3 5", "whatever
intermediate
> text", "START text1 23.4 text2 3.1415 text3 6")
>
> Best regards,
>
>
> Baptiste
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>