Some comments:
1. [^\s] matches everything up to a literal 's', unless perl=TRUE.
2. The (.*) is greedy, so you'll need
(.*?)"\s"(.*?)"\s"(.*?)"$ or
similar at the end of the expression
With those changes (and removing a space inserted by the newsgroup
posting) the expression works for me.
> (pat <- readLines("/tmp/b.txt")[1])
[1]
"^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\"([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\"\\s([^\\s]+)\\s(\\d+)\\s\"(.*?)\"\\s\"(.*?)\"\\s\"(.*?)\"$"
> regexpr(pat, test, perl=TRUE)
[1] 1
attr(,"match.length")
[1] 436
3. Consider a different approach, e.g. scan(textConnection(test),
what=character(0))
Hope this helps
Allan
On 16/03/11 22:18, Saptarshi Guha wrote:> Hello R users,
>
> I have this regex see [1] for apache log lines. I tried using R to parse
> some data (only because I wanted to stay in R).
> A sample line is [2]
>
> (a) I saved the line in [1] into "~/tmp/a.txt" and [2] into
"/tmp/a.txt"
>
> pat<- readLines("~/tmp/a.txt")
> test<- readLines("/tmp/a.txt")
> test
> grep(pat,test)
>
> returns integer(0)
>
> The same query works in python via re.match(....) (i.e does return groups)
>
> Using readLines, the regex is escaped for me. Does Python and R use
> different regex styles?
>
> Cheers
> Saptarshi
>
> [1]
>
^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s"([A-Z]*)\s([^\s]*)\s([^\s]*)"\s([^\s]+)\s(\d+)\s"(.*)"\s"(.*)"\s"(.*)"$
>
> [2]
> 220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] "GET
>
/blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux%
> 202.6.32.9/default/default/6/6/1/ HTTP/1.1" 200 3243 "-"
"Mozilla/5.0
> (Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre
> Fennec/4.0b5" "BLOCKLIST_v3=110.163.217.169.1299218425.9706"
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.