UNIX grep selects out lines in a file and R grep similarly selects out
components of a vector of strings.    On the other hand re.findall
extracts substrings from strings. These are different concepts so
there is no logical reason to expect that these two sets of commands
behave the same. Instead, try this:
> library(gsubfn)
> text <- "=832,1*R[1]K[1]*R[2]K[1]*25%"
> pat <- "[^[[]([0-9]+[,.%]?[0-9]*)[^]]?"
> strapply(text, pat, c)[[1]]
[1] "832,1" "25%"
On Fri, Apr 30, 2010 at 11:59 AM, Albert-Jan Roskam <fomcl at yahoo.com>
wrote:> Hi,
>
> The regular expression (grep) below does not behave at all like the
equivalent in Python. Also, I would be happy if somebody could tell me what the
R equivalent for Python's re.findall is. The regex filters out any numbers
not enclosed by square brackets, including fractions (with either comma or dot
as the separator) and percentages. How should the R code below be modified so it
does the same as the Python code?
>
> # python code
>>>> import re
>>>> pattern = "[^[[]([0-9]+[,.%]?[0-9]*)[^]]?"
>>>> formula = "=832.1*R[1]K[1]*R[2]K[1]*25%"
>>>> re.findall(pattern, formula)
> ['832.1', '25%']
>
> # partial R code
>> formula <- "=832,1*R[1]K[1]*R[2]K[1]*25%"
>> pattern <- "[^[[]([0-9]+[,.%]?[0-9]*)[^]]?"
>> grep(pattern, formula, value=TRUE, perl=TRUE)
> [1] "=832,1*R[1]K[1]*R[2]K[1]*25%"
>
> Thank you, and have a good weekend!
>
> Cheers!!
>
> Albert-Jan
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> All right, but apart from the sanitation, the medicine, education, wine,
public order, irrigation, roads, a fresh water system, and public health, what
have the Romans ever done for us?
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>