thr3ads.net - R help - [R] unexpected behaviour of sub() / usage of regexp [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Jannis

2011-Dec-09 14:20 UTC

[R] unexpected behaviour of sub() / usage of regexp

Dear R users,


the way I understand the documentation of sub() and regexp the following code: 



sub('[[:digit:]]{1,2}', '', '9ewww')



... should yield:

'ewww'


It returns, however:

'www'


Why is this the case? My code should just substitute 1 (minimum) or up to 2
(maximum) digits, i.e. numbers and not the 'e' in the string. Do I
misinterpret something here?


Thanks for any ideas
Jannis

> sessionInfo()R version 2.14.0 (2011-10-31)
Platform: i686-pc-linux-gnu (32-bit)

locale:
?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C???????????? ?
?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=en_US.UTF-8?? ?
?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8? ?
?[7] LC_PAPER=C???????????????? LC_NAME=C??????????????? ?
?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C?????????? ?
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C????? ?

attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base???

Duncan Murdoch

2011-Dec-09 14:25 UTC

head link

[R] unexpected behaviour of sub() / usage of regexp

On 09/12/2011 9:20 AM, Jannis wrote:> Dear R users,
>
>
> the way I understand the documentation of sub() and regexp the following
code:
>
>
>
> sub('[[:digit:]]{1,2}', '', '9ewww')
>
>
>
> ... should yield:
>
> 'ewww'
>
>
> It returns, however:
>
> 'www'
>
>
> Why is this the case? My code should just substitute 1 (minimum) or up to 2
(maximum) digits, i.e. numbers and not the 'e' in the string. Do I
misinterpret something here?
I get your expected output of "ewww" running 2.14.0 or 2.14.0-patched
on
Windows.   So it's not a universal problem...

Duncan Murdoch>
> Thanks for any ideas
> Jannis
>
>
> >  sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
>   [7] LC_PAPER=C                 LC_NAME=C                 
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base   
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Prof Brian Ripley

2011-Dec-09 14:29 UTC

head link

[R] unexpected behaviour of sub() / usage of regexp

This is AFAICS an instance of bug PR#14408 : it seems that in UTF-8 
locales the grammar generated by the TRE engine for repetitions is in 
odd cases buggy.  And as the author has vanished, our hopes of his 
fixing it are slim.

Try perl=TRUE .

On 09/12/2011 14:20, Jannis wrote:> Dear R users,
>
>
> the way I understand the documentation of sub() and regexp the following
code:
>
>
>
> sub('[[:digit:]]{1,2}', '', '9ewww')
>
>
>
> ... should yield:
>
> 'ewww'
>
>
> It returns, however:
>
> 'www'
>
>
> Why is this the case? My code should just substitute 1 (minimum) or up to 2
(maximum) digits, i.e. numbers and not the 'e' in the string. Do I
misinterpret something here?
>
>
> Thanks for any ideas
> Jannis
>
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Maybe Matching Threads

Search for more reasonably related threads

R help - Dec 2011 - unexpected behaviour of sub() / usage of regexp

[R] unexpected behaviour of sub() / usage of regexp

[R] unexpected behaviour of sub() / usage of regexp

[R] unexpected behaviour of sub() / usage of regexp

Maybe Matching Threads