thr3ads.net - R help - [R] help with regexp [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Jannis

2011-Oct-05 11:56 UTC

[R] help with regexp

Dear list memebers, 


I am stuck with using regular expressions.


Imagine I have a vector of character strings like:

test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')

How could I use regexpressions to extract only the 'def'/'abc'
parts of these strings?


Some try from my side yielded no results:

testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)',
perl = TRUE, value = TRUE)

Somehow I seem to miss some important concept here. Until now I always used
nested sub expressions like:

testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))


but this tends to become cumbersome and I was wondering whether there is a more
elegant way to do this?



Thanks for any help

Jannis

Albert-Jan Roskam

2011-Oct-05 12:37 UTC

head link

[R] help with regexp

Hello!
 
library(gsubfn)
test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
gsubfn("(.+_)([a-z]+)(\\.pdf)", "\\2", test)

Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public
order, irrigation, roads, a fresh water system, and public health, what have the
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>________________________________
>From: Jannis <bt_jannis@yahoo.de>
>To: r-help@stat.math.ethz.ch
>Sent: Wednesday, October 5, 2011 1:56 PM
>Subject: [R] help with regexp
>
>Dear list memebers, 
>
>
>I am stuck with using regular expressions.
>
>
>Imagine I have a vector of character strings like:
>
>test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
>
>How could I use regexpressions to extract only the
'def'/'abc' parts of these strings?
>
>
>Some try from my side yielded no results:
>
>testresults <-
grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value =
TRUE)
>
>Somehow I seem to miss some important concept here. Until now I always used
nested sub expressions like:
>
>testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))
>
>
>but this tends to become cumbersome and I was wondering whether there is a
more elegant way to do this?
>
>
>
>Thanks for any help
>
>Jannis
>
>
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>	[[alternative HTML version deleted]]

Eik Vettorazzi

2011-Oct-05 13:11 UTC

head link

[R] help with regexp

Hi Jannis,
just use the backreferences in gsub, see ?gsub, -> replacement

test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
gsub(".*_([A-z]+)\\.pdf", "\\1", test)

hth.

Am 05.10.2011 13:56, schrieb Jannis:> Dear list memebers, 
> 
> 
> I am stuck with using regular expressions.
> 
> 
> Imagine I have a vector of character strings like:
> 
> test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
> 
> How could I use regexpressions to extract only the
'def'/'abc' parts of these strings?
> 
> 
> Some try from my side yielded no results:
> 
> testresults <-
grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value =
TRUE)
> 
> Somehow I seem to miss some important concept here. Until now I always used
nested sub expressions like:
> 
> testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))
> 
> 
> but this tends to become cumbersome and I was wondering whether there is a
more elegant way to do this?
> 
> 
> 
> Thanks for any help
> 
> Jannis
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi
Institut f?r Medizinische Biometrie und Epidemiologie
Universit?tsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts;
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr.
Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus

Gabor Grothendieck

2011-Oct-05 15:13 UTC

head link

[R] help with regexp

On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jannis at yahoo.de>
wrote:> Dear list memebers,
>
>
> I am stuck with using regular expressions.
>
>
> Imagine I have a vector of character strings like:
>
> test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
>
> How could I use regexpressions to extract only the
'def'/'abc' parts of these strings?
>
>
> Some try from my side yielded no results:
>
> testresults <-
grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value =
TRUE)
>
> Somehow I seem to miss some important concept here. Until now I always used
nested sub expressions like:
>
> testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))
>
>
> but this tends to become cumbersome and I was wondering whether there is a
more elegant way to do this?
>
Here are a couple of solutions:

# remove everything up to _b as well as everything from . onwards
gsub(".*_|[.].*", "", test)

# extract everything that is not a _ provided it is immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Oct 2011 - help with regexp

[R] help with regexp

[R] help with regexp

[R] help with regexp

[R] help with regexp

Possibly Parallel Threads