Dear list memebers, I am stuck with using regular expressions. Imagine I have a vector of character strings like: test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? Some try from my side yielded no results: testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? Thanks for any help Jannis
Hello! library(gsubfn) test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') gsubfn("(.+_)([a-z]+)(\\.pdf)", "\\2", test) Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>________________________________ >From: Jannis <bt_jannis@yahoo.de> >To: r-help@stat.math.ethz.ch >Sent: Wednesday, October 5, 2011 1:56 PM >Subject: [R] help with regexp > >Dear list memebers, > > >I am stuck with using regular expressions. > > >Imagine I have a vector of character strings like: > >test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > >How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? > > >Some try from my side yielded no results: > >testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) > >Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: > >testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > >but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? > > > >Thanks for any help > >Jannis > > > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
Hi Jannis, just use the backreferences in gsub, see ?gsub, -> replacement test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') gsub(".*_([A-z]+)\\.pdf", "\\1", test) hth. Am 05.10.2011 13:56, schrieb Jannis:> Dear list memebers, > > > I am stuck with using regular expressions. > > > Imagine I have a vector of character strings like: > > test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > > How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? > > > Some try from my side yielded no results: > > testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) > > Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: > > testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > > but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? > > > > Thanks for any help > > Jannis > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus
On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jannis at yahoo.de> wrote:> Dear list memebers, > > > I am stuck with using regular expressions. > > > Imagine I have a vector of character strings like: > > test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > > How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? > > > Some try from my side yielded no results: > > testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) > > Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: > > testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > > but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? >Here are a couple of solutions: # remove everything up to _b as well as everything from . onwards gsub(".*_|[.].*", "", test) # extract everything that is not a _ provided it is immediately followed by . library(gsubfn) strapply(test, "([^_]+)[.]", simplify = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com