Joris Meys
2010-May-28 12:21 UTC
[R] How to get values out of a string using regular expressions?
Dear all, I have a vector of filenames which begins like this : X <- c("OrthoP1_DNA_str.aln", "OrthoP10_DNA_str.aln", "OrthoP100_DNA_str.aln", "OrthoP101_DNA_str.aln", "OrthoP102_DNA_str.aln", "OrthoP103_DNA_str.aln", "OrthoP104_DNA_str.aln", "OrthoP105_DNA_str.aln", "OrthoP106_DNA_str.aln", "OrthoP107_DNA_str.aln") using grep("(\\d+)",X,perl=T,value=T) I get the complete values back. Yet, I want a vector : c(1,10,100,101,102,103,104,105,106,107) In Perl, using the brackets allows for extracting only the numbers (using a construct with $1 for those who know Perl). I want to do the same in R, but can't find a way of doing that without extensive string manipulations. Problem is that the length of the numbers differ, so I can't use substr. I tried> strsplit(X,"\\d+")[[1]] [1] "OrthoP" "_DNA_str.aln" which gives me exactly what I want to throw away. So :> strsplit(X,"\\D+")[[1]] [1] "" "1" [[2]] [1] "" "10" gives something I can use, but it still requires a lot of list manipulation afterwards to get the right vector. Is there an option or a function I'm missing somewhere? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 Joris.Meys@Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Gabor Grothendieck
2010-May-28 12:25 UTC
[R] How to get values out of a string using regular expressions?
Try this: as.numeric(gsub("\\D", "", X)) On Fri, May 28, 2010 at 8:21 AM, Joris Meys <jorismeys at gmail.com> wrote:> Dear all, > > I have a vector of filenames which begins like this : > X <- c("OrthoP1_DNA_str.aln", "OrthoP10_DNA_str.aln", > "OrthoP100_DNA_str.aln", > "OrthoP101_DNA_str.aln", "OrthoP102_DNA_str.aln", "OrthoP103_DNA_str.aln", > "OrthoP104_DNA_str.aln", "OrthoP105_DNA_str.aln", "OrthoP106_DNA_str.aln", > "OrthoP107_DNA_str.aln") > > using > grep("(\\d+)",X,perl=T,value=T) > > I get the complete values back. Yet, I want a vector : > > c(1,10,100,101,102,103,104,105,106,107) > > In Perl, using the brackets allows for extracting only the numbers (using a > construct with $1 for those who know Perl). > > I want to do the same in R, but can't find a way of doing that without > extensive string manipulations. Problem is that the length of the numbers > differ, so I can't use substr. > I tried >> strsplit(X,"\\d+") > [[1]] > [1] "OrthoP" ? ? ? "_DNA_str.aln" > which gives me exactly what I want to throw away. So : >> strsplit(X,"\\D+") > [[1]] > [1] "" ?"1" > > [[2]] > [1] "" ? "10" > gives something I can use, but it still requires a lot of list manipulation > afterwards to get the right vector. Is there an option or a function I'm > missing somewhere? > > Cheers > Joris > > -- > Joris Meys > Statistical Consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > Coupure Links 653 > B-9000 Gent > > tel : +32 9 264 59 87 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >