Hi I have a question concerning regexp - I want to select with grep all character strings which contain the numbers 11:20 (code below). At the moment I am using [], but that obviously does not work, as it matches each element in the []. Is there a way to specify that the regexp should match 11, but not 1? Here is the code code: x <- paste("suff", 1:40, "pref", sep="_") x ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" "suff_5_pref" ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" "suff_10_pref" ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" "suff_15_pref" ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" "suff_20_pref" ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" "suff_25_pref" ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" "suff_30_pref" ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" "suff_35_pref" ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" "suff_40_pref" i <- paste(11:20, collapse=",") i ## [1] "11,12,13,14,15,16,17,18,19,20" grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" "suff_5_pref" ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" "suff_10_pref" ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" "suff_15_pref" ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" "suff_20_pref" ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" "suff_25_pref" ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" "suff_30_pref" ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" "suff_35_pref" ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" "suff_40_pref" ## But I would like to have ## [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" "suff_15_pref" ## [6] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" "suff_20_pref" Version and platform info:> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 13.0 year 2011 month 04 day 13 svn rev 55427 language R version.string R version 2.13.0 (2011-04-13)> sessionInfo()R version 2.13.0 (2011-04-13) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reshape_0.8.4 plyr_1.5.2 tgp_2.4-2 lhs_0.5 [5] RSQLite_0.9-4 DBI_0.2-5 date_1.2-29 simecol_0.7-2 [9] lattice_0.19-26 deSolve_1.10-2 loaded via a namespace (and not attached): [1] grid_2.13.0 tools_2.13.0>Thanks in advance, Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax (F): +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer@krugs.de Skype: RMkrug [[alternative HTML version deleted]]
On Fri, Jul 1, 2011 at 11:02 AM, Rainer M Krug <r.m.krug at gmail.com> wrote:> Hi > > I have a question concerning regexp - I want to select with grep all > character strings which contain the numbers 11:20 (code below). > > At the moment I am using [], but that obviously does not work, as it matches > each element in the []. Is there a way to specify that the regexp should > match 11, but not 1? > > Here is the code code: > > x <- paste("suff", 1:40, "pref", sep="_") > x > ## ?[1] "suff_1_pref" ?"suff_2_pref" ?"suff_3_pref" ?"suff_4_pref" > ?"suff_5_pref" > ## ?[6] "suff_6_pref" ?"suff_7_pref" ?"suff_8_pref" ?"suff_9_pref" > ?"suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > > i <- paste(11:20, collapse=",") > i > ## [1] "11,12,13,14,15,16,17,18,19,20" > > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) > ## ?[1] "suff_1_pref" ?"suff_2_pref" ?"suff_3_pref" ?"suff_4_pref" > ?"suff_5_pref" > ## ?[6] "suff_6_pref" ?"suff_7_pref" ?"suff_8_pref" ?"suff_9_pref" > ?"suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > > ## But I would like to have > ## [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [6] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref"Here are two approaches: grep("1\\d|20", x, value = TRUE) grep(paste(11:20, collapse = "|"), x, value = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Jul 1, 2011, at 11:02 AM, Rainer M Krug wrote:> Hi > > I have a question concerning regexp - I want to select with grep all > character strings which contain the numbers 11:20 (code below). > > At the moment I am using [], but that obviously does not work, as it > matches > each element in the []. Is there a way to specify that the regexp > should > match 11, but not 1? > > Here is the code code: > > x <- paste("suff", 1:40, "pref", sep="_") > x > ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" > "suff_5_pref" > ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" > "suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" >> grep("suff_1[1-9]|suff_20", x, value=TRUE) [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" "suff_15_pref" "suff_16_pref" [7] "suff_17_pref" "suff_18_pref" "suff_19_pref" "suff_20_pref"> i <- paste(11:20, collapse=",") > i > ## [1] "11,12,13,14,15,16,17,18,19,20"That does not look right. You now have a single element with lots of commas.> > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) > ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" > "suff_5_pref" > ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" > "suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" >The list of values in an [ ] expression is not delimited by commas. You are matching on the first letter following the underscore whenever any character in the "i" string is present (including commas). > x[40] <- 'suff_,zz_pref' > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) # x[40] matches> ## But I would like to have > ## [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [6] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > > Version and platform info: > >> version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 13.0 > year 2011 > month 04 > day 13 > svn rev 55427 > language R > version.string R version 2.13.0 (2011-04-13) > >> sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] reshape_0.8.4 plyr_1.5.2 tgp_2.4-2 lhs_0.5 > [5] RSQLite_0.9-4 DBI_0.2-5 date_1.2-29 simecol_0.7-2 > [9] lattice_0.19-26 deSolve_1.10-2 > > loaded via a namespace (and not attached): > [1] grid_2.13.0 tools_2.13.0 >> > > Thanks in advance, > > Rainer > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation > Biology, > UCT), Dipl. Phys. (Germany) > > Centre of Excellence for Invasion Biology > Stellenbosch University > South Africa > > Tel : +33 - (0)9 53 10 27 44 > Cell: +33 - (0)6 85 62 59 98 > Fax (F): +33 - (0)9 58 10 27 44 > > Fax (D): +49 - (0)3 21 21 25 22 44 > > email: Rainer at krugs.de > > Skype: RMkrug > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT