Dear all I am again (as usual) lost in regular expression use for selection. Here are my data:> dput(mena)c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") I want to select only values "m" foolowed by numbers from 53 to 59. I used sub("m5.", "", mena) which correctly selects those m53 - m59 values but, in contrary to my expectation, it replaced the selected values with specified replacement - in that case empty string. What I shall use if I want to get rid of all but m53-m59 from those strings? Regards Petr
Hi, Try grepl instead of sub, mena[grepl("m5.", mena)] HTH, baptiste On 14 November 2011 21:45, Petr PIKAL <petr.pikal at precheza.cz> wrote:> Dear all > > I am again (as usual) lost in regular expression use for selection. Here > are my data: > >> dput(mena) > c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > > I want to select only values "m" foolowed by numbers from 53 to 59. > > I used > > sub("m5.", "", mena) > > which correctly selects those m53 - m59 values but, in contrary to my > expectation, it replaced the selected values with specified replacement - > in that case empty string. > > What I shall use if I want to get rid of all but m53-m59 from those > strings? > > Regards > Petr > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 11/14/2011 07:45 PM, Petr PIKAL wrote:> Dear all > > I am again (as usual) lost in regular expression use for selection. Here > are my data: > >> dput(mena) > c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > > I want to select only values "m" foolowed by numbers from 53 to 59. > > I used > > sub("m5.", "", mena) > > which correctly selects those m53 - m59 values but, in contrary to my > expectation, it replaced the selected values with specified replacement - > in that case empty string. > > What I shall use if I want to get rid of all but m53-m59 from those > strings? >Hi Petr, How about: grep("m5",mena) Jim
Hi> > Hi, > > Try grepl instead of sub, > > mena[grepl("m5.", mena)]It does not select those "m5?" strings from those character vectors. I need as an output a vector m53, m54, m55, m56, m57, m58, m59 Regards Petr> > HTH, > > baptiste > > On 14 November 2011 21:45, Petr PIKAL <petr.pikal at precheza.cz> wrote: > > Dear all > > > > I am again (as usual) lost in regular expression use for selection.Here> > are my data: > > > >> dput(mena) > > c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > > "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > > > > I want to select only values "m" foolowed by numbers from 53 to 59. > > > > I used > > > > sub("m5.", "", mena) > > > > which correctly selects those m53 - m59 values but, in contrary to my > > expectation, it replaced the selected values with specifiedreplacement -> > in that case empty string. > > > > What I shall use if I want to get rid of all but m53-m59 from those > > strings? > > > > Regards > > Petr > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >
Hi> On 11/14/2011 07:45 PM, Petr PIKAL wrote: > > Dear all > > > > I am again (as usual) lost in regular expression use for selection.Here> > are my data: > > > >> dput(mena) > > c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > > "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > > "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > > > > I want to select only values "m" foolowed by numbers from 53 to 59. > > > > I used > > > > sub("m5.", "", mena) > > > > which correctly selects those m53 - m59 values but, in contrary to my > > expectation, it replaced the selected values with specifiedreplacement -> > in that case empty string. > > > > What I shall use if I want to get rid of all but m53-m59 from those > > strings? > > > Hi Petr, > How about: > > grep("m5",mena)It gives numeric values which tells me that there is a match in each string, but as a result I need only m53-m59 substrings. Regards Petr> > Jim >
On 14.11.2011 10:22, Petr PIKAL wrote:> Hi > >> On 11/14/2011 07:45 PM, Petr PIKAL wrote: >>> Dear all >>> >>> I am again (as usual) lost in regular expression use for selection. > Here >>> are my data: >>> >>>> dput(mena) >>> c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", >>> "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") >>> >>> I want to select only values "m" foolowed by numbers from 53 to 59. >>> >>> I used >>> >>> sub("m5.", "", mena) >>> >>> which correctly selects those m53 - m59 values but, in contrary to my >>> expectation, it replaced the selected values with specified > replacement - >>> in that case empty string. >>> >>> What I shall use if I want to get rid of all but m53-m59 from those >>> strings? >>> >> Hi Petr, >> How about: >> >> grep("m5",mena) > > It gives numeric values which tells me that there is a match in each > string, but as a result I need only > > m53-m59 substrings.gsub(".*_(m5.).*", "\\1", mena) Uwe Ligges> Regards > Petr > > > >> >> Jim >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Does library( stringr ) str_extract( mena, "m5[0-9]" ) achieve what you are looking for? Rgds, Rainer On Monday 14 November 2011 10:22:09 Petr PIKAL wrote:> Hi > > > On 11/14/2011 07:45 PM, Petr PIKAL wrote: > > > Dear all > > > > > > I am again (as usual) lost in regular expression use for > > > selection. > > Here > > > > are my data: > > >> dput(mena) > > > > > > c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > > > "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > > > > > > I want to select only values "m" foolowed by numbers from 53 to > > > 59. > > > > > > I used > > > > > > sub("m5.", "", mena) > > > > > > which correctly selects those m53 - m59 values but, in contrary > > > to my expectation, it replaced the selected values with > > > specified > replacement - > > > > in that case empty string. > > > > > > What I shall use if I want to get rid of all but m53-m59 from > > > those > > > strings? > > > > Hi Petr, > > How about: > > > > grep("m5",mena) > > It gives numeric values which tells me that there is a match in each > string, but as a result I need only > > m53-m59 substrings. > > Regards > Petr > > > Jim > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.
Hi Thank you. It is a pure magic, something taught in Unseen University. this is what I got as a help for selecting only letters from set of character vector.> vzor[1] "61A" "62C/27" "65A/27" "66C/29" "69A/29" "70C/31" "73A/31" [8] "74C/33" "77A/33" "81A/35" "82C/37" "85A/37" "86C/39" "89A/39" [15] "90C/41" "93A/41" "94C/43" "97A/43" "98C/45" "101A/45" "102C/47" [22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51"> gsub("[^A-z]", "", vzor)[1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C" [18] "A" "C" "A" "C" "A" "C" "A" "C" "A" Therefore I expected that sub("m5.", "\\1", mena) or sub("m5.", "", mena) selects what I wanted. But it was not the case. Please can you correct me when I try to evaluate your solution? gsub(".*_(m5.).*", "\\1", mena) or gsub(".*(m5.).*", "\\1", mena) .* matches any characters () negation? or matching selection for back reference? Finally the expressin matches whole string and evaluates what is matched by parenthesised value. This evaluation is returned by backreference. Is it correct evaluation? Regards Petr> > On 14.11.2011 10:22, Petr PIKAL wrote: > > Hi > > > >> On 11/14/2011 07:45 PM, Petr PIKAL wrote: > >>> Dear all > >>> > >>> I am again (as usual) lost in regular expression use for selection. > > Here > >>> are my data: > >>> > >>>> dput(mena) > >>> c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", > >>> "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") > >>> > >>> I want to select only values "m" foolowed by numbers from 53 to 59. > >>> > >>> I used > >>> > >>> sub("m5.", "", mena) > >>> > >>> which correctly selects those m53 - m59 values but, in contrary tomy> >>> expectation, it replaced the selected values with specified > > replacement - > >>> in that case empty string. > >>> > >>> What I shall use if I want to get rid of all but m53-m59 from those > >>> strings? > >>> > >> Hi Petr, > >> How about: > >> > >> grep("m5",mena) > > > > It gives numeric values which tells me that there is a match in each > > string, but as a result I need only > > > > m53-m59 substrings. > > > gsub(".*_(m5.).*", "\\1", mena) > > Uwe Ligges > > > > > Regards > > Petr > > > > > > > >> > >> Jim > >> > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.
On 14.11.2011 11:27, Petr PIKAL wrote:> Hi > > Thank you. It is a pure magic, something taught in Unseen University. > > this is what I got as a help for selecting only letters from set of > character vector. > >> vzor > [1] "61A" "62C/27" "65A/27" "66C/29" "69A/29" "70C/31" > "73A/31" > [8] "74C/33" "77A/33" "81A/35" "82C/37" "85A/37" "86C/39" > "89A/39" > [15] "90C/41" "93A/41" "94C/43" "97A/43" "98C/45" "101A/45" > "102C/47" > [22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51" > >> gsub("[^A-z]", "", vzor) > [1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C" > [18] "A" "C" "A" "C" "A" "C" "A" "C" "A" > > Therefore I expected that > > sub("m5.", "\\1", mena) or sub("m5.", "", mena) > > selects what I wanted. But it was not the case. > > Please can you correct me when I try to evaluate your solution? > > gsub(".*_(m5.).*", "\\1", mena) > > or > > gsub(".*(m5.).*", "\\1", mena) > > .* matches any charactersYes.> () negation? or matching selection for back reference?The latter. See books about ergular expressions. I think it is also mentioned in ?regexp and with an example in ?gsub> Finally the expressin matches whole string and evaluates what is matched > by parenthesised value. This evaluation is returned by backreference. > > Is it correct evaluation?Indeed, where \\1 is the first backreference. Best, Uwe> Regards > Petr > >> >> On 14.11.2011 10:22, Petr PIKAL wrote: >>> Hi >>> >>>> On 11/14/2011 07:45 PM, Petr PIKAL wrote: >>>>> Dear all >>>>> >>>>> I am again (as usual) lost in regular expression use for selection. >>> Here >>>>> are my data: >>>>> >>>>>> dput(mena) >>>>> c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m54.00_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m55.00_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m56.00_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m57.00_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m58.00_s1.imp", >>>>> "138516_10g_50ml_50c_250utes1_m59.00_s1.imp") >>>>> >>>>> I want to select only values "m" foolowed by numbers from 53 to 59. >>>>> >>>>> I used >>>>> >>>>> sub("m5.", "", mena) >>>>> >>>>> which correctly selects those m53 - m59 values but, in contrary to > my >>>>> expectation, it replaced the selected values with specified >>> replacement - >>>>> in that case empty string. >>>>> >>>>> What I shall use if I want to get rid of all but m53-m59 from those >>>>> strings? >>>>> >>>> Hi Petr, >>>> How about: >>>> >>>> grep("m5",mena) >>> >>> It gives numeric values which tells me that there is a match in each >>> string, but as a result I need only >>> >>> m53-m59 substrings. >> >> >> gsub(".*_(m5.).*", "\\1", mena) >> >> Uwe Ligges >> >> >> >>> Regards >>> Petr >>> >>> >>> >>>> >>>> Jim >>>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >