Ista Zahn
2016-Sep-07 13:34 UTC
[R] element wise pattern recognition and string substitution
On Tue, Sep 6, 2016 at 11:59 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:> Hi Ista, > > Thanks for the suggestion. I didn't know mapply can be used this way! Let me > take one more step. Instead of defining a pattern for each string, I would > like to define a set of patterns from all the possible combination of the > unique values of those variables. Then I need each string to find a pattern > for itself.Uh, humn, what?!? I have no idea what this means. Example? --Ista I know this is getting a little stretching. Thanks for all the> suggestion/comments from everyone. > > Jun > > On Tue, Sep 6, 2016 at 9:44 PM, Ista Zahn <istazahn at gmail.com> wrote: >> >> If you want to mach each element of 'strings' to a different regex, do >> it. Here are three ways, using your original example. >> >> pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> patterns <- c(pattern1,pattern2) >> strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i])) >> >> mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement >> "\\2")) >> >> library(stringi) >> stri_replace_all_regex(strings, patterns, "$2") >> >> Best, >> Ista >> On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: >> > Hi Jeff, >> > >> > Thanks for the reply. I tried your suggestion and it doesn't seem to >> > work >> > and I tried a simple pattern as follows and it works as expected >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] "3.mg.kg" >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] ">50-70.kg" >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] "P05" >> > >> > My problem is the pattern has to be dynamically constructed on the input >> > data of the function I am writing. It's actually not too difficult to >> > assemble the final.pattern with some code like the following >> > >> > sort.var <- c('TX','WTCUT') >> > combn.sort.var <- do.call(expand.grid, lapply(sort.var, >> > >> > function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))), >> > ')', sep=''))) >> > all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) >> > final.pattern <- paste0(all.patterns, collapse='|') >> > >> > You cannot run the code directly since the data object "all.exposure" is >> > not provided here. >> > >> > Jun >> > >> > >> > >> > On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller >> > <jdnewmil at dcn.davis.ca.us> >> > wrote: >> > >> >> I am not near my computer today, but each parenthesis gets its own >> >> result >> >> number, so you should put the parenthesis around the whole pattern of >> >> alternatives instead of having many parentheses. >> >> >> >> I recommend thinking in terms of what common information you expect to >> >> find in these various strings, and place your parentheses to capture >> >> that >> >> information. There is no other reason to put parentheses in the >> >> pattern... >> >> they are not grouping symbols. >> >> -- >> >> Sent from my phone. Please excuse my brevity. >> >> >> >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter >> >> <bgunter.4567 at gmail.com> >> >> wrote: >> >> >Jun: >> >> > >> >> >1. Tell us your desired result from your test vector and maybe someone >> >> >will help. >> >> > >> >> >2. As we played this game once already (you couldn't do it; I showed >> >> >you how), this seems to be a function of your limitations with regular >> >> >expressions. I'm probably not much better, but in any case, I don't >> >> >intend to be your consultant. See if you can find someone locally to >> >> >help you if you do not receive a satisfactory reply from the list. >> >> >There are many people here who are pretty good at this sort of thing, >> >> >but I don't know if they'll reply. Regex's are certainly complex. PERL >> >> >people tend to be pretty good at them, I believe. There are numerous >> >> >web sites and books on them if you need to acquire expertise for your >> >> >work. >> >> > >> >> >Cheers, >> >> >Bert >> >> >Bert Gunter >> >> > >> >> >"The trouble with having an open mind is that people keep coming along >> >> >and sticking things into it." >> >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> > >> >> > >> >> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at gmail.com> >> >> > wrote: >> >> >> Hi Bert, >> >> >> >> >> >> I still couldn't make the multiple patterns to work. Here is an >> >> >example. I >> >> >> make the pattern as follows >> >> >> >> >> >> final.pattern <- >> >> >> >> >> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> >> >> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ >> >> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. >> >> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ >> >> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ >> >> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. >> >> mg\\.kg)\\.(>110\\.kg)\\.(.*)" >> >> >> >> >> >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', >> >> >> '240.m.g.>50-70.kg.geo.mean') >> >> >> >> >> >> sub(final.pattern, '\\1', test.string) >> >> >> sub(final.pattern, '\\2', test.string) >> >> >> sub(final.pattern, '\\3', test.string) >> >> >> >> >> >> Only the third string has been correctly parsed, which matches the >> >> >first >> >> >> pattern. It seems the rest of the patterns are not called. >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter >> >> >> <bgunter.4567 at gmail.com> >> >> >wrote: >> >> >>> >> >> >>> Just noticed: My clumsy do.call() line in my previously posted code >> >> >>> below should be replaced with: >> >> >>> pat <- paste(pat,collapse = "|") >> >> >>> >> >> >>> >> >> >>> > pat <- c(pat1,pat2) >> >> >>> > paste(pat,collapse="|") >> >> >>> [1] "a+\\.*a+|b+\\.*b+" >> >> >>> >> >> >>> ************ replace this ************************** >> >> >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >>> ******************************************** >> >> >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) >> >> >>> [1] "a.a" "bb" "b.bbb" >> >> >>> >> >> >>> >> >> >>> -- Bert >> >> >>> Bert Gunter >> >> >>> >> >> >>> "The trouble with having an open mind is that people keep coming >> >> >along >> >> >>> and sticking things into it." >> >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >>> >> >> >>> >> >> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter >> >> ><bgunter.4567 at gmail.com> >> >> >>> wrote: >> >> >>> > Jun: >> >> >>> > >> >> >>> > You need to provide a clear specification via regular expressions >> >> >of >> >> >>> > the patterns you wish to match -- at least for me to decipher it. >> >> >>> > Others may be smarter than I, though... >> >> >>> > >> >> >>> > Jeff: Thanks. I have now convinced myself that it can be done (a >> >> >>> > "proof" of sorts): If pat1, pat2,..., patn are m different >> >> >patterns >> >> >>> > (in a vector of patterns) to be matched in a vector of n >> >> >>> > strings, >> >> >>> > where only one of the patterns will match in any string, then >> >> >>> > use >> >> >>> > paste() (probably via do.call()) or otherwise to paste them >> >> >together >> >> >>> > separated by "|" to form the concatenated pattern, pat. Then >> >> >>> > >> >> >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) >> >> >>> > >> >> >>> > should extract the matching pattern in each (perhaps with a >> >> >>> > little >> >> >>> > fiddling due to precedence rules); e.g. >> >> >>> > >> >> >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") >> >> >>> > >> >> >>> >> pat1 <- "a+\\.*a+" >> >> >>> >> pat2 <-"b+\\.*b+" >> >> >>> >> pat <- c(pat1,pat2) >> >> >>> > >> >> >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >>> >> pat >> >> >>> > [1] "a+\\.*a+|b+\\.*b+" >> >> >>> > >> >> >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) >> >> >>> > [1] "a.a" "bb" "b.bbb" >> >> >>> > >> >> >>> > Cheers, >> >> >>> > Bert >> >> >>> > >> >> >>> > >> >> >>> > Bert Gunter >> >> >>> > >> >> >>> > "The trouble with having an open mind is that people keep coming >> >> >along >> >> >>> > and sticking things into it." >> >> >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip >> >> >>> > ) >> >> >>> > >> >> >>> > >> >> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen.ut at gmail.com> >> >> >wrote: >> >> >>> >> Thanks for the reply, Bert. >> >> >>> >> >> >> >>> >> Your solution solves the example. I actually have a more general >> >> >>> >> situation >> >> >>> >> where I have this dot concatenated string from multiple >> >> >variables. The >> >> >>> >> problem is those variables may have values with dots in there. >> >> >The >> >> >>> >> number of >> >> >>> >> dots are not consistent for all values of a variable. So I am >> >> >thinking >> >> >>> >> to >> >> >>> >> define a vector of patterns for the vector of the string and >> >> >hopefully >> >> >>> >> to >> >> >>> >> find a way to use a pattern from the pattern vector for each >> >> >value of >> >> >>> >> the >> >> >>> >> string vector. The only way I can think of is "for" loop, which >> >> >can be >> >> >>> >> slow. >> >> >>> >> Also these are happening in a function I am writing. Just wonder >> >> >if >> >> >>> >> there is >> >> >>> >> another more efficient way. Thanks a lot. >> >> >>> >> >> >> >>> >> Jun >> >> >>> >> >> >> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter >> >> ><bgunter.4567 at gmail.com> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> Well, he did provide an example, and... >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >>> >>> >> >> >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) >> >> >>> >>> [1] "WT.CUT" "tx" >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> ## seems to do what was requested. >> >> >>> >>> >> >> >>> >>> Jeff would have to amplify on his initial statement however: do >> >> >you >> >> >>> >>> mean that separate patterns can always be combined via "|" ? >> >> >>> >>> Or >> >> >>> >>> something deeper? >> >> >>> >>> >> >> >>> >>> Cheers, >> >> >>> >>> Bert >> >> >>> >>> Bert Gunter >> >> >>> >>> >> >> >>> >>> "The trouble with having an open mind is that people keep >> >> >>> >>> coming >> >> >along >> >> >>> >>> and sticking things into it." >> >> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic >> >> >>> >>> strip >> >> >) >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller >> >> >>> >>> <jdnewmil at dcn.davis.ca.us> >> >> >>> >>> wrote: >> >> >>> >>> > Your opening assertion is false. >> >> >>> >>> > >> >> >>> >>> > Provide a reproducible example and someone will demonstrate. >> >> >>> >>> > -- >> >> >>> >>> > Sent from my phone. Please excuse my brevity. >> >> >>> >>> > >> >> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen >> >> >>> >>> > <jun.shen.ut at gmail.com> >> >> >>> >>> > wrote: >> >> >>> >>> >>Dear list, >> >> >>> >>> >> >> >> >>> >>> >>I have a vector of strings that cannot be described by one >> >> >pattern. >> >> >>> >>> >> So >> >> >>> >>> >>let's say I construct a vector of patterns in the same length >> >> >as the >> >> >>> >>> >>vector >> >> >>> >>> >>of strings, can I do the element wise pattern recognition and >> >> >string >> >> >>> >>> >>substitution. >> >> >>> >>> >> >> >> >>> >>> >>For example, >> >> >>> >>> >> >> >> >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> >> >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> >>> >>> >> >> >> >>> >>> >>patterns <- c(pattern1,pattern2) >> >> >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >>> >>> >> >> >> >>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" >> >> >from >> >> >>> >>> >> the >> >> >>> >>> >>second string. If I do >> >> >>> >>> >> >> >> >>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be >> >> >used. >> >> >>> >>> >> >> >> >>> >>> >>looping the patterns doesn't work the way I want. Appreciate >> >> >any >> >> >>> >>> >>comments. >> >> >>> >>> >>Thanks. >> >> >>> >>> >> >> >> >>> >>> >>Jun >> >> >>> >>> >> >> >> >>> >>> >> [[alternative HTML version deleted]] >> >> >>> >>> >> >> >> >>> >>> >>______________________________________________ >> >> >>> >>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >see >> >> >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> >>> >>PLEASE do read the posting guide >> >> >>> >>> >>http://www.R-project.org/posting-guide.html >> >> >>> >>> >>and provide commented, minimal, self-contained, reproducible >> >> >code. >> >> >>> >>> > >> >> >>> >>> > ______________________________________________ >> >> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >see >> >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> >>> > PLEASE do read the posting guide >> >> >>> >>> > http://www.R-project.org/posting-guide.html >> >> >>> >>> > and provide commented, minimal, self-contained, reproducible >> >> >code. >> >> >>> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >
Jun Shen
2016-Sep-09 04:14 UTC
[R] element wise pattern recognition and string substitution
Hi Ista, Imagine we have a data set called "all.exposure" with variables "TX","WTCUT" for a function. The concatenated strings are generated by some procedure within the function (the dot is used as separator, I can't change that). Now I want to parse the strings back to the original values as in "TX" and "WTCUT" (there could be more than two variables). Since the data set is provided by users, I cannot pre-define the pattern. The patterns have to be figured out from the values in "TX" and "WTCUT". It's easy if the values in "TX" or "WTCUT" don't have any "." but much trickier if they do. However, the number of the patterns are limited by the combination of the unique values in "TX" and "WTCUT". All possible patterns can be constructed by the code I posted in this thread. Now I need to figure out a way to match the patterns to the strings so each string can be parsed correctly. I have made some progress... Jun On Wed, Sep 7, 2016 at 9:34 AM, Ista Zahn <istazahn at gmail.com> wrote:> On Tue, Sep 6, 2016 at 11:59 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: > > Hi Ista, > > > > Thanks for the suggestion. I didn't know mapply can be used this way! > Let me > > take one more step. Instead of defining a pattern for each string, I > would > > like to define a set of patterns from all the possible combination of the > > unique values of those variables. Then I need each string to find a > pattern > > for itself. > > Uh, humn, what?!? I have no idea what this means. Example? > > --Ista > > I know this is getting a little stretching. Thanks for all the > > suggestion/comments from everyone. > > > > Jun > > > > On Tue, Sep 6, 2016 at 9:44 PM, Ista Zahn <istazahn at gmail.com> wrote: > >> > >> If you want to mach each element of 'strings' to a different regex, do > >> it. Here are three ways, using your original example. > >> > >> pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > >> pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > >> > >> patterns <- c(pattern1,pattern2) > >> strings <- c('TX.WT.CUT.mean','mg.tx.cv') > >> > >> for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i])) > >> > >> mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement > >> "\\2")) > >> > >> library(stringi) > >> stri_replace_all_regex(strings, patterns, "$2") > >> > >> Best, > >> Ista > >> On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: > >> > Hi Jeff, > >> > > >> > Thanks for the reply. I tried your suggestion and it doesn't seem to > >> > work > >> > and I tried a simple pattern as follows and it works as expected > >> > > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', > >> > "3.mg.kg.>50-70.kg.P05") > >> > [1] "3.mg.kg" > >> > > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', > >> > "3.mg.kg.>50-70.kg.P05") > >> > [1] ">50-70.kg" > >> > > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', > >> > "3.mg.kg.>50-70.kg.P05") > >> > [1] "P05" > >> > > >> > My problem is the pattern has to be dynamically constructed on the > input > >> > data of the function I am writing. It's actually not too difficult to > >> > assemble the final.pattern with some code like the following > >> > > >> > sort.var <- c('TX','WTCUT') > >> > combn.sort.var <- do.call(expand.grid, lapply(sort.var, > >> > > >> > function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all. > exposure[x]))), > >> > ')', sep=''))) > >> > all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) > >> > final.pattern <- paste0(all.patterns, collapse='|') > >> > > >> > You cannot run the code directly since the data object "all.exposure" > is > >> > not provided here. > >> > > >> > Jun > >> > > >> > > >> > > >> > On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller > >> > <jdnewmil at dcn.davis.ca.us> > >> > wrote: > >> > > >> >> I am not near my computer today, but each parenthesis gets its own > >> >> result > >> >> number, so you should put the parenthesis around the whole pattern of > >> >> alternatives instead of having many parentheses. > >> >> > >> >> I recommend thinking in terms of what common information you expect > to > >> >> find in these various strings, and place your parentheses to capture > >> >> that > >> >> information. There is no other reason to put parentheses in the > >> >> pattern... > >> >> they are not grouping symbols. > >> >> -- > >> >> Sent from my phone. Please excuse my brevity. > >> >> > >> >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter > >> >> <bgunter.4567 at gmail.com> > >> >> wrote: > >> >> >Jun: > >> >> > > >> >> >1. Tell us your desired result from your test vector and maybe > someone > >> >> >will help. > >> >> > > >> >> >2. As we played this game once already (you couldn't do it; I showed > >> >> >you how), this seems to be a function of your limitations with > regular > >> >> >expressions. I'm probably not much better, but in any case, I don't > >> >> >intend to be your consultant. See if you can find someone locally to > >> >> >help you if you do not receive a satisfactory reply from the list. > >> >> >There are many people here who are pretty good at this sort of > thing, > >> >> >but I don't know if they'll reply. Regex's are certainly complex. > PERL > >> >> >people tend to be pretty good at them, I believe. There are numerous > >> >> >web sites and books on them if you need to acquire expertise for > your > >> >> >work. > >> >> > > >> >> >Cheers, > >> >> >Bert > >> >> >Bert Gunter > >> >> > > >> >> >"The trouble with having an open mind is that people keep coming > along > >> >> >and sticking things into it." > >> >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> >> > > >> >> > > >> >> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at gmail.com> > >> >> > wrote: > >> >> >> Hi Bert, > >> >> >> > >> >> >> I still couldn't make the multiple patterns to work. Here is an > >> >> >example. I > >> >> >> make the pattern as follows > >> >> >> > >> >> >> final.pattern <- > >> >> >> > >> >> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> > >> >> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ > >> >> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. > >> >> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ > >> >> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ > >> >> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. > >> >> mg\\.kg)\\.(>110\\.kg)\\.(.*)" > >> >> >> > >> >> >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg > .>110.kg.P05', > >> >> >> '240.m.g.>50-70.kg.geo.mean') > >> >> >> > >> >> >> sub(final.pattern, '\\1', test.string) > >> >> >> sub(final.pattern, '\\2', test.string) > >> >> >> sub(final.pattern, '\\3', test.string) > >> >> >> > >> >> >> Only the third string has been correctly parsed, which matches the > >> >> >first > >> >> >> pattern. It seems the rest of the patterns are not called. > >> >> >> > >> >> >> Jun > >> >> >> > >> >> >> > >> >> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter > >> >> >> <bgunter.4567 at gmail.com> > >> >> >wrote: > >> >> >>> > >> >> >>> Just noticed: My clumsy do.call() line in my previously posted > code > >> >> >>> below should be replaced with: > >> >> >>> pat <- paste(pat,collapse = "|") > >> >> >>> > >> >> >>> > >> >> >>> > pat <- c(pat1,pat2) > >> >> >>> > paste(pat,collapse="|") > >> >> >>> [1] "a+\\.*a+|b+\\.*b+" > >> >> >>> > >> >> >>> ************ replace this ************************** > >> >> >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) > >> >> >>> ******************************************** > >> >> >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) > >> >> >>> [1] "a.a" "bb" "b.bbb" > >> >> >>> > >> >> >>> > >> >> >>> -- Bert > >> >> >>> Bert Gunter > >> >> >>> > >> >> >>> "The trouble with having an open mind is that people keep coming > >> >> >along > >> >> >>> and sticking things into it." > >> >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip > ) > >> >> >>> > >> >> >>> > >> >> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter > >> >> ><bgunter.4567 at gmail.com> > >> >> >>> wrote: > >> >> >>> > Jun: > >> >> >>> > > >> >> >>> > You need to provide a clear specification via regular > expressions > >> >> >of > >> >> >>> > the patterns you wish to match -- at least for me to decipher > it. > >> >> >>> > Others may be smarter than I, though... > >> >> >>> > > >> >> >>> > Jeff: Thanks. I have now convinced myself that it can be done > (a > >> >> >>> > "proof" of sorts): If pat1, pat2,..., patn are m different > >> >> >patterns > >> >> >>> > (in a vector of patterns) to be matched in a vector of n > >> >> >>> > strings, > >> >> >>> > where only one of the patterns will match in any string, then > >> >> >>> > use > >> >> >>> > paste() (probably via do.call()) or otherwise to paste them > >> >> >together > >> >> >>> > separated by "|" to form the concatenated pattern, pat. Then > >> >> >>> > > >> >> >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) > >> >> >>> > > >> >> >>> > should extract the matching pattern in each (perhaps with a > >> >> >>> > little > >> >> >>> > fiddling due to precedence rules); e.g. > >> >> >>> > > >> >> >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") > >> >> >>> > > >> >> >>> >> pat1 <- "a+\\.*a+" > >> >> >>> >> pat2 <-"b+\\.*b+" > >> >> >>> >> pat <- c(pat1,pat2) > >> >> >>> > > >> >> >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) > >> >> >>> >> pat > >> >> >>> > [1] "a+\\.*a+|b+\\.*b+" > >> >> >>> > > >> >> >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) > >> >> >>> > [1] "a.a" "bb" "b.bbb" > >> >> >>> > > >> >> >>> > Cheers, > >> >> >>> > Bert > >> >> >>> > > >> >> >>> > > >> >> >>> > Bert Gunter > >> >> >>> > > >> >> >>> > "The trouble with having an open mind is that people keep > coming > >> >> >along > >> >> >>> > and sticking things into it." > >> >> >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic > strip > >> >> >>> > ) > >> >> >>> > > >> >> >>> > > >> >> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen < > jun.shen.ut at gmail.com> > >> >> >wrote: > >> >> >>> >> Thanks for the reply, Bert. > >> >> >>> >> > >> >> >>> >> Your solution solves the example. I actually have a more > general > >> >> >>> >> situation > >> >> >>> >> where I have this dot concatenated string from multiple > >> >> >variables. The > >> >> >>> >> problem is those variables may have values with dots in there. > >> >> >The > >> >> >>> >> number of > >> >> >>> >> dots are not consistent for all values of a variable. So I am > >> >> >thinking > >> >> >>> >> to > >> >> >>> >> define a vector of patterns for the vector of the string and > >> >> >hopefully > >> >> >>> >> to > >> >> >>> >> find a way to use a pattern from the pattern vector for each > >> >> >value of > >> >> >>> >> the > >> >> >>> >> string vector. The only way I can think of is "for" loop, > which > >> >> >can be > >> >> >>> >> slow. > >> >> >>> >> Also these are happening in a function I am writing. Just > wonder > >> >> >if > >> >> >>> >> there is > >> >> >>> >> another more efficient way. Thanks a lot. > >> >> >>> >> > >> >> >>> >> Jun > >> >> >>> >> > >> >> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter > >> >> ><bgunter.4567 at gmail.com> > >> >> >>> >> wrote: > >> >> >>> >>> > >> >> >>> >>> Well, he did provide an example, and... > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >> >>> >>> > >> >> >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) > >> >> >>> >>> [1] "WT.CUT" "tx" > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> ## seems to do what was requested. > >> >> >>> >>> > >> >> >>> >>> Jeff would have to amplify on his initial statement however: > do > >> >> >you > >> >> >>> >>> mean that separate patterns can always be combined via "|" ? > >> >> >>> >>> Or > >> >> >>> >>> something deeper? > >> >> >>> >>> > >> >> >>> >>> Cheers, > >> >> >>> >>> Bert > >> >> >>> >>> Bert Gunter > >> >> >>> >>> > >> >> >>> >>> "The trouble with having an open mind is that people keep > >> >> >>> >>> coming > >> >> >along > >> >> >>> >>> and sticking things into it." > >> >> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic > >> >> >>> >>> strip > >> >> >) > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller > >> >> >>> >>> <jdnewmil at dcn.davis.ca.us> > >> >> >>> >>> wrote: > >> >> >>> >>> > Your opening assertion is false. > >> >> >>> >>> > > >> >> >>> >>> > Provide a reproducible example and someone will > demonstrate. > >> >> >>> >>> > -- > >> >> >>> >>> > Sent from my phone. Please excuse my brevity. > >> >> >>> >>> > > >> >> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen > >> >> >>> >>> > <jun.shen.ut at gmail.com> > >> >> >>> >>> > wrote: > >> >> >>> >>> >>Dear list, > >> >> >>> >>> >> > >> >> >>> >>> >>I have a vector of strings that cannot be described by one > >> >> >pattern. > >> >> >>> >>> >> So > >> >> >>> >>> >>let's say I construct a vector of patterns in the same > length > >> >> >as the > >> >> >>> >>> >>vector > >> >> >>> >>> >>of strings, can I do the element wise pattern recognition > and > >> >> >string > >> >> >>> >>> >>substitution. > >> >> >>> >>> >> > >> >> >>> >>> >>For example, > >> >> >>> >>> >> > >> >> >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > >> >> >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > >> >> >>> >>> >> > >> >> >>> >>> >>patterns <- c(pattern1,pattern2) > >> >> >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >> >>> >>> >> > >> >> >>> >>> >>Say I want to extract "WT.CUT" from the first string and > "tx" > >> >> >from > >> >> >>> >>> >> the > >> >> >>> >>> >>second string. If I do > >> >> >>> >>> >> > >> >> >>> >>> >>sub(patterns, '\\2', strings), only the first pattern will > be > >> >> >used. > >> >> >>> >>> >> > >> >> >>> >>> >>looping the patterns doesn't work the way I want. > Appreciate > >> >> >any > >> >> >>> >>> >>comments. > >> >> >>> >>> >>Thanks. > >> >> >>> >>> >> > >> >> >>> >>> >>Jun > >> >> >>> >>> >> > >> >> >>> >>> >> [[alternative HTML version deleted]] > >> >> >>> >>> >> > >> >> >>> >>> >>______________________________________________ > >> >> >>> >>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, > >> >> >see > >> >> >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help > >> >> >>> >>> >>PLEASE do read the posting guide > >> >> >>> >>> >>http://www.R-project.org/posting-guide.html > >> >> >>> >>> >>and provide commented, minimal, self-contained, > reproducible > >> >> >code. > >> >> >>> >>> > > >> >> >>> >>> > ______________________________________________ > >> >> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, > >> >> >see > >> >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> >>> >>> > PLEASE do read the posting guide > >> >> >>> >>> > http://www.R-project.org/posting-guide.html > >> >> >>> >>> > and provide commented, minimal, self-contained, > reproducible > >> >> >code. > >> >> >>> >> > >> >> >>> >> > >> >> >> > >> >> >> > >> >> > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > > > >[[alternative HTML version deleted]]
Ista Zahn
2016-Sep-09 12:29 UTC
[R] element wise pattern recognition and string substitution
On Sep 9, 2016 12:14 AM, "Jun Shen" <jun.shen.ut at gmail.com> wrote:> > Hi Ista, > > Imagine we have a data set called "all.exposure" with variables"TX","WTCUT" for a function. I don't think imagining your situation is the best way. Make an example so we can actually see what you are working with. The concatenated strings are generated by some procedure within the function (the dot is used as separator, I can't change that). Now I want to parse the strings back to the original values as in "TX" and "WTCUT" (there could be more than two variables). Since the data set is provided by users, I cannot pre-define the pattern. The patterns have to be figured out from the values in "TX" and "WTCUT". It's easy if the values in "TX" or "WTCUT" don't have any "." but much trickier if they do. However, the number of the patterns are limited by the combination of the unique values in "TX" and "WTCUT". All possible patterns can be constructed by the code I posted in this thread. Now I need to figure out a way to match the patterns to the strings so each string can be parsed correctly. I have made some progress...> > Jun > > On Wed, Sep 7, 2016 at 9:34 AM, Ista Zahn <istazahn at gmail.com> wrote: >> >> On Tue, Sep 6, 2016 at 11:59 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: >> > Hi Ista, >> > >> > Thanks for the suggestion. I didn't know mapply can be used this way!Let me>> > take one more step. Instead of defining a pattern for each string, Iwould>> > like to define a set of patterns from all the possible combination ofthe>> > unique values of those variables. Then I need each string to find apattern>> > for itself. >> >> Uh, humn, what?!? I have no idea what this means. Example? >> >> --Ista >> >> I know this is getting a little stretching. Thanks for all the >> > suggestion/comments from everyone. >> > >> > Jun >> > >> > On Tue, Sep 6, 2016 at 9:44 PM, Ista Zahn <istazahn at gmail.com> wrote: >> >> >> >> If you want to mach each element of 'strings' to a different regex, do >> >> it. Here are three ways, using your original example. >> >> >> >> pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> >> pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> >> >> patterns <- c(pattern1,pattern2) >> >> strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >> >> for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i])) >> >> >> >> mapply(sub, pattern = patterns, x = strings,MoreArgs=list(replacement >> >> "\\2"))>> >> >> >> library(stringi) >> >> stri_replace_all_regex(strings, patterns, "$2") >> >> >> >> Best, >> >> Ista >> >> On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com>wrote:>> >> > Hi Jeff, >> >> > >> >> > Thanks for the reply. I tried your suggestion and it doesn't seem to >> >> > work >> >> > and I tried a simple pattern as follows and it works as expected >> >> > >> >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', >> >> > "3.mg.kg.>50-70.kg.P05") >> >> > [1] "3.mg.kg" >> >> > >> >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', >> >> > "3.mg.kg.>50-70.kg.P05") >> >> > [1] ">50-70.kg" >> >> > >> >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', >> >> > "3.mg.kg.>50-70.kg.P05") >> >> > [1] "P05" >> >> > >> >> > My problem is the pattern has to be dynamically constructed on theinput>> >> > data of the function I am writing. It's actually not too difficultto>> >> > assemble the final.pattern with some code like the following >> >> > >> >> > sort.var <- c('TX','WTCUT') >> >> > combn.sort.var <- do.call(expand.grid, lapply(sort.var, >> >> > >> >> >function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),>> >> > ')', sep=''))) >> >> > all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) >> >> > final.pattern <- paste0(all.patterns, collapse='|') >> >> > >> >> > You cannot run the code directly since the data object"all.exposure" is>> >> > not provided here. >> >> > >> >> > Jun >> >> > >> >> > >> >> > >> >> > On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller >> >> > <jdnewmil at dcn.davis.ca.us> >> >> > wrote: >> >> > >> >> >> I am not near my computer today, but each parenthesis gets its own >> >> >> result >> >> >> number, so you should put the parenthesis around the whole patternof>> >> >> alternatives instead of having many parentheses. >> >> >> >> >> >> I recommend thinking in terms of what common information youexpect to>> >> >> find in these various strings, and place your parentheses tocapture>> >> >> that >> >> >> information. There is no other reason to put parentheses in the >> >> >> pattern... >> >> >> they are not grouping symbols. >> >> >> -- >> >> >> Sent from my phone. Please excuse my brevity. >> >> >> >> >> >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter >> >> >> <bgunter.4567 at gmail.com> >> >> >> wrote: >> >> >> >Jun: >> >> >> > >> >> >> >1. Tell us your desired result from your test vector and maybesomeone>> >> >> >will help. >> >> >> > >> >> >> >2. As we played this game once already (you couldn't do it; Ishowed>> >> >> >you how), this seems to be a function of your limitations withregular>> >> >> >expressions. I'm probably not much better, but in any case, Idon't>> >> >> >intend to be your consultant. See if you can find someone locallyto>> >> >> >help you if you do not receive a satisfactory reply from the list. >> >> >> >There are many people here who are pretty good at this sort ofthing,>> >> >> >but I don't know if they'll reply. Regex's are certainly complex.PERL>> >> >> >people tend to be pretty good at them, I believe. There arenumerous>> >> >> >web sites and books on them if you need to acquire expertise foryour>> >> >> >work. >> >> >> > >> >> >> >Cheers, >> >> >> >Bert >> >> >> >Bert Gunter >> >> >> > >> >> >> >"The trouble with having an open mind is that people keep comingalong>> >> >> >and sticking things into it." >> >> >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> > >> >> >> > >> >> >> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at gmail.com> >> >> >> > wrote: >> >> >> >> Hi Bert, >> >> >> >> >> >> >> >> I still couldn't make the multiple patterns to work. Here is an >> >> >> >example. I >> >> >> >> make the pattern as follows >> >> >> >> >> >> >> >> final.pattern <- >> >> >> >> >> >> >> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> >> >> >> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ >> >> >> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. >> >> >> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ >> >> >> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ >> >> >> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. >> >> >> mg\\.kg)\\.(>110\\.kg)\\.(.*)" >> >> >> >> >> >> >> >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05',>> >> >> >> '240.m.g.>50-70.kg.geo.mean') >> >> >> >> >> >> >> >> sub(final.pattern, '\\1', test.string) >> >> >> >> sub(final.pattern, '\\2', test.string) >> >> >> >> sub(final.pattern, '\\3', test.string) >> >> >> >> >> >> >> >> Only the third string has been correctly parsed, which matchesthe>> >> >> >first >> >> >> >> pattern. It seems the rest of the patterns are not called. >> >> >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter >> >> >> >> <bgunter.4567 at gmail.com> >> >> >> >wrote: >> >> >> >>> >> >> >> >>> Just noticed: My clumsy do.call() line in my previously postedcode>> >> >> >>> below should be replaced with: >> >> >> >>> pat <- paste(pat,collapse = "|") >> >> >> >>> >> >> >> >>> >> >> >> >>> > pat <- c(pat1,pat2) >> >> >> >>> > paste(pat,collapse="|") >> >> >> >>> [1] "a+\\.*a+|b+\\.*b+" >> >> >> >>> >> >> >> >>> ************ replace this ************************** >> >> >> >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >> >>> ******************************************** >> >> >> >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) >> >> >> >>> [1] "a.a" "bb" "b.bbb" >> >> >> >>> >> >> >> >>> >> >> >> >>> -- Bert >> >> >> >>> Bert Gunter >> >> >> >>> >> >> >> >>> "The trouble with having an open mind is that people keepcoming>> >> >> >along >> >> >> >>> and sticking things into it." >> >> >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comicstrip )>> >> >> >>> >> >> >> >>> >> >> >> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter >> >> >> ><bgunter.4567 at gmail.com> >> >> >> >>> wrote: >> >> >> >>> > Jun: >> >> >> >>> > >> >> >> >>> > You need to provide a clear specification via regularexpressions>> >> >> >of >> >> >> >>> > the patterns you wish to match -- at least for me todecipher it.>> >> >> >>> > Others may be smarter than I, though... >> >> >> >>> > >> >> >> >>> > Jeff: Thanks. I have now convinced myself that it can bedone (a>> >> >> >>> > "proof" of sorts): If pat1, pat2,..., patn are m different >> >> >> >patterns >> >> >> >>> > (in a vector of patterns) to be matched in a vector of n >> >> >> >>> > strings, >> >> >> >>> > where only one of the patterns will match in any string,then>> >> >> >>> > use >> >> >> >>> > paste() (probably via do.call()) or otherwise to paste them >> >> >> >together >> >> >> >>> > separated by "|" to form the concatenated pattern, pat. Then >> >> >> >>> > >> >> >> >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) >> >> >> >>> > >> >> >> >>> > should extract the matching pattern in each (perhaps with a >> >> >> >>> > little >> >> >> >>> > fiddling due to precedence rules); e.g. >> >> >> >>> > >> >> >> >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") >> >> >> >>> > >> >> >> >>> >> pat1 <- "a+\\.*a+" >> >> >> >>> >> pat2 <-"b+\\.*b+" >> >> >> >>> >> pat <- c(pat1,pat2) >> >> >> >>> > >> >> >> >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >> >>> >> pat >> >> >> >>> > [1] "a+\\.*a+|b+\\.*b+" >> >> >> >>> > >> >> >> >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) >> >> >> >>> > [1] "a.a" "bb" "b.bbb" >> >> >> >>> > >> >> >> >>> > Cheers, >> >> >> >>> > Bert >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > Bert Gunter >> >> >> >>> > >> >> >> >>> > "The trouble with having an open mind is that people keepcoming>> >> >> >along >> >> >> >>> > and sticking things into it." >> >> >> >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comicstrip>> >> >> >>> > ) >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen.ut at gmail.com>>> >> >> >wrote: >> >> >> >>> >> Thanks for the reply, Bert. >> >> >> >>> >> >> >> >> >>> >> Your solution solves the example. I actually have a moregeneral>> >> >> >>> >> situation >> >> >> >>> >> where I have this dot concatenated string from multiple >> >> >> >variables. The >> >> >> >>> >> problem is those variables may have values with dots inthere.>> >> >> >The >> >> >> >>> >> number of >> >> >> >>> >> dots are not consistent for all values of a variable. So Iam>> >> >> >thinking >> >> >> >>> >> to >> >> >> >>> >> define a vector of patterns for the vector of the string and >> >> >> >hopefully >> >> >> >>> >> to >> >> >> >>> >> find a way to use a pattern from the pattern vector for each >> >> >> >value of >> >> >> >>> >> the >> >> >> >>> >> string vector. The only way I can think of is "for" loop,which>> >> >> >can be >> >> >> >>> >> slow. >> >> >> >>> >> Also these are happening in a function I am writing. Justwonder>> >> >> >if >> >> >> >>> >> there is >> >> >> >>> >> another more efficient way. Thanks a lot. >> >> >> >>> >> >> >> >> >>> >> Jun >> >> >> >>> >> >> >> >> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter >> >> >> ><bgunter.4567 at gmail.com> >> >> >> >>> >> wrote: >> >> >> >>> >>> >> >> >> >>> >>> Well, he did provide an example, and... >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >> >>> >>> >> >> >> >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) >> >> >> >>> >>> [1] "WT.CUT" "tx" >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> ## seems to do what was requested. >> >> >> >>> >>> >> >> >> >>> >>> Jeff would have to amplify on his initial statementhowever: do>> >> >> >you >> >> >> >>> >>> mean that separate patterns can always be combined via "|"?>> >> >> >>> >>> Or >> >> >> >>> >>> something deeper? >> >> >> >>> >>> >> >> >> >>> >>> Cheers, >> >> >> >>> >>> Bert >> >> >> >>> >>> Bert Gunter >> >> >> >>> >>> >> >> >> >>> >>> "The trouble with having an open mind is that people keep >> >> >> >>> >>> coming >> >> >> >along >> >> >> >>> >>> and sticking things into it." >> >> >> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic >> >> >> >>> >>> strip >> >> >> >) >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller >> >> >> >>> >>> <jdnewmil at dcn.davis.ca.us> >> >> >> >>> >>> wrote: >> >> >> >>> >>> > Your opening assertion is false. >> >> >> >>> >>> > >> >> >> >>> >>> > Provide a reproducible example and someone willdemonstrate.>> >> >> >>> >>> > -- >> >> >> >>> >>> > Sent from my phone. Please excuse my brevity. >> >> >> >>> >>> > >> >> >> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen >> >> >> >>> >>> > <jun.shen.ut at gmail.com> >> >> >> >>> >>> > wrote: >> >> >> >>> >>> >>Dear list, >> >> >> >>> >>> >> >> >> >> >>> >>> >>I have a vector of strings that cannot be described byone>> >> >> >pattern. >> >> >> >>> >>> >> So >> >> >> >>> >>> >>let's say I construct a vector of patterns in the samelength>> >> >> >as the >> >> >> >>> >>> >>vector >> >> >> >>> >>> >>of strings, can I do the element wise patternrecognition and>> >> >> >string >> >> >> >>> >>> >>substitution. >> >> >> >>> >>> >> >> >> >> >>> >>> >>For example, >> >> >> >>> >>> >> >> >> >> >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> >> >> >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> >> >>> >>> >> >> >> >> >>> >>> >>patterns <- c(pattern1,pattern2) >> >> >> >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >> >>> >>> >> >> >> >> >>> >>> >>Say I want to extract "WT.CUT" from the first string and"tx">> >> >> >from >> >> >> >>> >>> >> the >> >> >> >>> >>> >>second string. If I do >> >> >> >>> >>> >> >> >> >> >>> >>> >>sub(patterns, '\\2', strings), only the first patternwill be>> >> >> >used. >> >> >> >>> >>> >> >> >> >> >>> >>> >>looping the patterns doesn't work the way I want.Appreciate>> >> >> >any >> >> >> >>> >>> >>comments. >> >> >> >>> >>> >>Thanks. >> >> >> >>> >>> >> >> >> >> >>> >>> >>Jun >> >> >> >>> >>> >> >> >> >> >>> >>> >> [[alternative HTML version deleted]] >> >> >> >>> >>> >> >> >> >> >>> >>> >>______________________________________________ >> >> >> >>> >>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE andmore,>> >> >> >see >> >> >> >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> >>> >>> >>PLEASE do read the posting guide >> >> >> >>> >>> >>http://www.R-project.org/posting-guide.html >> >> >> >>> >>> >>and provide commented, minimal, self-contained,reproducible>> >> >> >code. >> >> >> >>> >>> > >> >> >> >>> >>> > ______________________________________________ >> >> >> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE andmore,>> >> >> >see >> >> >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> >>> >>> > PLEASE do read the posting guide >> >> >> >>> >>> > http://www.R-project.org/posting-guide.html >> >> >> >>> >>> > and provide commented, minimal, self-contained,reproducible>> >> >> >code. >> >> >> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide >> >> > http://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > > >[[alternative HTML version deleted]]