Ista Zahn
2016-Sep-07 01:44 UTC
[R] element wise pattern recognition and string substitution
If you want to mach each element of 'strings' to a different regex, do
it. Here are three ways, using your original example.
pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
patterns <- c(pattern1,pattern2)
strings <- c('TX.WT.CUT.mean','mg.tx.cv')
for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i]))
mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement =
"\\2"))
library(stringi)
stri_replace_all_regex(strings, patterns, "$2")
Best,
Ista
On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:> Hi Jeff,
>
> Thanks for the reply. I tried your suggestion and it doesn't seem to
work
> and I tried a simple pattern as follows and it works as expected
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1',
"3.mg.kg.>50-70.kg.P05")
> [1] "3.mg.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2',
"3.mg.kg.>50-70.kg.P05")
> [1] ">50-70.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3',
"3.mg.kg.>50-70.kg.P05")
> [1] "P05"
>
> My problem is the pattern has to be dynamically constructed on the input
> data of the function I am writing. It's actually not too difficult to
> assemble the final.pattern with some code like the following
>
> sort.var <- c('TX','WTCUT')
> combn.sort.var <- do.call(expand.grid, lapply(sort.var,
>
function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),
> ')', sep='')))
> all.patterns <- do.call(paste, c(combn.sort.var, '(.*)',
sep='\\.'))
> final.pattern <- paste0(all.patterns, collapse='|')
>
> You cannot run the code directly since the data object
"all.exposure" is
> not provided here.
>
> Jun
>
>
>
> On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
> wrote:
>
>> I am not near my computer today, but each parenthesis gets its own
result
>> number, so you should put the parenthesis around the whole pattern of
>> alternatives instead of having many parentheses.
>>
>> I recommend thinking in terms of what common information you expect to
>> find in these various strings, and place your parentheses to capture
that
>> information. There is no other reason to put parentheses in the
pattern...
>> they are not grouping symbols.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>> >Jun:
>> >
>> >1. Tell us your desired result from your test vector and maybe
someone
>> >will help.
>> >
>> >2. As we played this game once already (you couldn't do it; I
showed
>> >you how), this seems to be a function of your limitations with
regular
>> >expressions. I'm probably not much better, but in any case, I
don't
>> >intend to be your consultant. See if you can find someone locally
to
>> >help you if you do not receive a satisfactory reply from the list.
>> >There are many people here who are pretty good at this sort of
thing,
>> >but I don't know if they'll reply. Regex's are
certainly complex. PERL
>> >people tend to be pretty good at them, I believe. There are
numerous
>> >web sites and books on them if you need to acquire expertise for
your
>> >work.
>> >
>> >Cheers,
>> >Bert
>> >Bert Gunter
>> >
>> >"The trouble with having an open mind is that people keep
coming along
>> >and sticking things into it."
>> >-- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>> >
>> >
>> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at
gmail.com> wrote:
>> >> Hi Bert,
>> >>
>> >> I still couldn't make the multiple patterns to work. Here
is an
>> >example. I
>> >> make the pattern as follows
>> >>
>> >> final.pattern <-
>> >>
>> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>
>> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\
>> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.
>> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\
>> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\
>> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.
>> mg\\.kg)\\.(>110\\.kg)\\.(.*)"
>> >>
>> >> test.string <- c('240.m.g.>110.kg.geo.mean',
'3.mg.kg.>110.kg.P05',
>> >> '240.m.g.>50-70.kg.geo.mean')
>> >>
>> >> sub(final.pattern, '\\1', test.string)
>> >> sub(final.pattern, '\\2', test.string)
>> >> sub(final.pattern, '\\3', test.string)
>> >>
>> >> Only the third string has been correctly parsed, which matches
the
>> >first
>> >> pattern. It seems the rest of the patterns are not called.
>> >>
>> >> Jun
>> >>
>> >>
>> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4567
at gmail.com>
>> >wrote:
>> >>>
>> >>> Just noticed: My clumsy do.call() line in my previously
posted code
>> >>> below should be replaced with:
>> >>> pat <- paste(pat,collapse = "|")
>> >>>
>> >>>
>> >>> > pat <- c(pat1,pat2)
>> >>> > paste(pat,collapse="|")
>> >>> [1] "a+\\.*a+|b+\\.*b+"
>> >>>
>> >>> ************ replace this **************************
>> >>> > pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> ********************************************
>> >>> >
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
>> >>> [1] "a.a" "bb" "b.bbb"
>> >>>
>> >>>
>> >>> -- Bert
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that people
keep coming
>> >along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>> >>>
>> >>>
>> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> wrote:
>> >>> > Jun:
>> >>> >
>> >>> > You need to provide a clear specification via regular
expressions
>> >of
>> >>> > the patterns you wish to match -- at least for me to
decipher it.
>> >>> > Others may be smarter than I, though...
>> >>> >
>> >>> > Jeff: Thanks. I have now convinced myself that it can
be done (a
>> >>> > "proof" of sorts): If pat1, pat2,..., patn
are m different
>> >patterns
>> >>> > (in a vector of patterns) to be matched in a vector
of n strings,
>> >>> > where only one of the patterns will match in any
string, then use
>> >>> > paste() (probably via do.call()) or otherwise to
paste them
>> >together
>> >>> > separated by "|" to form the concatenated
pattern, pat. Then
>> >>> >
>> >>> > sub(paste0("^.*(",pat,
").*$"),"\\1",thevector)
>> >>> >
>> >>> > should extract the matching pattern in each (perhaps
with a little
>> >>> > fiddling due to precedence rules); e.g.
>> >>> >
>> >>> >> z <-c(".fg.h.g.a.a",
"bb..dd.ef.tgf.", "foo...b.bbb.tgy")
>> >>> >
>> >>> >> pat1 <- "a+\\.*a+"
>> >>> >> pat2 <-"b+\\.*b+"
>> >>> >> pat <- c(pat1,pat2)
>> >>> >
>> >>> >> pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> >> pat
>> >>> > [1] "a+\\.*a+|b+\\.*b+"
>> >>> >
>> >>> >>
sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
>> >>> > [1] "a.a" "bb"
"b.bbb"
>> >>> >
>> >>> > Cheers,
>> >>> > Bert
>> >>> >
>> >>> >
>> >>> > Bert Gunter
>> >>> >
>> >>> > "The trouble with having an open mind is that
people keep coming
>> >along
>> >>> > and sticking things into it."
>> >>> > -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>> >>> >
>> >>> >
>> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen
<jun.shen.ut at gmail.com>
>> >wrote:
>> >>> >> Thanks for the reply, Bert.
>> >>> >>
>> >>> >> Your solution solves the example. I actually have
a more general
>> >>> >> situation
>> >>> >> where I have this dot concatenated string from
multiple
>> >variables. The
>> >>> >> problem is those variables may have values with
dots in there.
>> >The
>> >>> >> number of
>> >>> >> dots are not consistent for all values of a
variable. So I am
>> >thinking
>> >>> >> to
>> >>> >> define a vector of patterns for the vector of the
string and
>> >hopefully
>> >>> >> to
>> >>> >> find a way to use a pattern from the pattern
vector for each
>> >value of
>> >>> >> the
>> >>> >> string vector. The only way I can think of is
"for" loop, which
>> >can be
>> >>> >> slow.
>> >>> >> Also these are happening in a function I am
writing. Just wonder
>> >if
>> >>> >> there is
>> >>> >> another more efficient way. Thanks a lot.
>> >>> >>
>> >>> >> Jun
>> >>> >>
>> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Well, he did provide an example, and...
>> >>> >>>
>> >>> >>>
>> >>> >>> > z <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>>
>> >>> >>> >
sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>> >>> >>> [1] "WT.CUT" "tx"
>> >>> >>>
>> >>> >>>
>> >>> >>> ## seems to do what was requested.
>> >>> >>>
>> >>> >>> Jeff would have to amplify on his initial
statement however: do
>> >you
>> >>> >>> mean that separate patterns can always be
combined via "|" ? Or
>> >>> >>> something deeper?
>> >>> >>>
>> >>> >>> Cheers,
>> >>> >>> Bert
>> >>> >>> Bert Gunter
>> >>> >>>
>> >>> >>> "The trouble with having an open mind is
that people keep coming
>> >along
>> >>> >>> and sticking things into it."
>> >>> >>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic strip
>> >)
>> >>> >>>
>> >>> >>>
>> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff
Newmiller
>> >>> >>> <jdnewmil at dcn.davis.ca.us>
>> >>> >>> wrote:
>> >>> >>> > Your opening assertion is false.
>> >>> >>> >
>> >>> >>> > Provide a reproducible example and
someone will demonstrate.
>> >>> >>> > --
>> >>> >>> > Sent from my phone. Please excuse my
brevity.
>> >>> >>> >
>> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun
Shen
>> >>> >>> > <jun.shen.ut at gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >>Dear list,
>> >>> >>> >>
>> >>> >>> >>I have a vector of strings that
cannot be described by one
>> >pattern.
>> >>> >>> >> So
>> >>> >>> >>let's say I construct a vector of
patterns in the same length
>> >as the
>> >>> >>> >>vector
>> >>> >>> >>of strings, can I do the element wise
pattern recognition and
>> >string
>> >>> >>> >>substitution.
>> >>> >>> >>
>> >>> >>> >>For example,
>> >>> >>> >>
>> >>> >>> >>pattern1 <-
"([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>> >>> >>> >>pattern2 <-
"([^.]*)\\.([^.]*)\\.(.*)"
>> >>> >>> >>
>> >>> >>> >>patterns <- c(pattern1,pattern2)
>> >>> >>> >>strings <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>> >>
>> >>> >>> >>Say I want to extract
"WT.CUT" from the first string and "tx"
>> >from
>> >>> >>> >> the
>> >>> >>> >>second string. If I do
>> >>> >>> >>
>> >>> >>> >>sub(patterns, '\\2',
strings), only the first pattern will be
>> >used.
>> >>> >>> >>
>> >>> >>> >>looping the patterns doesn't work
the way I want. Appreciate
>> >any
>> >>> >>> >>comments.
>> >>> >>> >>Thanks.
>> >>> >>> >>
>> >>> >>> >>Jun
>> >>> >>> >>
>> >>> >>> >> [[alternative HTML version
deleted]]
>> >>> >>> >>
>> >>> >>>
>>______________________________________________
>> >>> >>> >>R-help at r-project.org mailing list
-- To UNSUBSCRIBE and more,
>> >see
>> >>> >>>
>>https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> >>PLEASE do read the posting guide
>> >>> >>>
>>http://www.R-project.org/posting-guide.html
>> >>> >>> >>and provide commented, minimal,
self-contained, reproducible
>> >code.
>> >>> >>> >
>> >>> >>> >
______________________________________________
>> >>> >>> > R-help at r-project.org mailing list --
To UNSUBSCRIBE and more,
>> >see
>> >>> >>> >
https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> > PLEASE do read the posting guide
>> >>> >>> >
http://www.R-project.org/posting-guide.html
>> >>> >>> > and provide commented, minimal,
self-contained, reproducible
>> >code.
>> >>> >>
>> >>> >>
>> >>
>> >>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2016-Sep-07 13:34 UTC
[R] element wise pattern recognition and string substitution
On Tue, Sep 6, 2016 at 11:59 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:> Hi Ista, > > Thanks for the suggestion. I didn't know mapply can be used this way! Let me > take one more step. Instead of defining a pattern for each string, I would > like to define a set of patterns from all the possible combination of the > unique values of those variables. Then I need each string to find a pattern > for itself.Uh, humn, what?!? I have no idea what this means. Example? --Ista I know this is getting a little stretching. Thanks for all the> suggestion/comments from everyone. > > Jun > > On Tue, Sep 6, 2016 at 9:44 PM, Ista Zahn <istazahn at gmail.com> wrote: >> >> If you want to mach each element of 'strings' to a different regex, do >> it. Here are three ways, using your original example. >> >> pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> patterns <- c(pattern1,pattern2) >> strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i])) >> >> mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement >> "\\2")) >> >> library(stringi) >> stri_replace_all_regex(strings, patterns, "$2") >> >> Best, >> Ista >> On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: >> > Hi Jeff, >> > >> > Thanks for the reply. I tried your suggestion and it doesn't seem to >> > work >> > and I tried a simple pattern as follows and it works as expected >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] "3.mg.kg" >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] ">50-70.kg" >> > >> > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', >> > "3.mg.kg.>50-70.kg.P05") >> > [1] "P05" >> > >> > My problem is the pattern has to be dynamically constructed on the input >> > data of the function I am writing. It's actually not too difficult to >> > assemble the final.pattern with some code like the following >> > >> > sort.var <- c('TX','WTCUT') >> > combn.sort.var <- do.call(expand.grid, lapply(sort.var, >> > >> > function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))), >> > ')', sep=''))) >> > all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) >> > final.pattern <- paste0(all.patterns, collapse='|') >> > >> > You cannot run the code directly since the data object "all.exposure" is >> > not provided here. >> > >> > Jun >> > >> > >> > >> > On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller >> > <jdnewmil at dcn.davis.ca.us> >> > wrote: >> > >> >> I am not near my computer today, but each parenthesis gets its own >> >> result >> >> number, so you should put the parenthesis around the whole pattern of >> >> alternatives instead of having many parentheses. >> >> >> >> I recommend thinking in terms of what common information you expect to >> >> find in these various strings, and place your parentheses to capture >> >> that >> >> information. There is no other reason to put parentheses in the >> >> pattern... >> >> they are not grouping symbols. >> >> -- >> >> Sent from my phone. Please excuse my brevity. >> >> >> >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter >> >> <bgunter.4567 at gmail.com> >> >> wrote: >> >> >Jun: >> >> > >> >> >1. Tell us your desired result from your test vector and maybe someone >> >> >will help. >> >> > >> >> >2. As we played this game once already (you couldn't do it; I showed >> >> >you how), this seems to be a function of your limitations with regular >> >> >expressions. I'm probably not much better, but in any case, I don't >> >> >intend to be your consultant. See if you can find someone locally to >> >> >help you if you do not receive a satisfactory reply from the list. >> >> >There are many people here who are pretty good at this sort of thing, >> >> >but I don't know if they'll reply. Regex's are certainly complex. PERL >> >> >people tend to be pretty good at them, I believe. There are numerous >> >> >web sites and books on them if you need to acquire expertise for your >> >> >work. >> >> > >> >> >Cheers, >> >> >Bert >> >> >Bert Gunter >> >> > >> >> >"The trouble with having an open mind is that people keep coming along >> >> >and sticking things into it." >> >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> > >> >> > >> >> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at gmail.com> >> >> > wrote: >> >> >> Hi Bert, >> >> >> >> >> >> I still couldn't make the multiple patterns to work. Here is an >> >> >example. I >> >> >> make the pattern as follows >> >> >> >> >> >> final.pattern <- >> >> >> >> >> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> >> >> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ >> >> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. >> >> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ >> >> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ >> >> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. >> >> mg\\.kg)\\.(>110\\.kg)\\.(.*)" >> >> >> >> >> >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', >> >> >> '240.m.g.>50-70.kg.geo.mean') >> >> >> >> >> >> sub(final.pattern, '\\1', test.string) >> >> >> sub(final.pattern, '\\2', test.string) >> >> >> sub(final.pattern, '\\3', test.string) >> >> >> >> >> >> Only the third string has been correctly parsed, which matches the >> >> >first >> >> >> pattern. It seems the rest of the patterns are not called. >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter >> >> >> <bgunter.4567 at gmail.com> >> >> >wrote: >> >> >>> >> >> >>> Just noticed: My clumsy do.call() line in my previously posted code >> >> >>> below should be replaced with: >> >> >>> pat <- paste(pat,collapse = "|") >> >> >>> >> >> >>> >> >> >>> > pat <- c(pat1,pat2) >> >> >>> > paste(pat,collapse="|") >> >> >>> [1] "a+\\.*a+|b+\\.*b+" >> >> >>> >> >> >>> ************ replace this ************************** >> >> >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >>> ******************************************** >> >> >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) >> >> >>> [1] "a.a" "bb" "b.bbb" >> >> >>> >> >> >>> >> >> >>> -- Bert >> >> >>> Bert Gunter >> >> >>> >> >> >>> "The trouble with having an open mind is that people keep coming >> >> >along >> >> >>> and sticking things into it." >> >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >>> >> >> >>> >> >> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter >> >> ><bgunter.4567 at gmail.com> >> >> >>> wrote: >> >> >>> > Jun: >> >> >>> > >> >> >>> > You need to provide a clear specification via regular expressions >> >> >of >> >> >>> > the patterns you wish to match -- at least for me to decipher it. >> >> >>> > Others may be smarter than I, though... >> >> >>> > >> >> >>> > Jeff: Thanks. I have now convinced myself that it can be done (a >> >> >>> > "proof" of sorts): If pat1, pat2,..., patn are m different >> >> >patterns >> >> >>> > (in a vector of patterns) to be matched in a vector of n >> >> >>> > strings, >> >> >>> > where only one of the patterns will match in any string, then >> >> >>> > use >> >> >>> > paste() (probably via do.call()) or otherwise to paste them >> >> >together >> >> >>> > separated by "|" to form the concatenated pattern, pat. Then >> >> >>> > >> >> >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) >> >> >>> > >> >> >>> > should extract the matching pattern in each (perhaps with a >> >> >>> > little >> >> >>> > fiddling due to precedence rules); e.g. >> >> >>> > >> >> >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") >> >> >>> > >> >> >>> >> pat1 <- "a+\\.*a+" >> >> >>> >> pat2 <-"b+\\.*b+" >> >> >>> >> pat <- c(pat1,pat2) >> >> >>> > >> >> >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) >> >> >>> >> pat >> >> >>> > [1] "a+\\.*a+|b+\\.*b+" >> >> >>> > >> >> >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) >> >> >>> > [1] "a.a" "bb" "b.bbb" >> >> >>> > >> >> >>> > Cheers, >> >> >>> > Bert >> >> >>> > >> >> >>> > >> >> >>> > Bert Gunter >> >> >>> > >> >> >>> > "The trouble with having an open mind is that people keep coming >> >> >along >> >> >>> > and sticking things into it." >> >> >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip >> >> >>> > ) >> >> >>> > >> >> >>> > >> >> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen.ut at gmail.com> >> >> >wrote: >> >> >>> >> Thanks for the reply, Bert. >> >> >>> >> >> >> >>> >> Your solution solves the example. I actually have a more general >> >> >>> >> situation >> >> >>> >> where I have this dot concatenated string from multiple >> >> >variables. The >> >> >>> >> problem is those variables may have values with dots in there. >> >> >The >> >> >>> >> number of >> >> >>> >> dots are not consistent for all values of a variable. So I am >> >> >thinking >> >> >>> >> to >> >> >>> >> define a vector of patterns for the vector of the string and >> >> >hopefully >> >> >>> >> to >> >> >>> >> find a way to use a pattern from the pattern vector for each >> >> >value of >> >> >>> >> the >> >> >>> >> string vector. The only way I can think of is "for" loop, which >> >> >can be >> >> >>> >> slow. >> >> >>> >> Also these are happening in a function I am writing. Just wonder >> >> >if >> >> >>> >> there is >> >> >>> >> another more efficient way. Thanks a lot. >> >> >>> >> >> >> >>> >> Jun >> >> >>> >> >> >> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter >> >> ><bgunter.4567 at gmail.com> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> Well, he did provide an example, and... >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >>> >>> >> >> >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) >> >> >>> >>> [1] "WT.CUT" "tx" >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> ## seems to do what was requested. >> >> >>> >>> >> >> >>> >>> Jeff would have to amplify on his initial statement however: do >> >> >you >> >> >>> >>> mean that separate patterns can always be combined via "|" ? >> >> >>> >>> Or >> >> >>> >>> something deeper? >> >> >>> >>> >> >> >>> >>> Cheers, >> >> >>> >>> Bert >> >> >>> >>> Bert Gunter >> >> >>> >>> >> >> >>> >>> "The trouble with having an open mind is that people keep >> >> >>> >>> coming >> >> >along >> >> >>> >>> and sticking things into it." >> >> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic >> >> >>> >>> strip >> >> >) >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller >> >> >>> >>> <jdnewmil at dcn.davis.ca.us> >> >> >>> >>> wrote: >> >> >>> >>> > Your opening assertion is false. >> >> >>> >>> > >> >> >>> >>> > Provide a reproducible example and someone will demonstrate. >> >> >>> >>> > -- >> >> >>> >>> > Sent from my phone. Please excuse my brevity. >> >> >>> >>> > >> >> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen >> >> >>> >>> > <jun.shen.ut at gmail.com> >> >> >>> >>> > wrote: >> >> >>> >>> >>Dear list, >> >> >>> >>> >> >> >> >>> >>> >>I have a vector of strings that cannot be described by one >> >> >pattern. >> >> >>> >>> >> So >> >> >>> >>> >>let's say I construct a vector of patterns in the same length >> >> >as the >> >> >>> >>> >>vector >> >> >>> >>> >>of strings, can I do the element wise pattern recognition and >> >> >string >> >> >>> >>> >>substitution. >> >> >>> >>> >> >> >> >>> >>> >>For example, >> >> >>> >>> >> >> >> >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >> >> >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >> >> >>> >>> >> >> >> >>> >>> >>patterns <- c(pattern1,pattern2) >> >> >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') >> >> >>> >>> >> >> >> >>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" >> >> >from >> >> >>> >>> >> the >> >> >>> >>> >>second string. If I do >> >> >>> >>> >> >> >> >>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be >> >> >used. >> >> >>> >>> >> >> >> >>> >>> >>looping the patterns doesn't work the way I want. Appreciate >> >> >any >> >> >>> >>> >>comments. >> >> >>> >>> >>Thanks. >> >> >>> >>> >> >> >> >>> >>> >>Jun >> >> >>> >>> >> >> >> >>> >>> >> [[alternative HTML version deleted]] >> >> >>> >>> >> >> >> >>> >>> >>______________________________________________ >> >> >>> >>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >see >> >> >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> >>> >>PLEASE do read the posting guide >> >> >>> >>> >>http://www.R-project.org/posting-guide.html >> >> >>> >>> >>and provide commented, minimal, self-contained, reproducible >> >> >code. >> >> >>> >>> > >> >> >>> >>> > ______________________________________________ >> >> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >see >> >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> >>> > PLEASE do read the posting guide >> >> >>> >>> > http://www.R-project.org/posting-guide.html >> >> >>> >>> > and provide commented, minimal, self-contained, reproducible >> >> >code. >> >> >>> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >