Jun Shen
2016-Sep-07 01:20 UTC
[R] element wise pattern recognition and string substitution
Hi Jeff,
Thanks for the reply. I tried your suggestion and it doesn't seem to work
and I tried a simple pattern as follows and it works as expected
sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1',
"3.mg.kg.>50-70.kg.P05")
[1] "3.mg.kg"
sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2',
"3.mg.kg.>50-70.kg.P05")
[1] ">50-70.kg"
sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3',
"3.mg.kg.>50-70.kg.P05")
[1] "P05"
My problem is the pattern has to be dynamically constructed on the input
data of the function I am writing. It's actually not too difficult to
assemble the final.pattern with some code like the following
sort.var <- c('TX','WTCUT')
combn.sort.var <- do.call(expand.grid, lapply(sort.var,
function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),
')', sep='')))
all.patterns <- do.call(paste, c(combn.sort.var, '(.*)',
sep='\\.'))
final.pattern <- paste0(all.patterns, collapse='|')
You cannot run the code directly since the data object "all.exposure"
is
not provided here.
Jun
On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
wrote:
> I am not near my computer today, but each parenthesis gets its own result
> number, so you should put the parenthesis around the whole pattern of
> alternatives instead of having many parentheses.
>
> I recommend thinking in terms of what common information you expect to
> find in these various strings, and place your parentheses to capture that
> information. There is no other reason to put parentheses in the pattern...
> they are not grouping symbols.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >Jun:
> >
> >1. Tell us your desired result from your test vector and maybe someone
> >will help.
> >
> >2. As we played this game once already (you couldn't do it; I
showed
> >you how), this seems to be a function of your limitations with regular
> >expressions. I'm probably not much better, but in any case, I
don't
> >intend to be your consultant. See if you can find someone locally to
> >help you if you do not receive a satisfactory reply from the list.
> >There are many people here who are pretty good at this sort of thing,
> >but I don't know if they'll reply. Regex's are certainly
complex. PERL
> >people tend to be pretty good at them, I believe. There are numerous
> >web sites and books on them if you need to acquire expertise for your
> >work.
> >
> >Cheers,
> >Bert
> >Bert Gunter
> >
> >"The trouble with having an open mind is that people keep coming
along
> >and sticking things into it."
> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> >
> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at
gmail.com> wrote:
> >> Hi Bert,
> >>
> >> I still couldn't make the multiple patterns to work. Here is
an
> >example. I
> >> make the pattern as follows
> >>
> >> final.pattern <-
> >>
> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>
> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\
> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.
> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\
> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\
> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.
> mg\\.kg)\\.(>110\\.kg)\\.(.*)"
> >>
> >> test.string <- c('240.m.g.>110.kg.geo.mean',
'3.mg.kg.>110.kg.P05',
> >> '240.m.g.>50-70.kg.geo.mean')
> >>
> >> sub(final.pattern, '\\1', test.string)
> >> sub(final.pattern, '\\2', test.string)
> >> sub(final.pattern, '\\3', test.string)
> >>
> >> Only the third string has been correctly parsed, which matches the
> >first
> >> pattern. It seems the rest of the patterns are not called.
> >>
> >> Jun
> >>
> >>
> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4567 at
gmail.com>
> >wrote:
> >>>
> >>> Just noticed: My clumsy do.call() line in my previously posted
code
> >>> below should be replaced with:
> >>> pat <- paste(pat,collapse = "|")
> >>>
> >>>
> >>> > pat <- c(pat1,pat2)
> >>> > paste(pat,collapse="|")
> >>> [1] "a+\\.*a+|b+\\.*b+"
> >>>
> >>> ************ replace this **************************
> >>> > pat <- do.call(paste,c(as.list(pat),
sep="|"))
> >>> ********************************************
> >>> >
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
> >>> [1] "a.a" "bb" "b.bbb"
> >>>
> >>>
> >>> -- Bert
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep
coming
> >along
> >>> and sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >>>
> >>>
> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
> ><bgunter.4567 at gmail.com>
> >>> wrote:
> >>> > Jun:
> >>> >
> >>> > You need to provide a clear specification via regular
expressions
> >of
> >>> > the patterns you wish to match -- at least for me to
decipher it.
> >>> > Others may be smarter than I, though...
> >>> >
> >>> > Jeff: Thanks. I have now convinced myself that it can be
done (a
> >>> > "proof" of sorts): If pat1, pat2,..., patn are
m different
> >patterns
> >>> > (in a vector of patterns) to be matched in a vector of n
strings,
> >>> > where only one of the patterns will match in any string,
then use
> >>> > paste() (probably via do.call()) or otherwise to paste
them
> >together
> >>> > separated by "|" to form the concatenated
pattern, pat. Then
> >>> >
> >>> > sub(paste0("^.*(",pat,
").*$"),"\\1",thevector)
> >>> >
> >>> > should extract the matching pattern in each (perhaps with
a little
> >>> > fiddling due to precedence rules); e.g.
> >>> >
> >>> >> z <-c(".fg.h.g.a.a",
"bb..dd.ef.tgf.", "foo...b.bbb.tgy")
> >>> >
> >>> >> pat1 <- "a+\\.*a+"
> >>> >> pat2 <-"b+\\.*b+"
> >>> >> pat <- c(pat1,pat2)
> >>> >
> >>> >> pat <- do.call(paste,c(as.list(pat),
sep="|"))
> >>> >> pat
> >>> > [1] "a+\\.*a+|b+\\.*b+"
> >>> >
> >>> >> sub(paste0("^[^b]*(",pat,").*$"),
"\\1", z)
> >>> > [1] "a.a" "bb" "b.bbb"
> >>> >
> >>> > Cheers,
> >>> > Bert
> >>> >
> >>> >
> >>> > Bert Gunter
> >>> >
> >>> > "The trouble with having an open mind is that people
keep coming
> >along
> >>> > and sticking things into it."
> >>> > -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>> >
> >>> >
> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen.ut
at gmail.com>
> >wrote:
> >>> >> Thanks for the reply, Bert.
> >>> >>
> >>> >> Your solution solves the example. I actually have a
more general
> >>> >> situation
> >>> >> where I have this dot concatenated string from
multiple
> >variables. The
> >>> >> problem is those variables may have values with dots
in there.
> >The
> >>> >> number of
> >>> >> dots are not consistent for all values of a variable.
So I am
> >thinking
> >>> >> to
> >>> >> define a vector of patterns for the vector of the
string and
> >hopefully
> >>> >> to
> >>> >> find a way to use a pattern from the pattern vector
for each
> >value of
> >>> >> the
> >>> >> string vector. The only way I can think of is
"for" loop, which
> >can be
> >>> >> slow.
> >>> >> Also these are happening in a function I am writing.
Just wonder
> >if
> >>> >> there is
> >>> >> another more efficient way. Thanks a lot.
> >>> >>
> >>> >> Jun
> >>> >>
> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
> ><bgunter.4567 at gmail.com>
> >>> >> wrote:
> >>> >>>
> >>> >>> Well, he did provide an example, and...
> >>> >>>
> >>> >>>
> >>> >>> > z <-
c('TX.WT.CUT.mean','mg.tx.cv')
> >>> >>>
> >>> >>> >
sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
> >>> >>> [1] "WT.CUT" "tx"
> >>> >>>
> >>> >>>
> >>> >>> ## seems to do what was requested.
> >>> >>>
> >>> >>> Jeff would have to amplify on his initial
statement however: do
> >you
> >>> >>> mean that separate patterns can always be
combined via "|" ? Or
> >>> >>> something deeper?
> >>> >>>
> >>> >>> Cheers,
> >>> >>> Bert
> >>> >>> Bert Gunter
> >>> >>>
> >>> >>> "The trouble with having an open mind is
that people keep coming
> >along
> >>> >>> and sticking things into it."
> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip
> >)
> >>> >>>
> >>> >>>
> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
> >>> >>> <jdnewmil at dcn.davis.ca.us>
> >>> >>> wrote:
> >>> >>> > Your opening assertion is false.
> >>> >>> >
> >>> >>> > Provide a reproducible example and someone
will demonstrate.
> >>> >>> > --
> >>> >>> > Sent from my phone. Please excuse my
brevity.
> >>> >>> >
> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun
Shen
> >>> >>> > <jun.shen.ut at gmail.com>
> >>> >>> > wrote:
> >>> >>> >>Dear list,
> >>> >>> >>
> >>> >>> >>I have a vector of strings that cannot be
described by one
> >pattern.
> >>> >>> >> So
> >>> >>> >>let's say I construct a vector of
patterns in the same length
> >as the
> >>> >>> >>vector
> >>> >>> >>of strings, can I do the element wise
pattern recognition and
> >string
> >>> >>> >>substitution.
> >>> >>> >>
> >>> >>> >>For example,
> >>> >>> >>
> >>> >>> >>pattern1 <-
"([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
> >>> >>> >>pattern2 <-
"([^.]*)\\.([^.]*)\\.(.*)"
> >>> >>> >>
> >>> >>> >>patterns <- c(pattern1,pattern2)
> >>> >>> >>strings <-
c('TX.WT.CUT.mean','mg.tx.cv')
> >>> >>> >>
> >>> >>> >>Say I want to extract "WT.CUT"
from the first string and "tx"
> >from
> >>> >>> >> the
> >>> >>> >>second string. If I do
> >>> >>> >>
> >>> >>> >>sub(patterns, '\\2', strings),
only the first pattern will be
> >used.
> >>> >>> >>
> >>> >>> >>looping the patterns doesn't work the
way I want. Appreciate
> >any
> >>> >>> >>comments.
> >>> >>> >>Thanks.
> >>> >>> >>
> >>> >>> >>Jun
> >>> >>> >>
> >>> >>> >> [[alternative HTML version
deleted]]
> >>> >>> >>
> >>> >>>
>>______________________________________________
> >>> >>> >>R-help at r-project.org mailing list --
To UNSUBSCRIBE and more,
> >see
> >>> >>>
>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> >>PLEASE do read the posting guide
> >>> >>>
>>http://www.R-project.org/posting-guide.html
> >>> >>> >>and provide commented, minimal,
self-contained, reproducible
> >code.
> >>> >>> >
> >>> >>> >
______________________________________________
> >>> >>> > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
> >see
> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> > PLEASE do read the posting guide
> >>> >>> > http://www.R-project.org/posting-guide.html
> >>> >>> > and provide commented, minimal,
self-contained, reproducible
> >code.
> >>> >>
> >>> >>
> >>
> >>
>
>
[[alternative HTML version deleted]]
Ista Zahn
2016-Sep-07 01:44 UTC
[R] element wise pattern recognition and string substitution
If you want to mach each element of 'strings' to a different regex, do
it. Here are three ways, using your original example.
pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
patterns <- c(pattern1,pattern2)
strings <- c('TX.WT.CUT.mean','mg.tx.cv')
for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i]))
mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement =
"\\2"))
library(stringi)
stri_replace_all_regex(strings, patterns, "$2")
Best,
Ista
On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:> Hi Jeff,
>
> Thanks for the reply. I tried your suggestion and it doesn't seem to
work
> and I tried a simple pattern as follows and it works as expected
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1',
"3.mg.kg.>50-70.kg.P05")
> [1] "3.mg.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2',
"3.mg.kg.>50-70.kg.P05")
> [1] ">50-70.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3',
"3.mg.kg.>50-70.kg.P05")
> [1] "P05"
>
> My problem is the pattern has to be dynamically constructed on the input
> data of the function I am writing. It's actually not too difficult to
> assemble the final.pattern with some code like the following
>
> sort.var <- c('TX','WTCUT')
> combn.sort.var <- do.call(expand.grid, lapply(sort.var,
>
function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),
> ')', sep='')))
> all.patterns <- do.call(paste, c(combn.sort.var, '(.*)',
sep='\\.'))
> final.pattern <- paste0(all.patterns, collapse='|')
>
> You cannot run the code directly since the data object
"all.exposure" is
> not provided here.
>
> Jun
>
>
>
> On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
> wrote:
>
>> I am not near my computer today, but each parenthesis gets its own
result
>> number, so you should put the parenthesis around the whole pattern of
>> alternatives instead of having many parentheses.
>>
>> I recommend thinking in terms of what common information you expect to
>> find in these various strings, and place your parentheses to capture
that
>> information. There is no other reason to put parentheses in the
pattern...
>> they are not grouping symbols.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>> >Jun:
>> >
>> >1. Tell us your desired result from your test vector and maybe
someone
>> >will help.
>> >
>> >2. As we played this game once already (you couldn't do it; I
showed
>> >you how), this seems to be a function of your limitations with
regular
>> >expressions. I'm probably not much better, but in any case, I
don't
>> >intend to be your consultant. See if you can find someone locally
to
>> >help you if you do not receive a satisfactory reply from the list.
>> >There are many people here who are pretty good at this sort of
thing,
>> >but I don't know if they'll reply. Regex's are
certainly complex. PERL
>> >people tend to be pretty good at them, I believe. There are
numerous
>> >web sites and books on them if you need to acquire expertise for
your
>> >work.
>> >
>> >Cheers,
>> >Bert
>> >Bert Gunter
>> >
>> >"The trouble with having an open mind is that people keep
coming along
>> >and sticking things into it."
>> >-- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>> >
>> >
>> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at
gmail.com> wrote:
>> >> Hi Bert,
>> >>
>> >> I still couldn't make the multiple patterns to work. Here
is an
>> >example. I
>> >> make the pattern as follows
>> >>
>> >> final.pattern <-
>> >>
>> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>
>> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\
>> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.
>> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\
>> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\
>> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.
>> mg\\.kg)\\.(>110\\.kg)\\.(.*)"
>> >>
>> >> test.string <- c('240.m.g.>110.kg.geo.mean',
'3.mg.kg.>110.kg.P05',
>> >> '240.m.g.>50-70.kg.geo.mean')
>> >>
>> >> sub(final.pattern, '\\1', test.string)
>> >> sub(final.pattern, '\\2', test.string)
>> >> sub(final.pattern, '\\3', test.string)
>> >>
>> >> Only the third string has been correctly parsed, which matches
the
>> >first
>> >> pattern. It seems the rest of the patterns are not called.
>> >>
>> >> Jun
>> >>
>> >>
>> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4567
at gmail.com>
>> >wrote:
>> >>>
>> >>> Just noticed: My clumsy do.call() line in my previously
posted code
>> >>> below should be replaced with:
>> >>> pat <- paste(pat,collapse = "|")
>> >>>
>> >>>
>> >>> > pat <- c(pat1,pat2)
>> >>> > paste(pat,collapse="|")
>> >>> [1] "a+\\.*a+|b+\\.*b+"
>> >>>
>> >>> ************ replace this **************************
>> >>> > pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> ********************************************
>> >>> >
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
>> >>> [1] "a.a" "bb" "b.bbb"
>> >>>
>> >>>
>> >>> -- Bert
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that people
keep coming
>> >along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>> >>>
>> >>>
>> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> wrote:
>> >>> > Jun:
>> >>> >
>> >>> > You need to provide a clear specification via regular
expressions
>> >of
>> >>> > the patterns you wish to match -- at least for me to
decipher it.
>> >>> > Others may be smarter than I, though...
>> >>> >
>> >>> > Jeff: Thanks. I have now convinced myself that it can
be done (a
>> >>> > "proof" of sorts): If pat1, pat2,..., patn
are m different
>> >patterns
>> >>> > (in a vector of patterns) to be matched in a vector
of n strings,
>> >>> > where only one of the patterns will match in any
string, then use
>> >>> > paste() (probably via do.call()) or otherwise to
paste them
>> >together
>> >>> > separated by "|" to form the concatenated
pattern, pat. Then
>> >>> >
>> >>> > sub(paste0("^.*(",pat,
").*$"),"\\1",thevector)
>> >>> >
>> >>> > should extract the matching pattern in each (perhaps
with a little
>> >>> > fiddling due to precedence rules); e.g.
>> >>> >
>> >>> >> z <-c(".fg.h.g.a.a",
"bb..dd.ef.tgf.", "foo...b.bbb.tgy")
>> >>> >
>> >>> >> pat1 <- "a+\\.*a+"
>> >>> >> pat2 <-"b+\\.*b+"
>> >>> >> pat <- c(pat1,pat2)
>> >>> >
>> >>> >> pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> >> pat
>> >>> > [1] "a+\\.*a+|b+\\.*b+"
>> >>> >
>> >>> >>
sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
>> >>> > [1] "a.a" "bb"
"b.bbb"
>> >>> >
>> >>> > Cheers,
>> >>> > Bert
>> >>> >
>> >>> >
>> >>> > Bert Gunter
>> >>> >
>> >>> > "The trouble with having an open mind is that
people keep coming
>> >along
>> >>> > and sticking things into it."
>> >>> > -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>> >>> >
>> >>> >
>> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen
<jun.shen.ut at gmail.com>
>> >wrote:
>> >>> >> Thanks for the reply, Bert.
>> >>> >>
>> >>> >> Your solution solves the example. I actually have
a more general
>> >>> >> situation
>> >>> >> where I have this dot concatenated string from
multiple
>> >variables. The
>> >>> >> problem is those variables may have values with
dots in there.
>> >The
>> >>> >> number of
>> >>> >> dots are not consistent for all values of a
variable. So I am
>> >thinking
>> >>> >> to
>> >>> >> define a vector of patterns for the vector of the
string and
>> >hopefully
>> >>> >> to
>> >>> >> find a way to use a pattern from the pattern
vector for each
>> >value of
>> >>> >> the
>> >>> >> string vector. The only way I can think of is
"for" loop, which
>> >can be
>> >>> >> slow.
>> >>> >> Also these are happening in a function I am
writing. Just wonder
>> >if
>> >>> >> there is
>> >>> >> another more efficient way. Thanks a lot.
>> >>> >>
>> >>> >> Jun
>> >>> >>
>> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Well, he did provide an example, and...
>> >>> >>>
>> >>> >>>
>> >>> >>> > z <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>>
>> >>> >>> >
sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>> >>> >>> [1] "WT.CUT" "tx"
>> >>> >>>
>> >>> >>>
>> >>> >>> ## seems to do what was requested.
>> >>> >>>
>> >>> >>> Jeff would have to amplify on his initial
statement however: do
>> >you
>> >>> >>> mean that separate patterns can always be
combined via "|" ? Or
>> >>> >>> something deeper?
>> >>> >>>
>> >>> >>> Cheers,
>> >>> >>> Bert
>> >>> >>> Bert Gunter
>> >>> >>>
>> >>> >>> "The trouble with having an open mind is
that people keep coming
>> >along
>> >>> >>> and sticking things into it."
>> >>> >>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic strip
>> >)
>> >>> >>>
>> >>> >>>
>> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff
Newmiller
>> >>> >>> <jdnewmil at dcn.davis.ca.us>
>> >>> >>> wrote:
>> >>> >>> > Your opening assertion is false.
>> >>> >>> >
>> >>> >>> > Provide a reproducible example and
someone will demonstrate.
>> >>> >>> > --
>> >>> >>> > Sent from my phone. Please excuse my
brevity.
>> >>> >>> >
>> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun
Shen
>> >>> >>> > <jun.shen.ut at gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >>Dear list,
>> >>> >>> >>
>> >>> >>> >>I have a vector of strings that
cannot be described by one
>> >pattern.
>> >>> >>> >> So
>> >>> >>> >>let's say I construct a vector of
patterns in the same length
>> >as the
>> >>> >>> >>vector
>> >>> >>> >>of strings, can I do the element wise
pattern recognition and
>> >string
>> >>> >>> >>substitution.
>> >>> >>> >>
>> >>> >>> >>For example,
>> >>> >>> >>
>> >>> >>> >>pattern1 <-
"([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>> >>> >>> >>pattern2 <-
"([^.]*)\\.([^.]*)\\.(.*)"
>> >>> >>> >>
>> >>> >>> >>patterns <- c(pattern1,pattern2)
>> >>> >>> >>strings <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>> >>
>> >>> >>> >>Say I want to extract
"WT.CUT" from the first string and "tx"
>> >from
>> >>> >>> >> the
>> >>> >>> >>second string. If I do
>> >>> >>> >>
>> >>> >>> >>sub(patterns, '\\2',
strings), only the first pattern will be
>> >used.
>> >>> >>> >>
>> >>> >>> >>looping the patterns doesn't work
the way I want. Appreciate
>> >any
>> >>> >>> >>comments.
>> >>> >>> >>Thanks.
>> >>> >>> >>
>> >>> >>> >>Jun
>> >>> >>> >>
>> >>> >>> >> [[alternative HTML version
deleted]]
>> >>> >>> >>
>> >>> >>>
>>______________________________________________
>> >>> >>> >>R-help at r-project.org mailing list
-- To UNSUBSCRIBE and more,
>> >see
>> >>> >>>
>>https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> >>PLEASE do read the posting guide
>> >>> >>>
>>http://www.R-project.org/posting-guide.html
>> >>> >>> >>and provide commented, minimal,
self-contained, reproducible
>> >code.
>> >>> >>> >
>> >>> >>> >
______________________________________________
>> >>> >>> > R-help at r-project.org mailing list --
To UNSUBSCRIBE and more,
>> >see
>> >>> >>> >
https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> > PLEASE do read the posting guide
>> >>> >>> >
http://www.R-project.org/posting-guide.html
>> >>> >>> > and provide commented, minimal,
self-contained, reproducible
>> >code.
>> >>> >>
>> >>> >>
>> >>
>> >>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Jun Shen
2016-Sep-07 03:59 UTC
[R] element wise pattern recognition and string substitution
Hi Ista, Thanks for the suggestion. I didn't know mapply can be used this way! Let me take one more step. Instead of defining a pattern for each string, I would like to define a set of patterns from all the possible combination of the unique values of those variables. Then I need each string to find a pattern for itself. I know this is getting a little stretching. Thanks for all the suggestion/comments from everyone. Jun On Tue, Sep 6, 2016 at 9:44 PM, Ista Zahn <istazahn at gmail.com> wrote:> If you want to mach each element of 'strings' to a different regex, do > it. Here are three ways, using your original example. > > pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > > patterns <- c(pattern1,pattern2) > strings <- c('TX.WT.CUT.mean','mg.tx.cv') > > for(i in seq(strings)) print(sub(patterns[i], "\\2", strings[i])) > > mapply(sub, pattern = patterns, x = strings, MoreArgs=list(replacement > "\\2")) > > library(stringi) > stri_replace_all_regex(strings, patterns, "$2") > > Best, > Ista > On Tue, Sep 6, 2016 at 9:20 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: > > Hi Jeff, > > > > Thanks for the reply. I tried your suggestion and it doesn't seem to work > > and I tried a simple pattern as follows and it works as expected > > > > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', "3.mg.kg > .>50-70.kg.P05") > > [1] "3.mg.kg" > > > > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', "3.mg.kg > .>50-70.kg.P05") > > [1] ">50-70.kg" > > > > sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', "3.mg.kg > .>50-70.kg.P05") > > [1] "P05" > > > > My problem is the pattern has to be dynamically constructed on the input > > data of the function I am writing. It's actually not too difficult to > > assemble the final.pattern with some code like the following > > > > sort.var <- c('TX','WTCUT') > > combn.sort.var <- do.call(expand.grid, lapply(sort.var, > > function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all. > exposure[x]))), > > ')', sep=''))) > > all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) > > final.pattern <- paste0(all.patterns, collapse='|') > > > > You cannot run the code directly since the data object "all.exposure" is > > not provided here. > > > > Jun > > > > > > > > On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us > > > > wrote: > > > >> I am not near my computer today, but each parenthesis gets its own > result > >> number, so you should put the parenthesis around the whole pattern of > >> alternatives instead of having many parentheses. > >> > >> I recommend thinking in terms of what common information you expect to > >> find in these various strings, and place your parentheses to capture > that > >> information. There is no other reason to put parentheses in the > pattern... > >> they are not grouping symbols. > >> -- > >> Sent from my phone. Please excuse my brevity. > >> > >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter < > bgunter.4567 at gmail.com> > >> wrote: > >> >Jun: > >> > > >> >1. Tell us your desired result from your test vector and maybe someone > >> >will help. > >> > > >> >2. As we played this game once already (you couldn't do it; I showed > >> >you how), this seems to be a function of your limitations with regular > >> >expressions. I'm probably not much better, but in any case, I don't > >> >intend to be your consultant. See if you can find someone locally to > >> >help you if you do not receive a satisfactory reply from the list. > >> >There are many people here who are pretty good at this sort of thing, > >> >but I don't know if they'll reply. Regex's are certainly complex. PERL > >> >people tend to be pretty good at them, I believe. There are numerous > >> >web sites and books on them if you need to acquire expertise for your > >> >work. > >> > > >> >Cheers, > >> >Bert > >> >Bert Gunter > >> > > >> >"The trouble with having an open mind is that people keep coming along > >> >and sticking things into it." > >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > > >> > > >> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at gmail.com> > wrote: > >> >> Hi Bert, > >> >> > >> >> I still couldn't make the multiple patterns to work. Here is an > >> >example. I > >> >> make the pattern as follows > >> >> > >> >> final.pattern <- > >> >> > >> >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> > >> 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ > >> .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. > >> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ > >> .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ > >> \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. > >> mg\\.kg)\\.(>110\\.kg)\\.(.*)" > >> >> > >> >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', > >> >> '240.m.g.>50-70.kg.geo.mean') > >> >> > >> >> sub(final.pattern, '\\1', test.string) > >> >> sub(final.pattern, '\\2', test.string) > >> >> sub(final.pattern, '\\3', test.string) > >> >> > >> >> Only the third string has been correctly parsed, which matches the > >> >first > >> >> pattern. It seems the rest of the patterns are not called. > >> >> > >> >> Jun > >> >> > >> >> > >> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4567 at gmail.com > > > >> >wrote: > >> >>> > >> >>> Just noticed: My clumsy do.call() line in my previously posted code > >> >>> below should be replaced with: > >> >>> pat <- paste(pat,collapse = "|") > >> >>> > >> >>> > >> >>> > pat <- c(pat1,pat2) > >> >>> > paste(pat,collapse="|") > >> >>> [1] "a+\\.*a+|b+\\.*b+" > >> >>> > >> >>> ************ replace this ************************** > >> >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) > >> >>> ******************************************** > >> >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) > >> >>> [1] "a.a" "bb" "b.bbb" > >> >>> > >> >>> > >> >>> -- Bert > >> >>> Bert Gunter > >> >>> > >> >>> "The trouble with having an open mind is that people keep coming > >> >along > >> >>> and sticking things into it." > >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> >>> > >> >>> > >> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter > >> ><bgunter.4567 at gmail.com> > >> >>> wrote: > >> >>> > Jun: > >> >>> > > >> >>> > You need to provide a clear specification via regular expressions > >> >of > >> >>> > the patterns you wish to match -- at least for me to decipher it. > >> >>> > Others may be smarter than I, though... > >> >>> > > >> >>> > Jeff: Thanks. I have now convinced myself that it can be done (a > >> >>> > "proof" of sorts): If pat1, pat2,..., patn are m different > >> >patterns > >> >>> > (in a vector of patterns) to be matched in a vector of n strings, > >> >>> > where only one of the patterns will match in any string, then use > >> >>> > paste() (probably via do.call()) or otherwise to paste them > >> >together > >> >>> > separated by "|" to form the concatenated pattern, pat. Then > >> >>> > > >> >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) > >> >>> > > >> >>> > should extract the matching pattern in each (perhaps with a little > >> >>> > fiddling due to precedence rules); e.g. > >> >>> > > >> >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") > >> >>> > > >> >>> >> pat1 <- "a+\\.*a+" > >> >>> >> pat2 <-"b+\\.*b+" > >> >>> >> pat <- c(pat1,pat2) > >> >>> > > >> >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) > >> >>> >> pat > >> >>> > [1] "a+\\.*a+|b+\\.*b+" > >> >>> > > >> >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) > >> >>> > [1] "a.a" "bb" "b.bbb" > >> >>> > > >> >>> > Cheers, > >> >>> > Bert > >> >>> > > >> >>> > > >> >>> > Bert Gunter > >> >>> > > >> >>> > "The trouble with having an open mind is that people keep coming > >> >along > >> >>> > and sticking things into it." > >> >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> >>> > > >> >>> > > >> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen.ut at gmail.com> > >> >wrote: > >> >>> >> Thanks for the reply, Bert. > >> >>> >> > >> >>> >> Your solution solves the example. I actually have a more general > >> >>> >> situation > >> >>> >> where I have this dot concatenated string from multiple > >> >variables. The > >> >>> >> problem is those variables may have values with dots in there. > >> >The > >> >>> >> number of > >> >>> >> dots are not consistent for all values of a variable. So I am > >> >thinking > >> >>> >> to > >> >>> >> define a vector of patterns for the vector of the string and > >> >hopefully > >> >>> >> to > >> >>> >> find a way to use a pattern from the pattern vector for each > >> >value of > >> >>> >> the > >> >>> >> string vector. The only way I can think of is "for" loop, which > >> >can be > >> >>> >> slow. > >> >>> >> Also these are happening in a function I am writing. Just wonder > >> >if > >> >>> >> there is > >> >>> >> another more efficient way. Thanks a lot. > >> >>> >> > >> >>> >> Jun > >> >>> >> > >> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter > >> ><bgunter.4567 at gmail.com> > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> Well, he did provide an example, and... > >> >>> >>> > >> >>> >>> > >> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >>> >>> > >> >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) > >> >>> >>> [1] "WT.CUT" "tx" > >> >>> >>> > >> >>> >>> > >> >>> >>> ## seems to do what was requested. > >> >>> >>> > >> >>> >>> Jeff would have to amplify on his initial statement however: do > >> >you > >> >>> >>> mean that separate patterns can always be combined via "|" ? Or > >> >>> >>> something deeper? > >> >>> >>> > >> >>> >>> Cheers, > >> >>> >>> Bert > >> >>> >>> Bert Gunter > >> >>> >>> > >> >>> >>> "The trouble with having an open mind is that people keep coming > >> >along > >> >>> >>> and sticking things into it." > >> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip > >> >) > >> >>> >>> > >> >>> >>> > >> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller > >> >>> >>> <jdnewmil at dcn.davis.ca.us> > >> >>> >>> wrote: > >> >>> >>> > Your opening assertion is false. > >> >>> >>> > > >> >>> >>> > Provide a reproducible example and someone will demonstrate. > >> >>> >>> > -- > >> >>> >>> > Sent from my phone. Please excuse my brevity. > >> >>> >>> > > >> >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen > >> >>> >>> > <jun.shen.ut at gmail.com> > >> >>> >>> > wrote: > >> >>> >>> >>Dear list, > >> >>> >>> >> > >> >>> >>> >>I have a vector of strings that cannot be described by one > >> >pattern. > >> >>> >>> >> So > >> >>> >>> >>let's say I construct a vector of patterns in the same length > >> >as the > >> >>> >>> >>vector > >> >>> >>> >>of strings, can I do the element wise pattern recognition and > >> >string > >> >>> >>> >>substitution. > >> >>> >>> >> > >> >>> >>> >>For example, > >> >>> >>> >> > >> >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > >> >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > >> >>> >>> >> > >> >>> >>> >>patterns <- c(pattern1,pattern2) > >> >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >>> >>> >> > >> >>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" > >> >from > >> >>> >>> >> the > >> >>> >>> >>second string. If I do > >> >>> >>> >> > >> >>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be > >> >used. > >> >>> >>> >> > >> >>> >>> >>looping the patterns doesn't work the way I want. Appreciate > >> >any > >> >>> >>> >>comments. > >> >>> >>> >>Thanks. > >> >>> >>> >> > >> >>> >>> >>Jun > >> >>> >>> >> > >> >>> >>> >> [[alternative HTML version deleted]] > >> >>> >>> >> > >> >>> >>> >>______________________________________________ > >> >>> >>> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, > >> >see > >> >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> >>> >>PLEASE do read the posting guide > >> >>> >>> >>http://www.R-project.org/posting-guide.html > >> >>> >>> >>and provide commented, minimal, self-contained, reproducible > >> >code. > >> >>> >>> > > >> >>> >>> > ______________________________________________ > >> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, > >> >see > >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> >>> > PLEASE do read the posting guide > >> >>> >>> > http://www.R-project.org/posting-guide.html > >> >>> >>> > and provide commented, minimal, self-contained, reproducible > >> >code. > >> >>> >> > >> >>> >> > >> >> > >> >> > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jeff Newmiller
2016-Sep-07 07:04 UTC
[R] element wise pattern recognition and string substitution
Here are some suggestions:
test.string <- c( '240.m.g.>110.kg.geo.mean'
, '3.mg.kg.>110.kg.P05'
, '240.m.g.>50-70.kg.geo.mean'
)
# based on your literal idea
suggested.pattern1 <-
"(240\\.m\\.g|3\\.mg\\.kg)\\.(>50-70\\.kg|>70-90\\.kg|>90-110\\.kg|50\\.kg\\.or\\.less|>110\\.kg)\\.(.*)"
resultL <- strsplit( sub( suggested.pattern1
, "\\1\t\\2\t\\3"
, test.string )
, split = "\t"
)
# equivalent based on apparent repetitive patterns in your sample data
suggested.pattern2 <- "(.*?m\\.g|kg)\\.(.*?kg|.*?less)\\.(.*)"
resultL2 <- strsplit( sub( suggested.pattern2
, "\\1\t\\2\t\\3"
, test.string
)
, split = "\t"
)
# put results into an organized table
DF <- setNames( data.frame( do.call( rbind, resultL ) )
, c( "First", "Second", "Third" )
)
By the way... please aim to make your examples reproducible. It would have
been easy for you to define the necessary variables in example form
rather than sending a non-reproducible example.
On Tue, 6 Sep 2016, Jun Shen wrote:
> Hi Jeff,
>
> Thanks for the reply. I tried your suggestion and it doesn't seem to
work and I tried a simple pattern as follows and it works as expected
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1',
"3.mg.kg.>50-70.kg.P05")
> [1] "3.mg.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2',
"3.mg.kg.>50-70.kg.P05")
> [1] ">50-70.kg"
>
> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3',
"3.mg.kg.>50-70.kg.P05")
> [1] "P05"
>
> My problem is the pattern has to be dynamically constructed on the input
data of the function I am writing. It's actually not too difficult
> to assemble the final.pattern with some code like the following
>
> sort.var <- c('TX','WTCUT')
> combn.sort.var <- do.call(expand.grid, lapply(sort.var,
function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),
')',
> sep='')))
> all.patterns <- do.call(paste, c(combn.sort.var, '(.*)',
sep='\\.'))
> final.pattern <- paste0(all.patterns, collapse='|')
>
> You cannot run the code directly since the data object
"all.exposure" is not provided here.
>
> Jun
>
>
>
> On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us> wrote:
> I am not near my computer today, but each parenthesis gets its own
result number, so you should put the parenthesis around the
> whole pattern of alternatives instead of having many parentheses.
>
> I recommend thinking in terms of what common information you expect
to find in these various strings, and place your parentheses
> to capture that information. There is no other reason to put
parentheses in the pattern... they are not grouping symbols.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
> >Jun:
> >
> >1. Tell us your desired result from your test vector and maybe
someone
> >will help.
> >
> >2. As we played this game once already (you couldn't do it; I
showed
> >you how), this seems to be a function of your limitations with
regular
> >expressions. I'm probably not much better, but in any case, I
don't
> >intend to be your consultant. See if you can find someone locally
to
> >help you if you do not receive a satisfactory reply from the
list.
> >There are many people here who are pretty good at this sort of
thing,
> >but I don't know if they'll reply. Regex's are
certainly complex. PERL
> >people tend to be pretty good at them, I believe. There are
numerous
> >web sites and books on them if you need to acquire expertise for
your
> >work.
> >
> >Cheers,
> >Bert
> >Bert Gunter
> >
> >"The trouble with having an open mind is that people keep
coming along
> >and sticking things into it."
> >-- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >
> >
> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at
gmail.com> wrote:
> >> Hi Bert,
> >>
> >> I still couldn't make the multiple patterns to work.
Here is an
> >example. I
> >> make the pattern as follows
> >>
> >> final.pattern <-
> >>
>
>"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>70-90\\.k
>
g)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\.g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\
>
.kg)\\.(50\\.kg\\.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>110\\.kg)\\.(.*)"
> >>
> >> test.string <- c('240.m.g.>110.kg.geo.mean',
'3.mg.kg.>110.kg.P05',
> >> '240.m.g.>50-70.kg.geo.mean')
> >>
> >> sub(final.pattern, '\\1', test.string)
> >> sub(final.pattern, '\\2', test.string)
> >> sub(final.pattern, '\\3', test.string)
> >>
> >> Only the third string has been correctly parsed, which
matches the
> >first
> >> pattern. It seems the rest of the patterns are not called.
> >>
> >> Jun
> >>
> >>
> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter
<bgunter.4567 at gmail.com>
> >wrote:
> >>>
> >>> Just noticed: My clumsy do.call() line in my previously
posted code
> >>> below should be replaced with:
> >>> pat <- paste(pat,collapse = "|")
> >>>
> >>>
> >>> > pat <- c(pat1,pat2)
> >>> > paste(pat,collapse="|")
> >>> [1] "a+\\.*a+|b+\\.*b+"
> >>>
> >>> ************ replace this **************************
> >>> > pat <- do.call(paste,c(as.list(pat),
sep="|"))
> >>> ********************************************
> >>> >
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
> >>> [1] "a.a"? ?"bb"? ?
"b.bbb"
> >>>
> >>>
> >>> -- Bert
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that
people keep coming
> >along
> >>> and sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>
> >>>
> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
> ><bgunter.4567 at gmail.com>
> >>> wrote:
> >>> > Jun:
> >>> >
> >>> > You need to provide a clear specification via
regular expressions
> >of
> >>> > the patterns you wish to match -- at least for me
to decipher it.
> >>> > Others may be smarter than I, though...
> >>> >
> >>> > Jeff: Thanks. I have now convinced myself that it
can be done (a
> >>> > "proof" of sorts): If pat1, pat2,...,
patn are m different
> >patterns
> >>> > (in a vector of patterns)? to be matched in a
vector of n strings,
> >>> > where only one of the patterns will match in any
string,? then use
> >>> > paste() (probably via do.call()) or otherwise to
paste them
> >together
> >>> > separated by "|" to form the concatenated
pattern, pat. Then
> >>> >
> >>> > sub(paste0("^.*(",pat,
").*$"),"\\1",thevector)
> >>> >
> >>> > should extract the matching pattern in each
(perhaps with a little
> >>> > fiddling due to precedence rules); e.g.
> >>> >
> >>> >> z <-c(".fg.h.g.a.a",
"bb..dd.ef.tgf.", "foo...b.bbb.tgy")
> >>> >
> >>> >> pat1 <- "a+\\.*a+"
> >>> >> pat2 <-"b+\\.*b+"
> >>> >> pat <- c(pat1,pat2)
> >>> >
> >>> >> pat <- do.call(paste,c(as.list(pat),
sep="|"))
> >>> >> pat
> >>> > [1] "a+\\.*a+|b+\\.*b+"
> >>> >
> >>> >>
sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
> >>> > [1] "a.a"? ?"bb"? ?
"b.bbb"
> >>> >
> >>> > Cheers,
> >>> > Bert
> >>> >
> >>> >
> >>> > Bert Gunter
> >>> >
> >>> > "The trouble with having an open mind is that
people keep coming
> >along
> >>> > and sticking things into it."
> >>> > -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>> >
> >>> >
> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen
<jun.shen.ut at gmail.com>
> >wrote:
> >>> >> Thanks for the reply, Bert.
> >>> >>
> >>> >> Your solution solves the example. I actually
have a more general
> >>> >> situation
> >>> >> where I have this dot concatenated string from
multiple
> >variables. The
> >>> >> problem is those variables may have values with
dots in there.
> >The
> >>> >> number of
> >>> >> dots are not consistent for all values of a
variable. So I am
> >thinking
> >>> >> to
> >>> >> define a vector of patterns for the vector of
the string and
> >hopefully
> >>> >> to
> >>> >> find a way to use a pattern from the pattern
vector for each
> >value of
> >>> >> the
> >>> >> string vector. The only way I can think of is
"for" loop, which
> >can be
> >>> >> slow.
> >>> >> Also these are happening in a function I am
writing. Just wonder
> >if
> >>> >> there is
> >>> >> another more efficient way. Thanks a lot.
> >>> >>
> >>> >> Jun
> >>> >>
> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
> ><bgunter.4567 at gmail.com>
> >>> >> wrote:
> >>> >>>
> >>> >>> Well, he did provide an example, and...
> >>> >>>
> >>> >>>
> >>> >>> > z <-
c('TX.WT.CUT.mean','mg.tx.cv')
> >>> >>>
> >>> >>> >
sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
> >>> >>> [1] "WT.CUT" "tx"
> >>> >>>
> >>> >>>
> >>> >>> ## seems to do what was requested.
> >>> >>>
> >>> >>> Jeff would have to amplify on his initial
statement however: do
> >you
> >>> >>> mean that separate patterns can always be
combined via "|" ?? Or
> >>> >>> something deeper?
> >>> >>>
> >>> >>> Cheers,
> >>> >>> Bert
> >>> >>> Bert Gunter
> >>> >>>
> >>> >>> "The trouble with having an open mind
is that people keep coming
> >along
> >>> >>> and sticking things into it."
> >>> >>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic strip
> >)
> >>> >>>
> >>> >>>
> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff
Newmiller
> >>> >>> <jdnewmil at dcn.davis.ca.us>
> >>> >>> wrote:
> >>> >>> > Your opening assertion is false.
> >>> >>> >
> >>> >>> > Provide a reproducible example and
someone will demonstrate.
> >>> >>> > --
> >>> >>> > Sent from my phone. Please excuse my
brevity.
> >>> >>> >
> >>> >>> > On September 4, 2016 9:06:59 PM PDT,
Jun Shen
> >>> >>> > <jun.shen.ut at gmail.com>
> >>> >>> > wrote:
> >>> >>> >>Dear list,
> >>> >>> >>
> >>> >>> >>I have a vector of strings that
cannot be described by one
> >pattern.
> >>> >>> >> So
> >>> >>> >>let's say I construct a vector
of patterns in the same length
> >as the
> >>> >>> >>vector
> >>> >>> >>of strings, can I do the element
wise pattern recognition and
> >string
> >>> >>> >>substitution.
> >>> >>> >>
> >>> >>> >>For example,
> >>> >>> >>
> >>> >>> >>pattern1 <-
"([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
> >>> >>> >>pattern2 <-
"([^.]*)\\.([^.]*)\\.(.*)"
> >>> >>> >>
> >>> >>> >>patterns <- c(pattern1,pattern2)
> >>> >>> >>strings <-
c('TX.WT.CUT.mean','mg.tx.cv')
> >>> >>> >>
> >>> >>> >>Say I want to extract
"WT.CUT" from the first string and "tx"
> >from
> >>> >>> >> the
> >>> >>> >>second string. If I do
> >>> >>> >>
> >>> >>> >>sub(patterns, '\\2',
strings), only the first pattern will be
> >used.
> >>> >>> >>
> >>> >>> >>looping the patterns doesn't
work the way I want. Appreciate
> >any
> >>> >>> >>comments.
> >>> >>> >>Thanks.
> >>> >>> >>
> >>> >>> >>Jun
> >>> >>> >>
> >>> >>> >>? ? ? ?[[alternative HTML version
deleted]]
> >>> >>> >>
> >>> >>>
>>______________________________________________
> >>> >>> >>R-help at r-project.org mailing
list -- To UNSUBSCRIBE and more,
> >see
> >>> >>>
>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> >>PLEASE do read the posting guide
> >>> >>>
>>http://www.R-project.org/posting-guide.html
> >>> >>> >>and provide commented, minimal,
self-contained, reproducible
> >code.
> >>> >>> >
> >>> >>> >
______________________________________________
> >>> >>> > R-help at r-project.org mailing list
-- To UNSUBSCRIBE and more,
> >see
> >>> >>> >
https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> > PLEASE do read the posting guide
> >>> >>> >
http://www.R-project.org/posting-guide.html
> >>> >>> > and provide commented, minimal,
self-contained, reproducible
> >code.
> >>> >>
> >>> >>
> >>
> >>
>
>
>
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Jun Shen
2016-Sep-10 04:06 UTC
[R] element wise pattern recognition and string substitution
Hi Jeff,
I have been trying different methods and found your approach is the most
efficient. I am able to resolve the string-parsing problem. Let me report
back to the group.
This following example explains what I was trying to achieve.
melt.results is where the strings reside, testdata is a snippet of data
where the unique values are derived. replace.metaChar is a function I
defined. Thanks for the help from everyone and appreciate any comment.
Jun
################################################################
melt.results <- structure(list(param = c("Cmin1",
"Cminss", "Cmaxss",
"Cmin1",
"Cminss", "Cmin1", "Cminss", "Cmaxss",
"Cmin1", "Cminss"), variable structure(c(1L,
5L, 9L, 14L, 18L, 21L, 25L, 29L, 34L, 38L), .Label
c("240.mg.>110.kg.geo.mean",
"240.mg.>110.kg.cv", "240.mg.>110.kg.P05",
"240.mg.>110.kg.P95",
"3.mg.kg.>110.kg.geo.mean", "3.mg.kg.>110.kg.cv",
"3.mg.kg.>110.kg.P05",
"3.mg.kg.>110.kg.P95", "240.mg.>50-70.kg.geo.mean",
"240.mg.>50-70.kg.cv",
"240.mg.>50-70.kg.P05", "240.mg.>50-70.kg.P95",
"3.mg.kg.>50-70.kg.geo.mean",
"3.mg.kg.>50-70.kg.cv", "3.mg.kg.>50-70.kg.P05",
"3.mg.kg.>50-70.kg.P95",
"240.mg.50.kg.or.less.geo.mean", "240.mg.50.kg.or.less.cv",
"240.mg.50.kg.or.less.P05",
"240.mg.50.kg.or.less.P95",
"3.mg.kg.50.kg.or.less.geo.mean",
"3.mg.kg.50.kg.or.less.cv", "3.mg.kg.50.kg.or.less.P05",
"3.mg.kg.50.kg.or.less.P95",
"240.mg.>70-90.kg.geo.mean", "240.mg.>70-90.kg.cv",
"240.mg.>70-90.kg.P05",
"240.mg.>70-90.kg.P95", "3.mg.kg.>70-90.kg.geo.mean",
"3.mg.kg.>70-90.kg.cv",
"3.mg.kg.>70-90.kg.P05", "3.mg.kg.>70-90.kg.P95",
"240.mg.>90-110.kg.geo.mean",
"240.mg.>90-110.kg.cv", "240.mg.>90-110.kg.P05",
"240.mg.>90-110.kg.P95",
"3.mg.kg.>90-110.kg.geo.mean",
"3.mg.kg.>90-110.kg.cv",
"3.mg.kg.>90-110.kg.P05",
"3.mg.kg.>90-110.kg.P95"), class = "factor"), value =
c(97L,
144L, 76L, 137L, 18L, 104L, 92L, 87L, 111L, 41L)), .Names = c("param",
"variable", "value"), row.names = c(1L, 14L, 27L, 40L, 53L,
61L,
74L, 87L, 100L, 113L), class = "data.frame")
testdata <- structure(list(TX = c("240.mg", "3.mg.kg",
"240.mg", "3.mg.kg",
"240.mg", "3.mg.kg", "240.mg",
"3.mg.kg", "240.mg", "3.mg.kg"
), WTCUT = c(">50-70.kg", ">50-70.kg",
">70-90.kg", ">70-90.kg",
">90-110.kg", ">90-110.kg", "50.kg.or.less",
"50.kg.or.less",
">110.kg", ">110.kg")), .Names = c("TX",
"WTCUT"), row.names = c(1L,
2L, 7L, 8L, 19L, 20L, 21L, 22L, 129L, 130L), class = "data.frame")
replace.metaChar <- function(string) {
metaChar <-
c("\\$","\\*","\\+","\\.","\\?","\\[","\\]","\\^","\\{","\\}","\\|","\\(","\\)","\\\\")
metaReplace <- paste('\\',metaChar, sep='')
for(r in seq(metaChar)) gsub(metaChar[r], metaReplace[r], string) ->
string
return(string)
}
sort.var <- c('TX','WTCUT')
one.pattern <- paste('\\b',paste(sapply(sapply(sort.var,
function(x)replace.metaChar(unique(testdata[[x]]))), function(y)
paste('(',paste(y,collapse='|'),')', sep='')),
collapse='\\.'), '\\.(.*)',
sep='')
n.sort.var <- length(sort.var)
one.replacement <- paste('\\', seq(n.sort.var+1),
collapse='\t', sep='')
one.results <- strsplit(sub(one.pattern, one.replacement,
melt.results$variable), split='\t')
melt.results[c(sort.var,'STATS')] <- as.data.frame(do.call(rbind,
one.results))
On Wed, Sep 7, 2016 at 3:04 AM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
wrote:
> Here are some suggestions:
>
> test.string <- c( '240.m.g.>110.kg.geo.mean'
> , '3.mg.kg.>110.kg.P05'
> , '240.m.g.>50-70.kg.geo.mean'
> )
> # based on your literal idea
> suggested.pattern1 <-
>
"(240\\.m\\.g|3\\.mg\\.kg)\\.(>50-70\\.kg|>70-90\\.kg|>90-11
> 0\\.kg|50\\.kg\\.or\\.less|>110\\.kg)\\.(.*)"
>
> resultL <- strsplit( sub( suggested.pattern1
> , "\\1\t\\2\t\\3"
> , test.string )
> , split = "\t"
> )
>
> # equivalent based on apparent repetitive patterns in your sample data
> suggested.pattern2 <- "(.*?m\\.g|kg)\\.(.*?kg|.*?less)\\.(.*)"
>
> resultL2 <- strsplit( sub( suggested.pattern2
> , "\\1\t\\2\t\\3"
> , test.string
> )
> , split = "\t"
> )
>
> # put results into an organized table
> DF <- setNames( data.frame( do.call( rbind, resultL ) )
> , c( "First", "Second", "Third"
)
> )
>
> By the way... please aim to make your examples reproducible. It would have
> been easy for you to define the necessary variables in example form
> rather than sending a non-reproducible example.
>
>
> On Tue, 6 Sep 2016, Jun Shen wrote:
>
> Hi Jeff,
>>
>> Thanks for the reply. I tried your suggestion and it doesn't seem
to work
>> and I tried a simple pattern as follows and it works as expected
>>
>> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1',
"3.mg.kg
>> .>50-70.kg.P05")
>> [1] "3.mg.kg"
>>
>> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2',
"3.mg.kg
>> .>50-70.kg.P05")
>> [1] ">50-70.kg"
>>
>> sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3',
"3.mg.kg
>> .>50-70.kg.P05")
>> [1] "P05"
>>
>> My problem is the pattern has to be dynamically constructed on the
input
>> data of the function I am writing. It's actually not too difficult
>> to assemble the final.pattern with some code like the following
>>
>> sort.var <- c('TX','WTCUT')
>> combn.sort.var <- do.call(expand.grid, lapply(sort.var,
>>
function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))),
>> ')',
>> sep='')))
>> all.patterns <- do.call(paste, c(combn.sort.var, '(.*)',
sep='\\.'))
>> final.pattern <- paste0(all.patterns, collapse='|')
>>
>> You cannot run the code directly since the data object
"all.exposure" is
>> not provided here.
>>
>> Jun
>>
>>
>>
>> On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
>> wrote:
>> I am not near my computer today, but each parenthesis gets its
own
>> result number, so you should put the parenthesis around the
>> whole pattern of alternatives instead of having many parentheses.
>>
>> I recommend thinking in terms of what common information you
expect
>> to find in these various strings, and place your parentheses
>> to capture that information. There is no other reason to put
>> parentheses in the pattern... they are not grouping symbols.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <
>> bgunter.4567 at gmail.com> wrote:
>> >Jun:
>> >
>> >1. Tell us your desired result from your test vector and
maybe
>> someone
>> >will help.
>> >
>> >2. As we played this game once already (you couldn't do
it; I
>> showed
>> >you how), this seems to be a function of your limitations
with
>> regular
>> >expressions. I'm probably not much better, but in any
case, I don't
>> >intend to be your consultant. See if you can find someone
locally
>> to
>> >help you if you do not receive a satisfactory reply from the
list.
>> >There are many people here who are pretty good at this sort
of
>> thing,
>> >but I don't know if they'll reply. Regex's are
certainly complex.
>> PERL
>> >people tend to be pretty good at them, I believe. There are
>> numerous
>> >web sites and books on them if you need to acquire expertise
for
>> your
>> >work.
>> >
>> >Cheers,
>> >Bert
>> >Bert Gunter
>> >
>> >"The trouble with having an open mind is that people
keep coming
>> along
>> >and sticking things into it."
>> >-- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>> >
>> >
>> >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen.ut at
gmail.com>
>> wrote:
>> >> Hi Bert,
>> >>
>> >> I still couldn't make the multiple patterns to work.
Here is an
>> >example. I
>> >> make the pattern as follows
>> >>
>> >> final.pattern <-
>> >>
>>
>"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50-
>> 70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\.
>> mg\\.kg)\\.(>70-90\\.k
>> g)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.kg)\\.(.*)|(3\\.mg\\.kg
>> )\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\.g)\\.(50\\.kg\\.or\\.
>> less)\\.(.*)|(3\\.mg\\
>> .kg)\\.(50\\.kg\\.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.
>> kg)\\.(.*)|(3\\.mg\\.kg)\\.(>110\\.kg)\\.(.*)"
>> >>
>> >> test.string <-
c('240.m.g.>110.kg.geo.mean', '3.mg.kg
>> .>110.kg.P05',
>> >> '240.m.g.>50-70.kg.geo.mean')
>> >>
>> >> sub(final.pattern, '\\1', test.string)
>> >> sub(final.pattern, '\\2', test.string)
>> >> sub(final.pattern, '\\3', test.string)
>> >>
>> >> Only the third string has been correctly parsed, which
matches
>> the
>> >first
>> >> pattern. It seems the rest of the patterns are not
called.
>> >>
>> >> Jun
>> >>
>> >>
>> >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <
>> bgunter.4567 at gmail.com>
>> >wrote:
>> >>>
>> >>> Just noticed: My clumsy do.call() line in my
previously posted
>> code
>> >>> below should be replaced with:
>> >>> pat <- paste(pat,collapse = "|")
>> >>>
>> >>>
>> >>> > pat <- c(pat1,pat2)
>> >>> > paste(pat,collapse="|")
>> >>> [1] "a+\\.*a+|b+\\.*b+"
>> >>>
>> >>> ************ replace this **************************
>> >>> > pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> ********************************************
>> >>> >
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
>> >>> [1] "a.a" "bb"
"b.bbb"
>> >>>
>> >>>
>> >>> -- Bert
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that
people keep coming
>> >along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic
>> strip )
>> >>>
>> >>>
>> >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> wrote:
>> >>> > Jun:
>> >>> >
>> >>> > You need to provide a clear specification via
regular
>> expressions
>> >of
>> >>> > the patterns you wish to match -- at least for
me to decipher
>> it.
>> >>> > Others may be smarter than I, though...
>> >>> >
>> >>> > Jeff: Thanks. I have now convinced myself that
it can be done
>> (a
>> >>> > "proof" of sorts): If pat1, pat2,...,
patn are m different
>> >patterns
>> >>> > (in a vector of patterns) to be matched in a
vector of n
>> strings,
>> >>> > where only one of the patterns will match in
any string,
>> then use
>> >>> > paste() (probably via do.call()) or otherwise
to paste them
>> >together
>> >>> > separated by "|" to form the
concatenated pattern, pat. Then
>> >>> >
>> >>> > sub(paste0("^.*(",pat,
").*$"),"\\1",thevector)
>> >>> >
>> >>> > should extract the matching pattern in each
(perhaps with a
>> little
>> >>> > fiddling due to precedence rules); e.g.
>> >>> >
>> >>> >> z <-c(".fg.h.g.a.a",
"bb..dd.ef.tgf.", "foo...b.bbb.tgy")
>> >>> >
>> >>> >> pat1 <- "a+\\.*a+"
>> >>> >> pat2 <-"b+\\.*b+"
>> >>> >> pat <- c(pat1,pat2)
>> >>> >
>> >>> >> pat <- do.call(paste,c(as.list(pat),
sep="|"))
>> >>> >> pat
>> >>> > [1] "a+\\.*a+|b+\\.*b+"
>> >>> >
>> >>> >>
sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
>> >>> > [1] "a.a" "bb"
"b.bbb"
>> >>> >
>> >>> > Cheers,
>> >>> > Bert
>> >>> >
>> >>> >
>> >>> > Bert Gunter
>> >>> >
>> >>> > "The trouble with having an open mind is
that people keep
>> coming
>> >along
>> >>> > and sticking things into it."
>> >>> > -- Opus (aka Berkeley Breathed in his
"Bloom County" comic
>> strip )
>> >>> >
>> >>> >
>> >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <
>> jun.shen.ut at gmail.com>
>> >wrote:
>> >>> >> Thanks for the reply, Bert.
>> >>> >>
>> >>> >> Your solution solves the example. I
actually have a more
>> general
>> >>> >> situation
>> >>> >> where I have this dot concatenated string
from multiple
>> >variables. The
>> >>> >> problem is those variables may have values
with dots in
>> there.
>> >The
>> >>> >> number of
>> >>> >> dots are not consistent for all values of a
variable. So I am
>> >thinking
>> >>> >> to
>> >>> >> define a vector of patterns for the vector
of the string and
>> >hopefully
>> >>> >> to
>> >>> >> find a way to use a pattern from the
pattern vector for each
>> >value of
>> >>> >> the
>> >>> >> string vector. The only way I can think of
is "for" loop,
>> which
>> >can be
>> >>> >> slow.
>> >>> >> Also these are happening in a function I am
writing. Just
>> wonder
>> >if
>> >>> >> there is
>> >>> >> another more efficient way. Thanks a lot.
>> >>> >>
>> >>> >> Jun
>> >>> >>
>> >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
>> ><bgunter.4567 at gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Well, he did provide an example, and...
>> >>> >>>
>> >>> >>>
>> >>> >>> > z <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>>
>> >>> >>> >
sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>> >>> >>> [1] "WT.CUT" "tx"
>> >>> >>>
>> >>> >>>
>> >>> >>> ## seems to do what was requested.
>> >>> >>>
>> >>> >>> Jeff would have to amplify on his
initial statement
>> however: do
>> >you
>> >>> >>> mean that separate patterns can always
be combined via "|"
>> ? Or
>> >>> >>> something deeper?
>> >>> >>>
>> >>> >>> Cheers,
>> >>> >>> Bert
>> >>> >>> Bert Gunter
>> >>> >>>
>> >>> >>> "The trouble with having an open
mind is that people keep
>> coming
>> >along
>> >>> >>> and sticking things into it."
>> >>> >>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic
>> strip
>> >)
>> >>> >>>
>> >>> >>>
>> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff
Newmiller
>> >>> >>> <jdnewmil at dcn.davis.ca.us>
>> >>> >>> wrote:
>> >>> >>> > Your opening assertion is false.
>> >>> >>> >
>> >>> >>> > Provide a reproducible example and
someone will
>> demonstrate.
>> >>> >>> > --
>> >>> >>> > Sent from my phone. Please excuse
my brevity.
>> >>> >>> >
>> >>> >>> > On September 4, 2016 9:06:59 PM
PDT, Jun Shen
>> >>> >>> > <jun.shen.ut at gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >>Dear list,
>> >>> >>> >>
>> >>> >>> >>I have a vector of strings that
cannot be described by one
>> >pattern.
>> >>> >>> >> So
>> >>> >>> >>let's say I construct a
vector of patterns in the same
>> length
>> >as the
>> >>> >>> >>vector
>> >>> >>> >>of strings, can I do the
element wise pattern recognition
>> and
>> >string
>> >>> >>> >>substitution.
>> >>> >>> >>
>> >>> >>> >>For example,
>> >>> >>> >>
>> >>> >>> >>pattern1 <-
"([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>> >>> >>> >>pattern2 <-
"([^.]*)\\.([^.]*)\\.(.*)"
>> >>> >>> >>
>> >>> >>> >>patterns <-
c(pattern1,pattern2)
>> >>> >>> >>strings <-
c('TX.WT.CUT.mean','mg.tx.cv')
>> >>> >>> >>
>> >>> >>> >>Say I want to extract
"WT.CUT" from the first string and
>> "tx"
>> >from
>> >>> >>> >> the
>> >>> >>> >>second string. If I do
>> >>> >>> >>
>> >>> >>> >>sub(patterns, '\\2',
strings), only the first pattern
>> will be
>> >used.
>> >>> >>> >>
>> >>> >>> >>looping the patterns
doesn't work the way I want.
>> Appreciate
>> >any
>> >>> >>> >>comments.
>> >>> >>> >>Thanks.
>> >>> >>> >>
>> >>> >>> >>Jun
>> >>> >>> >>
>> >>> >>> >> [[alternative HTML
version deleted]]
>> >>> >>> >>
>> >>> >>>
>>______________________________________________
>> >>> >>> >>R-help at r-project.org mailing
list -- To UNSUBSCRIBE and
>> more,
>> >see
>> >>> >>>
>>https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> >>PLEASE do read the posting
guide
>> >>> >>>
>>http://www.R-project.org/posting-guide.html
>> >>> >>> >>and provide commented, minimal,
self-contained,
>> reproducible
>> >code.
>> >>> >>> >
>> >>> >>> >
______________________________________________
>> >>> >>> > R-help at r-project.org mailing
list -- To UNSUBSCRIBE and
>> more,
>> >see
>> >>> >>> >
https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >>> > PLEASE do read the posting guide
>> >>> >>> >
http://www.R-project.org/posting-guide.html
>> >>> >>> > and provide commented, minimal,
self-contained,
>> reproducible
>> >code.
>> >>> >>
>> >>> >>
>> >>
>> >>
>>
>>
>>
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#.
Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ------------------------------------------------------------
> ---------------
[[alternative HTML version deleted]]