Thank you all for all the very helpful answers! Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On 20/08/2020 3:28, Richard O'Keefe wrote:> There are & and | operators in the R language. > There is an | operator in regular expressions. > There is NOT any & operator in regular expressions. > grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > looks for elements of mydata containing the literal > string 'ConfoMap&GuineaPigs'. > > > foo <- c("a","b","cab","back") > > foo[grepl("a",foo) & grepl("b",foo)] > [1] "cab" ?"back" > > grepl returns a TRUE/FALSE vector. > > On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra at rgzm.de > <mailto:calandra at rgzm.de>> wrote: > > Dear useRs, > > I feel really stupid, but I cannot understand why "&" doesn't work > as I > expect, while "|" does. > > I have the following vector: > mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv", > "SSFA-ConfoMap_Lithics_NMPfilled.csv",? > "SSFA-ConfoMap_Sheeps_NMPfilled.csv", > "SSFA-Toothfrax_GuineaPigs.xlsx", > "SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx") > and I want to find the values that include both "ConfoMap" and > "GuineaPigs". > > If I do: > grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > it returns an empty vector, character(0). > > But if I do: > grep("ConfoMap|GuineaPigs", mydata, value=TRUE) > it returns all the elements that include either "ConfoMap" or > "GuineaPigs", as I would expect. > > So what is wrong with my "&" construct? How can I return the elements > that include both parts? > > Thank you for your help! > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
The single grep regex solutions offered to Ivan's problem were fine, but do not readily generalize to the conjunction of multiple (>2, say) regex patterns that can appear anywhere in a string and in any order. However, note that this can easily be done using the Perl zero width lookahead construction, "(?=...)" . e.g.> test <- test <- c("xyCz","xAyCz","xAyBzC","xCByAz","xACyB","BAyyC","CBxBAy") ## to search for strings contain "A", "B", & "C" in any order> grep("(?=.*A)(?=.*B)(?=.*C)", test, perl = TRUE)[1] 3 4 5 6 7 Note that this matches on one or multiple instances of the patterns. If one wants only exactly one instance of each conjunct, then something like this should do:> lookfor <- c("A","B","C") > notme <- paste0("[^",lookfor,"]*") > z <- paste0("(?=", notme, lookfor, notme, "$)",collapse = "") > grep(z, test, perl = TRUE)[1] 3 4 5 6 Cheers, Bert On Wed, Aug 19, 2020 at 11:38 PM Ivan Calandra <calandra at rgzm.de> wrote:> Thank you all for all the very helpful answers! > > Best, > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > On 20/08/2020 3:28, Richard O'Keefe wrote: > > There are & and | operators in the R language. > > There is an | operator in regular expressions. > > There is NOT any & operator in regular expressions. > > grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > > looks for elements of mydata containing the literal > > string 'ConfoMap&GuineaPigs'. > > > > > foo <- c("a","b","cab","back") > > > foo[grepl("a",foo) & grepl("b",foo)] > > [1] "cab" "back" > > > > grepl returns a TRUE/FALSE vector. > > > > On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra at rgzm.de > > <mailto:calandra at rgzm.de>> wrote: > > > > Dear useRs, > > > > I feel really stupid, but I cannot understand why "&" doesn't work > > as I > > expect, while "|" does. > > > > I have the following vector: > > mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv", > > "SSFA-ConfoMap_Lithics_NMPfilled.csv", > > "SSFA-ConfoMap_Sheeps_NMPfilled.csv", > > "SSFA-Toothfrax_GuineaPigs.xlsx", > > "SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx") > > and I want to find the values that include both "ConfoMap" and > > "GuineaPigs". > > > > If I do: > > grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > > it returns an empty vector, character(0). > > > > But if I do: > > grep("ConfoMap|GuineaPigs", mydata, value=TRUE) > > it returns all the elements that include either "ConfoMap" or > > "GuineaPigs", as I would expect. > > > > So what is wrong with my "&" construct? How can I return the elements > > that include both parts? > > > > Thank you for your help! > > Ivan > > > > -- > > Dr. Ivan Calandra > > TraCEr, laboratory for Traceology and Controlled Experiments > > MONREPOS Archaeological Research Centre and > > Museum for Human Behavioural Evolution > > Schloss Monrepos > > 56567 Neuwied, Germany > > +49 (0) 2631 9772-243 > > https://www.researchgate.net/profile/Ivan_Calandra > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > > To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you Bert, this is wonderful! Best wishes, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On 21/08/2020 0:37, Bert Gunter wrote:> The single grep regex solutions offered to Ivan's problem were fine, > but do not readily generalize to the conjunction of multiple (>2, say) > regex patterns that can appear anywhere in a string and in any order. > However, note that this can easily be done using the Perl zero width > lookahead construction,? "(?=...)" . > e.g. > > test <- test <- c("xyCz", > "xAyCz","xAyBzC","xCByAz","xACyB","BAyyC","CBxBAy") > > ## to search for strings contain "A", "B", & "C" in any order > > grep("(?=.*A)(?=.*B)(?=.*C)", test, perl = TRUE) > [1] 3 4 5 6 7 > > Note that this matches on one or multiple instances of the patterns. > If one wants only exactly one instance of each conjunct,? then > something like this should do: > > > lookfor <- c("A","B","C") > > notme <- paste0("[^",lookfor,"]*") > > z <- paste0("(?=", notme, lookfor, notme, "$)",collapse = "") > > grep(z, test, perl = TRUE) > [1] 3 4 5 6 > > Cheers, > Bert > > > > > On Wed, Aug 19, 2020 at 11:38 PM Ivan Calandra <calandra at rgzm.de > <mailto:calandra at rgzm.de>> wrote: > > Thank you all for all the very helpful answers! > > Best, > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > On 20/08/2020 3:28, Richard O'Keefe wrote: > > There are & and | operators in the R language. > > There is an | operator in regular expressions. > > There is NOT any & operator in regular expressions. > > grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > > looks for elements of mydata containing the literal > > string 'ConfoMap&GuineaPigs'. > > > > > foo <- c("a","b","cab","back") > > > foo[grepl("a",foo) & grepl("b",foo)] > > [1] "cab" ?"back" > > > > grepl returns a TRUE/FALSE vector. > > > > On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra at rgzm.de > <mailto:calandra at rgzm.de> > > <mailto:calandra at rgzm.de <mailto:calandra at rgzm.de>>> wrote: > > > >? ? ?Dear useRs, > > > >? ? ?I feel really stupid, but I cannot understand why "&" > doesn't work > >? ? ?as I > >? ? ?expect, while "|" does. > > > >? ? ?I have the following vector: > >? ? ?mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv", > >? ? ?"SSFA-ConfoMap_Lithics_NMPfilled.csv",? > >? ? ?"SSFA-ConfoMap_Sheeps_NMPfilled.csv", > >? ? ?"SSFA-Toothfrax_GuineaPigs.xlsx", > >? ? ?"SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx") > >? ? ?and I want to find the values that include both "ConfoMap" and > >? ? ?"GuineaPigs". > > > >? ? ?If I do: > >? ? ?grep("ConfoMap&GuineaPigs", mydata, value=TRUE) > >? ? ?it returns an empty vector, character(0). > > > >? ? ?But if I do: > >? ? ?grep("ConfoMap|GuineaPigs", mydata, value=TRUE) > >? ? ?it returns all the elements that include either "ConfoMap" or > >? ? ?"GuineaPigs", as I would expect. > > > >? ? ?So what is wrong with my "&" construct? How can I return the > elements > >? ? ?that include both parts? > > > >? ? ?Thank you for your help! > >? ? ?Ivan > > > >? ? ?-- > >? ? ?Dr. Ivan Calandra > >? ? ?TraCEr, laboratory for Traceology and Controlled Experiments > >? ? ?MONREPOS Archaeological Research Centre and > >? ? ?Museum for Human Behavioural Evolution > >? ? ?Schloss Monrepos > >? ? ?56567 Neuwied, Germany > >? ? ?+49 (0) 2631 9772-243 > >? ? ?https://www.researchgate.net/profile/Ivan_Calandra > > > >? ? ?______________________________________________ > >? ? ?R-help at r-project.org <mailto:R-help at r-project.org> > <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> > mailing list -- > >? ? ?To UNSUBSCRIBE and more, see > >? ? ?https://stat.ethz.ch/mailman/listinfo/r-help > >? ? ?PLEASE do read the posting guide > >? ? ?http://www.R-project.org/posting-guide.html > >? ? ?and provide commented, minimal, self-contained, reproducible > code. > > > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >