Paul Miller
2012-Apr-23 16:10 UTC
[R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Hello All, Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: names(KRASyn)[grep("muta", names(KRASyn))] The idea then would be to add together the various columns using code like: KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. Can anyone show me how to select the columns I need? Thanks, Paul
David Winsemius
2012-Apr-23 16:16 UTC
[R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:> Hello All, > > Started out awhile ago trying to select columns in a dataframe whose > names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using > code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like > "nonmutated" and "unmutated" as well as columns like "mutated", > "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation > of the word "mutant" without the "non" or the "un". I've been > looking around for an example of how to do that but haven't found > anything yet. > > Can anyone show me how to select the columns I need?If you want only columns whose names _begin_ with "muta" then add the "^" character at the beginning of your pattern: names(KRASyn)[grep("^muta", names(KRASyn))] (This should be explained on the ?regex page.) -- David Winsemius, MD West Hartford, CT
Bert Gunter
2012-Apr-23 17:01 UTC
[R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Below. -- Bert On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller <pjmiller_57 at yahoo.com> wrote:> Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet.You can't, because you have not provided a full specification of what can be selected and what can't. Software can only do what you tell it to -- it cannot read minds. Once you have provided a a complete and accurate specification of inclusion/exclusion criteria, it should be easy to write a regex procedure. "The fault, dear Brutus, lies not in the stars but in ourselves." -- Bert> > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Greg Snow
2012-Apr-23 21:05 UTC
[R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
Here is a method that uses negative look behind:> tmp <- c('mutation','nonmutated','unmutated','verymutated','other') > grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)[1] 1 4 it looks for muta that is not immediatly preceeded by un or non (but it would match "unusually mutated" since the un is not immediatly befor the muta). Hope this helps, On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller <pjmiller_57 at yahoo.com> wrote:> Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. > > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com