Josh B
2009-Feb-27 17:27 UTC
[R] Filtering a dataset's columns by another dataset's column names
Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: Individual SNP1 SNP2 SNP3 SNP4 SNP5 1 A G T C A 2 T C A G T 3 A C T C A Dataset 2: Individual SNP1 SNP3 SNP5 SNP6 SNP7 4 A T T G C 5 T A A G G 6 A A T C G I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: Individual SNP1 SNP3 SNP5 1 A T A 2 T A T 3 A T A Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]]
Rowe, Brian Lee Yung (Portfolio Analytics)
2009-Feb-27 17:35 UTC
[R] Filtering a dataset's columns by another dataset's column names
Try this: d1[,intersect(names(d1),names(d2))] HTH, Brian -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Josh B Sent: Friday, February 27, 2009 12:28 PM To: R Help Subject: [R] Filtering a dataset's columns by another dataset's column names Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: Individual SNP1 SNP2 SNP3 SNP4 SNP5 1 A G T C A 2 T C A G T 3 A C T C A Dataset 2: Individual SNP1 SNP3 SNP5 SNP6 SNP7 4 A T T G C 5 T A A G G 6 A A T C G I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: Individual SNP1 SNP3 SNP5 1 A T A 2 T A T 3 A T A Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to "Merrill Lynch" are references to any company in the Merrill Lynch & Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. --------------------------------------------------------------------------
Marc Schwartz
2009-Feb-27 17:36 UTC
[R] Filtering a dataset's columns by another dataset's column names
on 02/27/2009 11:27 AM Josh B wrote:> Hello all, > > I hope some of you can come to my rescue, yet again. > > I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. > Here is a toy example (my real datasets have hundreds of columns): > > Dataset 1: > > Individual SNP1 SNP2 SNP3 SNP4 SNP5 > 1 A G T C A > 2 T C A G T > 3 A C T C A > > Dataset 2: > > Individual SNP1 SNP3 SNP5 SNP6 SNP7 > 4 A T T G C > 5 T A A G G > 6 A A T C G > > I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: > > Individual SNP1 SNP3 SNP5 > 1 A T A > 2 T A T > 3 A T A > > Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function. > > Thanks very much for your help everyone. > Josh B.Same.Cols <- intersect(names(DF1), names(DF2))> Same.Cols[1] "Individual" "SNP1" "SNP3" "SNP5"> rbind(DF1[, Same.Cols], DF2[, Same.Cols])Individual SNP1 SNP3 SNP5 1 1 A T A 2 2 T A T 3 3 A T A 4 4 A T T 5 5 T A A 6 6 A A T See ?intersect, which gives you the common column names, which you can then use in rbind(). HTH, Marc Schwartz
Jorge Ivan Velez
2009-Feb-27 17:39 UTC
[R] Filtering a dataset's columns by another dataset's column names
Dear Josh, Try this: dataset1[,colnames(dataset1) %in% colnames(dataset2)] Take a look at ?colnames and ?"%in%" for more information. HTH, Jorge On Fri, Feb 27, 2009 at 12:27 PM, Josh B <joshb41@yahoo.com> wrote:> Hello all, > > I hope some of you can come to my rescue, yet again. > > I have two genetic datasets, and I want one of the datasets to have only > the columns that are in common with the other dataset. > Here is a toy example (my real datasets have hundreds of columns): > > Dataset 1: > > Individual SNP1 SNP2 SNP3 SNP4 SNP5 > 1 A G T C A > 2 T C A G T > 3 A C T C A > > Dataset 2: > > Individual SNP1 SNP3 SNP5 SNP6 SNP7 > 4 A T T G C > 5 T A A G G > 6 A A T C G > > I want Dataset1 to have only columns that are also represented in Dataset > 2, i.e., I want to generate a new Dataset 3 that looks like this: > > Individual SNP1 SNP3 SNP5 > 1 A T A > 2 T A T > 3 A T A > > Does anyone know how I could do this? Keep in mind that this is not a > simple merge, as in the "merge" function. > > Thanks very much for your help everyone. > Josh B. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
David Winsemius
2009-Feb-27 17:41 UTC
[R] Filtering a dataset's columns by another dataset's column names
So you want the data that is in Dataset 1 but only the column names that are also in Dataset 2: How about: subset(DS1, select = names(DS1) %in% names(DS2) ) > DS1 <-read.table(textConnection("Individual SNP1 SNP2 SNP3 SNP4 SNP5 + 1 A G T C A + 2 T C A G T + 3 A C T C A"),header=TRUE) > DS2 <-read.table(textConnection("Individual SNP1 SNP3 SNP5 SNP6 SNP7 + 4 A T T G C + 5 T A A G G + 6 A A T C G"),header=TRUE) > subset(DS1, select= names(DS1) %in% names(DS2) ) Individual SNP1 SNP3 SNP5 1 1 A T A 2 2 T A T 3 3 A T A Tested! -- David Winsemius Heritage Labs On Feb 27, 2009, at 12:27 PM, Josh B wrote:> Hello all, > > I hope some of you can come to my rescue, yet again. > > I have two genetic datasets, and I want one of the datasets to have > only the columns that are in common with the other dataset. > Here is a toy example (my real datasets have hundreds of columns): > > Dataset 1: > > Individual SNP1 SNP2 SNP3 SNP4 SNP5 > 1 A G T C A > 2 T C A G T > 3 A C T C A > > Dataset 2: > > Individual SNP1 SNP3 SNP5 SNP6 SNP7 > 4 A T T G C > 5 T A A G G > 6 A A T C G > > I want Dataset1 to have only columns that are also represented in > Dataset 2, i.e., I want to generate a new Dataset 3 that looks like > this: > > Individual SNP1 SNP3 SNP5 > 1 A T A > 2 T A T > 3 A T A > > Does anyone know how I could do this? Keep in mind that this is not > a simple merge, as in the "merge" function. > > Thanks very much for your help everyone. > Josh B. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Daniel Malter
2009-Feb-27 17:47 UTC
[R] Filtering a dataset's columns by another dataset's column names
Hi Josh B, this looks like homework to me. Please obey the posting rules. I.e., provide self-contained code/examples and show what the point is at which you are stuck. To solve your problem, you need the "which" and the "names" function as well as the %in% operator. It is then easy to rbind the two datasets once you have figured out what the common column names are. Please try on your own first and report back if and where you are stuck along with the self-contained code. If this is indeed homework, please ask your professor or teacher. Example for two simulated datasets: x=rnorm(30) dim(x)=c(5,6) x=data.frame(x) names(x)=c("a","b","c","x","y","z") y=rnorm(30) dim(y)=c(5,6) y=data.frame(y) names(y)=c("a","b","d","v","w","x") Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Josh B Gesendet: Friday, February 27, 2009 12:28 PM An: R Help Betreff: [R] Filtering a dataset's columns by another dataset's column names Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: Individual SNP1 SNP2 SNP3 SNP4 SNP5 1 A G T C A 2 T C A G T 3 A C T C A Dataset 2: Individual SNP1 SNP3 SNP5 SNP6 SNP7 4 A T T G C 5 T A A G G 6 A A T C G I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: Individual SNP1 SNP3 SNP5 1 A T A 2 T A T 3 A T A Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.