thr3ads.net - R help - [R] Filtering a dataset's columns by another dataset's column names [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Josh B

2009-Feb-27 17:27 UTC

[R] Filtering a dataset's columns by another dataset's column names

Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only the
columns that are in common with the other dataset.
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

Individual    SNP1    SNP2    SNP3    SNP4    SNP5
1    A    G    T    C    A
2    T    C    A    G    T
3    A    C    T    C    A

Dataset 2:

Individual    SNP1    SNP3    SNP5    SNP6    SNP7
4    A    T    T    G    C
5    T    A    A    G    G
6    A    A    T    C    G

I want Dataset1 to have only columns that are also represented in Dataset 2,
i.e., I want to generate a new Dataset 3 that looks like this:

Individual    SNP1    SNP3    SNP5
1    A    T    A
2    T    A    T
3    A    T    A

Does anyone know how I could do this? Keep in mind that this is not a simple
merge, as in the "merge" function.

Thanks very much for your help everyone.
Josh B.



      
	[[alternative HTML version deleted]]

Rowe, Brian Lee Yung (Portfolio Analytics)

2009-Feb-27 17:35 UTC

head link

[R] Filtering a dataset's columns by another dataset's column names

Try this:

d1[,intersect(names(d1),names(d2))]

HTH, Brian

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Josh B
Sent: Friday, February 27, 2009 12:28 PM
To: R Help
Subject: [R] Filtering a dataset's columns by another dataset's column
names


Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only
the columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

Individual    SNP1    SNP2    SNP3    SNP4    SNP5
1    A    G    T    C    A
2    T    C    A    G    T
3    A    C    T    C    A

Dataset 2:

Individual    SNP1    SNP3    SNP5    SNP6    SNP7
4    A    T    T    G    C
5    T    A    A    G    G
6    A    A    T    C    G

I want Dataset1 to have only columns that are also represented in
Dataset 2, i.e., I want to generate a new Dataset 3 that looks like
this:

Individual    SNP1    SNP3    SNP5
1    A    T    A
2    T    A    T
3    A    T    A

Does anyone know how I could do this? Keep in mind that this is not a
simple merge, as in the "merge" function.

Thanks very much for your help everyone.
Josh B.



      
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------------
This message w/attachments (message) may be privileged, confidential or
proprietary, and if you are not an intended recipient, please notify the sender,
do not use or share it and delete it. Unless specifically indicated, this
message is not an offer to sell or a solicitation of any investment products or
other financial product or service, an official confirmation of any transaction,
or an official statement of Merrill Lynch. Subject to applicable law, Merrill
Lynch may monitor, review and retain e-communications (EC) traveling through its
networks/systems. The laws of the country of each sender/recipient may impact
the handling of EC, and EC may be archived, supervised and produced in countries
other than the country in which you are located. This message cannot be
guaranteed to be secure or error-free. References to "Merrill Lynch"
are references to any company in the Merrill Lynch & Co., Inc. group of
companies, which are wholly-owned by Bank of America Corporation. Securities and
Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose
Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or
Activity * Are Not Insured by Any Federal Government Agency. Attachments that
are part of this E-communication may have additional important disclosures and
disclaimers, which you should read. This message is subject to terms available
at the following link: http://www.ml.com/e-communications_terms/. By messaging
with Merrill Lynch you consent to the foregoing.
--------------------------------------------------------------------------

Marc Schwartz

2009-Feb-27 17:36 UTC

head link

[R] Filtering a dataset's columns by another dataset's column names

on 02/27/2009 11:27 AM Josh B wrote:> Hello all,
> 
> I hope some of you can come to my rescue, yet again.
> 
> I have two genetic datasets, and I want one of the datasets to have only
the columns that are in common with the other dataset.
> Here is a toy example (my real datasets have hundreds of columns):
> 
> Dataset 1:
> 
> Individual    SNP1    SNP2    SNP3    SNP4    SNP5
> 1    A    G    T    C    A
> 2    T    C    A    G    T
> 3    A    C    T    C    A
> 
> Dataset 2:
> 
> Individual    SNP1    SNP3    SNP5    SNP6    SNP7
> 4    A    T    T    G    C
> 5    T    A    A    G    G
> 6    A    A    T    C    G
> 
> I want Dataset1 to have only columns that are also represented in Dataset
2, i.e., I want to generate a new Dataset 3 that looks like this:
> 
> Individual    SNP1    SNP3    SNP5
> 1    A    T    A
> 2    T    A    T
> 3    A    T    A
> 
> Does anyone know how I could do this? Keep in mind that this is not a
simple merge, as in the "merge" function.
> 
> Thanks very much for your help everyone.
> Josh B.
Same.Cols <- intersect(names(DF1), names(DF2))
> Same.Cols[1] "Individual" "SNP1"       "SNP3"      
"SNP5"
> rbind(DF1[, Same.Cols], DF2[, Same.Cols])  Individual SNP1 SNP3 SNP5
1          1    A    T    A
2          2    T    A    T
3          3    A    T    A
4          4    A    T    T
5          5    T    A    A
6          6    A    A    T


See ?intersect, which gives you the common column names, which you can
then use in rbind().

HTH,

Marc Schwartz

Jorge Ivan Velez

2009-Feb-27 17:39 UTC

head link

[R] Filtering a dataset's columns by another dataset's column names

Dear Josh,
Try this:

dataset1[,colnames(dataset1) %in% colnames(dataset2)]

Take a look at ?colnames and ?"%in%" for more information.

HTH,

Jorge


On Fri, Feb 27, 2009 at 12:27 PM, Josh B <joshb41@yahoo.com> wrote:
> Hello all,
>
> I hope some of you can come to my rescue, yet again.
>
> I have two genetic datasets, and I want one of the datasets to have only
> the columns that are in common with the other dataset.
> Here is a toy example (my real datasets have hundreds of columns):
>
> Dataset 1:
>
> Individual    SNP1    SNP2    SNP3    SNP4    SNP5
> 1    A    G    T    C    A
> 2    T    C    A    G    T
> 3    A    C    T    C    A
>
> Dataset 2:
>
> Individual    SNP1    SNP3    SNP5    SNP6    SNP7
> 4    A    T    T    G    C
> 5    T    A    A    G    G
> 6    A    A    T    C    G
>
> I want Dataset1 to have only columns that are also represented in Dataset
> 2, i.e., I want to generate a new Dataset 3 that looks like this:
>
> Individual    SNP1    SNP3    SNP5
> 1    A    T    A
> 2    T    A    T
> 3    A    T    A
>
> Does anyone know how I could do this? Keep in mind that this is not a
> simple merge, as in the "merge" function.
>
> Thanks very much for your help everyone.
> Josh B.
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

David Winsemius

2009-Feb-27 17:41 UTC

head link

[R] Filtering a dataset's columns by another dataset's column names

So you want the data that is in Dataset 1 but only the column names  
that are also in Dataset 2:

How about:

  subset(DS1, select = names(DS1) %in% names(DS2) )

 > DS1 <-read.table(textConnection("Individual    SNP1    SNP2     
SNP3    SNP4    SNP5
+ 1    A    G    T    C    A
+ 2    T    C    A    G    T
+ 3    A    C    T    C    A"),header=TRUE)
 > DS2 <-read.table(textConnection("Individual    SNP1    SNP3     
SNP5    SNP6    SNP7
+ 4    A    T    T    G    C
+ 5    T    A    A    G    G
+ 6    A    A    T    C    G"),header=TRUE)

 > subset(DS1, select= names(DS1) %in% names(DS2) )
   Individual SNP1 SNP3 SNP5
1          1    A    T    A
2          2    T    A    T
3          3    A    T    A

Tested!
-- 
David Winsemius
Heritage Labs

On Feb 27, 2009, at 12:27 PM, Josh B wrote:
> Hello all,
>
> I hope some of you can come to my rescue, yet again.
>
> I have two genetic datasets, and I want one of the datasets to have  
> only the columns that are in common with the other dataset.
> Here is a toy example (my real datasets have hundreds of columns):
>
> Dataset 1:
>
> Individual    SNP1    SNP2    SNP3    SNP4    SNP5
> 1    A    G    T    C    A
> 2    T    C    A    G    T
> 3    A    C    T    C    A
>
> Dataset 2:
>
> Individual    SNP1    SNP3    SNP5    SNP6    SNP7
> 4    A    T    T    G    C
> 5    T    A    A    G    G
> 6    A    A    T    C    G
>
> I want Dataset1 to have only columns that are also represented in  
> Dataset 2, i.e., I want to generate a new Dataset 3 that looks like  
> this:
>
> Individual    SNP1    SNP3    SNP5
> 1    A    T    A
> 2    T    A    T
> 3    A    T    A
>
> Does anyone know how I could do this? Keep in mind that this is not  
> a simple merge, as in the "merge" function.
>
> Thanks very much for your help everyone.
> Josh B.
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Daniel Malter

2009-Feb-27 17:47 UTC

head link

[R] Filtering a dataset's columns by another dataset's column names

Hi Josh B,

this looks like homework to me. Please obey the posting rules. I.e., provide
self-contained code/examples and show what the point is at which you are
stuck. 

To solve your problem, you need the "which" and the "names"
function as well
as the %in%  operator. It is then easy to rbind the two datasets once you
have figured out what the common column names are. Please try on your own
first and report back if and where you are stuck along with the
self-contained code. If this is indeed homework, please ask your professor
or teacher.

Example for two simulated datasets:

x=rnorm(30)
dim(x)=c(5,6)
x=data.frame(x)
names(x)=c("a","b","c","x","y","z")

y=rnorm(30)
dim(y)=c(5,6)
y=data.frame(y)
names(y)=c("a","b","d","v","w","x")

Daniel


-------------------------
cuncta stricte discussurus
-------------------------

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Josh B
Gesendet: Friday, February 27, 2009 12:28 PM
An: R Help
Betreff: [R] Filtering a dataset's columns by another dataset's column
names

Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only the
columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

Individual    SNP1    SNP2    SNP3    SNP4    SNP5
1    A    G    T    C    A
2    T    C    A    G    T
3    A    C    T    C    A

Dataset 2:

Individual    SNP1    SNP3    SNP5    SNP6    SNP7
4    A    T    T    G    C
5    T    A    A    G    G
6    A    A    T    C    G

I want Dataset1 to have only columns that are also represented in Dataset 2,
i.e., I want to generate a new Dataset 3 that looks like this:

Individual    SNP1    SNP3    SNP5
1    A    T    A
2    T    A    T
3    A    T    A

Does anyone know how I could do this? Keep in mind that this is not a simple
merge, as in the "merge" function.

Thanks very much for your help everyone.
Josh B.



      
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2009 - Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

[R] Filtering a dataset's columns by another dataset's column names

Maybe Matching Threads