Dear R users, I am interested in taking the columns from multiple dataframes, the problem is that the different dataframes have different combinations of the same variable names, here's a simple example: a<-rep(1:10) b<-rep(1:10) c<-rep(21:30) d<-rep(31:40) dat.a<-data.frame(a,b,c,d) names(dat.a)<-c("a", "b", "c", "d") dat.b<-data.frame(a,c,d) names(dat.b)<-c("a", "c", "d") I would like to first see if the names in the larger dataframe match those of the smaller (they have the same variables) names(dat.a)%in%names(dat.b) Could anyone help with this problem, I would basically like to form a subset of the dat.a that matches the variable names in dat.b. If there were only a few variables, this would be easier, but I have between 4 and 5 thousand variables in each dataset Any help would be greatly appreciated. Best, Corey Corey Sparks Assistant Professor Department of Demography and Organization Studies University of Texas at San Antonio College of Public Policy 501 West Durango Blvd Monterey Building 2.270C San Antonio, TX 78207 210 458 3166 corey.sparks 'at' utsa.edu
On Sep 22, 2009, at 5:58 PM, Corey Sparks wrote:> Dear R users, > I am interested in taking the columns from multiple dataframes, the > problem is that the different dataframes have different combinations > of the same variable names, here's a simple example: > a<-rep(1:10) > b<-rep(1:10) > c<-rep(21:30) > d<-rep(31:40) > > dat.a<-data.frame(a,b,c,d) > names(dat.a)<-c("a", "b", "c", "d") > > dat.b<-data.frame(a,c,d) > names(dat.b)<-c("a", "c", "d") > > I would like to first see if the names in the larger dataframe match > those of the smaller (they have the same variables) > > names(dat.a)%in%names(dat.b) > > > Could anyone help with this problem, I would basically like to form > a subset of the dat.a that matches the variable names in dat.b. If > there were only a few variables, this would be easier, but I have > between 4 and 5 thousand variables in each datasetI have never tried this on the scale you propose, but on your toy example, here's what works; > names(dat.a)%in%names(dat.b) # your code which returns a logical vector [1] TRUE FALSE TRUE TRUE > subset(dat.a, select= names(dat.a)%in%names(dat.b) ) a c d 1 1 21 31 2 2 22 32 3 3 23 33 4 4 24 34 5 5 25 35 6 6 26 36 7 7 27 37 8 8 28 38 9 9 29 39 10 10 30 40>-- David Winsemius, MD Heritage Laboratories West Hartford, CT
Henrique Dallazuanna
2009-Sep-23 00:18 UTC
[R] Subsetting dataframes based on column names
You can use intersect also: dat.a[intersect(names(dat.a), names(dat.b))] On Tue, Sep 22, 2009 at 6:58 PM, Corey Sparks <corey.sparks at utsa.edu> wrote:> Dear R users, > I am interested in taking the columns from multiple dataframes, the problem > is that the different dataframes have different combinations of the same > variable names, here's a simple example: > a<-rep(1:10) > b<-rep(1:10) > c<-rep(21:30) > d<-rep(31:40) > > dat.a<-data.frame(a,b,c,d) > names(dat.a)<-c("a", "b", "c", "d") > > dat.b<-data.frame(a,c,d) > names(dat.b)<-c("a", "c", "d") > > I would like to first see if the names in the larger dataframe match those > of the smaller (they have the same variables) > > names(dat.a)%in%names(dat.b) > > > Could anyone help with this problem, I would basically like to form a subset > of the dat.a that matches the variable names in dat.b. ?If there were only a > few variables, this would be easier, but I have between 4 and 5 thousand > variables in each dataset > > Any help would be greatly appreciated. > Best, > Corey > > Corey Sparks > Assistant Professor > Department of Demography and Organization Studies > University of Texas at San Antonio > College of Public Policy > 501 West Durango Blvd > Monterey Building 2.270C > San Antonio, TX 78207 > 210 458 3166 > corey.sparks 'at' utsa.edu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O