I have two data frames data1 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,13:24), nrow=4, ncol=6, byrow=F, dimnames=list(c(1:4),c("a","b","c","d","e","z")))) data2 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,37:48), nrow=4, ncol=6, byrow=F, dimnames=list(c(1:4),c("a","b","c","f","g","z")))) that have some common column names. Comparing the names of the columns within each data frame to the other setdiff(names(data1), names(data2)) setdiff(names(data2), names(data1)) provides which columns are different. For each column that appears in data1 that DOES NOT appear in data2, I need to create those columns and fill them with NA values. The same is true for the reverse. So, I can create a vector of new column names that need to be filled with NA values, but here is where I'm stuck. I don't know how to get the names from inside the vector into the respective dataFrame. tmp1 <- as.factor(paste("data2$", setdiff(names(data1), names(data2)), sep="")) tmp2 <- as.factor(paste("data1$", setdiff(names(data2), names(data1)), sep="")) Of course, if it were as simple as only a few columns, I could do all of this by hand, but in my original data frames, I have 60 different columns that need to be created and filled with NA values for both data1 and data2. Eventually, the point of this exercise is so that I can rbind(data1, data2) and create a SQL table out of the merged dataFrames. Unfortunately, I can't rbind() everything until the column names are common across both data1 and data2. Thoughts? Thanks - SR Steven H. Ranney [[alternative HTML version deleted]]
HI, Not sure about your expected result. library(plyr) data2New<-join_all(lapply(setdiff(names(data1), names(data2)),function(x) {data2[,x]<-NA; data2})) data1New<-join_all(lapply(setdiff(names(data2), names(data1)),function(x){data1[,x]<-NA;data1})) ?data1New #? a b? c? d? e? z? f? g #1 1 5? 9 13 17 21 NA NA #2 2 6 10 14 18 22 NA NA #3 3 7 11 15 19 23 NA NA #4 4 8 12 16 20 24 NA NA A.K. ----- Original Message ----- From: Steven Ranney <steven.ranney at gmail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Thursday, August 8, 2013 2:01 PM Subject: [R] Creating new vectors from other dataFrames I have two data frames data1 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,13:24), nrow=4, ncol=6, byrow=F, dimnames=list(c(1:4),c("a","b","c","d","e","z")))) data2 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,37:48), nrow=4, ncol=6, byrow=F, dimnames=list(c(1:4),c("a","b","c","f","g","z")))) that have some common column names. Comparing the names of the columns within each data frame to the other setdiff(names(data1), names(data2)) setdiff(names(data2), names(data1)) provides which columns are different. For each column that appears in data1 that DOES NOT appear in data2, I need to create those columns and fill them with NA values.? The same is true for the reverse.? So, I can create a vector of new column names that need to be filled with NA values, but here is where I'm stuck.? I don't know how to get the names from inside the vector into the respective dataFrame. tmp1 <- as.factor(paste("data2$", setdiff(names(data1), names(data2)), sep="")) tmp2 <- as.factor(paste("data1$", setdiff(names(data2), names(data1)), sep="")) Of course, if it were as simple as only a few columns, I could do all of this by hand, but in my original data frames, I have 60 different columns that need to be created and filled with NA values for both data1 and data2. Eventually, the point of this exercise is so that I can rbind(data1, data2) and create a SQL table out of the merged dataFrames.? Unfortunately, I can't rbind() everything until the column names are common across both data1 and data2. Thoughts? Thanks - SR Steven H. Ranney ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
This is exactly what I'm looking for. Each dataFrame will have those columns that are endemic to the other filled with NA. Thanks. Steven H. Ranney On Thu, Aug 8, 2013 at 12:17 PM, arun <smartpink111@yahoo.com> wrote:> HI, > > Not sure about your expected result. > > library(plyr) > data2New<-join_all(lapply(setdiff(names(data1), names(data2)),function(x) > {data2[,x]<-NA; data2})) > > data1New<-join_all(lapply(setdiff(names(data2), > names(data1)),function(x){data1[,x]<-NA;data1})) > data1New > # a b c d e z f g > #1 1 5 9 13 17 21 NA NA > #2 2 6 10 14 18 22 NA NA > #3 3 7 11 15 19 23 NA NA > #4 4 8 12 16 20 24 NA NA > A.K. > > > > ----- Original Message ----- > From: Steven Ranney <steven.ranney@gmail.com> > To: "r-help@r-project.org" <r-help@r-project.org> > Cc: > Sent: Thursday, August 8, 2013 2:01 PM > Subject: [R] Creating new vectors from other dataFrames > > I have two data frames > > data1 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,13:24), nrow=4, ncol=6, > byrow=F, dimnames=list(c(1:4),c("a","b","c","d","e","z")))) > data2 <- as.data.frame(matrix(data=c(1:4,5:8,9:12,37:48), nrow=4, ncol=6, > byrow=F, dimnames=list(c(1:4),c("a","b","c","f","g","z")))) > > that have some common column names. > > Comparing the names of the columns within each data frame to the other > > setdiff(names(data1), names(data2)) > setdiff(names(data2), names(data1)) > > provides which columns are different. > > For each column that appears in data1 that DOES NOT appear in data2, I need > to create those columns and fill them with NA values. The same is true for > the reverse. So, I can create a vector of new column names that need to be > filled with NA values, but here is where I'm stuck. I don't know how to > get the names from inside the vector into the respective dataFrame. > > tmp1 <- as.factor(paste("data2$", setdiff(names(data1), names(data2)), > sep="")) > tmp2 <- as.factor(paste("data1$", setdiff(names(data2), names(data1)), > sep="")) > > Of course, if it were as simple as only a few columns, I could do all of > this by hand, but in my original data frames, I have 60 different columns > that need to be created and filled with NA values for both data1 and data2. > > Eventually, the point of this exercise is so that I can rbind(data1, data2) > and create a SQL table out of the merged dataFrames. Unfortunately, I > can't rbind() everything until the column names are common across both > data1 and data2. > > Thoughts? > > Thanks - > > SR > > > > Steven H. Ranney > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]