christiaan pauw
2009-Oct-07 13:32 UTC
[R] merging dataframes with an unequal number of variables
Hallo Everyone I have the kind of problem that one should never have because one must always plan well and communicate with your team. But now I haven't so here is my problem. I have data coming in on a daily basis from surveys in 10 towns. The questionnaire has 62 variables but some of the regions have used older versions of the questionnaire that have a few variables less. I want to combine everything a single dataframe on a daily basis. The problem is now that i cannot rbind() the data because the number of variables do not correspond. I have found that i can first subset all datasets to keep just the variables that they all have in common but that is very unsatisfactory. What I want to do is to use a complete list of variable names and look at each data frame and create variable names where they are missing and fill it with NAs. At least then I can merge the data and use the data that I have short example # Make a data frame with 4 variables var1=c(1,2,3,4,5,6) var2=c("a","b","a","b","a","a") var3=c(1,NA,NA,2,3,NA) var4=c(100,200,300,100,200,300) df1=data.frame(cbind(var1,var2,var3,var4)) # Data frame 2 and three has two of the 4 variables and 4 has eveything df2=df1[,c(1,2,4)] df3=df1[,c(2,3,4)] df4=df1 # I wanted to do this but it produces an error because the number of variable differ df=data.frame(rbind(df1,df2,df3,df4)) #I have figured out how to print the names of variable that do match the 'master' list (in this case df1): # example with df3 names(df3[,na.omit(match(names(df1),names(df3)))]) #What I need is the name of the variable that each specific data frame does NOT contain # Something like this, but this gives an error names(df1[-names(df3[,na.omit(match(names(df1),names(df3)))])]) thanks in advance Christiaan [[alternative HTML version deleted]]
Gabor Grothendieck
2009-Oct-07 13:41 UTC
[R] merging dataframes with an unequal number of variables
See ?rbind.fill in the plyr package. On Wed, Oct 7, 2009 at 9:32 AM, christiaan pauw <cjpauw at gmail.com> wrote:> Hallo Everyone > I have the kind of problem that one should never have because one must > always plan well and communicate with your team. But now I haven't so here > is my problem. > > I have data coming in on a daily basis from surveys in 10 towns. The > questionnaire has 62 variables but some of the regions have used older > versions of the questionnaire that have a few variables less. I want to > combine everything ?a single dataframe on a daily basis. The problem is now > that i cannot rbind() the data because the number of variables do not > correspond. I have found that i can first subset all datasets to keep just > the variables that they all have in common but that is very unsatisfactory. > What I want to do is to use a complete list of variable names and look at > each data frame and create variable names where they are missing and fill it > with NAs. At least then I can merge the data and use the data that I have > > short example > > # Make a data frame with 4 variables > > var1=c(1,2,3,4,5,6) > > var2=c("a","b","a","b","a","a") > > var3=c(1,NA,NA,2,3,NA) > > var4=c(100,200,300,100,200,300) > > df1=data.frame(cbind(var1,var2,var3,var4)) > > > # Data frame 2 and three has two of the 4 variables and 4 has eveything > > df2=df1[,c(1,2,4)] > > df3=df1[,c(2,3,4)] > > df4=df1 > > > # I wanted to do this but it produces an error because the number of > variable differ > > df=data.frame(rbind(df1,df2,df3,df4)) > > > #I have figured out how to print the names of variable that do match the > 'master' list (in this case df1): > # example with df3 > names(df3[,na.omit(match(names(df1),names(df3)))]) > > #What I need is the name of the variable that each specific data frame does > NOT contain > # Something like this, but this gives an error > > names(df1[-names(df3[,na.omit(match(names(df1),names(df3)))])]) > > thanks in advance > > Christiaan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >