I have a list of data frames which I would like to combine into one data frame doing something like rbind. I wish to combine in column order and not by names. However, there are issues. The number of columns is not the same for each data frame. This is an intermediate step to a problem and the number of columns could be 2,4,6,8,or10. There might be a few thousand data frames. Another problem is that the names of the columns produced by the first step are garbage. Below is a method that I obtained by asking a question on stack overflow. Unfortunately, my example was not general enough. The code below works for the simple case where the names of the people are consistent. It does not work when the names are realistically not the same. https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432 Please note that the lapply step sets things up except for the column name issue. If I could figure out a way to change the column names, then the bind_rows step will, I believe, work. So I really have two questions. How to change all column names of all the data frames and then how to solve the original problem. # The non general case works fine. It produces one data frame and I can then change the column names to # c("first1", "last1","first2", "last2","first3", "last3",) #Non general easy case employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"), data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")), data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones", "Smith", "Adams")), data.frame(first1 = ("Al"), second1 = "Jones")) employees4BList bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))) # This produces a nice list of data frames, except for the names lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))) # This list is a disaster. I am looking for a solution that works in this case. employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"), data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")), data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", "Smith", "Adams")), data.frame(first4 = ("Al"), second4 = "Jones2")) ?bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x))))) Thanks. Ira [[alternative HTML version deleted]]
Hi, It isn't super clear to me what you're after. Is this what you intend?> dfbycol(employees4BList)first1 last1 first2 last2 first3 last3 1 Al Jones <NA> <NA> <NA> <NA> 2 Al Jones Barb Smith <NA> <NA> 3 Al Jones Barb Smith Carol Adams 4 Al Jones <NA> <NA> <NA> <NA>> > dfbycol(employees4List)first1 last1 first2 last2 first3 last3 1 Al Jones <NA> <NA> <NA> <NA> 2 Al2 Jones Barb Smith <NA> <NA> 3 Al3 Jones Barbara Smith Carol Adams 4 Al Jones2 <NA> <NA> <NA> <NA> If so: employees4BList = list( data.frame(first1 = "Al", second1 = "Jones"), data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")), data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones", "Smith", "Adams")), data.frame(first1 = ("Al"), second1 = "Jones")) employees4List = list( data.frame(first1 = ("Al"), second1 = "Jones"), data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")), data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", "Smith", "Adams")), data.frame(first4 = ("Al"), second4 = "Jones2")) ### dfbycol <- function(x) { x <- lapply(x, function(y)as.vector(t(as.matrix(y)))) x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y}) x <- do.call(rbind, x) x <- data.frame(x, stringsAsFactors=FALSE) colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2)) x } ### dfbycol(employees4BList) dfbycol(employees4List) On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help <r-help at r-project.org> wrote:> I have a list of data frames which I would like to combine into one data > frame doing something like rbind. I wish to combine in column order and > not by names. However, there are issues. > > The number of columns is not the same for each data frame. This is an > intermediate step to a problem and the number of columns could be > 2,4,6,8,or10. There might be a few thousand data frames. Another problem > is that the names of the columns produced by the first step are garbage. > > Below is a method that I obtained by asking a question on stack > overflow. Unfortunately, my example was not general enough. The code > below works for the simple case where the names of the people are > consistent. It does not work when the names are realistically not the same. > > https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432 > > > Please note that the lapply step sets things up except for the column > name issue. If I could figure out a way to change the column names, then > the bind_rows step will, I believe, work. > > So I really have two questions. How to change all column names of all > the data frames and then how to solve the original problem. > > # The non general case works fine. It produces one data frame and I can > then change the column names to > > # c("first1", "last1","first2", "last2","first3", "last3",) > > #Non general easy case > > employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"), > > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")), > > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones", > "Smith", "Adams")), > > data.frame(first1 = ("Al"), second1 = "Jones")) > > employees4BList > > bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))) > > # This produces a nice list of data frames, except for the names > > lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))) > > # This list is a disaster. I am looking for a solution that works in > this case. > > employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"), > > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")), > > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", > "Smith", "Adams")), > > data.frame(first4 = ("Al"), second4 = "Jones2")) > > bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x))))) > > Thanks. > > Ira >-- Sarah Goslee http://www.functionaldiversity.org
> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote: > > Hi, > > It isn't super clear to me what you're after.Agree. Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`: library(dplyr) newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) ) bind_rows(newList) #--------- first1 second1 1 Al Jones 2 Al2 Jones 3 Barb Smith 4 Al3 Jones 5 Barbara Smith 6 Carol Adams 7 Al Jones2 Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels. -- David.> Is this what you intend? > >> dfbycol(employees4BList) > first1 last1 first2 last2 first3 last3 > 1 Al Jones <NA> <NA> <NA> <NA> > 2 Al Jones Barb Smith <NA> <NA> > 3 Al Jones Barb Smith Carol Adams > 4 Al Jones <NA> <NA> <NA> <NA> >> >> dfbycol(employees4List) > first1 last1 first2 last2 first3 last3 > 1 Al Jones <NA> <NA> <NA> <NA> > 2 Al2 Jones Barb Smith <NA> <NA> > 3 Al3 Jones Barbara Smith Carol Adams > 4 Al Jones2 <NA> <NA> <NA> <NA> > > > If so: > > employees4BList = list( > data.frame(first1 = "Al", second1 = "Jones"), > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")), > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones", > "Smith", "Adams")), > data.frame(first1 = ("Al"), second1 = "Jones")) > > employees4List = list( > data.frame(first1 = ("Al"), second1 = "Jones"), > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")), > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", > "Smith", "Adams")), > data.frame(first4 = ("Al"), second4 = "Jones2")) > > ### > > dfbycol <- function(x) { > x <- lapply(x, function(y)as.vector(t(as.matrix(y)))) > x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y}) > x <- do.call(rbind, x) > x <- data.frame(x, stringsAsFactors=FALSE) > colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2)) > x > } > > ### > > dfbycol(employees4BList) > > dfbycol(employees4List) > > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help > <r-help at r-project.org> wrote: >> I have a list of data frames which I would like to combine into one data >> frame doing something like rbind. I wish to combine in column order and >> not by names. However, there are issues. >> >> The number of columns is not the same for each data frame. This is an >> intermediate step to a problem and the number of columns could be >> 2,4,6,8,or10. There might be a few thousand data frames. Another problem >> is that the names of the columns produced by the first step are garbage. >> >> Below is a method that I obtained by asking a question on stack >> overflow. Unfortunately, my example was not general enough. The code >> below works for the simple case where the names of the people are >> consistent. It does not work when the names are realistically not the same. >> >> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432 >> >> >> Please note that the lapply step sets things up except for the column >> name issue. If I could figure out a way to change the column names, then >> the bind_rows step will, I believe, work. >> >> So I really have two questions. How to change all column names of all >> the data frames and then how to solve the original problem. >> >> # The non general case works fine. It produces one data frame and I can >> then change the column names to >> >> # c("first1", "last1","first2", "last2","first3", "last3",) >> >> #Non general easy case >> >> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"), >> >> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")), >> >> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones", >> "Smith", "Adams")), >> >> data.frame(first1 = ("Al"), second1 = "Jones")) >> >> employees4BList >> >> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))) >> >> # This produces a nice list of data frames, except for the names >> >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))) >> >> # This list is a disaster. I am looking for a solution that works in >> this case. >> >> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"), >> >> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")), >> >> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", >> "Smith", "Adams")), >> >> data.frame(first4 = ("Al"), second4 = "Jones2")) >> >> bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x))))) >> >> Thanks. >> >> Ira >> > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law