thr3ads.net - R help - [R] Convert list of data frames to one data frame [Jun 2018]

If this information is useful, please help other people find it:
Share via:

David Winsemius

2018-Jun-29 19:49 UTC

[R] Convert list of data frames to one data frame

> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> 
> Hi,
> 
> It isn't super clear to me what you're after.
Agree.

Had a different read of ht erequest. Thought the request was for a first step
that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:

library(dplyr)
 newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
 bind_rows(newList)

#---------

   first1 second1
1      Al   Jones
2     Al2   Jones
3    Barb   Smith
4     Al3   Jones
5 Barbara   Smith
6   Carol   Adams
7      Al  Jones2

Might want to wrap suppressWarnings around the right side of that assignment
since there were many warnings regarding incongruent factor levels.

-- 
David.> Is this what you intend?
> 
>> dfbycol(employees4BList)
>  first1 last1 first2 last2 first3 last3
> 1     Al Jones   <NA>  <NA>   <NA>  <NA>
> 2     Al Jones   Barb Smith   <NA>  <NA>
> 3     Al Jones   Barb Smith  Carol Adams
> 4     Al Jones   <NA>  <NA>   <NA>  <NA>
>> 
>> dfbycol(employees4List)
>  first1  last1  first2 last2 first3 last3
> 1     Al  Jones    <NA>  <NA>   <NA>  <NA>
> 2    Al2  Jones    Barb Smith   <NA>  <NA>
> 3    Al3  Jones Barbara Smith  Carol Adams
> 4     Al Jones2    <NA>  <NA>   <NA>  <NA>
> 
> 
> If so:
> 
> employees4BList = list(
> data.frame(first1 = "Al", second1 = "Jones"),
> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
> data.frame(first1 = c("Al", "Barb", "Carol"),
second1 = c("Jones",
> "Smith", "Adams")),
> data.frame(first1 = ("Al"), second1 = "Jones"))
> 
> employees4List = list(
> data.frame(first1 = ("Al"), second1 = "Jones"),
> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
> "Smith", "Adams")),
> data.frame(first4 = ("Al"), second4 = "Jones2"))
> 
> ###
> 
> dfbycol <- function(x) {
>  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>  x <- do.call(rbind, x)
>  x <- data.frame(x, stringsAsFactors=FALSE)
>  colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>  x
> }
> 
> ###
> 
> dfbycol(employees4BList)
> 
> dfbycol(employees4List)
> 
> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> <r-help at r-project.org> wrote:
>> I have a list of data frames which I would like to combine into one
data
>> frame doing something like rbind. I wish to combine in column order and
>> not by names. However, there are issues.
>> 
>> The number of columns is not the same for each data frame. This is an
>> intermediate step to a problem and the number of columns could be
>> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
>> is that the names of the columns produced by the first step are
garbage.
>> 
>> Below is a method that I obtained by asking a question on stack
>> overflow. Unfortunately, my example was not general enough. The code
>> below works for the simple case where the names of the people are
>> consistent. It does not work when the names are realistically not the
same.
>> 
>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>> 
>> 
>> Please note that the lapply step sets things up except for the column
>> name issue. If I could figure out a way to change the column names,
then
>> the bind_rows step will, I believe, work.
>> 
>> So I really have two questions. How to change all column names of all
>> the data frames and then how to solve the original problem.
>> 
>> # The non general case works fine. It produces one data frame and I can
>> then change the column names to
>> 
>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>> 
>> #Non general easy case
>> 
>> employees4BList = list(data.frame(first1 = "Al", second1 =
"Jones"),
>> 
>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>> 
>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>> 
>> employees4BList
>> 
>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>> 
>> # This produces a nice list of data frames, except for the names
>> 
>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>> 
>> # This list is a disaster. I am looking for a solution that works in
>> this case.
>> 
>> employees4List = list(data.frame(first1 = ("Al"), second1 =
"Jones"),
>> 
>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>> 
>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>> 
>>  bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>> 
>> Thanks.
>> 
>> Ira
>> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' 
-Gehm's Corollary to Clarke's Third Law

Ira Sharenow

2018-Jun-30 00:29 UTC

head link

[R] Convert list of data frames to one data frame

Sarah and David,

Thank you for your responses.I will try and be clearer.

Base R solution: Sarah?smethod worked perfectly

Is there a dplyrsolution?

START: list of dataframes

FINISH: one data frame

DETAILS: The initiallist of data frames might have hundreds or a few thousand
data frames. Everydata frame will have two columns. The first column will
represent first names.The second column will represent last names. The column
names are notconsistent. Data frames will most likely have from one to five
rows.

SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then
somehow do an rbindeven though the number of columns differ from data frame to
data frame.

EXAMPLE: List with twodata frames

# DF1

First?? ???????Last

George Washington

?

# DF2

Start????????????? End

John?????????????? Adams

Thomas??????? Jefferson

?

# End Result. One dataframe

First1????? Second1??????? First2?????????? Second2

George Washington?????? NA??????????????????? NA

John?????????????? Adams??? Thomas??????? Jefferson

?

DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my
example was not generalenough and so the suggested solutions worked on the easy
case which I provided butnot when the names were different.

The suggested solution was:

library(dplyr)

bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))

?

On this site I pointedout that the inner function: lapply(employees4List,
function(x) rbind.data.frame(c(t(x))))

For each data frame correctlyspread the multiple rows into ?1 by 2ndata frames.
However, the column names were derived from the values and were amess. This
caused a problem with bind_rows.

I felt that if I knewhow to change all the names of all of the data frames that
were created afterlapply, then I could then use bind_rows. So if someone knows
how to change allof the names at this intermediate stage, I hope that person
will provide thesolution.

In? the end a 1 by 2 data frame would have namesFirst1????? Second1. A 1 by 4
data framewould have names First1????? Second1??????? First2?????????? Second2.

Ira


    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <dwinsemius at
comcast.net> wrote:
 
 > On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> 
> Hi,
> 
> It isn't super clear to me what you're after.
Agree.

Had a different read of ht erequest. Thought the request was for a first step
that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:

library(dplyr)
 newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
 bind_rows(newList)

#---------

? first1 second1
1? ? ? Al? Jones
2? ? Al2? Jones
3? ? Barb? Smith
4? ? Al3? Jones
5 Barbara? Smith
6? Carol? Adams
7? ? ? Al? Jones2

Might want to wrap suppressWarnings around the right side of that assignment
since there were many warnings regarding incongruent factor levels.

-- 
David.> Is this what you intend?
> 
>> dfbycol(employees4BList)
>? first1 last1 first2 last2 first3 last3
> 1? ? Al Jones? <NA>? <NA>? <NA>? <NA>
> 2? ? Al Jones? Barb Smith? <NA>? <NA>
> 3? ? Al Jones? Barb Smith? Carol Adams
> 4? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>> 
>> dfbycol(employees4List)
>? first1? last1? first2 last2 first3 last3
> 1? ? Al? Jones? ? <NA>? <NA>? <NA>? <NA>
> 2? ? Al2? Jones? ? Barb Smith? <NA>? <NA>
> 3? ? Al3? Jones Barbara Smith? Carol Adams
> 4? ? Al Jones2? ? <NA>? <NA>? <NA>? <NA>
> 
> 
> If so:
> 
> employees4BList = list(
> data.frame(first1 = "Al", second1 = "Jones"),
> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
> data.frame(first1 = c("Al", "Barb", "Carol"),
second1 = c("Jones",
> "Smith", "Adams")),
> data.frame(first1 = ("Al"), second1 = "Jones"))
> 
> employees4List = list(
> data.frame(first1 = ("Al"), second1 = "Jones"),
> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
> "Smith", "Adams")),
> data.frame(first4 = ("Al"), second4 = "Jones2"))
> 
> ###
> 
> dfbycol <- function(x) {
>? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>? x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>? x <- do.call(rbind, x)
>? x <- data.frame(x, stringsAsFactors=FALSE)
>? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>? x
> }
> 
> ###
> 
> dfbycol(employees4BList)
> 
> dfbycol(employees4List)
> 
> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> <r-help at r-project.org> wrote:
>> I have a list of data frames which I would like to combine into one
data
>> frame doing something like rbind. I wish to combine in column order and
>> not by names. However, there are issues.
>> 
>> The number of columns is not the same for each data frame. This is an
>> intermediate step to a problem and the number of columns could be
>> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
>> is that the names of the columns produced by the first step are
garbage.
>> 
>> Below is a method that I obtained by asking a question on stack
>> overflow. Unfortunately, my example was not general enough. The code
>> below works for the simple case where the names of the people are
>> consistent. It does not work when the names are realistically not the
same.
>> 
>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>> 
>> 
>> Please note that the lapply step sets things up except for the column
>> name issue. If I could figure out a way to change the column names,
then
>> the bind_rows step will, I believe, work.
>> 
>> So I really have two questions. How to change all column names of all
>> the data frames and then how to solve the original problem.
>> 
>> # The non general case works fine. It produces one data frame and I can
>> then change the column names to
>> 
>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>> 
>> #Non general easy case
>> 
>> employees4BList = list(data.frame(first1 = "Al", second1 =
"Jones"),
>> 
>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>> 
>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>> 
>> employees4BList
>> 
>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>> 
>> # This produces a nice list of data frames, except for the names
>> 
>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>> 
>> # This list is a disaster. I am looking for a solution that works in
>> this case.
>> 
>> employees4List = list(data.frame(first1 = ("Al"), second1 =
"Jones"),
>> 
>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>> 
>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>> 
>>? bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>> 
>> Thanks.
>> 
>> Ira
>> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'?
-Gehm's Corollary to Clarke's Third Law




  
	[[alternative HTML version deleted]]

Bert Gunter

2018-Jun-30 01:33 UTC

head link

[R] Convert list of data frames to one data frame

Well, I don't know your constraints, of course; but if I understand
correctly, in situations like this, it is usually worthwhile to reconsider
your data structure.

This is a one-liner if you simply rbind all your data frames into one with
2 columns. Here's an example to indicate how:

## list of two data frames with different column names and numbers of rows:
zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a 5:9,b =
letters[11:15]))

## create common column names and bind them up:
do.call(rbind,lapply(zz,function(x){   names(x) <-
c("first","last"); x}))

Note that the row names of the result tell you which original frame the
rows came from. This can also be obtained just from a count of rows (?nrow)
of the original list.

Apologies if I misunderstand or your query or your constraints make this
simple approach impossible.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jun 29, 2018 at 5:29 PM, Ira Sharenow via R-help <
r-help at r-project.org> wrote:
>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few
> thousand data frames. Everydata frame will have two columns. The first
> column will represent first names.The second column will represent last
> names. The column names are notconsistent. Data frames will most likely
> have from one to five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames.
> Then somehow do an rbindeven though the number of columns differ from data
> frame to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First          Last
>
> George Washington
>
>
>
> # DF2
>
> Start              End
>
> John               Adams
>
> Thomas        Jefferson
>
>
>
> # End Result. One dataframe
>
> First1      Second1        First2           Second2
>
> George Washington       NA                    NA
>
> John               Adams    Thomas        Jefferson
>
>
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
> Unfortunately, my example was not generalenough and so the suggested
> solutions worked on the easy case which I provided butnot when the names
> were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
>
>
> On this site I pointedout that the inner function: lapply(employees4List,
> function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into  1 by 2ndata
> frames. However, the column names were derived from the values and were
> amess. This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames
> that were created afterlapply, then I could then use bind_rows. So if
> someone knows how to change allof the names at this intermediate stage, I
> hope that person will provide thesolution.
>
> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1
> by 4 data framewould have names First1      Second1        First2
> Second2.
>
> Ira
>
>
>     On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <
> dwinsemius at comcast.net> wrote:
>
>
> > On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at
gmail.com>
> wrote:
> >
> > Hi,
> >
> > It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first
> step that "harmonized" the names of the columns and then used
> `dplyr::bind_rows`:
>
> library(dplyr)
>  newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]])
> )
>  bind_rows(newList)
>
> #---------
>
>   first1 second1
> 1      Al  Jones
> 2    Al2  Jones
> 3    Barb  Smith
> 4    Al3  Jones
> 5 Barbara  Smith
> 6  Carol  Adams
> 7      Al  Jones2
>
> Might want to wrap suppressWarnings around the right side of that
> assignment since there were many warnings regarding incongruent factor
> levels.
>
> --
> David.
> > Is this what you intend?
> >
> >> dfbycol(employees4BList)
> >  first1 last1 first2 last2 first3 last3
> > 1    Al Jones  <NA>  <NA>  <NA>  <NA>
> > 2    Al Jones  Barb Smith  <NA>  <NA>
> > 3    Al Jones  Barb Smith  Carol Adams
> > 4    Al Jones  <NA>  <NA>  <NA>  <NA>
> >>
> >> dfbycol(employees4List)
> >  first1  last1  first2 last2 first3 last3
> > 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
> > 2    Al2  Jones    Barb Smith  <NA>  <NA>
> > 3    Al3  Jones Barbara Smith  Carol Adams
> > 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
> >
> >
> > If so:
> >
> > employees4BList = list(
> > data.frame(first1 = "Al", second1 = "Jones"),
> > data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
> > data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
> > "Smith", "Adams")),
> > data.frame(first1 = ("Al"), second1 = "Jones"))
> >
> > employees4List = list(
> > data.frame(first1 = ("Al"), second1 = "Jones"),
> > data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
> > data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
> > "Smith", "Adams")),
> > data.frame(first4 = ("Al"), second4 = "Jones2"))
> >
> > ###
> >
> > dfbycol <- function(x) {
> >  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
> >  x <- lapply(x, function(y){length(y) <- max(sapply(x, length));
y})
> >  x <- do.call(rbind, x)
> >  x <- data.frame(x, stringsAsFactors=FALSE)
> >  colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2),
> each=2))
> >  x
> > }
> >
> > ###
> >
> > dfbycol(employees4BList)
> >
> > dfbycol(employees4List)
> >
> > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> > <r-help at r-project.org> wrote:
> >> I have a list of data frames which I would like to combine into
one data
> >> frame doing something like rbind. I wish to combine in column
order and
> >> not by names. However, there are issues.
> >>
> >> The number of columns is not the same for each data frame. This is
an
> >> intermediate step to a problem and the number of columns could be
> >> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
> >> is that the names of the columns produced by the first step are
garbage.
> >>
> >> Below is a method that I obtained by asking a question on stack
> >> overflow. Unfortunately, my example was not general enough. The
code
> >> below works for the simple case where the names of the people are
> >> consistent. It does not work when the names are realistically not
the
> same.
> >>
> >> https://stackoverflow.com/questions/50807970/converting-
> a-list-of-data-frames-not-a-simple-rbind-second-row-to-
> new-columns/50809432#50809432
> >>
> >>
> >> Please note that the lapply step sets things up except for the
column
> >> name issue. If I could figure out a way to change the column
names, then
> >> the bind_rows step will, I believe, work.
> >>
> >> So I really have two questions. How to change all column names of
all
> >> the data frames and then how to solve the original problem.
> >>
> >> # The non general case works fine. It produces one data frame and
I can
> >> then change the column names to
> >>
> >> # c("first1", "last1","first2",
"last2","first3", "last3",)
> >>
> >> #Non general easy case
> >>
> >> employees4BList = list(data.frame(first1 = "Al", second1
= "Jones"),
> >>
> >> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
> >>
> >> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first1 = ("Al"), second1 =
"Jones"))
> >>
> >> employees4BList
> >>
> >> bind_rows(lapply(employees4BList, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> # This produces a nice list of data frames, except for the names
> >>
> >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
> >>
> >> # This list is a disaster. I am looking for a solution that works
in
> >> this case.
> >>
> >> employees4List = list(data.frame(first1 = ("Al"),
second1 = "Jones"),
> >>
> >> data.frame(first2 = c("Al2", "Barb"), second2
= c("Jones", "Smith")),
> >>
> >> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first4 = ("Al"), second4 =
"Jones2"))
> >>
> >>  bind_rows(lapply(employees4List, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> Thanks.
> >>
> >> Ira
> >>
> >
> > --
> > Sarah Goslee
> > http://www.functionaldiversity.org
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
advanced.'
> -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2018-Jun-30 02:50 UTC

head link

[R] Convert list of data frames to one data frame

Code below...

a) Just because something can be done with dplyr does not mean that is the 
best way to do it. A solution in the hand is worth two on the Internet, 
and dplyr is not always the fastest method anyway.

b) I highly recommend that you read Hadley Wickham's paper on tidy data 
[1]. Also, having a group of one or more columns at all times that 
uniquely identify where the data came from is a "key" to success [2].

c) Please read and follow one of the various online documents about making 
reproducible examples in R (e.g. [3]). HTML formatting is really a pain 
(at best... at worst, it corrupts your code) on a plain-text-only list 
(you have read the Posting Guide, right?). Consider my example below as a 
model for you to follow in the future, and make sure to set your email 
program to send plain text. (Obviously your examples don't have to achieve 
success... but they should bring us up to speed with where you are having 
troubles IN R.)

[1] https://www.jstatsoft.org/article/view/v059i10
[2] http://r4ds.had.co.nz/relational-data.html#keys
[3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

----
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)

# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text "First          Last
George          Washington
", header=TRUE, as.is = TRUE )

# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))

DF2 <- read.table( text "Start              End
John               Adams
Thomas        Jefferson
", header = TRUE, as.is = TRUE )

DFL <- list( DF1, DF2 )

# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
                   , data = DFL
                   )

DFL2 <- (   DFL1
         %>% mutate( data = lapply( data
                                  , function( DF ) {
                                      DF[[ ".PK" ]] <- seq.int(
nrow( DF ))
                                      gather( DF, ".Col",
"value", -.PK )
                                    }
                                  )
                   )
         %>% unnest
         %>% spread( .Col, value )
         )
DFL2
#> # A tibble: 3 x 6
#>   .DFNames   .PK End       First  Last       Start
#>   <chr>    <int> <chr>     <chr>  <chr>     
<chr>
#> 1 DF1          1 <NA>      George Washington <NA>
#> 2 DF2          1 Adams     <NA>   <NA>       John
#> 3 DF2          2 Jefferson <NA>   <NA>       Thomas

#' Created on 2018-06-29 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
----

On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few
thousand data frames. Everydata frame will have two columns. The first column
will represent first names.The second column will represent last names. The
column names are notconsistent. Data frames will most likely have from one to
five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames.
Then somehow do an rbindeven though the number of columns differ from data frame
to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First?? ???????Last
>
> George Washington
>
> ?
>
> # DF2
>
> Start????????????? End
>
> John?????????????? Adams
>
> Thomas??????? Jefferson
>
> ?
>
> # End Result. One dataframe
>
> First1????? Second1??????? First2?????????? Second2
>
> George Washington?????? NA??????????????????? NA
>
> John?????????????? Adams??? Thomas??????? Jefferson
>
> ?
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
Unfortunately, my example was not generalenough and so the suggested solutions
worked on the easy case which I provided butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
> ?
>
> On this site I pointedout that the inner function: lapply(employees4List,
function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into ?1 by 2ndata
frames. However, the column names were derived from the values and were amess.
This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames
that were created afterlapply, then I could then use bind_rows. So if someone
knows how to change allof the names at this intermediate stage, I hope that
person will provide thesolution.
>
> In? the end a 1 by 2 data frame would have namesFirst1????? Second1. A 1 by
4 data framewould have names First1????? Second1??????? First2??????????
Second2.
>
> Ira
>
>
>    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
<dwinsemius at comcast.net> wrote:
>
>
>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at
gmail.com> wrote:
>>
>> Hi,
>>
>> It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first
step that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
> ? first1 second1
> 1? ? ? Al? Jones
> 2? ? Al2? Jones
> 3? ? Barb? Smith
> 4? ? Al3? Jones
> 5 Barbara? Smith
> 6? Carol? Adams
> 7? ? ? Al? Jones2
>
> Might want to wrap suppressWarnings around the right side of that
assignment since there were many warnings regarding incongruent factor levels.
>
> -- 
> David.
>> Is this what you intend?
>>
>>> dfbycol(employees4BList)
>> ? first1 last1 first2 last2 first3 last3
>> 1? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al Jones? Barb Smith? <NA>? <NA>
>> 3? ? Al Jones? Barb Smith? Carol Adams
>> 4? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>>>
>>> dfbycol(employees4List)
>> ? first1? last1? first2 last2 first3 last3
>> 1? ? Al? Jones? ? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al2? Jones? ? Barb Smith? <NA>? <NA>
>> 3? ? Al3? Jones Barbara Smith? Carol Adams
>> 4? ? Al Jones2? ? <NA>? <NA>? <NA>? <NA>
>>
>>
>> If so:
>>
>> employees4BList = list(
>> data.frame(first1 = "Al", second1 = "Jones"),
>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4List = list(
>> data.frame(first1 = ("Al"), second1 = "Jones"),
>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>> ###
>>
>> dfbycol <- function(x) {
>> ? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>> ? x <- lapply(x, function(y){length(y) <- max(sapply(x, length));
y})
>> ? x <- do.call(rbind, x)
>> ? x <- data.frame(x, stringsAsFactors=FALSE)
>> ? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>> ? x
>> }
>>
>> ###
>>
>> dfbycol(employees4BList)
>>
>> dfbycol(employees4List)
>>
>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>> <r-help at r-project.org> wrote:
>>> I have a list of data frames which I would like to combine into one
data
>>> frame doing something like rbind. I wish to combine in column order
and
>>> not by names. However, there are issues.
>>>
>>> The number of columns is not the same for each data frame. This is
an
>>> intermediate step to a problem and the number of columns could be
>>> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
>>> is that the names of the columns produced by the first step are
garbage.
>>>
>>> Below is a method that I obtained by asking a question on stack
>>> overflow. Unfortunately, my example was not general enough. The
code
>>> below works for the simple case where the names of the people are
>>> consistent. It does not work when the names are realistically not
the same.
>>>
>>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>
>>>
>>> Please note that the lapply step sets things up except for the
column
>>> name issue. If I could figure out a way to change the column names,
then
>>> the bind_rows step will, I believe, work.
>>>
>>> So I really have two questions. How to change all column names of
all
>>> the data frames and then how to solve the original problem.
>>>
>>> # The non general case works fine. It produces one data frame and I
can
>>> then change the column names to
>>>
>>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>>>
>>> #Non general easy case
>>>
>>> employees4BList = list(data.frame(first1 = "Al", second1
= "Jones"),
>>>
>>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>>>
>>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4BList
>>>
>>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> # This produces a nice list of data frames, except for the names
>>>
>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>
>>> # This list is a disaster. I am looking for a solution that works
in
>>> this case.
>>>
>>> employees4List = list(data.frame(first1 = ("Al"), second1
= "Jones"),
>>>
>>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>>>
>>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first4 = ("Al"), second4 =
"Jones2"))
>>>
>>> ? bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> Thanks.
>>>
>>> Ira
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
advanced.'? -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------

R help - Jun 2018 - Convert list of data frames to one data frame

[R] Convert list of data frames to one data frame

[R] Convert list of data frames to one data frame

[R] Convert list of data frames to one data frame

[R] Convert list of data frames to one data frame