Code below...
a) Just because something can be done with dplyr does not mean that is the
best way to do it. A solution in the hand is worth two on the Internet,
and dplyr is not always the fastest method anyway.
b) I highly recommend that you read Hadley Wickham's paper on tidy data
[1]. Also, having a group of one or more columns at all times that
uniquely identify where the data came from is a "key" to success [2].
c) Please read and follow one of the various online documents about making
reproducible examples in R (e.g. [3]). HTML formatting is really a pain
(at best... at worst, it corrupts your code) on a plain-text-only list
(you have read the Posting Guide, right?). Consider my example below as a
model for you to follow in the future, and make sure to set your email
program to send plain text. (Obviously your examples don't have to achieve
success... but they should bring us up to speed with where you are having
troubles IN R.)
[1] https://www.jstatsoft.org/article/view/v059i10
[2] http://r4ds.had.co.nz/relational-data.html#keys
[3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
----
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text "First Last
George Washington
", header=TRUE, as.is = TRUE )
# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))
DF2 <- read.table( text "Start End
John Adams
Thomas Jefferson
", header = TRUE, as.is = TRUE )
DFL <- list( DF1, DF2 )
# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
, data = DFL
)
DFL2 <- ( DFL1
%>% mutate( data = lapply( data
, function( DF ) {
DF[[ ".PK" ]] <- seq.int(
nrow( DF ))
gather( DF, ".Col",
"value", -.PK )
}
)
)
%>% unnest
%>% spread( .Col, value )
)
DFL2
#> # A tibble: 3 x 6
#> .DFNames .PK End First Last Start
#> <chr> <int> <chr> <chr> <chr>
<chr>
#> 1 DF1 1 <NA> George Washington <NA>
#> 2 DF2 1 Adams <NA> <NA> John
#> 3 DF2 2 Jefferson <NA> <NA> Thomas
#' Created on 2018-06-29 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
----
On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few
thousand data frames. Everydata frame will have two columns. The first column
will represent first names.The second column will represent last names. The
column names are notconsistent. Data frames will most likely have from one to
five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames.
Then somehow do an rbindeven though the number of columns differ from data frame
to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First?? ???????Last
>
> George Washington
>
> ?
>
> # DF2
>
> Start????????????? End
>
> John?????????????? Adams
>
> Thomas??????? Jefferson
>
> ?
>
> # End Result. One dataframe
>
> First1????? Second1??????? First2?????????? Second2
>
> George Washington?????? NA??????????????????? NA
>
> John?????????????? Adams??? Thomas??????? Jefferson
>
> ?
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
Unfortunately, my example was not generalenough and so the suggested solutions
worked on the easy case which I provided butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
> ?
>
> On this site I pointedout that the inner function: lapply(employees4List,
function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into ?1 by 2ndata
frames. However, the column names were derived from the values and were amess.
This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames
that were created afterlapply, then I could then use bind_rows. So if someone
knows how to change allof the names at this intermediate stage, I hope that
person will provide thesolution.
>
> In? the end a 1 by 2 data frame would have namesFirst1????? Second1. A 1 by
4 data framewould have names First1????? Second1??????? First2??????????
Second2.
>
> Ira
>
>
> On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
<dwinsemius at comcast.net> wrote:
>
>
>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at
gmail.com> wrote:
>>
>> Hi,
>>
>> It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first
step that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
> ? first1 second1
> 1? ? ? Al? Jones
> 2? ? Al2? Jones
> 3? ? Barb? Smith
> 4? ? Al3? Jones
> 5 Barbara? Smith
> 6? Carol? Adams
> 7? ? ? Al? Jones2
>
> Might want to wrap suppressWarnings around the right side of that
assignment since there were many warnings regarding incongruent factor levels.
>
> --
> David.
>> Is this what you intend?
>>
>>> dfbycol(employees4BList)
>> ? first1 last1 first2 last2 first3 last3
>> 1? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al Jones? Barb Smith? <NA>? <NA>
>> 3? ? Al Jones? Barb Smith? Carol Adams
>> 4? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>>>
>>> dfbycol(employees4List)
>> ? first1? last1? first2 last2 first3 last3
>> 1? ? Al? Jones? ? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al2? Jones? ? Barb Smith? <NA>? <NA>
>> 3? ? Al3? Jones Barbara Smith? Carol Adams
>> 4? ? Al Jones2? ? <NA>? <NA>? <NA>? <NA>
>>
>>
>> If so:
>>
>> employees4BList = list(
>> data.frame(first1 = "Al", second1 = "Jones"),
>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4List = list(
>> data.frame(first1 = ("Al"), second1 = "Jones"),
>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>> ###
>>
>> dfbycol <- function(x) {
>> ? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>> ? x <- lapply(x, function(y){length(y) <- max(sapply(x, length));
y})
>> ? x <- do.call(rbind, x)
>> ? x <- data.frame(x, stringsAsFactors=FALSE)
>> ? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>> ? x
>> }
>>
>> ###
>>
>> dfbycol(employees4BList)
>>
>> dfbycol(employees4List)
>>
>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>> <r-help at r-project.org> wrote:
>>> I have a list of data frames which I would like to combine into one
data
>>> frame doing something like rbind. I wish to combine in column order
and
>>> not by names. However, there are issues.
>>>
>>> The number of columns is not the same for each data frame. This is
an
>>> intermediate step to a problem and the number of columns could be
>>> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
>>> is that the names of the columns produced by the first step are
garbage.
>>>
>>> Below is a method that I obtained by asking a question on stack
>>> overflow. Unfortunately, my example was not general enough. The
code
>>> below works for the simple case where the names of the people are
>>> consistent. It does not work when the names are realistically not
the same.
>>>
>>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>
>>>
>>> Please note that the lapply step sets things up except for the
column
>>> name issue. If I could figure out a way to change the column names,
then
>>> the bind_rows step will, I believe, work.
>>>
>>> So I really have two questions. How to change all column names of
all
>>> the data frames and then how to solve the original problem.
>>>
>>> # The non general case works fine. It produces one data frame and I
can
>>> then change the column names to
>>>
>>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>>>
>>> #Non general easy case
>>>
>>> employees4BList = list(data.frame(first1 = "Al", second1
= "Jones"),
>>>
>>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>>>
>>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4BList
>>>
>>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> # This produces a nice list of data frames, except for the names
>>>
>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>
>>> # This list is a disaster. I am looking for a solution that works
in
>>> this case.
>>>
>>> employees4List = list(data.frame(first1 = ("Al"), second1
= "Jones"),
>>>
>>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>>>
>>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first4 = ("Al"), second4 =
"Jones2"))
>>>
>>> ? bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> Thanks.
>>>
>>> Ira
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
advanced.'? -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
I would like to thank everyone who helped me out. I have obtained some offline
help, so I would like to summarize all the information I have received.
Before I summarize the thread, there is one loose end.
Initially I thought
library(dplyr)
dplyr::bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
would work, but there were problems.
lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
spreads out the data frames converting the data frames from long to wide, but it
messes up the names. So one question I still have, is how can I programmatically
change all of the names?
After this initial step, the first data frame's names might be derived from
c("George", "Washington")
and the second data frame's names might be derived from
c("John", "Adams", "Thomas",
"Jefferson")
What I want to change to the names to:
c("First1", "Second1")
and
c("First1", "Second1", "First2",
"Second2")
I believe that will enable me to then go back and use bind_rows and complete
that method of solution:
Step 1: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
Step 2: Clean up the names
Step 3: bind_rows
Immediately below is hopefully a clear and precise statement of the problem and
the proposed solution path. Then there are the various solutions.
# Starting list of data frames
employees4List = list(data.frame(first1 = "Al", second1 =
"Jones"),
???????????????????? data.frame(first2 = c("Al2", "Barb"),
second2 = c("Jones", "Smith")),
???????????????????? data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones", "Smith",
"Adams")),
???????????????????? data.frame(first4 = ("Al"), second4 =
"Jones2"))
employees4List
# Intermediate step that messes up the names but successfully converts from long
to wide
lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
# The intermediate list should likely look like this listFinal
df1 = data.frame(First1 = "Al", Second1 = "Jones", First2 =
NA, Second2 = NA, First3 = NA, Second3 = NA,
???????????????? First4 = NA, Second4 = NA)
df2 = data.frame(First1 = "Al2", Second1 = "Jones", First2 =
"Barb", Second2 = "Smith",
???????????????? First3 = NA, Second3 = NA, First4 = NA, Second4 = NA)
df3 = data.frame(First1 = "Al3", Second1 = "Jones", First2 =
"Barbara", Second2 = "Smith",
???????????????? First3 = "Carol", Second3 = "Adams", First4
= NA, Second4 = NA)
df4 = data.frame(First1 = "Al", Second1 = "Jones2", First2 =
NA, Second2 = NA, First3 = NA, Second3 = NA,
???????????????? First4 = NA, Second4 = NA)
listFinal = list(df1, df2, df3, df4)
listFinal
# Requested data frame (except that the columns are not just character but some
are factor or even logical)
dplyr::bind_rows(listFinal)
Sarah Goslee solved the problem using base R.
Given
employees4List = list(
? data.frame(first1 = ("Al"), second1 = "Jones"),
? data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones2", "Smith")),
? data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones3",
???????????????????????????????????????????????????????????????
"Smith", "Adams")),
? data.frame(first4 = ("Al"), second4 = "Jones2"))
This function produces the solution in the requested structure.
dfbycol <- function(x) {
? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
? x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
? x <- do.call(rbind, x)
? x <- data.frame(x, stringsAsFactors=FALSE)
? colnames(x) <- paste0(c("first", "last"), rep(seq(1,
ncol(x)/2), each=2))
? x
}
dfbycol(employees4List)
Offline, Jeff Newmiller and Bert Gunter provided alternative approaches to the
problem as well as other advice. Their solutions meet the "tidy"
criterion.
Bert suggested this online.
## list of two data frames with different column names and numbers of rows:
zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a = 5:9,b
= letters[11:15]))
## create common column names and bind them up:
do.call(rbind,lapply(zz,function(x){?? names(x) <-
c("first","last"); x}))
This and the next suggestion by Jeff produced useful solutions but not in the
requested form.
library(dplyr)
# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text "First????????? Last
George????????? Washington
", header=TRUE, as.is = TRUE )
# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))
DF2 <- read.table( text "Start????????????? End
John????????????? Adams
Thomas??????? Jefferson
", header = TRUE, as.is = TRUE )
DFL <- list( DF1, DF2 )
# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
????????????????? , data = DFL
????????????????? )
DFL2 <- (? DFL1
??????? %>% mutate( data = lapply( data
????????????????????????????????? , function( DF ) {
????????????????????????????????????? DF[[ ".PK" ]] <- seq.int(
nrow( DF ))
????????????????????????????????????? gather( DF, ".Col",
"value", -.PK )
??????????????????????????????????? }
????????????????????????????????? )
????????????????? )
??????? %>% unnest
??????? %>% spread( .Col, value )
??????? )
DFL2
During the discussion, useful links were recommended
[1] https://www.jstatsoft.org/article/view/v059i10?? Hadley on tidy data
[2] http://r4ds.had.co.nz/relational-data.html#keys? Hadley on relational data
[3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?
How to make a great reproducible example
http://adv-r.had.co.nz/Functionals.html???? Improving lapply and related skills
Thanks again to everyone!
Ira
On Friday, June 29, 2018, 7:47:13 PM PDT, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us> wrote:
Code below...
a) Just because something can be done with dplyr does not mean that is the
best way to do it. A solution in the hand is worth two on the Internet,
and dplyr is not always the fastest method anyway.
b) I highly recommend that you read Hadley Wickham's paper on tidy data
[1]. Also, having a group of one or more columns at all times that
uniquely identify where the data came from is a "key" to success [2].
c) Please read and follow one of the various online documents about making
reproducible examples in R (e.g. [3]). HTML formatting is really a pain
(at best... at worst, it corrupts your code) on a plain-text-only list
(you have read the Posting Guide, right?). Consider my example below as a
model for you to follow in the future, and make sure to set your email
program to send plain text. (Obviously your examples don't have to achieve
success... but they should bring us up to speed with where you are having
troubles IN R.)
[1] https://www.jstatsoft.org/article/view/v059i10
[2] http://r4ds.had.co.nz/relational-data.html#keys
[3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
----
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>? ? filter, lag
#> The following objects are masked from 'package:base':
#>
#>? ? intersect, setdiff, setequal, union
library(tidyr)
# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text "First? ? ? ? ? Last
George? ? ? ? ? Washington
", header=TRUE, as.is = TRUE )
# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))
DF2 <- read.table( text "Start? ? ? ? ? ? ? End
John? ? ? ? ? ? ? Adams
Thomas? ? ? ? Jefferson
", header = TRUE, as.is = TRUE )
DFL <- list( DF1, DF2 )
# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
? ? ? ? ? ? ? ? ? , data = DFL
? ? ? ? ? ? ? ? ? )
DFL2 <- (? DFL1
? ? ? ? %>% mutate( data = lapply( data
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , function( DF ) {
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? DF[[ ".PK" ]] <- seq.int(
nrow( DF ))
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gather( DF, ".Col",
"value", -.PK )
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? )
? ? ? ? ? ? ? ? ? )
? ? ? ? %>% unnest
? ? ? ? %>% spread( .Col, value )
? ? ? ? )
DFL2
#> # A tibble: 3 x 6
#>? .DFNames? .PK End? ? ? First? Last? ? ? Start
#>? <chr>? ? <int> <chr>? ? <chr>? <chr>? ? ?
<chr>
#> 1 DF1? ? ? ? ? 1 <NA>? ? ? George Washington <NA>
#> 2 DF2? ? ? ? ? 1 Adams? ? <NA>? <NA>? ? ? John
#> 3 DF2? ? ? ? ? 2 Jefferson <NA>? <NA>? ? ? Thomas
#' Created on 2018-06-29 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
----
On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few
thousand data frames. Everydata frame will have two columns. The first column
will represent first names.The second column will represent last names. The
column names are notconsistent. Data frames will most likely have from one to
five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames.
Then somehow do an rbindeven though the number of columns differ from data frame
to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First?? ???????Last
>
> George Washington
>
> ?
>
> # DF2
>
> Start????????????? End
>
> John?????????????? Adams
>
> Thomas??????? Jefferson
>
> ?
>
> # End Result. One dataframe
>
> First1????? Second1??????? First2?????????? Second2
>
> George Washington?????? NA??????????????????? NA
>
> John?????????????? Adams??? Thomas??????? Jefferson
>
> ?
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
Unfortunately, my example was not generalenough and so the suggested solutions
worked on the easy case which I provided butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
> ?
>
> On this site I pointedout that the inner function: lapply(employees4List,
function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into ?1 by 2ndata
frames. However, the column names were derived from the values and were amess.
This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames
that were created afterlapply, then I could then use bind_rows. So if someone
knows how to change allof the names at this intermediate stage, I hope that
person will provide thesolution.
>
> In? the end a 1 by 2 data frame would have namesFirst1????? Second1. A 1 by
4 data framewould have names First1????? Second1??????? First2??????????
Second2.
>
> Ira
>
>
>? ? On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
<dwinsemius at comcast.net> wrote:
>
>
>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at
gmail.com> wrote:
>>
>> Hi,
>>
>> It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first
step that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
> ? first1 second1
> 1? ? ? Al? Jones
> 2? ? Al2? Jones
> 3? ? Barb? Smith
> 4? ? Al3? Jones
> 5 Barbara? Smith
> 6? Carol? Adams
> 7? ? ? Al? Jones2
>
> Might want to wrap suppressWarnings around the right side of that
assignment since there were many warnings regarding incongruent factor levels.
>
> --
> David.
>> Is this what you intend?
>>
>>> dfbycol(employees4BList)
>> ? first1 last1 first2 last2 first3 last3
>> 1? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al Jones? Barb Smith? <NA>? <NA>
>> 3? ? Al Jones? Barb Smith? Carol Adams
>> 4? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>>>
>>> dfbycol(employees4List)
>> ? first1? last1? first2 last2 first3 last3
>> 1? ? Al? Jones? ? <NA>? <NA>? <NA>? <NA>
>> 2? ? Al2? Jones? ? Barb Smith? <NA>? <NA>
>> 3? ? Al3? Jones Barbara Smith? Carol Adams
>> 4? ? Al Jones2? ? <NA>? <NA>? <NA>? <NA>
>>
>>
>> If so:
>>
>> employees4BList = list(
>> data.frame(first1 = "Al", second1 = "Jones"),
>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4List = list(
>> data.frame(first1 = ("Al"), second1 = "Jones"),
>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>> ###
>>
>> dfbycol <- function(x) {
>> ? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>> ? x <- lapply(x, function(y){length(y) <- max(sapply(x, length));
y})
>> ? x <- do.call(rbind, x)
>> ? x <- data.frame(x, stringsAsFactors=FALSE)
>> ? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>> ? x
>> }
>>
>> ###
>>
>> dfbycol(employees4BList)
>>
>> dfbycol(employees4List)
>>
>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>> <r-help at r-project.org> wrote:
>>> I have a list of data frames which I would like to combine into one
data
>>> frame doing something like rbind. I wish to combine in column order
and
>>> not by names. However, there are issues.
>>>
>>> The number of columns is not the same for each data frame. This is
an
>>> intermediate step to a problem and the number of columns could be
>>> 2,4,6,8,or10. There might be a few thousand data frames. Another
problem
>>> is that the names of the columns produced by the first step are
garbage.
>>>
>>> Below is a method that I obtained by asking a question on stack
>>> overflow. Unfortunately, my example was not general enough. The
code
>>> below works for the simple case where the names of the people are
>>> consistent. It does not work when the names are realistically not
the same.
>>>
>>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>
>>>
>>> Please note that the lapply step sets things up except for the
column
>>> name issue. If I could figure out a way to change the column names,
then
>>> the bind_rows step will, I believe, work.
>>>
>>> So I really have two questions. How to change all column names of
all
>>> the data frames and then how to solve the original problem.
>>>
>>> # The non general case works fine. It produces one data frame and I
can
>>> then change the column names to
>>>
>>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>>>
>>> #Non general easy case
>>>
>>> employees4BList = list(data.frame(first1 = "Al", second1
= "Jones"),
>>>
>>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>>>
>>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4BList
>>>
>>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> # This produces a nice list of data frames, except for the names
>>>
>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>
>>> # This list is a disaster. I am looking for a solution that works
in
>>> this case.
>>>
>>> employees4List = list(data.frame(first1 = ("Al"), second1
= "Jones"),
>>>
>>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>>>
>>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first4 = ("Al"), second4 =
"Jones2"))
>>>
>>> ? bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>>>
>>> Thanks.
>>>
>>> Ira
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
advanced.'? -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller? ? ? ? ? ? ? ? ? ? ? ? The? ? .....? ? ? .....? Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>? ? ? ? Basics: ##.#.? ? ? ##.#.? Live
Go...
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Live:? OO#.. Dead: OO#..? Playing
Research Engineer (Solar/Batteries? ? ? ? ? ? O.O#.? ? ? #.O#.? with
/Software/Embedded Controllers)? ? ? ? ? ? ? .OO#.? ? ? .OO#.? rocks...1k
---------------------------------------------------------------------------
[[alternative HTML version deleted]]
Your request is getting a bit complicated with so much re-hashing, but
here are three solutions: base only, a bit of dplyr, and dplyr+tidyr:
#########
# input data
employees4List = list(data.frame(first1 = "Al", second1 =
"Jones"),
data.frame(first2 = c("Al2", "Barb"),
second2 = c("Jones",
"Smith")),
data.frame(first3 = c("Al3",
"Barbara",
"Carol"),
second3 = c("Jones",
"Smith",
"Adams")),
data.frame(first4 = ("Al"), second4 =
"Jones2"))
employees4List
#> [[1]]
#> first1 second1
#> 1 Al Jones
#>
#> [[2]]
#> first2 second2
#> 1 Al2 Jones
#> 2 Barb Smith
#>
#> [[3]]
#> first3 second3
#> 1 Al3 Jones
#> 2 Barbara Smith
#> 3 Carol Adams
#>
#> [[4]]
#> first4 second4
#> 1 Al Jones2
# expected output
df1 = data.frame(First1 = "Al", Second1 = "Jones",
First2 = NA, Second2 = NA,
First3 = NA, Second3 = NA,
First4 = NA, Second4 = NA)
df2 = data.frame(First1 = "Al2", Second1 = "Jones",
First2 = "Barb", Second2 = "Smith",
First3 = NA, Second3 = NA,
First4 = NA, Second4 = NA)
df3 = data.frame(First1 = "Al3", Second1 = "Jones",
First2 = "Barbara", Second2 = "Smith",
First3 = "Carol", Second3 = "Adams",
First4 = NA, Second4 = NA)
df4 = data.frame(First1 = "Al", Second1 = "Jones2",
First2 = NA, Second2 = NA,
First3 = NA, Second3 = NA,
First4 = NA, Second4 = NA)
listFinal = list(df1, df2, df3, df4)
listFinal
#> [[1]]
#> First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1 Al Jones NA NA NA NA NA NA
#>
#> [[2]]
#> First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1 Al2 Jones Barb Smith NA NA NA NA
#>
#> [[3]]
#> First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1 Al3 Jones Barbara Smith Carol Adams NA NA
#>
#> [[4]]
#> First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1 Al Jones2 NA NA NA NA NA NA
myrename1 <- function( DF, m ) {
# if a pair of columns is not present, raise an error
stopifnot( 2 == length( DF ) )
n <- nrow( DF )
# use memory layout of elements of matrix
# t() automatically converts to matrix (nrow=2)
# matrix(,nrow=1) re-interprets the column-major output of t()
# as a single row matrix
result <- as.data.frame( matrix( t( DF ), nrow = 1 )
, stringsAsFactors = FALSE
)
if ( n < m ) {
result[ , seq( 2 * n + 1, 2 * m ) ] <- NA
}
setNames( result
, sprintf( "%s%d"
, c( "First", "Second" )
, rep( seq.int( m ), each = 2 )
)
)
}
m <- max( unlist( lapply( employees4List, nrow ) ) )
listFinal1 <- lapply( employees4List, myrename1, m = m )
listFinal1
#> [[1]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al Jones NA NA NA NA
#>
#> [[2]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al2 Jones Barb Smith NA NA
#>
#> [[3]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al3 Jones Barbara Smith Carol Adams
#>
#> [[4]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al Jones2 NA NA NA NA
result1 <- do.call( rbind, listFinal1 )
result1
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al Jones <NA> <NA> <NA> <NA>
#> 2 Al2 Jones Barb Smith <NA> <NA>
#> 3 Al3 Jones Barbara Smith Carol Adams
#> 4 Al Jones2 <NA> <NA> <NA> <NA>
myrename2 <- function( DF ) {
# if a pair of columns is not present, raise an error
stopifnot( 2 == length( DF ) )
n <- nrow( DF )
# use memory layout of elements of matrix
# t() automatically converts to matrix (nrow=2)
# matrix(,nrow=1) re-interprets the column-major output of t()
# as a single row matrix
setNames( as.data.frame( matrix( t( DF ), nrow = 1 )
, stringsAsFactors = FALSE
)
, sprintf( "%s%d"
, c( "First", "Second" )
, rep( seq.int( n ), each = 2 )
)
)
}
listFinal2 <- lapply( employees4List, myrename2 )
listFinal2
#> [[1]]
#> First1 Second1
#> 1 Al Jones
#>
#> [[2]]
#> First1 Second1 First2 Second2
#> 1 Al2 Jones Barb Smith
#>
#> [[3]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al3 Jones Barbara Smith Carol Adams
#>
#> [[4]]
#> First1 Second1
#> 1 Al Jones2
result2 <- dplyr::bind_rows( listFinal2 )
result2
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al Jones <NA> <NA> <NA> <NA>
#> 2 Al2 Jones Barb Smith <NA> <NA>
#> 3 Al3 Jones Barbara Smith Carol Adams
#> 4 Al Jones2 <NA> <NA> <NA> <NA>
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
myrename3 <- function( DF ) {
# if a pair of columns is not present, raise an error
stopifnot( 2 == length( DF ) )
names( DF ) <- c( "a", "b" )
m <- nrow( DF )
( DF
%>% mutate_all( as.character )
%>% mutate( rw = LETTERS[ seq.int( n() ) ] )
%>% gather( col, val, -rw )
%>% tidyr::unite( "labels", rw, col, sep="" )
%>% spread( labels, val )
%>% setNames( sprintf( "%s%d"
, c( "First", "Second" )
, rep( seq.int( m ), each = 2 )
)
)
)
}
listFinal3 <- lapply( employees4List, myrename3 )
listFinal3
#> [[1]]
#> First1 Second1
#> 1 Al Jones
#>
#> [[2]]
#> First1 Second1 First2 Second2
#> 1 Al2 Jones Barb Smith
#>
#> [[3]]
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al3 Jones Barbara Smith Carol Adams
#>
#> [[4]]
#> First1 Second1
#> 1 Al Jones2
result3 <- dplyr::bind_rows( listFinal3 )
result3
#> First1 Second1 First2 Second2 First3 Second3
#> 1 Al Jones <NA> <NA> <NA> <NA>
#> 2 Al2 Jones Barb Smith <NA> <NA>
#> 3 Al3 Jones Barbara Smith Carol Adams
#> 4 Al Jones2 <NA> <NA> <NA> <NA>
#' Created on 2018-06-30 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
#########
On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
> I would like to thank everyone who helped me out. I have obtained some
offline help, so I would like to summarize all the information I have received.
> Before I summarize the thread, there is one loose end.
> Initially I thought
> library(dplyr)
> dplyr::bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
> would work, but there were problems.
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> spreads out the data frames converting the data frames from long to wide,
but it messes up the names. So one question I still have, is how can I
programmatically change all of the names?
> After this initial step, the first data frame's names might be derived
from
> c("George", "Washington")
> and the second data frame's names might be derived from
> c("John", "Adams", "Thomas",
"Jefferson")
> What I want to change to the names to:
> c("First1", "Second1")
> and
> c("First1", "Second1", "First2",
"Second2")
> I believe that will enable me to then go back and use bind_rows and
complete that method of solution:
> Step 1: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> Step 2: Clean up the names
> Step 3: bind_rows
> Immediately below is hopefully a clear and precise statement of the problem
and the proposed solution path. Then there are the various solutions.
> # Starting list of data frames
> employees4List = list(data.frame(first1 = "Al", second1 =
"Jones"),
> ???????????????????? data.frame(first2 = c("Al2",
"Barb"), second2 = c("Jones", "Smith")),
> ???????????????????? data.frame(first3 = c("Al3",
"Barbara", "Carol"), second3 = c("Jones",
"Smith", "Adams")),
> ???????????????????? data.frame(first4 = ("Al"), second4 =
"Jones2"))
>
> employees4List
>
>
> # Intermediate step that messes up the names but successfully converts from
long to wide
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> # The intermediate list should likely look like this listFinal
> df1 = data.frame(First1 = "Al", Second1 = "Jones",
First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
> ???????????????? First4 = NA, Second4 = NA)
> df2 = data.frame(First1 = "Al2", Second1 = "Jones",
First2 = "Barb", Second2 = "Smith",
> ???????????????? First3 = NA, Second3 = NA, First4 = NA, Second4 = NA)
>
> df3 = data.frame(First1 = "Al3", Second1 = "Jones",
First2 = "Barbara", Second2 = "Smith",
> ???????????????? First3 = "Carol", Second3 = "Adams",
First4 = NA, Second4 = NA)
> df4 = data.frame(First1 = "Al", Second1 = "Jones2",
First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
> ???????????????? First4 = NA, Second4 = NA)
> listFinal = list(df1, df2, df3, df4)
> listFinal
>
> # Requested data frame (except that the columns are not just character but
some are factor or even logical)
> dplyr::bind_rows(listFinal)
> Sarah Goslee solved the problem using base R.
> Given
> employees4List = list(
> ? data.frame(first1 = ("Al"), second1 = "Jones"),
> ? data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones2", "Smith")),
> ? data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones3",
> ???????????????????????????????????????????????????????????????
"Smith", "Adams")),
> ? data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> This function produces the solution in the requested structure.
> dfbycol <- function(x) {
> ? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
> ? x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
> ? x <- do.call(rbind, x)
> ? x <- data.frame(x, stringsAsFactors=FALSE)
> ? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
> ? x
> }
> dfbycol(employees4List)
> Offline, Jeff Newmiller and Bert Gunter provided alternative approaches to
the problem as well as other advice. Their solutions meet the "tidy"
criterion.
> Bert suggested this online.
> ## list of two data frames with different column names and numbers of rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a =
5:9,b = letters[11:15]))
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){?? names(x) <-
c("first","last"); x}))
> This and the next suggestion by Jeff produced useful solutions but not in
the requested form.
> library(dplyr)
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text > "First????????? Last
> George????????? Washington
> ", header=TRUE, as.is = TRUE )
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last =
"Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA,
-1L))
>
> DF2 <- read.table( text > "Start????????????? End
> John????????????? Adams
> Thomas??????? Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
> ????????????????? , data = DFL
> ????????????????? )
>
> DFL2 <- (? DFL1
> ??????? %>% mutate( data = lapply( data
> ????????????????????????????????? , function( DF ) {
> ????????????????????????????????????? DF[[ ".PK" ]] <-
seq.int( nrow( DF ))
> ????????????????????????????????????? gather( DF, ".Col",
"value", -.PK )
> ??????????????????????????????????? }
> ????????????????????????????????? )
> ????????????????? )
> ??????? %>% unnest
> ??????? %>% spread( .Col, value )
> ??????? )
> DFL2
> During the discussion, useful links were recommended
> [1] https://www.jstatsoft.org/article/view/v059i10?? Hadley on tidy data
> [2] http://r4ds.had.co.nz/relational-data.html#keys? Hadley on relational
data
> [3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?
How to make a great reproducible example
> http://adv-r.had.co.nz/Functionals.html???? Improving lapply and related
skills
> Thanks again to everyone!
> Ira
>
>
>
>
> On Friday, June 29, 2018, 7:47:13 PM PDT, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us> wrote:
>
> Code below...
>
> a) Just because something can be done with dplyr does not mean that is the
> best way to do it. A solution in the hand is worth two on the Internet,
> and dplyr is not always the fastest method anyway.
>
> b) I highly recommend that you read Hadley Wickham's paper on tidy data
> [1]. Also, having a group of one or more columns at all times that
> uniquely identify where the data came from is a "key" to success
[2].
>
> c) Please read and follow one of the various online documents about making
> reproducible examples in R (e.g. [3]). HTML formatting is really a pain
> (at best... at worst, it corrupts your code) on a plain-text-only list
> (you have read the Posting Guide, right?). Consider my example below as a
> model for you to follow in the future, and make sure to set your email
> program to send plain text. (Obviously your examples don't have to
achieve
> success... but they should bring us up to speed with where you are having
> troubles IN R.)
>
> [1] https://www.jstatsoft.org/article/view/v059i10
> [2] http://r4ds.had.co.nz/relational-data.html#keys
> [3]
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> ----
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #>? ? filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #>? ? intersect, setdiff, setequal, union
> library(tidyr)
>
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text > "First? ? ? ? ? Last
> George? ? ? ? ? Washington
> ", header=TRUE, as.is = TRUE )
>
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last =
"Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA,
-1L))
>
> DF2 <- read.table( text > "Start? ? ? ? ? ? ? End
> John? ? ? ? ? ? ? Adams
> Thomas? ? ? ? Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
> ? ? ? ? ? ? ? ? ? , data = DFL
> ? ? ? ? ? ? ? ? ? )
>
> DFL2 <- (? DFL1
> ? ? ? ? %>% mutate( data = lapply( data
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , function( DF ) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? DF[[ ".PK" ]] <-
seq.int( nrow( DF ))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gather( DF, ".Col",
"value", -.PK )
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? )
> ? ? ? ? ? ? ? ? ? )
> ? ? ? ? %>% unnest
> ? ? ? ? %>% spread( .Col, value )
> ? ? ? ? )
> DFL2
> #> # A tibble: 3 x 6
> #>? .DFNames? .PK End? ? ? First? Last? ? ? Start
> #>? <chr>? ? <int> <chr>? ? <chr>? <chr>?
? ? <chr>
> #> 1 DF1? ? ? ? ? 1 <NA>? ? ? George Washington <NA>
> #> 2 DF2? ? ? ? ? 1 Adams? ? <NA>? <NA>? ? ? John
> #> 3 DF2? ? ? ? ? 2 Jefferson <NA>? <NA>? ? ? Thomas
>
> #' Created on 2018-06-29 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
> ----
>
> On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
>>
>> Sarah and David,
>>
>> Thank you for your responses.I will try and be clearer.
>>
>> Base R solution: Sarah?smethod worked perfectly
>>
>> Is there a dplyrsolution?
>>
>> START: list of dataframes
>>
>> FINISH: one data frame
>>
>> DETAILS: The initiallist of data frames might have hundreds or a few
thousand data frames. Everydata frame will have two columns. The first column
will represent first names.The second column will represent last names. The
column names are notconsistent. Data frames will most likely have from one to
five rows.
>>
>> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data
frames. Then somehow do an rbindeven though the number of columns differ from
data frame to data frame.
>>
>> EXAMPLE: List with twodata frames
>>
>> # DF1
>>
>> First?? ???????Last
>>
>> George Washington
>>
>> ?
>>
>> # DF2
>>
>> Start????????????? End
>>
>> John?????????????? Adams
>>
>> Thomas??????? Jefferson
>>
>> ?
>>
>> # End Result. One dataframe
>>
>> First1????? Second1??????? First2?????????? Second2
>>
>> George Washington?????? NA??????????????????? NA
>>
>> John?????????????? Adams??? Thomas??????? Jefferson
>>
>> ?
>>
>> DISCUSSION: As mentionedI posted something on Stack Overflow.
Unfortunately, my example was not generalenough and so the suggested solutions
worked on the easy case which I provided butnot when the names were different.
>>
>> The suggested solution was:
>>
>> library(dplyr)
>>
>> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>>
>> ?
>>
>> On this site I pointedout that the inner function:
lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>>
>> For each data frame correctlyspread the multiple rows into ?1 by 2ndata
frames. However, the column names were derived from the values and were amess.
This caused a problem with bind_rows.
>>
>> I felt that if I knewhow to change all the names of all of the data
frames that were created afterlapply, then I could then use bind_rows. So if
someone knows how to change allof the names at this intermediate stage, I hope
that person will provide thesolution.
>>
>> In? the end a 1 by 2 data frame would have namesFirst1????? Second1. A
1 by 4 data framewould have names First1????? Second1??????? First2??????????
Second2.
>>
>> Ira
>>
>>
>> ? ? On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
<dwinsemius at comcast.net> wrote:
>>
>>
>>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee at
gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> It isn't super clear to me what you're after.
>>
>> Agree.
>>
>> Had a different read of ht erequest. Thought the request was for a
first step that "harmonized" the names of the columns and then used
`dplyr::bind_rows`:
>>
>> library(dplyr)
>> newList <- lapply( employees4List, 'names<-',
names(employees4List[[1]]) )
>> bind_rows(newList)
>>
>> #---------
>>
>> ? first1 second1
>> 1? ? ? Al? Jones
>> 2? ? Al2? Jones
>> 3? ? Barb? Smith
>> 4? ? Al3? Jones
>> 5 Barbara? Smith
>> 6? Carol? Adams
>> 7? ? ? Al? Jones2
>>
>> Might want to wrap suppressWarnings around the right side of that
assignment since there were many warnings regarding incongruent factor levels.
>>
>> --
>> David.
>>> Is this what you intend?
>>>
>>>> dfbycol(employees4BList)
>>> ? first1 last1 first2 last2 first3 last3
>>> 1? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>>> 2? ? Al Jones? Barb Smith? <NA>? <NA>
>>> 3? ? Al Jones? Barb Smith? Carol Adams
>>> 4? ? Al Jones? <NA>? <NA>? <NA>? <NA>
>>>>
>>>> dfbycol(employees4List)
>>> ? first1? last1? first2 last2 first3 last3
>>> 1? ? Al? Jones? ? <NA>? <NA>? <NA>? <NA>
>>> 2? ? Al2? Jones? ? Barb Smith? <NA>? <NA>
>>> 3? ? Al3? Jones Barbara Smith? Carol Adams
>>> 4? ? Al Jones2? ? <NA>? <NA>? <NA>? <NA>
>>>
>>>
>>> If so:
>>>
>>> employees4BList = list(
>>> data.frame(first1 = "Al", second1 = "Jones"),
>>> data.frame(first1 = c("Al", "Barb"), second1 =
c("Jones", "Smith")),
>>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4List = list(
>>> data.frame(first1 = ("Al"), second1 = "Jones"),
>>> data.frame(first2 = c("Al2", "Barb"), second2 =
c("Jones", "Smith")),
>>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first4 = ("Al"), second4 =
"Jones2"))
>>>
>>> ###
>>>
>>> dfbycol <- function(x) {
>>> ? x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>>> ? x <- lapply(x, function(y){length(y) <- max(sapply(x,
length)); y})
>>> ? x <- do.call(rbind, x)
>>> ? x <- data.frame(x, stringsAsFactors=FALSE)
>>> ? colnames(x) <- paste0(c("first", "last"),
rep(seq(1, ncol(x)/2), each=2))
>>> ? x
>>> }
>>>
>>> ###
>>>
>>> dfbycol(employees4BList)
>>>
>>> dfbycol(employees4List)
>>>
>>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>>> <r-help at r-project.org> wrote:
>>>> I have a list of data frames which I would like to combine into
one data
>>>> frame doing something like rbind. I wish to combine in column
order and
>>>> not by names. However, there are issues.
>>>>
>>>> The number of columns is not the same for each data frame. This
is an
>>>> intermediate step to a problem and the number of columns could
be
>>>> 2,4,6,8,or10. There might be a few thousand data frames.
Another problem
>>>> is that the names of the columns produced by the first step are
garbage.
>>>>
>>>> Below is a method that I obtained by asking a question on stack
>>>> overflow. Unfortunately, my example was not general enough. The
code
>>>> below works for the simple case where the names of the people
are
>>>> consistent. It does not work when the names are realistically
not the same.
>>>>
>>>>
https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>>
>>>>
>>>> Please note that the lapply step sets things up except for the
column
>>>> name issue. If I could figure out a way to change the column
names, then
>>>> the bind_rows step will, I believe, work.
>>>>
>>>> So I really have two questions. How to change all column names
of all
>>>> the data frames and then how to solve the original problem.
>>>>
>>>> # The non general case works fine. It produces one data frame
and I can
>>>> then change the column names to
>>>>
>>>> # c("first1", "last1","first2",
"last2","first3", "last3",)
>>>>
>>>> #Non general easy case
>>>>
>>>> employees4BList = list(data.frame(first1 = "Al",
second1 = "Jones"),
>>>>
>>>> data.frame(first1 = c("Al", "Barb"),
second1 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first1 = c("Al", "Barb",
"Carol"), second1 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first1 = ("Al"), second1 =
"Jones"))
>>>>
>>>> employees4BList
>>>>
>>>> bind_rows(lapply(employees4BList, function(x)
rbind.data.frame(c(t(x)))))
>>>>
>>>> # This produces a nice list of data frames, except for the
names
>>>>
>>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>>
>>>> # This list is a disaster. I am looking for a solution that
works in
>>>> this case.
>>>>
>>>> employees4List = list(data.frame(first1 = ("Al"),
second1 = "Jones"),
>>>>
>>>> data.frame(first2 = c("Al2", "Barb"),
second2 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first3 = c("Al3", "Barbara",
"Carol"), second3 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first4 = ("Al"), second4 =
"Jones2"))
>>>>
>>>> ? bind_rows(lapply(employees4List, function(x)
rbind.data.frame(c(t(x)))))
>>>>
>>>> Thanks.
>>>>
>>>> Ira
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> 'Any technology distinguishable from magic is insufficiently
advanced.'? -Gehm's Corollary to Clarke's Third Law
>>
>>
>>
>>
>>
>> ??? [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller? ? ? ? ? ? ? ? ? ? ? ? The? ? .....? ? ? .....? Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>? ? ? ? Basics: ##.#.? ? ? ##.#.?
Live Go...
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Live:? OO#.. Dead: OO#..? Playing
> Research Engineer (Solar/Batteries? ? ? ? ? ? O.O#.? ? ? #.O#.? with
> /Software/Embedded Controllers)? ? ? ? ? ? ? .OO#.? ? ? .OO#.? rocks...1k
> ---------------------------------------------------------------------------
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------