Stefano Sofia
2022-Jun-03 12:43 UTC
[R] rbind of multiple data frames by column name, when each data frames can contain different columns
Thank you to all who provided useful hints, great as always.
In my opinion the solution given by Andrew is perfect, exactly what I wanted to
do without changing the format of my data frames:
df_list <- list(df1, df2, df3)
allNms <- unique(unlist(lapply(df_list, names)))
do.call(rbind, c(lapply(df_list, function(x) data.frame(x,
sapply(setdiff(allNms, names(x)), function(y) NA, simplify = FALSE))),
make.row.names=FALSE))
Now I encountered this final problem: when I load my data frames in R through
df1 <- read.table(file="/mypath/df1.csv", header = TRUE, sep="
", dec = ".", stringsAsFactors = TRUE)
df1$data_POSIX <- as.POSIXct(df1$data_POSIX, format = "%Y-%m-%d",
tz="Etc/GMT-1")
I get the following error:
Error in data.frame(x, sapply(setdiff(allNms, names(x)), function(y) NA, :
arguments imply differing number of rows:5, 0
Why? df1 is a data frame correctly filled in.
Thank you again
Stefano
(oo)
--oOO--( )--OOo--------------------------------------
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.sofia at regione.marche.it
---Oo---------oO----------------------------------------
________________________________
Da: Andrew Simmons <akwsimmo at gmail.com>
Inviato: venerd? 3 giugno 2022 11:06
A: Stefano Sofia
Oggetto: Re: [R] rbind of multiple data frames by column name, when each data
frames can contain different columns
I think I see the problem. I forgot that lapply doesn't assign names where
sapply does. I think you might want to use that instead, but also supply
argument simplify as FALSE
sapply(setdiff(allNms, names(x)), function(y) NA, simplify = FALSE)
It should produce something more like
$Station2_Hs
[1] NA
$Station2_Hn
[1] NA
$Station2_flag
[1] NA
which should combine nicely with data.frame(x,)
On Fri, Jun 3, 2022, 03:25 Stefano Sofia <stefano.sofia at
regione.marche.it<mailto:stefano.sofia at regione.marche.it>> wrote:
Good morning Andrew.
Thank you for your help.
Unfortunately your suggestion does not work, and I got different errors
depending on the use of the three small examples or real data frames.
If I apply your code to the three small examples I gave you (df1, df2 and df3) I
get:
Error in match.names(clabs, names(xi)) :
names do not match previous names
I tried to understand the origin of the problem.
lapply(setdiff(allNms, names(x)), function(y) NA)
gives
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] NA
[[5]]
[1] NA
[[6]]
[1] NA
[[7]]
[1] NA
[[8]]
[1] NA
[[9]]
[1] NA
[[10]]
[1] NA
and
data.frame(x, lapply(setdiff(allNms, names(x)), function(y) NA))
gives
x NA. NA..1 NA..2 NA..3 NA..4 NA..5 NA..6 NA..7 NA..8 NA..9
1 1 NA NA NA NA NA NA NA NA NA NA
2 1 NA NA NA NA NA NA NA NA NA NA
3 1 NA NA NA NA NA NA NA NA NA NA
4 2 NA NA NA NA NA NA NA NA NA NA
5 2 NA NA NA NA NA NA NA NA NA NA
6 3 NA NA NA NA NA NA NA NA NA NA
7 4 NA NA NA NA NA NA NA NA NA NA
8 4 NA NA NA NA NA NA NA NA NA NA
9 4 NA NA NA NA NA NA NA NA NA NA
10 5 NA NA NA NA NA NA NA NA NA NA
11 6 NA NA NA NA NA NA NA NA NA NA
12 6 NA NA NA NA NA NA NA NA NA NA
13 6 NA NA NA NA NA NA NA NA NA NA
Why?
Unfortunately this code is too difficult for me.
Sorry for bothering you, thank you for what you have already done.
Stefano
(oo)
--oOO--( )--OOo--------------------------------------
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.sofia at regione.marche.it<mailto:stefano.sofia at
regione.marche.it>
---Oo---------oO----------------------------------------
________________________________
Da: Andrew Simmons <akwsimmo at gmail.com<mailto:akwsimmo at
gmail.com>>
Inviato: gioved? 2 giugno 2022 08:21
A: Stefano Sofia
Oggetto: Re: [R] rbind of multiple data frames by column name, when each data
frames can contain different columns
I would change this:
do.call(rbind, c(lapply(df_list, function(x) data.frame(c(x,
sapply(setdiff(allNms, names(x)), function(y) NA)))), make.row.names=FALSE))
to:
do.call(rbind, c(lapply(df_list, function(x) data.frame(x,
lapply(setdiff(allNms, names(x)), function(y) NA))), make.row.names=FALSE))
On Thu, Jun 2, 2022, 02:13 Stefano Sofia <stefano.sofia at
regione.marche.it<mailto:stefano.sofia at regione.marche.it>> wrote:
Dear R-list users,
for each winter season from 2000 to 2022 I have a data frame collecting for
different weather stations snowpack height (Hs), snowfall in the last 24h (Hn)
and a validation flag.
Suppose I have these three following data frames
df1 <- data.frame(data_POSIX=seq(as.POSIXct("2000-12-01",
format="%Y-%m-%d", tz="Etc/GMT-1"),
as.POSIXct("2000-12-05", format="%Y-%m-%d",
tz="Etc/GMT-1"), by="1 days"), Station1_Hs = c(30, 40, 50,
NA, 55), Station1_Hn = c(10, 20, 10, NA, 5), Station1_flag = c(0, 0, 0, NA, 0),
Station2_Hs = c(20, 20, 30, 30, 0), Station2_Hn = c(0, 0, 10, 0, 5),
Station2_flag = c(0, 0, 0, 1, 0))
df2 <- data.frame(data_POSIX=seq(as.POSIXct("2001-12-01",
format="%Y-%m-%d", tz="Etc/GMT-1"),
as.POSIXct("2001-12-05", format="%Y-%m-%d",
tz="Etc/GMT-1"), by="1 days"), Station1_Hs = c(50, 60, 70,
NA, NA), Station1_Hn = c(20, 20, 20, NA, NA), Station1_flag = c(0, 0, 0, NA,
NA), Station3_Hs = c(20, 20, 30, 30, 0), Station3_Hn = c(0, 0, 10, 0, 5),
Station3_flag = c(0, 0, 0, 1, 0))
df3 <- data.frame(data_POSIX=seq(as.POSIXct("2002-12-01",
format="%Y-%m-%d", tz="Etc/GMT-1"),
as.POSIXct("2002-12-05", format="%Y-%m-%d",
tz="Etc/GMT-1"), by="1 days"), = c(0, 0, 0, NA, NA),
Station3_Hs = c(20, 20, 30, 30, 0), Station3_Hn = c(0, 0, 10, 0, 5),
Station3_flag = c(0, 0, 0, 1, 0))
As you can see, each data frame can have different stations loaded.
I would need to call rbind matching data frames by column name (i.e. by station
name), keeping in mind that the number of stations loaded in each data frame may
differ. The result should be
data_POSIX Station1_Hs Station1_Hn Station1_flag Station2_Hs Station2_Hn
Station2_flag Station3_Hs Station3_Hn Station3_flag
2000-12-01 30 10 0 20 0 0 NA NA NA
2000-12-02 40 20 0 20 0 0 NA NA NA
2000-12-03 50 10 0 30 10 0 NA NA NA
2000-12-04 NA NA NA 30 0 0 NA NA NA
2000-12-05 55 5 0 0 5 0 NA NA NA
2001-12-01 50 20 0 NA NA NA 20 0 0
2001-12-02 60 20 0 NA NA NA 20 0 0
2001-12-03 70 20 0 NA NA NA 30 10 0
2001-12-04 NA NA NA NA NA NA 30 0 1
2001-12-05 NA NA NA NA NA NA 0 5 0
2002-12-01 NA NA NA 50 20 0 20 0 0
2002-12-02 NA NA NA 60 20 0 20 0 0
2002-12-03 NA NA NA 70 20 0 30 10 0
2002-12-04 NA NA NA NA NA NA 30 0 1
2002-12-05 NA NA NA NA NA NA 0 5 0
I tried this code
df_list <- list(df1, df2, df3)
allNms <- unique(unlist(lapply(df_list, names)))
do.call(rbind, c(lapply(df_list, function(x) data.frame(c(x,
sapply(setdiff(allNms, names(x)), function(y) NA)))), make.row.names=FALSE))
but I get this error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, :
arguments imply differing number of rows
Could someone please help me?
Thank you for your attention
Stefano
(oo)
--oOO--( )--OOo--------------------------------------
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.sofia at regione.marche.it<mailto:stefano.sofia at
regione.marche.it>
---Oo---------oO----------------------------------------
________________________________
AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere
informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla
ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si ?
il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo
messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente
ed eliminarlo completamente dal sistema del proprio computer. Ai sensi
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed
urgenza, la risposta al presente messaggio di posta elettronica pu? essere
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons
entitled to receive the confidential information it may contain. E-mail messages
to clients of Regione Marche may contain information that is confidential and
legally privileged. Please do not read, copy, forward, or store this message
unless you are an intended recipient of it. If you have received this message in
error, please forward it to the sender and delete it completely from your
computer system.
--
Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto.
This message was scanned by Libraesva ESG and is believed to be clean.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&e=5a635173&h=06ff70f3&f=y&p=y>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<https://urlsand.esvalabs.com/?u=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&e=5a635173&h=e12f63e8&f=y&p=y>
and provide commented, minimal, self-contained, reproducible code.
--
Questo messaggio ? stato analizzato con Libraesva ESG ed ? risultato non infetto
________________________________
AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere
informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla
ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si ?
il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo
messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente
ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art.
6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la
risposta al presente messaggio di posta elettronica pu? essere visionata da
persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons
entitled to receive the confidential information it may contain. E-mail messages
to clients of Regione Marche may contain information that is confidential and
legally privileged. Please do not read, copy, forward, or store this message
unless you are an intended recipient of it. If you have received this message in
error, please forward it to the sender and delete it completely from your
computer system.
--
Questo messaggio ? stato analizzato con Libraesva ESG ed ? risultato non
infetto.
This message has been checked by Libraesva ESG and is believed to be clean.
--
Questo messaggio ? stato analizzato con Libraesva ESG ed ? risultato non infetto
________________________________
AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere
informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla
ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si ?
il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo
messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente
ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art.
6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la
risposta al presente messaggio di posta elettronica pu? essere visionata da
persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons
entitled to receive the confidential information it may contain. E-mail messages
to clients of Regione Marche may contain information that is confidential and
legally privileged. Please do not read, copy, forward, or store this message
unless you are an intended recipient of it. If you have received this message in
error, please forward it to the sender and delete it completely from your
computer system.
--
Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto.
This message was scanned by Libraesva ESG and is believed to be clean.
[[alternative HTML version deleted]]