thr3ads.net - R help - [R] merge data frames with same column names of different lengths and missing values [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Steven Lubitz

2009-Mar-07 05:02 UTC

[R] merge data frames with same column names of different lengths and missing values

Hello, I'm switching over from SAS to R and am having trouble merging data
frames. The data frames have several columns with the same name, and each has a
different number of rows. Some of the values are missing from cells with the
same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in a
complete cell overwriting the value in an empty cell from the other data frame.
I cannot seem to achieve this result, though I've tried several merge
adaptations:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)


merge(x,y,by="id") #I lose observations here (n=1 in this example),
and my items are duplicated - I do not want this result
  id item1.x item2.x item1.y item2.y
1  1      NA       1      NA      NA
2  2      NA      NA       2      NA
3  3       3      NA      NA       3
4  4       4       4       4       4
5  5       5       5       5       5


merge(x,y,by=c("id","item1","item2")) #again I
lose observations (n=4 here) and do not want this result
  id item1 item2
1  4     4     4
2  5     5     5


merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated and the NA values are retained - I instead want one row
per ID
  id item1 item2
1  1    NA     1
2  1    NA    NA
3  2     2    NA
4  2    NA    NA
5  3     3    NA
6  3    NA     3
7  4     4     4
8  5     5     5
9  6     6    NA

In reality I have multiple data frames with numerous columns, all with this
problem. I can do the merge seamlessly in SAS, but am trying to learn and stick
with R for my analyses. Any help would be greatly appreciated.

Steve Lubitz
Cardiovascular Research Fellow, Brigham and Women's Hospital and
Massachusetts General Hospital

Dieter Menne

2009-Mar-07 08:56 UTC

head link

[R] merge data frames with same column names of different lengths and missing values

Steven Lubitz <slubitz1 <at> yahoo.com> writes:
> 
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
> 
....>
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated
and the NA values are> retained - I instead want one row per ID
>   id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
> I think you only got the wrong (too complex) function. Try rbind(x,y)

Dieter

Domenico Vistocco

2009-Mar-07 10:24 UTC

head link

[R] merge data frames with same column names of different lengths and missing values

Steven Lubitz wrote:> Hello, I'm switching over from SAS to R and am having trouble merging
data frames. The data frames have several columns with the same name, and each
has a different number of rows. Some of the values are missing from cells with
the same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in a
complete cell overwriting the value in an empty cell from the other data frame.
I cannot seem to achieve this result, though I've tried several merge
adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this
example), and my items are duplicated - I do not want this result
>   id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #again
I lose observations (n=4 here) and do not want this result
>   id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
>
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated and the NA values are retained - I instead want one row
per ID
>   id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>   You should obtain the desired solution using:
merge(y, x, by=c("id","item1","item2"), all=TRUE)

In database terminology all=TRUE corresponds to the full outer join, 
all.x to the left outer join and all.y to the right outer join.

Ciao,
domenico
> In reality I have multiple data frames with numerous columns, all with this
problem. I can do the merge seamlessly in SAS, but am trying to learn and stick
with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
Massachusetts General Hospital
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Jun Shen

2009-Mar-07 21:52 UTC

head link

[R] merge data frames with same column names of different lengths and missing values

Steve,

I don't know if R has such a function to perform the task you were asking. I
wrote one myself. Try the following to see if it works for you. The new
function "merge.new" has one additional argument col.ID, which is the
column
number of ID column. To use your x, y as examples, type:

merge.new(x,y,all=TRUE,col.ID=3)

#################################################

merge.new<-function(...,col.ID){
    inter<-merge(...)
    inter<-inter[order(inter[col.ID]),] #merged data sorted by ID

    #total columns and rows for the target dataframe
    total.row<-length(unique(inter[[col.ID]]))
    total.col<-dim(inter)[2]
    row.ID<-unique(inter[[col.ID]])
    target<-matrix(NA,total.row,total.col)
    target<-as.data.frame(target)
    names(target)<-names(inter)

    for (i in 1:total.row){
        inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with
the same ID
        for (j in 1:total.col){
            if (is.na(inter.part[1,j])){
                if(is.na(inter.part[2,j])) {target[i,j]=NA}
                else {target[i,j]=inter.part[2,j]}
            }
            else {target[i,j]=inter.part[1,j]}

        }
    }
print(paste("total rows=",total.row))
print(paste("total columns=",total.col))
return(target)
}
#################################################
-- 
Jun Shen PhD
PK/PD Scientist
BioPharma Services
Millipore Corporation
15 Research Park Dr.
St Charles, MO 63304
Direct: 636-720-1589

On Fri, Mar 6, 2009 at 11:02 PM, Steven Lubitz <slubitz1@yahoo.com> wrote:
>
> Hello, I'm switching over from SAS to R and am having trouble merging
data
> frames. The data frames have several columns with the same name, and each
> has a different number of rows. Some of the values are missing from cells
> with the same column names in each data frame. I had hoped that when I
> merged the dataframes, every column with the same name would be merged,
with
> the value in a complete cell overwriting the value in an empty cell from
the
> other data frame. I cannot seem to achieve this result, though I've
tried
> several merge adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this
example), and my
> items are duplicated - I do not want this result
>  id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #again
I lose observations (n=4 here)
> and do not want this result
>  id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
>
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are
> duplicated and the NA values are retained - I instead want one row per ID
>  id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>
> In reality I have multiple data frames with numerous columns, all with this
> problem. I can do the merge seamlessly in SAS, but am trying to learn and
> stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
> Massachusetts General Hospital
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Steven Lubitz

2009-Mar-07 22:05 UTC

head link

[R] merge data frames with same column names of different lengths and missing values

Subject: Re: [R] merge data frames with same column names of different lengths
and missing values
To: "Phil Spector" <spector@stat.berkeley.edu>
Date: Saturday, March 7, 2009, 5:01 PM

Phil,
Thank you - this is very helpful. However I realized that with my real data sets
(not the example I have here), I also have different numbers of columns in each
data frame. rbind doesn't seem to like this. Here's a modified example:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5),
item3=c(NA,2,NA,4,NA), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)

rbind(x,y)
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Any ideas?
Thanks,
Steve


--- On Sat, 3/7/09, Phil Spector <spector@stat.berkeley.edu> wrote:
From: Phil Spector <spector@stat.berkeley.edu>
Subject: Re: [R] merge data frames with
 same column names of different lengths and missing values
To: "Steven Lubitz" <slubitz1@yahoo.com>
Date: Saturday, March 7, 2009, 1:56 AM

Steven -
  I believe this gives the output that you desire:
> xy = rbind(x,y)
>aggregate(subset(xy,select=-id),xy['id'],function(x)rev(x[!is.na(x)])[1])
   id item1 item2
1  1    NA     1
2  2     2    NA
3  3     3     3
4  4     4     4
5  5     5     5
6  6     6    NA

  But I think what merge x y; by id; would give you is
> aggregate(subset(xy,select=-id),xy['id'],function(x)x[length(x)])   id item1 item2
1  1    NA    NA
2  2     2    NA
3  3    NA     3
4  4     4     4
5  5     5     5
6  6     6    NA

                                        - Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					
 spector@stat.berkeley.edu





On Fri, 6 Mar 2009, Steven Lubitz wrote:
>
> Hello, I'm switching over from SAS to R and am having trouble mergingdata frames. The data frames have several columns with the same name, and each
has a different number of rows. Some of the values are missing from cells with
the same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in a
complete cell overwriting the value in an empty cell from the other data frame.
I cannot seem to achieve this result, though I've tried several merge
adaptations:>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA),
id=1:6)>
>
> merge(x,y,by="id") #I lose observations here (n=1 in thisexample), and my items are
 duplicated - I do not want this result>  id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #againI lose observations (n=4 here) and do not want this
result>  id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
>merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated and the NA values are retained - I instead want one row
per ID>  id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>
> In reality I have multiple data frames with numerous columns, all with
this problem. I can do the merge seamlessly in SAS, but am trying to learn and
stick with R for my analyses. Any help would be greatly
appreciated.>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
Massachusetts General Hospital>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
>


      


      
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more possibly parallel threads

R help - Mar 2009 - merge data frames with same column names of different lengths and missing values

[R] merge data frames with same column names of different lengths and missing values

[R] merge data frames with same column names of different lengths and missing values

[R] merge data frames with same column names of different lengths and missing values

[R] merge data frames with same column names of different lengths and missing values

[R] merge data frames with same column names of different lengths and missing values

Maybe Matching Threads