James Howison
2006-Aug-31 04:36 UTC
[R] Combine 'overlapping' dataframes, respecting row names
Hi, I've examined the archives and found quite a few questions on concatenating dataframes, but none that really addressed my issue, I'm afraid. I've also examined the cbind and rbind documentation but nonetheless here I am writing to r-help ;) This is what I have (the row names are dates used for conversion to an irregular time series with the its package): > cvsFrame cvsactions 2002-11-15 4 2002-12-15 9 2003-01-15 5 2003-02-15 5 > downloadsFrame downloads 2002-09-15 1 2002-10-15 2 2002-11-15 12 2002-12-15 8 (notice how the dates are overlapping?) The output I'd like is: cvsaction downloads 2002-09-15 NA 1 2002-10-15 NA 2 2002-11-15 4 12 2002-12-15 9 8 2003-01-15 5 NA 2003-02-15 5 NA ie. merge the data.frames, respecting the row.names and inserting NAs where a frame didn't contain info for a row in the final frame. This is the closest I gotten (I'm sure cbind is doing what it's meant to do but it's obviously not what I need) > cbind(downloadsFrame,cvsFrame) downloads cvsactions 2002-09-15 1 4 2002-10-15 2 9 2002-11-15 12 5 2002-12-15 8 5 It takes the row.names from the first frame given and then just adds the data in rows 1 through 4, regardless of their row.name. And it doesn't work at all if the column lengths are different. (Yes, it would be nice if the 'its' class had a way to merge 'its' objects, but the question seemed general enough to ask on list.) Thanks, James
Gabor Grothendieck
2006-Aug-31 04:51 UTC
[R] Combine 'overlapping' dataframes, respecting row names
If you are converting them to 'its' anyways then after the conversion to 'its' use the 'its' union command. On 8/31/06, James Howison <jhowison at syr.edu> wrote:> Hi, > > I've examined the archives and found quite a few questions on > concatenating dataframes, but none that really addressed my issue, > I'm afraid. I've also examined the cbind and rbind documentation but > nonetheless here I am writing to r-help ;) > > This is what I have (the row names are dates used for conversion to > an irregular time series with the its package): > > > cvsFrame > cvsactions > 2002-11-15 4 > 2002-12-15 9 > 2003-01-15 5 > 2003-02-15 5 > > > downloadsFrame > downloads > 2002-09-15 1 > 2002-10-15 2 > 2002-11-15 12 > 2002-12-15 8 > > (notice how the dates are overlapping?) > > The output I'd like is: > > cvsaction downloads > 2002-09-15 NA 1 > 2002-10-15 NA 2 > 2002-11-15 4 12 > 2002-12-15 9 8 > 2003-01-15 5 NA > 2003-02-15 5 NA > > ie. merge the data.frames, respecting the row.names and inserting NAs > where a frame didn't contain info for a row in the final frame. > > This is the closest I gotten (I'm sure cbind is doing what it's meant > to do but it's obviously not what I need) > > > cbind(downloadsFrame,cvsFrame) > downloads cvsactions > 2002-09-15 1 4 > 2002-10-15 2 9 > 2002-11-15 12 5 > 2002-12-15 8 5 > > It takes the row.names from the first frame given and then just adds > the data in rows 1 through 4, regardless of their row.name. And it > doesn't work at all if the column lengths are different. (Yes, it > would be nice if the 'its' class had a way to merge 'its' objects, > but the question seemed general enough to ask on list.) > > Thanks, > James > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Prof Brian Ripley
2006-Aug-31 04:54 UTC
[R] Combine 'overlapping' dataframes, respecting row names
'merge' is the key here. You say you want to merge, but it seems did not try merge()> (res <- merge(cvsFrame, downloadsFrame, by="row.names", all=TRUE))Row.names cvsactions downloads 1 2002-11-15 4 12 2 2002-12-15 9 8 3 2003-01-15 5 NA 4 2003-02-15 5 NA 5 2002-09-15 NA 1 6 2002-10-15 NA 2 You can sort on Row.names later: say res[order(as.character(res$Row.names)), ] On Thu, 31 Aug 2006, James Howison wrote:> Hi, > > I've examined the archives and found quite a few questions on > concatenating dataframes, but none that really addressed my issue, > I'm afraid. I've also examined the cbind and rbind documentation but > nonetheless here I am writing to r-help ;) > > This is what I have (the row names are dates used for conversion to > an irregular time series with the its package): > > > cvsFrame > cvsactions > 2002-11-15 4 > 2002-12-15 9 > 2003-01-15 5 > 2003-02-15 5 > > > downloadsFrame > downloads > 2002-09-15 1 > 2002-10-15 2 > 2002-11-15 12 > 2002-12-15 8 > > (notice how the dates are overlapping?) > > The output I'd like is: > > cvsaction downloads > 2002-09-15 NA 1 > 2002-10-15 NA 2 > 2002-11-15 4 12 > 2002-12-15 9 8 > 2003-01-15 5 NA > 2003-02-15 5 NA > > ie. merge the data.frames, respecting the row.names and inserting NAs > where a frame didn't contain info for a row in the final frame. > > This is the closest I gotten (I'm sure cbind is doing what it's meant > to do but it's obviously not what I need) > > > cbind(downloadsFrame,cvsFrame) > downloads cvsactions > 2002-09-15 1 4 > 2002-10-15 2 9 > 2002-11-15 12 5 > 2002-12-15 8 5 > > It takes the row.names from the first frame given and then just adds > the data in rows 1 through 4, regardless of their row.name. And it > doesn't work at all if the column lengths are different. (Yes, it > would be nice if the 'its' class had a way to merge 'its' objects, > but the question seemed general enough to ask on list.) > > Thanks, > James > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595