James Howison
2006-Aug-31 04:36 UTC
[R] Combine 'overlapping' dataframes, respecting row names
Hi,
I've examined the archives and found quite a few questions on
concatenating dataframes, but none that really addressed my issue,
I'm afraid. I've also examined the cbind and rbind documentation but
nonetheless here I am writing to r-help ;)
This is what I have (the row names are dates used for conversion to
an irregular time series with the its package):
> cvsFrame
cvsactions
2002-11-15 4
2002-12-15 9
2003-01-15 5
2003-02-15 5
> downloadsFrame
downloads
2002-09-15 1
2002-10-15 2
2002-11-15 12
2002-12-15 8
(notice how the dates are overlapping?)
The output I'd like is:
cvsaction downloads
2002-09-15 NA 1
2002-10-15 NA 2
2002-11-15 4 12
2002-12-15 9 8
2003-01-15 5 NA
2003-02-15 5 NA
ie. merge the data.frames, respecting the row.names and inserting NAs
where a frame didn't contain info for a row in the final frame.
This is the closest I gotten (I'm sure cbind is doing what it's meant
to do but it's obviously not what I need)
> cbind(downloadsFrame,cvsFrame)
downloads cvsactions
2002-09-15 1 4
2002-10-15 2 9
2002-11-15 12 5
2002-12-15 8 5
It takes the row.names from the first frame given and then just adds
the data in rows 1 through 4, regardless of their row.name. And it
doesn't work at all if the column lengths are different. (Yes, it
would be nice if the 'its' class had a way to merge 'its'
objects,
but the question seemed general enough to ask on list.)
Thanks,
James
Gabor Grothendieck
2006-Aug-31 04:51 UTC
[R] Combine 'overlapping' dataframes, respecting row names
If you are converting them to 'its' anyways then after the conversion to 'its' use the 'its' union command. On 8/31/06, James Howison <jhowison at syr.edu> wrote:> Hi, > > I've examined the archives and found quite a few questions on > concatenating dataframes, but none that really addressed my issue, > I'm afraid. I've also examined the cbind and rbind documentation but > nonetheless here I am writing to r-help ;) > > This is what I have (the row names are dates used for conversion to > an irregular time series with the its package): > > > cvsFrame > cvsactions > 2002-11-15 4 > 2002-12-15 9 > 2003-01-15 5 > 2003-02-15 5 > > > downloadsFrame > downloads > 2002-09-15 1 > 2002-10-15 2 > 2002-11-15 12 > 2002-12-15 8 > > (notice how the dates are overlapping?) > > The output I'd like is: > > cvsaction downloads > 2002-09-15 NA 1 > 2002-10-15 NA 2 > 2002-11-15 4 12 > 2002-12-15 9 8 > 2003-01-15 5 NA > 2003-02-15 5 NA > > ie. merge the data.frames, respecting the row.names and inserting NAs > where a frame didn't contain info for a row in the final frame. > > This is the closest I gotten (I'm sure cbind is doing what it's meant > to do but it's obviously not what I need) > > > cbind(downloadsFrame,cvsFrame) > downloads cvsactions > 2002-09-15 1 4 > 2002-10-15 2 9 > 2002-11-15 12 5 > 2002-12-15 8 5 > > It takes the row.names from the first frame given and then just adds > the data in rows 1 through 4, regardless of their row.name. And it > doesn't work at all if the column lengths are different. (Yes, it > would be nice if the 'its' class had a way to merge 'its' objects, > but the question seemed general enough to ask on list.) > > Thanks, > James > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Prof Brian Ripley
2006-Aug-31 04:54 UTC
[R] Combine 'overlapping' dataframes, respecting row names
'merge' is the key here. You say you want to merge, but it seems did not try merge()> (res <- merge(cvsFrame, downloadsFrame, by="row.names", all=TRUE))Row.names cvsactions downloads 1 2002-11-15 4 12 2 2002-12-15 9 8 3 2003-01-15 5 NA 4 2003-02-15 5 NA 5 2002-09-15 NA 1 6 2002-10-15 NA 2 You can sort on Row.names later: say res[order(as.character(res$Row.names)), ] On Thu, 31 Aug 2006, James Howison wrote:> Hi, > > I've examined the archives and found quite a few questions on > concatenating dataframes, but none that really addressed my issue, > I'm afraid. I've also examined the cbind and rbind documentation but > nonetheless here I am writing to r-help ;) > > This is what I have (the row names are dates used for conversion to > an irregular time series with the its package): > > > cvsFrame > cvsactions > 2002-11-15 4 > 2002-12-15 9 > 2003-01-15 5 > 2003-02-15 5 > > > downloadsFrame > downloads > 2002-09-15 1 > 2002-10-15 2 > 2002-11-15 12 > 2002-12-15 8 > > (notice how the dates are overlapping?) > > The output I'd like is: > > cvsaction downloads > 2002-09-15 NA 1 > 2002-10-15 NA 2 > 2002-11-15 4 12 > 2002-12-15 9 8 > 2003-01-15 5 NA > 2003-02-15 5 NA > > ie. merge the data.frames, respecting the row.names and inserting NAs > where a frame didn't contain info for a row in the final frame. > > This is the closest I gotten (I'm sure cbind is doing what it's meant > to do but it's obviously not what I need) > > > cbind(downloadsFrame,cvsFrame) > downloads cvsactions > 2002-09-15 1 4 > 2002-10-15 2 9 > 2002-11-15 12 5 > 2002-12-15 8 5 > > It takes the row.names from the first frame given and then just adds > the data in rows 1 through 4, regardless of their row.name. And it > doesn't work at all if the column lengths are different. (Yes, it > would be nice if the 'its' class had a way to merge 'its' objects, > but the question seemed general enough to ask on list.) > > Thanks, > James > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595