Hi, I have a history dataset, a matrix with about 1590 obs, and 242 cols, and I need to update this matrix with an 'update' matrix that has about 30 rows, and roughly similar number of columns as the history ds (but not necessarily equal). The update dataset is read from an Excel ODBC connection. When I try and merge these datasets, I get counter-intuitive results. library(RODBC) chn <- odbcConnectExcel(UpdateFile) sqlTables(chn) UpdateData <- sqlFetch(chn,"MCap243") colnames(UpdateData) <- gsub("#",".",colnames(UpdateData)) close(chn) # specify just how many rows we need from the Update file. We'd only read five # rows at a time. UpdateRows = 20 UpdateData <- UpdateData[1:UpdateRows,] # Delete Unwanted stocks. UpdateData <- UpdateData[,!names(UpdateData) %in% ToBeDeleted] x <- tail(UpdateData[c("Date","ABAN.BO")],n=50) print(x) Gives x as: Date ABAN.BO 1 2007-04-30 96448.40 2 2007-05-01 96448.40 3 2007-05-02 96448.40 4 2007-05-03 96300.44 5 2007-05-04 93718.52 6 2007-05-05 93718.52 7 2007-05-06 93718.52 8 2007-05-07 92743.82 9 2007-05-08 90374.60 10 2007-05-09 89126.18 11 2007-05-10 87082.47 12 2007-05-11 85493.73 13 2007-05-12 85493.73 14 2007-05-13 85493.73 15 2007-05-14 85033.21 16 2007-05-15 89209.41 17 2007-05-16 89089.19 18 2007-05-17 90472.62 19 2007-05-18 90326.51 20 2007-05-19 90326.51 But when I merge this file with the history dataset, I get the dates misaligned by one row. whistory <- merge(rhistory,UpdateData,by.x=rhistory["Date"],all=TRUE) tail(WHist4[c("Date","ABAN.BO")],n=30) Date ABAN.BO 1581 2007-04-19 83632.60 1582 2007-04-20 85942.00 1583 2007-04-23 88244.00 1584 2007-04-24 90309.50 1585 2007-04-25 92048.00 1586 2007-04-26 92051.70 1587 2007-04-27 95863.10 1588 2007-04-29 96448.40 1589 2007-04-30 96448.40 1590 2007-04-30 96343.40 1591 2007-05-01 96448.40 1592 2007-05-02 96300.44 1593 2007-05-03 93718.52 1594 2007-05-03 96195.60 1595 2007-05-04 93718.52 1596 2007-05-04 93616.50 1597 2007-05-05 93718.52 1598 2007-05-06 92743.82 1599 2007-05-07 90374.60 1600 2007-05-08 89126.18 1601 2007-05-09 87082.47 1602 2007-05-10 85493.73 1603 2007-05-11 85493.73 1604 2007-05-12 85493.73 1605 2007-05-13 85033.21 1606 2007-05-14 89209.41 1607 2007-05-15 89089.19 1608 2007-05-16 90472.62 1609 2007-05-17 90326.51 1610 2007-05-18 90326.51 Any reasons why the dates are shifted by one date? Am I missing some parameters in the merge statement? TIA and best, -Tir
Take a look at the help for merge(): in all the examples by.x is a character string, not a one-column data frame which is what rhistory["Date"] would appear to be. Please note the trailer of this messsage. On Tue, 5 Jun 2007, Patnaik, Tirthankar wrote:> Hi, > I have a history dataset, a matrix with about 1590 obs, and 242 cols, > and I need to update this matrix with an 'update' matrix that has about > 30 rows, and roughly similar number of columns as the history ds (but > not necessarily equal). The update dataset is read from an Excel ODBC > connection. When I try and merge these datasets, I get counter-intuitive > results. > > library(RODBC) > chn <- odbcConnectExcel(UpdateFile) > sqlTables(chn) > UpdateData <- sqlFetch(chn,"MCap243") > colnames(UpdateData) <- gsub("#",".",colnames(UpdateData)) > close(chn) > # specify just how many rows we need from the Update file. We'd only > read five > # rows at a time. > UpdateRows = 20 > UpdateData <- UpdateData[1:UpdateRows,] > > # Delete Unwanted stocks. > UpdateData <- UpdateData[,!names(UpdateData) %in% ToBeDeleted] > x <- tail(UpdateData[c("Date","ABAN.BO")],n=50) > print(x) > > > Gives x as: > > Date ABAN.BO > 1 2007-04-30 96448.40 > 2 2007-05-01 96448.40 > 3 2007-05-02 96448.40 > 4 2007-05-03 96300.44 > 5 2007-05-04 93718.52 > 6 2007-05-05 93718.52 > 7 2007-05-06 93718.52 > 8 2007-05-07 92743.82 > 9 2007-05-08 90374.60 > 10 2007-05-09 89126.18 > 11 2007-05-10 87082.47 > 12 2007-05-11 85493.73 > 13 2007-05-12 85493.73 > 14 2007-05-13 85493.73 > 15 2007-05-14 85033.21 > 16 2007-05-15 89209.41 > 17 2007-05-16 89089.19 > 18 2007-05-17 90472.62 > 19 2007-05-18 90326.51 > 20 2007-05-19 90326.51 > > But when I merge this file with the history dataset, I get the dates > misaligned by one row. > > whistory <- merge(rhistory,UpdateData,by.x=rhistory["Date"],all=TRUE) > > tail(WHist4[c("Date","ABAN.BO")],n=30) > > Date ABAN.BO > 1581 2007-04-19 83632.60 > 1582 2007-04-20 85942.00 > 1583 2007-04-23 88244.00 > 1584 2007-04-24 90309.50 > 1585 2007-04-25 92048.00 > 1586 2007-04-26 92051.70 > 1587 2007-04-27 95863.10 > 1588 2007-04-29 96448.40 > 1589 2007-04-30 96448.40 > 1590 2007-04-30 96343.40 > 1591 2007-05-01 96448.40 > 1592 2007-05-02 96300.44 > 1593 2007-05-03 93718.52 > 1594 2007-05-03 96195.60 > 1595 2007-05-04 93718.52 > 1596 2007-05-04 93616.50 > 1597 2007-05-05 93718.52 > 1598 2007-05-06 92743.82 > 1599 2007-05-07 90374.60 > 1600 2007-05-08 89126.18 > 1601 2007-05-09 87082.47 > 1602 2007-05-10 85493.73 > 1603 2007-05-11 85493.73 > 1604 2007-05-12 85493.73 > 1605 2007-05-13 85033.21 > 1606 2007-05-14 89209.41 > 1607 2007-05-15 89089.19 > 1608 2007-05-16 90472.62 > 1609 2007-05-17 90326.51 > 1610 2007-05-18 90326.51 > > Any reasons why the dates are shifted by one date? Am I missing some > parameters in the merge statement? > > TIA and best, > -Tir > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof. Ripley, Date is a column in a matrix, which should be ok for the merge. Actually the IMHO problem was with the default sorting option, I set sort=FALSE, and the merge happened as it should. Best, -Tir Tirthankar Patnaik Analyst, India Strategy Citigroup Investment Research +91-22-6631 9887 P.S. Agree with you on the trailer of the message--I had to post the message in a hurry, though!> -----Original Message----- > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Sent: Tuesday, June 05, 2007 6:08 PM > To: Patnaik, Tirthankar [GWM-CIR] > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Problems with Merge > > Take a look at the help for merge(): in all the examples by.x > is a character string, not a one-column data frame which is > what rhistory["Date"] would appear to be. > > Please note the trailer of this messsage. > > On Tue, 5 Jun 2007, Patnaik, Tirthankar wrote: > > > Hi, > > I have a history dataset, a matrix with about 1590 obs, and > 242 cols, > > and I need to update this matrix with an 'update' matrix that has > > about 30 rows, and roughly similar number of columns as the > history ds > > (but not necessarily equal). The update dataset is read > from an Excel > > ODBC connection. When I try and merge these datasets, I get > > counter-intuitive results. > > > > library(RODBC) > > chn <- odbcConnectExcel(UpdateFile) > > sqlTables(chn) > > UpdateData <- sqlFetch(chn,"MCap243") > > colnames(UpdateData) <- gsub("#",".",colnames(UpdateData)) > > close(chn) > > # specify just how many rows we need from the Update > file. We'd only > > read five > > # rows at a time. > > UpdateRows = 20 > > UpdateData <- UpdateData[1:UpdateRows,] > > > > # Delete Unwanted stocks. > > UpdateData <- UpdateData[,!names(UpdateData) %in% ToBeDeleted] > > x <- tail(UpdateData[c("Date","ABAN.BO")],n=50) > > print(x) > > > > > > Gives x as: > > > > Date ABAN.BO > > 1 2007-04-30 96448.40 > > 2 2007-05-01 96448.40 > > 3 2007-05-02 96448.40 > > 4 2007-05-03 96300.44 > > 5 2007-05-04 93718.52 > > 6 2007-05-05 93718.52 > > 7 2007-05-06 93718.52 > > 8 2007-05-07 92743.82 > > 9 2007-05-08 90374.60 > > 10 2007-05-09 89126.18 > > 11 2007-05-10 87082.47 > > 12 2007-05-11 85493.73 > > 13 2007-05-12 85493.73 > > 14 2007-05-13 85493.73 > > 15 2007-05-14 85033.21 > > 16 2007-05-15 89209.41 > > 17 2007-05-16 89089.19 > > 18 2007-05-17 90472.62 > > 19 2007-05-18 90326.51 > > 20 2007-05-19 90326.51 > > > > But when I merge this file with the history dataset, I get > the dates > > misaligned by one row. > > > > whistory <- > merge(rhistory,UpdateData,by.x=rhistory["Date"],all=TRUE) > > > > tail(WHist4[c("Date","ABAN.BO")],n=30) > > > > Date ABAN.BO > > 1581 2007-04-19 83632.60 > > 1582 2007-04-20 85942.00 > > 1583 2007-04-23 88244.00 > > 1584 2007-04-24 90309.50 > > 1585 2007-04-25 92048.00 > > 1586 2007-04-26 92051.70 > > 1587 2007-04-27 95863.10 > > 1588 2007-04-29 96448.40 > > 1589 2007-04-30 96448.40 > > 1590 2007-04-30 96343.40 > > 1591 2007-05-01 96448.40 > > 1592 2007-05-02 96300.44 > > 1593 2007-05-03 93718.52 > > 1594 2007-05-03 96195.60 > > 1595 2007-05-04 93718.52 > > 1596 2007-05-04 93616.50 > > 1597 2007-05-05 93718.52 > > 1598 2007-05-06 92743.82 > > 1599 2007-05-07 90374.60 > > 1600 2007-05-08 89126.18 > > 1601 2007-05-09 87082.47 > > 1602 2007-05-10 85493.73 > > 1603 2007-05-11 85493.73 > > 1604 2007-05-12 85493.73 > > 1605 2007-05-13 85033.21 > > 1606 2007-05-14 89209.41 > > 1607 2007-05-15 89089.19 > > 1608 2007-05-16 90472.62 > > 1609 2007-05-17 90326.51 > > 1610 2007-05-18 90326.51 > > > > Any reasons why the dates are shifted by one date? Am I > missing some > > parameters in the merge statement? > > > > TIA and best, > > -Tir > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >