Dirk Eddelbuettel
2004-Apr-25 00:32 UTC
[Rd] Yahoo bug in tseries::get.hist.quote and its::priceIts
Both get.hist.quote, and its derivative priceIts, rely on download.file() to fetch financial data series from Yahoo! in .csv format. They allow for nice interactive demonstrations of what one can do with R. Unfortunately, both are currently broken as Yahoo! decided to add a somewhat useless html comment at the end of the csv 'stream', breaking the regular format of n rows with k columns. Here is an example for the S&P500 index since the beginning of the month (to keep it compact): Date,Open,High,Low,Close,Volume,Adj. Close* 23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60 22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93 21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09 20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15 19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82 16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61 15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84 14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17 13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44 12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20 9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32 8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32 7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53 6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16 5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57 2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81 1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17 <!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 --> Is there an _elegant and portable_ way of reading this with the last line? I needed this, and used the somewhat clunky data <- read.csv(destfile) unlink(destfile) data <- data[-(nlines-1),] # skip very last line with commment which uses nlines, which had already been computed (as has a offset of one because of the header line). I'd be happy to send this as a patch to tseries and its, but I have the feeling we could do better. How? Thanks, Dirk -- The relationship between the computed price and reality is as yet unknown. -- From the pac(8) manual page
Douglas Bates
2004-Apr-25 00:50 UTC
[Rd] Yahoo bug in tseries::get.hist.quote and its::priceIts
Dirk Eddelbuettel <edd@debian.org> writes:> Both get.hist.quote, and its derivative priceIts, rely on download.file() to > fetch financial data series from Yahoo! in .csv format. They allow for nice > interactive demonstrations of what one can do with R. > > Unfortunately, both are currently broken as Yahoo! decided to add a somewhat > useless html comment at the end of the csv 'stream', breaking the regular > format of n rows with k columns. Here is an example for the S&P500 index > since the beginning of the month (to keep it compact): > > Date,Open,High,Low,Close,Volume,Adj. Close* > 23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60 > 22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93 > 21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09 > 20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15 > 19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82 > 16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61 > 15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84 > 14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17 > 13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44 > 12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20 > 9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32 > 8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32 > 7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53 > 6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16 > 5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57 > 2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81 > 1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17 > <!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 --> > > Is there an _elegant and portable_ way of reading this with the last line?If you do not expect to encounter the "<" character in your data you could try adding comment.char = "<" to your call to read.csv.
Peter Dalgaard
2004-Apr-25 01:19 UTC
[Rd] Yahoo bug in tseries::get.hist.quote and its::priceIts
Dirk Eddelbuettel <edd@debian.org> writes:> Both get.hist.quote, and its derivative priceIts, rely on download.file() to > fetch financial data series from Yahoo! in .csv format. They allow for nice > interactive demonstrations of what one can do with R.Er, how does this affect get.hist.quote? I see some flakiness, but the basic conversion appears to work:> spc <- get.hist.quote(instrument = "spc", start = "1998-01-01")trying URL `http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv' Error in download.file(url, destfile, method = method) : cannot open URL `http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv' In addition: Warning message: cannot open: HTTP status was `404 Not Found'> spc <- get.hist.quote(instrument = "spc", start = "1998-01-01")trying URL `http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv' Content type `application/octet-stream' length unknown opened URL .......... .......... .......... .......... .......... .......... .......... .. downloaded 72Kb time series starts 1998-01-02 time series ends 2004-04-01 (Yes, that's the same URL, a few seconds later!)> Unfortunately, both are currently broken as Yahoo! decided to add a somewhat > useless html comment at the end of the csv 'stream', breaking the regular > format of n rows with k columns. Here is an example for the S&P500 index > since the beginning of the month (to keep it compact): > > Date,Open,High,Low,Close,Volume,Adj. Close* > 23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60 > 22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93 > 21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09 > 20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15 > 19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82 > 16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61 > 15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84 > 14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17 > 13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44 > 12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20 > 9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32 > 8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32 > 7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53 > 6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16 > 5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57 > 2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81 > 1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17 > <!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 --> > > Is there an _elegant and portable_ way of reading this with the last line? > I needed this, and used the somewhat clunky > > data <- read.csv(destfile) > unlink(destfile) > data <- data[-(nlines-1),] # skip very last line with commment > > which uses nlines, which had already been computed (as has a offset of one > because of the header line).How about this?> v <- readLines(url("http://chart.yahoo.com/table.csv?s=ibm&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=ibm&x=.csv")) > x <- read.csv(textConnection(v[-grep("^<!",v)])) > str(x)`data.frame': 1586 obs. of 7 variables: $ Date : Factor w/ 1586 levels "1-Apr-02","1-Ap..",..: 786 732 681 629 524 368 315 263 210 157 ... $ Open : num 91.0 90.5 91.2 92.0 91.9 ... $ High : num 91.6 91.5 91.4 92.5 92.3 ... $ Low : num 90.4 89.7 90.7 90.7 91.7 ... $ Close : num 91.3 90.7 91.3 90.7 91.9 ... $ Volume : int 5063200 7988000 4623400 4260200 4159400 1111800 6844200 5316300 5013600 3112600 ... $ Adj..Close.: num 91.3 90.7 91.3 90.7 91.9 ... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907