All, I am trying to read the S&P 500 constituents from the iShares website using the following code: URL <- "http://www.ishares.com/us/239726/fund-download.dl" setInternet2(TRUE) download.file(url=URL, destfile="temp.xls") out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", header=TRUE, startRow=13) R returns the following error:> out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", header=TRUE, startRow=13)Error: IllegalArgumentException (Java): Your InputStream was neither an OLE2 stream, nor an OOXML stream In addition: Warning message: In download.file(url = URL, destfile = "temp.xls") : downloaded length 1938303 != reported length 200 Upon further examination this is because the format is really XML. Is there any way to get XLConnect or any other excel reader to read in an XML file? I thought XML was for new Excel format. Barring that, can we read in the file using the XML package? I tried the following code... require(XML) tmp <- xmlParse(URL) ... but I get this error: Opening and ending tag mismatch: Style line 14 and Style Error: 1: Opening and ending tag mismatch: Style line 14 and Style Thanks in advance for any help or hints, Roger *************************************************************** This message and any attachments are for the named person's use only. This message may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by an error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message, any attachments and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message or any attachments if you are not the intended recipient.
Raghuraman Ramachandran
2015-Feb-27 19:04 UTC
[R] Reading in an XLS (really XML) file from website
This works: Change the destination directory to suit you. MyURL1 = "http://www.ishares.com/us/239726/fund-download.dl" download.file(MyURL1,paste("C:/Data/Rtest1",date1,"r.xls",sep=""),method="wget",quiet=TRUE,mode="wb", extra="--header=\"User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0\"") Cheers Raghu On Fri, Feb 27, 2015 at 4:01 PM, Bos, Roger <roger.bos at rothschild.com> wrote:> All, > > I am trying to read the S&P 500 constituents from the iShares website using the following code: > > URL <- "http://www.ishares.com/us/239726/fund-download.dl" > setInternet2(TRUE) > download.file(url=URL, destfile="temp.xls") > out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", header=TRUE, startRow=13) > > R returns the following error: > >> out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", header=TRUE, startRow=13) > Error: IllegalArgumentException (Java): Your InputStream was neither an OLE2 stream, nor an OOXML stream > In addition: Warning message: > In download.file(url = URL, destfile = "temp.xls") : > downloaded length 1938303 != reported length 200 > > Upon further examination this is because the format is really XML. Is there any way to get XLConnect or any other excel reader to read in an XML file? I thought XML was for new Excel format. > > Barring that, can we read in the file using the XML package? I tried the following code... > > require(XML) > tmp <- xmlParse(URL) > > ... but I get this error: > > Opening and ending tag mismatch: Style line 14 and Style > Error: 1: Opening and ending tag mismatch: Style line 14 and Style > > Thanks in advance for any help or hints, > > Roger > > > > *************************************************************** > This message and any attachments are for the named person's use only. > This message may contain confidential, proprietary or legally privileged > information. No right to confidential or privileged treatment > of this message is waived or lost by an error in transmission. > If you have received this message in error, please immediately > notify the sender by e-mail, delete the message, any attachments and all > copies from your system and destroy any hard copies. You must > not, directly or indirectly, use, disclose, distribute, > print or copy any part of this message or any attachments if you are not > the intended recipient. > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, Feb 27, 2015 at 10:01 AM, Bos, Roger <roger.bos at rothschild.com> wrote:> All, > > I am trying to read the S&P 500 constituents from the iShares website > using the following code: > > URL <- "http://www.ishares.com/us/239726/fund-download.dl" > setInternet2(TRUE) > download.file(url=URL, destfile="temp.xls") > out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", > header=TRUE, startRow=13) > > R returns the following error: > > > out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings", > header=TRUE, startRow=13) > Error: IllegalArgumentException (Java): Your InputStream was neither an > OLE2 stream, nor an OOXML stream > In addition: Warning message: > In download.file(url = URL, destfile = "temp.xls") : > downloaded length 1938303 != reported length 200 > > Upon further examination this is because the format is really XML. Is > there any way to get XLConnect or any other excel reader to read in an XML > file? I thought XML was for new Excel format. > > Barring that, can we read in the file using the XML package? I tried the > following code... > > require(XML) > tmp <- xmlParse(URL) > > ... but I get this error: > > Opening and ending tag mismatch: Style line 14 and Style > Error: 1: Opening and ending tag mismatch: Style line 14 and Style > > Thanks in advance for any help or hints, > > Roger > >?The problem is indeed on line 14 of the file. The contents of that line are: </style> but should be </ss:style> That is, the file is malformed. I edited the file to make that change and saved it. After I did this, I was able to open it as a spreadsheet using LibreOffice. I did all of this on my home Linux system. I don't have Windows, and thus no Excel either, available here, so I can't test with Excel. ?You should be able to download this file as shown by Raghuraman. On Windows (which I _assume_ you are using since most do), you can edit the file using Notepad, or Wordpad. I would use Wordpad myself. Notepad is "iffy" on some things. Save it back, then try readWorksheetFromFile() as you originally did. -- He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown [[alternative HTML version deleted]]