The ishares website has the S&P 500 stocks you can download as a XLS file, which opens fine in Excel, but I am not able to open it in R due to what seems to be invalid XML formatting. I tried using XLConnect and XML as shown below. Does anyone know a workaround or can point out what I am doing wrong. Here is my reproducible code: temp <- "https://www.ishares.com/us/239726/fund-download.dl" fname <- "ivv.xls" download.file(url = temp, destfile = fname) readWorksheetFromFile(fname) library(XML) xmlfile <- xmlTreeParse(fname) 09:06:17 > readWorksheetFromFile(fname) Error: InvalidFormatException (Java): Your InputStream was neither an OLE2 stream, nor an OOXML stream 09:06:17 > library(XML) 09:06:25 > xmlfile <- xmlTreeParse(fname) Opening and ending tag mismatch: Style line 14 and Style Error: 1: Opening and ending tag mismatch: Style line 14 and Style Thanks in advance, Roger This message and any attachments are for the intended recipient's use only. This message may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by an error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message, any attachments and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message or any attachments if you are not the intended recipient. [[alternative HTML version deleted]]
Jeff Newmiller
2016-Jul-28 14:33 UTC
[R] problems reading XML type file from ishares website
XLS has nothing to do with XML. The shift from XLS to XLSX/XLSM formats was where XML was introduced. You might occasionally find mislabelled files that seem to work anyway, but there is a significant difference inside true XLS files. Use a package designed to handle your data format. There are a few, and most seem to require external software support (e.g. Perl or Java or Windows OS), so you have to decide what overhead support headaches you can tolerate. -- Sent from my phone. Please excuse my brevity. On July 28, 2016 6:14:28 AM PDT, "Bos, Roger" <roger.bos at rothschild.com> wrote:>The ishares website has the S&P 500 stocks you can download as a XLS >file, which opens fine in Excel, but I am not able to open it in R due >to what seems to be invalid XML formatting. I tried using XLConnect >and XML as shown below. Does anyone know a workaround or can point out >what I am doing wrong. Here is my reproducible code: > >temp <- "https://www.ishares.com/us/239726/fund-download.dl" >fname <- "ivv.xls" >download.file(url = temp, destfile = fname) >readWorksheetFromFile(fname) >library(XML) >xmlfile <- xmlTreeParse(fname) > >09:06:17 > readWorksheetFromFile(fname) >Error: InvalidFormatException (Java): Your InputStream was neither an >OLE2 stream, nor an OOXML stream >09:06:17 > library(XML) >09:06:25 > xmlfile <- xmlTreeParse(fname) >Opening and ending tag mismatch: Style line 14 and Style >Error: 1: Opening and ending tag mismatch: Style line 14 and Style > > >Thanks in advance, Roger > > > > > > > >This message and any attachments are for the intended recipient's use >only. > >This message may contain confidential, proprietary or legally >privileged > >information. No right to confidential or privileged treatment > >of this message is waived or lost by an error in transmission. > >If you have received this message in error, please immediately > >notify the sender by e-mail, delete the message, any attachments and >all > >copies from your system and destroy any hard copies. You must > >not, directly or indirectly, use, disclose, distribute, > >print or copy any part of this message or any attachments if you are >not > >the intended recipient. > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2016-Jul-28 18:20 UTC
[R] problems reading XML type file from ishares website
Please keep the list included in the thread (e.g. reply-all?). I looked at the file and agree that it looks like xml with a utf8 byte order mark and Unix line endings, which means it is not XLS and it is not XLSX (which is a zipped directory of xml files with DOS line endings). Excel complains but manages to open the file if it has the XLS extension, but I am not aware that any of the usual R Excel packages will understand this file. The byte order mark can be addressed by opening the file with encoding="UTF-8-BOM", but as you mentioned originally the XML structure is still broken (c.f. the error message about the Style ending tag). Line 16 seems to use /Style rather than /ss:Style. Maybe library(XML) txt <- readLines( fname, encoding="UTF-8-BOM" ) txt <- sub( "</Style>", "</ss:Style>", txt ) fnamenobom <- "nobom.xml" xmlfile <- xmlTreeParse( "nobom.xml" ) -- Sent from my phone. Please excuse my brevity. On July 28, 2016 8:26:44 AM PDT, "Bos, Roger" <roger.bos at rothschild.com> wrote:>Jeff, > >Thanks for your suggestions. I mentioned XLS because that is the >extension the ishares website provides. I have tried many packages >such as xml, xml2, XLConnect, and readxl. I am not even sure what data >format the file is, but I looks to me like XML and the extension is >XLS. If you have the names of specific packages you think I should >try, that would be very helpful. > >Thanks, > >Roger > > > > > >*************************************************************** >This message and any attachments are for the intended recipient's use >only. >This message may contain confidential, proprietary or legally >privileged >information. No right to confidential or privileged treatment >of this message is waived or lost by an error in transmission. >If you have received this message in error, please immediately >notify the sender by e-mail, delete the message, any attachments and >all >copies from your system and destroy any hard copies. You must >not, directly or indirectly, use, disclose, distribute, >print or copy any part of this message or any attachments if you are >not >the intended recipient. > > >-----Original Message----- >From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] >Sent: Thursday, July 28, 2016 10:34 AM >To: Bos, Roger; r-help at r-project.org >Subject: Re: [R] problems reading XML type file from ishares website > >XLS has nothing to do with XML. The shift from XLS to XLSX/XLSM formats >was where XML was introduced. You might occasionally find mislabelled >files that seem to work anyway, but there is a significant difference >inside true XLS files. > >Use a package designed to handle your data format. There are a few, and >most seem to require external software support (e.g. Perl or Java or >Windows OS), so you have to decide what overhead support headaches you >can tolerate. >-- >Sent from my phone. Please excuse my brevity. > >On July 28, 2016 6:14:28 AM PDT, "Bos, Roger" ><roger.bos at rothschild.com> wrote: >>The ishares website has the S&P 500 stocks you can download as a XLS >>file, which opens fine in Excel, but I am not able to open it in R due >>to what seems to be invalid XML formatting. I tried using XLConnect >>and XML as shown below. Does anyone know a workaround or can point >out >>what I am doing wrong. Here is my reproducible code: >> >>temp <- "https://www.ishares.com/us/239726/fund-download.dl" >>fname <- "ivv.xls" >>download.file(url = temp, destfile = fname) >>readWorksheetFromFile(fname) >>library(XML) >>xmlfile <- xmlTreeParse(fname) >> >>09:06:17 > readWorksheetFromFile(fname) >>Error: InvalidFormatException (Java): Your InputStream was neither an >>OLE2 stream, nor an OOXML stream >>09:06:17 > library(XML) >>09:06:25 > xmlfile <- xmlTreeParse(fname) Opening and ending tag >>mismatch: Style line 14 and Style >>Error: 1: Opening and ending tag mismatch: Style line 14 and Style >> >> >>Thanks in advance, Roger >> >> >> >> >> >> >> >>This message and any attachments are for the intended recipient's use >>only. >> >>This message may contain confidential, proprietary or legally >>privileged >> >>information. No right to confidential or privileged treatment >> >>of this message is waived or lost by an error in transmission. >> >>If you have received this message in error, please immediately >> >>notify the sender by e-mail, delete the message, any attachments and >>all >> >>copies from your system and destroy any hard copies. You must >> >>not, directly or indirectly, use, disclose, distribute, >> >>print or copy any part of this message or any attachments if you are >>not >> >>the intended recipient. >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2016-Jul-28 18:55 UTC
[R] problems reading XML type file from ishares website
Er, I failed to include the step to write the repaired data to a file... fnamenobom <- "nobom.xml" cat( paste( txt, collapse="\n" ), file=fnamenobom ) xmlfile <- xmlTreeParse( fnamenobom ) -- Sent from my phone. Please excuse my brevity. On July 28, 2016 11:20:23 AM PDT, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:>Please keep the list included in the thread (e.g. reply-all?). > >I looked at the file and agree that it looks like xml with a utf8 byte >order mark and Unix line endings, which means it is not XLS and it is >not XLSX (which is a zipped directory of xml files with DOS line >endings). Excel complains but manages to open the file if it has the >XLS extension, but I am not aware that any of the usual R Excel >packages will understand this file. > >The byte order mark can be addressed by opening the file with >encoding="UTF-8-BOM", but as you mentioned originally the XML structure >is still broken (c.f. the error message about the Style ending tag). >Line 16 seems to use /Style rather than /ss:Style. Maybe > >library(XML) >txt <- readLines( fname, encoding="UTF-8-BOM" ) >txt <- sub( "</Style>", "</ss:Style>", txt ) >fnamenobom <- "nobom.xml" >xmlfile <- xmlTreeParse( "nobom.xml" )