Hi Dennis
That those files are in a directory/folder suggests that they were extracted
from their
zip (.xlsx) file. The following are the basic contents of the .xlsx file
1484 02-28-11 12:48 [Content_Types].xml
733 02-28-11 12:48 _rels/.rels
972 02-28-11 12:48 xl/_rels/workbook.xml.rels
846 02-28-11 12:48 xl/workbook.xml
940 02-28-11 12:48 xl/styles.xml
1402 02-28-11 12:48 xl/worksheets/sheet2.xml
7562 02-28-11 12:48 xl/theme/theme1.xml
1888 02-28-11 12:48 xl/worksheets/sheet1.xml
470 02-28-11 12:48 xl/sharedStrings.xml
196 02-28-11 12:48 xl/calcChain.xml
21316 02-28-11 12:48 docProps/thumbnail.jpeg
629 02-28-11 12:48 docProps/core.xml
828 02-28-11 12:48 docProps/app.xml
If most of these are present, I would explore whether the sender could give them
to you without
unzipping them or make sure that your software isn't automatically unzipping
them for you.
Note that not all files in the .xlsx are sheets and the WorkSheet is the
basic entity that corresponds to a .csv file.
The xlsx package and my REXcelXML packages will probably get you a fair bit of
the way
in extracting the content, but they probably will need some tinkering since they
expect
the different components to be in a zip archive.
There is also an office2010 package which seems to have an overlap with what is
in
xlsx, and ROOXML, RWordXML and RExcelXML.
D.
On 8/10/11 7:26 AM, Dennis Fisher wrote:> R version 2.13.1
> OS X (or Windows)
>
> Colleagues,
>
> I received a number of files with a .xls extension. These files open in XL
and, by all appearances, are XL files. However, it appears to me that the files
are actually XML:
>
>> readLines(dir()[16])[1:10]
> [1] "<?xml version=\"1.0\"?>"
> [2] "<Workbook
xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\""
> [3] "
xmlns:o=\"urn:schemas-microsoft-com:office:office\""
> [4] "
xmlns:x=\"urn:schemas-microsoft-com:office:excel\""
> [5] "
xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\""
> [6] "
xmlns:html=\"http://www.w3.org/TR/REC-html40\">"
> [7] " <DocumentProperties
xmlns=\"urn:schemas-microsoft-com:office:office\">"
> [8] " <Version>12.0</Version>"
> [9] " </DocumentProperties>"
> [10] " <OfficeDocumentSettings
xmlns=\"urn:schemas-microsoft-com:office:office\">"
>
> I had initially tried to read the files using read.xls (gdata) but that
failed (not surprisingly). I could open each Excel file, then "save
as" csv, then use read.csv. However, there are many files so I would love
to have a solution that does not require this brute force approach.
>
> Are there any packages that would allow me to read these files without the
additional steps?
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.