I have Excel files containing data that I would like to move to R. They are in the standard form of a one row header followed by rows of data, one record per row EXCEPT that there are a few rows of comments before the header. The number of rows of comments varies. For Excel files of this form without comments I have had success with: require(RODBC) z <- odbcConnectExcel("C:/myspread.xls") z.df <- sqlFetch(z,"Sheet1") close(z) but the comments interfere with this. I don't want to manually delete the rows but want the entire process from Excel file to R to be automatic. I can accomplish this with a free utility, Baird's dataload that I found on the net. This will convert the Excel files to text and then the text can be processed using R to locate the start of the header and only process the remainder of the file. (There is also another free utility called xlhtml that I don't use, but could have, that does this too.) Thus at this point I have an adequate automated solution. Nevertheless, I was wondering, for sake of interest, if there is some solution in R that does not involve such an external program such as dataload or xlhtml. Thanks. (I am using Windows 2000.) _______________________________________________ No banners. No pop-ups. No kidding. Introducing My Way - http://www.myway.com
On Tue, Oct 21, 2003 at 08:31:16PM -0400, Gabor Grothendieck wrote:> > > I have Excel files containing data that I would like to move to R. > They are in the standard form of a one row header followed by > rows of data, one record per row EXCEPT that there are a few > rows of comments before the header. The number of rows of comments > varies. For Excel files of this form without comments I have had > success with: > > require(RODBC) > z <- odbcConnectExcel("C:/myspread.xls") > z.df <- sqlFetch(z,"Sheet1") > close(z) > > but the comments interfere with this. > > I don't want to manually delete the rows but want the entire > process from Excel file to R to be automatic. > > I can accomplish this with a free utility, Baird's dataload that > I found on the net. This will convert the Excel files to text > and then the text can be processed using R to locate the start of > the header and only process the remainder of the file. (There is > also another free utility called xlhtml that I don't use, but could > have, that does this too.) Thus at this point I have an > adequate automated solution.There is also Spreadsheet::ParseExcel, which comes with a simple xls2csv which I once extended, and posted here (or maybe only to BDR following a discussion here). Being Perl, it can easily be automated, and will cope with your comment lines. If I recall, ActiveState provides this as well for win* platforms.> Nevertheless, I was wondering, for sake of interest, if there is > some solution in R that does not involve such an external program > such as dataload or xlhtml.There are a few candidates for a cross-platform solution: - GNU Gretl (an econometric program with a nice Gnome GUI, and a win32 port) has code for this, taken from a C program xls2csv as well as from Gnumeric. I had planned to look into this for R, but never got around to it. - Gnumeric just added a standalone tool 'ssconvert', this may compile on Windows. - Also, OpenOffice has code for this which one could extract, but I am not familiar with the details. Someone just has to sit down and do it. Typically the person with the greatest urge wins. As I nowadays get all my data directly from databases systems, I will probably not be the one. Hth, Dirk -- Those are my principles, and if you don't like them... well, I have others. -- Groucho Marx
Indeed, keeping the computations in R and working directly with the Excel view of the cells is desirable for generality. It minimizes the loss in information from translating to simpler forms such as CSV, strings, etc.>From within R on Windows, one can use DCOM to create an Excelapplication, load the file and then read each of the worksheets in the workbook cell by cell and obtain all sorts of information about each cell. The RDCOMClient package on the Omegahat Web site (http://www.omegahat.org/RDCOMClient) and Thomas Baier's rcom package provide the DCOM facilities from R. David James has already done some work to process Excel Worksheet and Range objects using the RDCOMClient and create R data frames. And there is C code in the GGobi distribution which does something very similar. D. Gabor Grothendieck wrote:> > > I have Excel files containing data that I would like to move to R. > They are in the standard form of a one row header followed by > rows of data, one record per row EXCEPT that there are a few > rows of comments before the header. The number of rows of comments > varies. For Excel files of this form without comments I have had > success with: > > require(RODBC) > z <- odbcConnectExcel("C:/myspread.xls") > z.df <- sqlFetch(z,"Sheet1") > close(z) > > but the comments interfere with this. > > I don't want to manually delete the rows but want the entire > process from Excel file to R to be automatic. > > I can accomplish this with a free utility, Baird's dataload that > I found on the net. This will convert the Excel files to text > and then the text can be processed using R to locate the start of > the header and only process the remainder of the file. (There is > also another free utility called xlhtml that I don't use, but could > have, that does this too.) Thus at this point I have an > adequate automated solution. > > Nevertheless, I was wondering, for sake of interest, if there is > some solution in R that does not involve such an external program > such as dataload or xlhtml. > > Thanks. > > (I am using Windows 2000.) > > _______________________________________________ > No banners. No pop-ups. No kidding. > Introducing My Way - http://www.myway.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help-- _______________________________________________________________ Duncan Temple Lang duncan at research.bell-labs.com Bell Labs, Lucent Technologies office: (908)582-3217 700 Mountain Avenue, Room 2C-259 fax: (908)582-3340 Murray Hill, NJ 07974-2070 http://cm.bell-labs.com/stat/duncan