Marc Schwartz
2007-Jul-08 20:20 UTC
[R] Writing Excel (.xls) files on non-Windows OSs using Perl
Hi all, There have been quite a few threads in the recent months pertaining to the ability to directly write native Excel (.xls) files from R. For example, exporting R matrices and/or data frames to an Excel file, with perhaps the ability to create multiple tabs (worksheets) within a single file, with one tab/sheet per R object. There exists the xlsReadWrite package on CRAN by Hans-Peter Suter, which is restricted to Windows, since it utilizes the non-FOSS MS Office API to write the Excel formats. I recently had the need, under Linux (FC6/F7) to create an Excel file containing multiple worksheets, each worksheet containing an 'exported' data frame from R. While one could export the data frames to delimited files (ie. using write.table() ) and then open those files from Excel (or OO.org's Calc), it was rather tedious to do so with a larger number of R objects. Since I would now have the need to engage in this process with some level of frequency, the preceding approach would not be time efficient. I thus embarked on a mini-project to create a Perl script utilizing openly available functions from CPAN and then facilitate the calling of the script directly from R. I am posting the Perl code here for the benefit of others who may have similar requirements. Please note that I am providing this 'as is' and don't have any plans to substantively modify or enhance the code. It does what I need it to do. Feel free to modify for other needs as may be required. The basic calling schema is: WriteXLS.pl [--CSVpath] [--CSVfiles] ExcelFileName Where: CSVpath = Path to the csv files created in R, typically done using write.table() CSVfiles = globbed file name specification (ie. *.csv) ExcelFileName = FULL name of Excel .xls file to create When the Excel file is created, a new worksheet (tab) will be created for each CSV file imported. The worksheet name will be the basename (no path or extension) of the CSV file, up to the first 31 characters, which is a limitation for Excel worksheet names. Note of course that Excel has certain (version specific) limitations with respect to file formats. I list the MS link below for Excel 2007. Similar specs are available for earlier versions: http://office.microsoft.com/en-us/excel/HP100738491033.aspx Finally, note that I use 'Spreadsheet::WriteExcel::Big', as the regular version of the Perl package has a constraint where the ENTIRE Excel file cannot be larger than 7 Mb, which was a problem for my application. Here is the Perl code: #!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel::Big; use Getopt::Long; use File::Glob; use File::Basename; use Text::CSV_XS; # Initialize and get command line arguments my $CSVPath = '.'; my $CSVFiles = "*.csv"; GetOptions ('CSVpath=s' => \$CSVPath, 'CSVfiles=s' => \$CSVFiles); my $ExcelFileName = $ARGV[0]; # Create Excel XLS File print "Creating Excel File: $ExcelFileName\n\n"; my $XLSFile = Spreadsheet::WriteExcel::Big->new($ExcelFileName); # Glob file path and names my @FileNames = <$CSVPath/$CSVFiles>; foreach my $FileName (@FileNames) { print "Reading: $FileName\n"; # Open CSV File my $csv = Text::CSV_XS->new(); open (CSVFILE, "$FileName") || die "ERROR: cannot open $FileName. $!\n"; # Create new sheet with filename prefix # ($base, $dir, $ext) = fileparse ($FileName, '..*'); my $FName = (fileparse ($FileName, '\..*'))[0]; # Only take the first 31 chars, which is the # limit for a worksheet name my $SheetName = substr($FName, 0, 31); print "Creating New WorkSheet: $SheetName\n\n"; my $WorkSheet = $XLSFile->add_worksheet($SheetName); # Rows and columns are zero indexed my $Row = 0; # Write to Sheet while (<CSVFILE>) { if ($csv->parse($_)) { my @Fields = $csv->fields(); my $Col = 0; foreach my $Fld (@Fields) { $WorkSheet->write($Row, $Col, $Fld); $Col++; } $Row++; } } close CSVFILE; } A 'typical' sequence for the use of the code from within R might be: # Create a character vector of R objects to be exported RObjects <- c(VectorOfRObjectNames, ...) # Now loop through the vector, creating CSV files # In this case, export to a 'CSVFILES' sub-directory for (i in RObjects) { write.table(get(i), file = paste("CSVFILES/", i, ".csv", sep = ""), sep = ",", quote = TRUE, na = "", row.names = FALSE) } # Now call the Perl script from within R, presuming # that the script is in the current default directory system("./WriteXLS.pl --CSVPath CSVFILES RExport.xls") This process has worked for me, given the current functional requirements for my project. I hope that this is of some help to others. Regards, Marc Schwartz
Hans-Peter
2007-Jul-09 13:42 UTC
[R] Writing Excel (.xls) files on non-Windows OSs using Perl
Hi, 2007/7/8, Marc Schwartz <marc_schwartz at comcast.net>:> [snip] > There exists the xlsReadWrite package on CRAN by Hans-Peter Suter, which > is restricted to Windows, since it utilizes the non-FOSS MS Office API > to write the Excel formats.The non-FOSS API is not the problem(#) but its implementation is: The 3rd party library I use is written in Pascal and supports Delphi and Kylix. Kylix would allow to port the package to Linux but as Kylix has unfortunately been abandoned by CodeGear (Borland) I am not ready/interested to spend my time on this dead road. Though it probably could be done quickly. A much more interesting way is to port the package using FreePascal. --> I plan to do this since long but... --> Maybe someone fluent on Linux and FreePascal could have a look at the pascal header files (treetron.googlepages.com) and make the demos run on Linux..., that would be great and speed up an eventual xlsReadWrite port! -- Regards, Hans-Peter (#) at least for people who are not in principle opposed to run a package which contains a non-FOSS code part.