I?ve got an Excel workbook with about 30 worksheets. Each worksheet has 10000 rows of data over 30 columns. I?d like to read the data from each worksheet into a dataframe or matrix in R for processing. Normally, I use read.csv when interacting with Excel but I?d rather manipulate a multisheet workbook directly than set about splitting the original workbook and saving down each part as a csv. So far, I?ve tried using read.xlsx from the xlsx package. This works fine for small test files ? e.g. suppose I?m trying to read from the test_file workbook on my desktop. The following code extracts rows 1 and 2 from worksheet = ?johnny?. setwd("C:\\Documents and Settings\\dmenezes\\Desktop") info<- read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE) info However, when I try to apply this to my real, large workbook, things go wrong, with the following error message. Any ideas/workarounds? Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space
Java uses "heap space" when creating new objects. My guess is that since the default size is 128 Mb iirc, you are reading in an object larger than this. I don't know the guts of the xlsx package or if there is a way to increase the heap, but you may get by if you can divide up your data import. HTH, Jon On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:> I?ve got an Excel workbook with about 30 worksheets. ?Each worksheet > has 10000 rows of data over 30 columns. > > I?d like to read the data from each worksheet into a dataframe or > matrix in R for processing. ?Normally, I use read.csv when interacting > with Excel but I?d rather manipulate a multisheet workbook directly > than set about splitting the original workbook and saving down each > part as a csv. > > So far, I?ve tried using read.xlsx from the xlsx package. ?This works > fine for small test files ? e.g. suppose I?m trying to read from the > test_file workbook on my desktop. ?The following code extracts rows 1 > and 2 from worksheet = ?johnny?. > > setwd("C:\\Documents and Settings\\dmenezes\\Desktop") > info<- > read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE) > info > > However, when I try to apply this to my real, large workbook, things > go wrong, with the following error message. ?Any ideas/workarounds? > > Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", > cl, ?: > ?java.lang.OutOfMemoryError: Java heap space > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ==============================================Jon Daily Technician ==============================================#!/usr/bin/env outside # It's great, trust me.
dM/ <david.n.menezes <at> gmail.com> writes:> > I?ve got an Excel workbook with about 30 worksheets. Each worksheet > has 10000 rows of data over 30 columns. > > I?d like to read the data from each worksheet into a dataframe or > matrix in R for processing. Normally, I use read.csv when interacting > with Excel but I?d rather manipulate a multisheet workbook directly > than set about splitting the original workbook and saving down each > part as a csv. >[snip]> However, when I try to apply this to my real, large workbook, things > go wrong, with the following error message. Any ideas/workarounds? >Try read.xls from the gdata package? Or see http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel for more suggestions on I/O between R and Excel.
On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:> I?ve got an Excel workbook with about 30 worksheets. ?Each worksheet > has 10000 rows of data over 30 columns. > > I?d like to read the data from each worksheet into a dataframe or > matrix in R for processing. ?Normally, I use read.csv when interacting > with Excel but I?d rather manipulate a multisheet workbook directly > than set about splitting the original workbook and saving down each > part as a csv. > > So far, I?ve tried using read.xlsx from the xlsx package. ?This works > fine for small test files ? e.g. suppose I?m trying to read from the > test_file workbook on my desktop. ?The following code extracts rows 1 > and 2 from worksheet = ?johnny?. > > setwd("C:\\Documents and Settings\\dmenezes\\Desktop") > info<- > read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE) > info > > However, when I try to apply this to my real, large workbook, things > go wrong, with the following error message. ?Any ideas/workarounds? > > Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", > cl, ?: > ?java.lang.OutOfMemoryError: Java heap spaceTry increasing the heap size. Restart R and: options(java.parameters="-Xmx1024m") library(...) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com