Ritter, Christian C MCIL-CTANL/S
2004-Sep-09 08:58 UTC
[R] Handling the windows clipboard/32KB limit
(R 1.9.1; Windows 2000;)
I'm just comparing ease of use, speed, etc for methods of transferring data
frames in the Excel, MySQL, R triangle. It turns out that going from Excel to R
(when doing this carefully). Using the clipboard is actually quite fast and
efficient (2 seconds for transferring 120 000 cells on a common desktop computer
as compared to much longer for going the RODBC route, maybe also substantially
longer using the R(D)COM route). Other advantage: Relatively flexible handling
of missing values from Excel which may be "empty", "#N/A",
etc. So I thought I would also look at ways of going back via the clipboard.
There I'm hitting a funny snag. The documentation explains that there is a
32KB limit on writing to the clipboard. This may make sense as a default, but
does it make sense as a hard restriction? Any ideas of getting around this
limitation (besides cumbersome R-code buffering/breaking up the frame to send
into small pieces to be collected and assembled in Excel?
Chris
P.S. (some details for those who are interested)
Here is what I use to go from Excel to R (which in turn uses the RExcel add in
to send the R call, but not the data):
Range("Alldata").Copy
Call
RRun("Alldata<-read.table(file=file(description='clipboard'),sep='\t',na.strings=c('#N/A',''),header=TRUE,comment.char=';')")
There are two funny things about this:
1. It's very fast.
2. (this may be bug or feature depending on view point):
Range("Alldata").Copy is filter sensitive. So, if I use a common
filter to select a part of Alldata in Excel, executing the code will transfer
only this part. If we turn this volatile and add a few lines for some meaninful
analysis which reports itself back or displays itself in a graph, we arrive at
an interesting level of animation).
Here is what I thought of using for going back:
write.table(Alldata,file=con<-file(description='clipboard',open="w"),sep='\t',na='#N/A',col.names=TRUE,row.names=FALSE);close(con)
And this works great if Alldata is small (less than 32KB when converted to
text). It no longer works if Alldata is larger (only the first 32KB go on the
clipboard).
Christian Ritter
Functional Specialist Statistics
Shell Coordination Centre S.A.
Monnet Centre International Laboratory, Avenue Jean Monnet 1, B-1348
Louvain-La-Neuve, Belgium
Tel: +32 10 477 349 Fax: +32 10 477 219
Email: christian.ritter@shell.com
Internet: http://www.shell.com/chemicals
[[alternative HTML version deleted]]
On Thu, 9 Sep 2004, Ritter, Christian C MCIL-CTANL/S wrote:> (R 1.9.1; Windows 2000;) > > I'm just comparing ease of use, speed, etc for methods of transferring > data frames in the Excel, MySQL, R triangle. It turns out that going > from Excel to R (when doing this carefully). Using the clipboard is > actually quite fast and efficient (2 seconds for transferring 120 000 > cells on a common desktop computer as compared to much longer for going > the RODBC route, maybe also substantially longer using the R(D)COM > route). Other advantage: Relatively flexible handling of missing values > from Excel which may be "empty", "#N/A", etc.You clearly haven't looked at RODBC carefully as that has equally flexible handling.> So I thought I would also look at ways of going back via the clipboard. > There I'm hitting a funny snag. The documentation explains that there is > a 32KB limit on writing to the clipboard. This may make sense as a > default, but does it make sense as a hard restriction?I guess you are talking about using file="clipboard" in a connection, but there is also writeClipboard, whose documentation does not mention a limit. My Windows documentation says there is a 32Kb limit on the size of an text object put on the clipboard, and I believe that is the case for Windows 95/98/ME at least. So a hard restriction makes sense.> Any ideas of getting around this limitation (besides cumbersome R-code > buffering/breaking up the frame to send into small pieces to be > collected and assembled in Excel?R is Open Source, so this is an opportunity for you to experiment, test on various versions of Windows and contribute a patch back to the R project. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Ritter, Christian C MCIL-CTANL/S
2004-Sep-09 12:47 UTC
[R] Handling the windows clipboard/32KB limit
Thanks Brian for the hint.
Indeed, writeClipboard does not suffer from the 32KB limit. Here is a line which
can put large quantities of data on the clipboard:
writeClipboard(gsub("NA","\#N/A",apply(Alldata,1,paste,collapse="\t")))
this can then be pasted either manually or via a VBA macro into a spreadsheet.
Here is some more info for R/Excel junkies:
==========================================VBA example using RExcel to pass the
command:
Sub FetchDataframe(Dfname As String, TargetRange As Range)
Dim commandString As String
commandString = "writeClipboard(c(paste(colnames(" & Dfname
&
"),collapse='\t'),gsub('NA','\#N/A',apply("
& Dfname & ",1,paste,collapse='\t'))))"
Call RRun(commandString)
TargetRange.PasteSpecial
End Sub
Again, the time for
FetchDataframe "Alldata",Range("Sheet1!$A$1")
on about 120000 cells is on the order of two or three seconds. I looked at
scaling. For 360000 (mixed type) cells, the required time was on the order of 15
seconds.
I also looked at putting data from Excel to R: Here things scale even better.
Using the code:
Sub PutDataframe(Dfname As String, TargetRange As Range)
Dim commandString As String
commandString = Dfname &
"<-read.table(file=file(description='clipboard'),sep='\t',na.strings=c('#N/A',''),header=TRUE,comment.char=';')"
TargetRange.Copy
Call RRun(commandString)
End Sub
the transfer
PutDataframe "Alldata",Range("Alldata")
took 7 seconds for 360000 (mixed type) cells.
As Brian points out: Obviously, RODBC is more flexible for selecting, cleaning,
etc. But in my experience it is a bit slow when dealing with Excel as a data
source (via dsn<-odbcConnectExcel("Myfile.xls"). I redid my 120000
cell example using Alldata<-sqlFetch(dsn,"Alldata") and needed
about 80 seconds. Now, this is not the fault of RODBC but probably of starting
and using the jet data engine within Windows.
Moreover, in particular, if the workbook Myfile.xls is actually open wile odbc
is working on it, there may be memory or performance leaks (probably unrelated
to RODBC). If I retrieve the same data frame (from Excel) several times, (either
opening and closing the connection every time or opening it once and fetching
several times) the retrieval times can become much slower as a go. At some time
there may be "waiting for an OLE action to complete messages".
RExcel/R(D)COM gives a third possibility with probably much more
flexibility/safety than my crude clipboard method and we are currently working
hard at increasing its speed. Currently, we are at about 20 seconds for the
120000 cell example.
In summary, what strikes me about the clipboard approach is its speed and the
size of objects which can be moved between Excel and R. Some fine tuning will be
needed to deal with exceptions.
-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: Thursday, 09 September, 2004 12:41 PM
To: Ritter, Christian C MCIL-CTANL/S
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Handling the windows clipboard/32KB limit
On Thu, 9 Sep 2004, Ritter, Christian C MCIL-CTANL/S wrote:
> (R 1.9.1; Windows 2000;)
>
> I'm just comparing ease of use, speed, etc for methods of transferring
> data frames in the Excel, MySQL, R triangle. It turns out that going
> from Excel to R (when doing this carefully). Using the clipboard is
> actually quite fast and efficient (2 seconds for transferring 120 000
> cells on a common desktop computer as compared to much longer for going
> the RODBC route, maybe also substantially longer using the R(D)COM
> route). Other advantage: Relatively flexible handling of missing values
> from Excel which may be "empty", "#N/A", etc.
You clearly haven't looked at RODBC carefully as that has equally flexible
handling.
> So I thought I would also look at ways of going back via the clipboard.
> There I'm hitting a funny snag. The documentation explains that there
is
> a 32KB limit on writing to the clipboard. This may make sense as a
> default, but does it make sense as a hard restriction?
I guess you are talking about using file="clipboard" in a connection,
but
there is also writeClipboard, whose documentation does not mention a
limit.
My Windows documentation says there is a 32Kb limit on the size of an
text object put on the clipboard, and I believe that is the case for
Windows 95/98/ME at least. So a hard restriction makes sense.
> Any ideas of getting around this limitation (besides cumbersome R-code
> buffering/breaking up the frame to send into small pieces to be
> collected and assembled in Excel?
R is Open Source, so this is an opportunity for you to experiment, test on
various versions of Windows and contribute a patch back to the R project.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595