Ritter, Christian C MCIL-CTANL/S
2004-Sep-09 08:58 UTC
[R] Handling the windows clipboard/32KB limit
(R 1.9.1; Windows 2000;) I'm just comparing ease of use, speed, etc for methods of transferring data frames in the Excel, MySQL, R triangle. It turns out that going from Excel to R (when doing this carefully). Using the clipboard is actually quite fast and efficient (2 seconds for transferring 120 000 cells on a common desktop computer as compared to much longer for going the RODBC route, maybe also substantially longer using the R(D)COM route). Other advantage: Relatively flexible handling of missing values from Excel which may be "empty", "#N/A", etc. So I thought I would also look at ways of going back via the clipboard. There I'm hitting a funny snag. The documentation explains that there is a 32KB limit on writing to the clipboard. This may make sense as a default, but does it make sense as a hard restriction? Any ideas of getting around this limitation (besides cumbersome R-code buffering/breaking up the frame to send into small pieces to be collected and assembled in Excel? Chris P.S. (some details for those who are interested) Here is what I use to go from Excel to R (which in turn uses the RExcel add in to send the R call, but not the data): Range("Alldata").Copy Call RRun("Alldata<-read.table(file=file(description='clipboard'),sep='\t',na.strings=c('#N/A',''),header=TRUE,comment.char=';')") There are two funny things about this: 1. It's very fast. 2. (this may be bug or feature depending on view point): Range("Alldata").Copy is filter sensitive. So, if I use a common filter to select a part of Alldata in Excel, executing the code will transfer only this part. If we turn this volatile and add a few lines for some meaninful analysis which reports itself back or displays itself in a graph, we arrive at an interesting level of animation). Here is what I thought of using for going back: write.table(Alldata,file=con<-file(description='clipboard',open="w"),sep='\t',na='#N/A',col.names=TRUE,row.names=FALSE);close(con) And this works great if Alldata is small (less than 32KB when converted to text). It no longer works if Alldata is larger (only the first 32KB go on the clipboard). Christian Ritter Functional Specialist Statistics Shell Coordination Centre S.A. Monnet Centre International Laboratory, Avenue Jean Monnet 1, B-1348 Louvain-La-Neuve, Belgium Tel: +32 10 477 349 Fax: +32 10 477 219 Email: christian.ritter@shell.com Internet: http://www.shell.com/chemicals [[alternative HTML version deleted]]
On Thu, 9 Sep 2004, Ritter, Christian C MCIL-CTANL/S wrote:> (R 1.9.1; Windows 2000;) > > I'm just comparing ease of use, speed, etc for methods of transferring > data frames in the Excel, MySQL, R triangle. It turns out that going > from Excel to R (when doing this carefully). Using the clipboard is > actually quite fast and efficient (2 seconds for transferring 120 000 > cells on a common desktop computer as compared to much longer for going > the RODBC route, maybe also substantially longer using the R(D)COM > route). Other advantage: Relatively flexible handling of missing values > from Excel which may be "empty", "#N/A", etc.You clearly haven't looked at RODBC carefully as that has equally flexible handling.> So I thought I would also look at ways of going back via the clipboard. > There I'm hitting a funny snag. The documentation explains that there is > a 32KB limit on writing to the clipboard. This may make sense as a > default, but does it make sense as a hard restriction?I guess you are talking about using file="clipboard" in a connection, but there is also writeClipboard, whose documentation does not mention a limit. My Windows documentation says there is a 32Kb limit on the size of an text object put on the clipboard, and I believe that is the case for Windows 95/98/ME at least. So a hard restriction makes sense.> Any ideas of getting around this limitation (besides cumbersome R-code > buffering/breaking up the frame to send into small pieces to be > collected and assembled in Excel?R is Open Source, so this is an opportunity for you to experiment, test on various versions of Windows and contribute a patch back to the R project. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Ritter, Christian C MCIL-CTANL/S
2004-Sep-09 12:47 UTC
[R] Handling the windows clipboard/32KB limit
Thanks Brian for the hint. Indeed, writeClipboard does not suffer from the 32KB limit. Here is a line which can put large quantities of data on the clipboard: writeClipboard(gsub("NA","\#N/A",apply(Alldata,1,paste,collapse="\t"))) this can then be pasted either manually or via a VBA macro into a spreadsheet. Here is some more info for R/Excel junkies: ==========================================VBA example using RExcel to pass the command: Sub FetchDataframe(Dfname As String, TargetRange As Range) Dim commandString As String commandString = "writeClipboard(c(paste(colnames(" & Dfname & "),collapse='\t'),gsub('NA','\#N/A',apply(" & Dfname & ",1,paste,collapse='\t'))))" Call RRun(commandString) TargetRange.PasteSpecial End Sub Again, the time for FetchDataframe "Alldata",Range("Sheet1!$A$1") on about 120000 cells is on the order of two or three seconds. I looked at scaling. For 360000 (mixed type) cells, the required time was on the order of 15 seconds. I also looked at putting data from Excel to R: Here things scale even better. Using the code: Sub PutDataframe(Dfname As String, TargetRange As Range) Dim commandString As String commandString = Dfname & "<-read.table(file=file(description='clipboard'),sep='\t',na.strings=c('#N/A',''),header=TRUE,comment.char=';')" TargetRange.Copy Call RRun(commandString) End Sub the transfer PutDataframe "Alldata",Range("Alldata") took 7 seconds for 360000 (mixed type) cells. As Brian points out: Obviously, RODBC is more flexible for selecting, cleaning, etc. But in my experience it is a bit slow when dealing with Excel as a data source (via dsn<-odbcConnectExcel("Myfile.xls"). I redid my 120000 cell example using Alldata<-sqlFetch(dsn,"Alldata") and needed about 80 seconds. Now, this is not the fault of RODBC but probably of starting and using the jet data engine within Windows. Moreover, in particular, if the workbook Myfile.xls is actually open wile odbc is working on it, there may be memory or performance leaks (probably unrelated to RODBC). If I retrieve the same data frame (from Excel) several times, (either opening and closing the connection every time or opening it once and fetching several times) the retrieval times can become much slower as a go. At some time there may be "waiting for an OLE action to complete messages". RExcel/R(D)COM gives a third possibility with probably much more flexibility/safety than my crude clipboard method and we are currently working hard at increasing its speed. Currently, we are at about 20 seconds for the 120000 cell example. In summary, what strikes me about the clipboard approach is its speed and the size of objects which can be moved between Excel and R. Some fine tuning will be needed to deal with exceptions. -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: Thursday, 09 September, 2004 12:41 PM To: Ritter, Christian C MCIL-CTANL/S Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Handling the windows clipboard/32KB limit On Thu, 9 Sep 2004, Ritter, Christian C MCIL-CTANL/S wrote:> (R 1.9.1; Windows 2000;) > > I'm just comparing ease of use, speed, etc for methods of transferring > data frames in the Excel, MySQL, R triangle. It turns out that going > from Excel to R (when doing this carefully). Using the clipboard is > actually quite fast and efficient (2 seconds for transferring 120 000 > cells on a common desktop computer as compared to much longer for going > the RODBC route, maybe also substantially longer using the R(D)COM > route). Other advantage: Relatively flexible handling of missing values > from Excel which may be "empty", "#N/A", etc.You clearly haven't looked at RODBC carefully as that has equally flexible handling.> So I thought I would also look at ways of going back via the clipboard. > There I'm hitting a funny snag. The documentation explains that there is > a 32KB limit on writing to the clipboard. This may make sense as a > default, but does it make sense as a hard restriction?I guess you are talking about using file="clipboard" in a connection, but there is also writeClipboard, whose documentation does not mention a limit. My Windows documentation says there is a 32Kb limit on the size of an text object put on the clipboard, and I believe that is the case for Windows 95/98/ME at least. So a hard restriction makes sense.> Any ideas of getting around this limitation (besides cumbersome R-code > buffering/breaking up the frame to send into small pieces to be > collected and assembled in Excel?R is Open Source, so this is an opportunity for you to experiment, test on various versions of Windows and contribute a patch back to the R project. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595