Dear members of the R Development Team, I am looking for people with a deep understanding of R internals to assist in bridging R to OpenOffice. While R is a state of the art statistical environment, less experienced users often find it difficult to work with R. Therefore, I believe that a bridge between R and a spreadsheet program will make this transition less painful. I sincerely believe that this will benefit both the R community as well as the potential new users. OpenOffice is an open-source office suite that includes a spreadsheet program (Calc). OpenOffice.org (OOo) is participating in the Google Summer of Code 2007 initiative sponsored by Google and one of the proposed projects involves the creation of an add-on component that allows an OOo Calc user to let the R environment do calculations on data from Calc cells and put the results into the spreadsheet again. A brief description can be found on the OOo Summer of Code wiki page (http://wiki.services.openoffice.org/wiki/Summer_of_Code_2007). Two students have already shown interest in this project (see the OOo mailing list, http://sc.openoffice.org/servlets/BrowseList?listName=dev&from=2007-03-01&to=2007-03-31&by=date&first=21&selectedPage=2, the "Summer of Code: R and Calc" thread). While mentoring is already available from a member of the OpenOffice team (I will try to offer a helping hand on statistics and R-syntax, but NOT the coding part itself), I feel that we still need someone with R-core expertise. I am aware of various existing packages (rcom, RDCOM) and the availability of various online-informations (like http://developer.r-project.org/embedded.html), however more specific questions may arise in the future, especially as this embedding should be platform-independent, and I would welcome any help from the R-core team members. I am looking forward to hear from you and hope that this project will be a great success. I would like to thank you in advance for your effort. Sincerely, Leonard Mada
Hmm, if all you are interested is reading/writing Excel spreadsheets from R, there are much lighter and easier ways of doing it, than hooking up with openoffice. The Perl people have had Spreadsheet::ParseExcel and Spreadsheet::WriteExcel for years (and they work quite well, personal experience). Those are tiny (a couple of Mb's?) compared to the size of openoffice. HTL Leonard Mada wrote:> Dear members of the R Development Team, > > I am looking for people with a deep understanding of R internals to > assist in bridging R to OpenOffice. > > While R is a state of the art statistical environment, less experienced > users often find it difficult to work with R. Therefore, I believe that > a bridge between R and a spreadsheet program will make this transition > less painful. I sincerely believe that this will benefit both the R > community as well as the potential new users. > > OpenOffice is an open-source office suite that includes a spreadsheet > program (Calc). > > OpenOffice.org (OOo) is participating in the Google Summer of Code 2007 > initiative sponsored by Google and one of the proposed projects involves > the creation of an add-on component that allows an OOo Calc user to let > the R environment do calculations on data from Calc cells and put the > results into the spreadsheet again. A brief description can be found on > the OOo Summer of Code wiki page > (http://wiki.services.openoffice.org/wiki/Summer_of_Code_2007). > > Two students have already shown interest in this project (see the OOo > mailing list, > http://sc.openoffice.org/servlets/BrowseList?listName=dev&from=2007-03-01&to=2007-03-31&by=date&first=21&selectedPage=2, > the "Summer of Code: R and Calc" thread). > > While mentoring is already available from a member of the OpenOffice > team (I will try to offer a helping hand on statistics and R-syntax, but > NOT the coding part itself), I feel that we still need someone with > R-core expertise. I am aware of various existing packages (rcom, RDCOM) > and the availability of various online-informations (like > http://developer.r-project.org/embedded.html), however more specific > questions may arise in the future, especially as this embedding should > be platform-independent, and I would welcome any help from the R-core > team members. > > I am looking forward to hear from you and hope that this project will be > a great success. I would like to thank you in advance for your effort. > > Sincerely, > > Leonard Mada > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Since I implemented the RExcel interface I also would like this discussion to be continued on r-devel or some other list where I can follow it. Let me add some thoughts: As Leonhard Mada suggested, probably the most needed connection between R and a spreadsheet program is a way to transfer dataframes easily from the spreadsheet to R and analysis results back from R to Excel. To do this, one needs to have a mechanism to transfer large amounts of data of different types. An additional complication for the way back fro R to the spreadsheet is that R results quite often have data types not supported by the spreadsheet program (complex numbers for example). The convenience tool that people really want in the first place is an item on the menu which allows to transfer a range with data from the spreadsheet to R. The question then is: should the users see the R command line? There has to be a way of telling R what kind of analysis to perform. Using a menu like the one supplied by RCommander is a sensible option. Using this also would reuse all the work invested in designing a good menu structure for end users. Getting results back into the spreadsheet is more difficult. Not technically, but from the design point of view. Analysis results in R usually are not arrays, but lists, i.e. compounds of compounds of data of different basic types, and of different sizes. There is no clear general rule how to put R results into spreadsheet ranges. The basic compound data type in spreadsheets are arrays, and the data types in R are much more complicated, and the conceptual mapping of result lists to spreadsheet ranges has to be designed differently for different types of analyses and results. Of course, a brute force method (implemented for example by the connection mechanism between NAG and Excel) would be to "just print" the results into the spreadsheet as strings. This way, spreadsheet rows become printed lines without further structure, and numbers in the results are not easily accessible for further computations on the spreadsheet. Such a "transfer data frame and get results" connection, however, is not really using the spreadsheet program as a spreadsheet program, but as a data grid and output formatting machinery, since it is completely independent of the spreadsheet program's most important feature, automatic recalculation triggered by changes of cell values. A really tight integration of R and a spreadsheet can extend the spreadsheet program's computational engine by the complete R engine. It could allow spreadsheet formulas like RApply("pchisq",A1,A2) which would have R compute the value of the chi-squared distribution with arguments in cells A1 and A2 of the spreadsheet. Changing the value in A1 would trigger R to recalculate the chi-square value. In this case, the connection between R and the spreadsheet program has to be very fast, since the spreadsheet program essentially is using R as a dynamically linked library. The problem of incompatible data types also becomes much harder to deal with. The results of R computations are directly put into spreadsheet ranges, so having R results consisting of lists makes things really difficult. Thomas Baier and I recently published a paper in Computational Statistics which discusses different models of integration between R and spreadsheets. Excel is used as an example, but the concepts are independent from the concrete implementation. It is accessible at http://dx.doi.org/10.1007/s00180-007-0023-6 If you cannot access it, write to me, I will send you a copy. Currently, we are working on a cross-platform alternative to using COM to connect the spread-sheet to R. The platforms in mind are (at least) Windows, Linux and MacOS (X). The spreadsheet program of choice for our next integration will be Gnumeric, where the integration is already worked on by students. -- Erich Neuwirth, Didactic Center for Computer Science University of Vienna Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-9394
Hi. Indeed, there is still some time left before the students get allocated. Though I would like to have various issues already analysed before the actual coding starts (me already drawing a tough schedule for the students ;-) ). Some projects are OK when you simply start to write code and refine it later. I often favour such an approach, too. But I feel this one is much to important for such an approach and there are surely many issues where clever ideas and solutions would make a difference. 1. I am still unsure of many of these issues. One involves importing the data back from R (as Prof. Neuwirth pointed out). Unfortunately OOo Calc does only have simple data structures (numbers, date, strings and formulas) and NO complex objects. More complex objects would be helpful (having various methods, like a summary - what to display in the cell, and also various built in functions interpreting only some subvalues of it). I will probably address this issue on the Calc development list (though this one would be difficult to implement, too). 2. As there is interest in this discussion, should we move back to the devel-list? Sincerely, Leonard Prof Brian Ripley wrote:> Quick comments: > > 1) I had intended to mention Java, as I believe Oo uses that on the > database side. I tend to struggle with Java (e.g. it is only recently > working with JNI on my main platform, AMD64 Linux). Simon Urbanek and > Duncan Temple Lang are our Java gurus. > > 2) Doug Bates and I mentored a SoC student last year, so we know about > a lot of the issues. Unless this year's allocations are done a _lot_ > more smoothly, I would recommended sitting this out until a student is > definitely allocated. At that point design work can begin in earnest > (and one mistake we made was not being *sure* the student understood > the design right at the beginning). I think it is then that we can > help most (in choosing between strategies). > > Brian Ripley > > > On Wed, 28 Mar 2007, Leonard Mada wrote: > >> Dear Prof. Ripley, dear R-core Team, >> >> thank you very much for your kind comments. >> >> >> Prof Brian Ripley wrote: >>> My guess is that you intended to contact the R core team: I am sorry >>> that you have had a somewhat unhelpful response from the R-devel list >>> members. >> >> I was indeed unsure which list was best suited for my question. Most >> responses were nevertheless quite interesting. >> >> 1. Cross-Platform >> Indeed, there may be a problem with the cross-platform requirement, as >> you pointed out. I had some thoughts on this problem. My feeling is that >> the code should be split into 2-3 modules: an OOo module >> (platform-independent), an abstraction layer (platform - DEpendent) and >> an R module (hopefully as much platform independent as possible). This >> bears another problem, as there would be now 3 inter-module interfaces. >> If the R-module (package) has to be platform-dependent, then embedding >> the abstraction layer into this module seems OK (only 2 modules to deal >> with). I still believe that the bridge should work on most platforms >> (but somebody may want to challenge my view). >> >> [One of the students came with an idea of using a java-connector, though >> this surely has to be analysed more thoroughly.] >> >> >> 2. What type of embedding/bridge? >> I left this question deliberately open. I favour to build in the initial >> phase something that resembles more closely a GUI to some common >> statistical functions (there will be definitely another discussion which >> functions these should be). >> >> My hope is too, that later on more advanced features are added, making >> it a true bridge. While having the ability to read/write .ods files from >> within R is a sensible alternative for power users, there will be always >> things that are easier to do in a spreadsheet. Therefore, a real >> connection will ease the work even for more advanced users. >> >> However, if the development of this more powerful bridge needs a very >> different approach from the simple GUI approach, then starting directly >> with the more complex code should be analysed more closely. I am still >> unsure about the right path, especially because I have no understanding >> of the R internals. >> >> 3. Who will do the work? >> Well, the Google Summer of Code is intended for students: students apply >> for various projects and those who will be accepted will get paid over >> the summer by Google. There are various mentors and mentoring >> organisations, but much of the coding should be done by the students. [I >> missed R on the Google Sumer of Code list: http://code.google.com/soc/ . >> While it is unlikely that a student will work on the core, there would >> be enough alternatives, like writing some packages. I am sure that the >> listing on the Google site alone will make the program more popular in >> various other communities.] >> >> That said, 2 students showed interest in this project and I hope that >> they will be accepted (results are still pending). >> >> However, I am aware that this will be a tough project and the students >> will surely need much help. Bridging 2 open-source applications proves >> (again and again) not to be that simple and expertise from both >> communities will be invaluable. >> >> Independent of the outcome of the Google Summer of Code Initiative, I >> believe that creating a powerful bridge between R and OOo is essential >> for the future. >> >> There is surely much more to discuss, but I am optimistic that most >> problems will be solved. >> >> Sincerely, >> >> Leonard >> >> _______________________________________________ >> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core >> >