Is R an appropriate tool for data manipulation and data reshaping and data organizing? I think so but someone who recently joined our group thinks not. The new recruit believes that python or another language is a far better tool for developing data manipulation scripts that can be then used by several members of our research group. Her assessment is that R is useful only when it comes to data analysis and working with statistical models. So what do you think: 1)R is a phenomenally powerful and flexible tool and since you are going to do analyses in R you might as well use it to read data in and merge it and reshape it to whatever you need. OR 2) Are you crazy? Nobody in their right mind uses R to pipe the data around their lab and assemble it for analysis. Your insights would be appreciated. Details if you are interested: Our setup: Hundreds of patients recorded as cases with about 60 variables. Inputted and stored in a Sybase relational database. High throughput SNP genotyping platforms saved data output to csv or excel tables. Previously, not knowing any SQL I had used Microsoft Access to write queries to get the data that I needed and to merge the genotyping with the clinical database. It was horrible. I could not even use it on anything other than my desktop machine at work. When I realized that I was going to need to learn R to handle the genetic analyses I decided to keep Sybase as the data repository for the clinical information and the do all the data manipulation, merging and piping with R using RODBC. I was and am a very amateur coder. Nevertheless, many many hours later I have scripts that did what I needed them to do and I understand R code and can tinker with it as needed. My scripts work for me but they are not exactly user-friendly for others in the laboratory to just run. For instance, depending on what machine the script is being run from, one may need to change the file name or file path and tinker under the hood to accomplish that. My bias is to fulfill all our data manipulation and reshaping with R. Since I am the principal investigator it is me who stays constant and coders or analysts who may come and go. I am even more enamored with R for data manipulation since reading a book about it. [[alternative HTML version deleted]]
take a look at sqldf package(http://code.google.com/p/sqldf/), you will be amazed. On Wed, May 6, 2009 at 12:22 AM, Farrel Buchinsky <fjbuch at gmail.com> wrote:> Is R an appropriate tool for data manipulation and data reshaping and data > organizing? I think so but someone who recently joined our group thinks not. > The new recruit believes that python or another language is a far better > tool for developing data manipulation scripts that can be then used by > several members of our research group. Her assessment is that R is useful > only when it comes to data analysis and working with statistical models. > So what do you think: > 1)R is a phenomenally powerful and flexible tool and since you are going to > do analyses in R you might as well use it to read data in and merge it and > reshape it to whatever you need. > OR > 2) Are you crazy? Nobody in their right mind uses R to pipe the data around > their lab and assemble it for analysis. > > Your insights would be appreciated. > > Details if you are interested: > > Our setup: Hundreds of patients recorded as cases with about 60 variables. > Inputted and stored in a Sybase relational database. High throughput SNP > genotyping platforms saved data output to csv or excel tables. Previously, > not knowing any SQL I had used Microsoft Access to write queries to get the > data that I needed and to merge the genotyping with the clinical database. > It was horrible. I could not even use it on anything other than my desktop > machine at work. When I realized that I was going to need to learn R to > handle the genetic analyses I decided to keep Sybase as the data repository > for the clinical information and the do all the data manipulation, merging > and piping with R using RODBC. I was and am a very amateur coder. > Nevertheless, many many hours later I have scripts that did what I needed > them to do and I understand R code and can tinker with it as needed. My > scripts work for me but they are not exactly user-friendly for others in the > laboratory to just run. For instance, depending on what machine the script > is being run from, one may need to change the file name or file path and > tinker under the hood to accomplish that. My bias is to fulfill all our data > manipulation and reshaping with R. Since I am the principal investigator it > is me who stays constant and coders or analysts who may come and go. > > I am even more enamored with R for data manipulation since reading a book > about it. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- =============================WenSui Liu Acquisition Risk, Chase Blog : statcompute.spaces.live.com Tough Times Never Last. But Tough People Do. - Robert Schuller ==============================
well, I am less proficient in R comparing with other tools/languages. Therefore my biased opinion is - it is possible in R, but it may be easier if you use other tools, especially if you have to build a user-friendly GUI. The most accessible (although limited to MS Windows only) method would be building GUI with HTA (HTML Application)/javasript which is nearly the same as creating web page and calling R from there when necessary. Less limited, but steeper learning curve - Python, Perl, Tcl/Tk - all open source tools that can communicate with R and all have decent GUI building tools. Then proprietary Adobe Flex, Flash, Air (the later somehow resembles HTA) or Runtime Revolution (RR) all allow to easily build crossplatform eye-candies, but these are not free although not too expensive either if you can allocate some resources for your project. I usually hide all the command line utilities beyond GUIs built with RR. All the tools listed above can easily do any kind of data manipulation and reshaping, but each have its strong sides: Python - tidy object oriented syntax, tons of 3rd party modules, Perl - powerful regular expressions tons of modules, RR - database connectivity, chunk expressions (item, char, word, line, etc...) and syntax that makes data manipulation much much easier. But I may be wrong, so please let me here ask another related question (new thread?..) for the group - what do you use to build graphical user interfaces for end-users of your tools in R? All the best Viktoras Farrel Buchinsky wrote:> Is R an appropriate tool for data manipulation and data reshaping and data > organizing? I think so but someone who recently joined our group thinks not. > The new recruit believes that python or another language is a far better > tool for developing data manipulation scripts that can be then used by > several members of our research group. Her assessment is that R is useful > only when it comes to data analysis and working with statistical models. > So what do you think: > 1)R is a phenomenally powerful and flexible tool and since you are going to > do analyses in R you might as well use it to read data in and merge it and > reshape it to whatever you need. > OR > 2) Are you crazy? Nobody in their right mind uses R to pipe the data around > their lab and assemble it for analysis. > > Your insights would be appreciated. > > Details if you are interested: > > Our setup: Hundreds of patients recorded as cases with about 60 variables. > Inputted and stored in a Sybase relational database. High throughput SNP > genotyping platforms saved data output to csv or excel tables. Previously, > not knowing any SQL I had used Microsoft Access to write queries to get the > data that I needed and to merge the genotyping with the clinical database. > It was horrible. I could not even use it on anything other than my desktop > machine at work. When I realized that I was going to need to learn R to > handle the genetic analyses I decided to keep Sybase as the data repository > for the clinical information and the do all the data manipulation, merging > and piping with R using RODBC. I was and am a very amateur coder. > Nevertheless, many many hours later I have scripts that did what I needed > them to do and I understand R code and can tinker with it as needed. My > scripts work for me but they are not exactly user-friendly for others in the > laboratory to just run. For instance, depending on what machine the script > is being run from, one may need to change the file name or file path and > tinker under the hood to accomplish that. My bias is to fulfill all our data > manipulation and reshaping with R. Since I am the principal investigator it > is me who stays constant and coders or analysts who may come and go. > > I am even more enamored with R for data manipulation since reading a book > about it. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Le mercredi 06 mai 2009 ? 00:22 -0400, Farrel Buchinsky a ?crit :> Is R an appropriate tool for data manipulation and data reshaping and data > organizing?[ Large Snip ! ... ] Depends on what you have to do. I've done what can be more or less termed "data management" with almost uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C, Basic (in 1982 !), Fortran and even Lisp in passing...). SQL has strong points : join is, to my tastes, more easily expressed in SQL than in most languages, projection and aggregation are natural. However, in SQL, there is no "natural" ordering of row tables, which makes expressing algorithms using this order difficult. Try for example to express the differences of a time series ... (it can be done, but it is *not* a pretty sight). On the other hand, R has some unique expressive possibilities (reshape() comes to mind). So I tend to use a combination of tools : except for very small samples, I tend to manage my data in SQL and with associated tools (think data editing, for example ; a simple form in OpenOffice's Base is quite easy to create, can handle anything for which an ODBC driver exists, and won't crap out for more than a few hundreds line...). finer manipulation is usually done in R with native tools and sqldf. But, at least in my trade, the ability to handle Excel files is a must (this is considered as a standard for data entry. Sigh ...). So the first task is usually a) import data in an SQL database, and b) prepare some routines to dump SQL tables / R dataframes in Excel tor returning back to the original data author... HTH Emmanuel Charpentier
On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote:> Is R an appropriate tool for data manipulation and data reshaping and data > organizing? I think so but someone who recently joined our group thinks not. > The new recruit believes that python or another language is a far better > tool for developing data manipulation scripts that can be then used by > several members of our research group.I happily use both approaches depending on the original format the data come in: For data that are not in a "well behaved" format and require actual parsing, I tend to use Python scripts for transmogrifying the data into nice and tidy tables (and maybe some very basic filtering). For everything after that I prefer R. I also use Python if the relevant data needs to be harvested and assembled from many differnt sources (e.g. data files + web + databases). Once the data files are easy to read (csv, tab separated, database, ...) and the task is to reshape, filter and clean the data, I usually do it in R. R has true advantages here: - After reading a table into a data frame I can immediatly tell, if all measurements are what they are supposed to be (integer, numeric, factor, boolean) and functions like read.table even do quite some error checking for me (equal number of columns etc.) - Finding out if factors have the right (or plausible) number of levels is easy - Filtering by logical indexing - Powerful and reliable reshaping (reshape package) - Very conveniant diagnostics: str(), dim(), table(), summary(), plotting the data in various ways, ... cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel
--- On Wed, 5/6/09, Farrel Buchinsky <fjbuch at gmail.com> wrote:> Is R an appropriate tool for data > manipulation and data reshaping and data > organizing? I think so but someone who recently joined our > group thinks not.I only do small scale projects and am by no means a programmer. Isn't Perl something for earings? That said, I find R to be extremely useful at data manipulation and have used it exclusively in my last three projects. The different data structures alone are worth their weight in gold, if for nothing else than making it harder to make stupid mistakes in coding.> The new recruit believes that python or another language is > a far better tool for developing data manipulation scripts that can be > then used by> several members of our research group. Her assessment is > that R is useful> only when it comes to data analysis and working with > statistical models.Any reason that she thinks this? How well does she know R? It is not exactly a language that one picks up in a week, especially if one is coming from using a stats package like SAS or SPSS. As an ex-SAS and SYSTAT user it took me weeks to just get comfortable with the power of subscripting and the ability to do all kinds of calculations "in-line".> So what do you think: > 1)R is a phenomenally powerful and flexible tool and since you are going > to do analyses in R you might as well use it to read data in and merge > it and reshape it to whatever you need.Definately. I am not a computer scientist or a statistician. I usually am working as a single contractor and normally with small datasets as part of a larger project. R does what I want, usually very elegantly (albeit perhaps after a lot of headbanging and calls for help to the R-list) and it would be stupid for me to use more than one language when it is not needed. Another plus is that I can easily leave my data analysis work and a working copy of R with the client. He/she may have a problem seeing what I did but it is clearly readable & replicable by either the client or another consultant.> OR > 2) Are you crazy? Nobody in their right mind uses R to pipe > the data around their lab and assemble it for analysis.Well I don't work in a lab but why complicate things? If everyone is using the same tools then you have a good situation. Others who do work in labs can address this point more cogently>From a personnel point of view do you expect everyone in the lab to be proficient with R and, for example, Perl? What happens when/if you lose your Perl expert(s)? I've had occasions where I waited a week for data simply because the division's MS Access "expert" was on holiday and the only other "Access" person there only knew how to enter data and run the monthly reports. Anything more complicated required the "expert".__________________________________________________________________ Make your browsing faster, safer, and easier with the new Internet Explorer? 8. Optimized for Yahoo! Get it Now for Free! at http://downloads.yahoo.com/ca/internetexplorer/
Thanks Laura, I deal with huge data sets and have to do alot of fancy juggling of data to get the job done in R. I have recently been granted access to a cluster at a university which means 64 bit machines with 8gb of memory, which could prove to be a saviour.... hopefully. Simon. ----- Original Message ----- From: Laura Arsanto To: simon.pickett@bto.org Sent: Wednesday, May 06, 2009 3:09 PM Subject: RE: [R] Do you use R for data manipulation? dear simon, my job now is doing a benchmark between SAS and open source tools (like R, weka, etc) for data and text mining, so I'm using both of them. personally I would prefer using the open source tools and I really would do it if possible, but at the moment...they simply do not work! and 99% of cases because of the dimensions of data! > From: simon.pickett@bto.org > To: ghina84@hotmail.it; jrkrideau@yahoo.ca; r-help@stat.math.ethz.ch; fjbuch@gmail.com > CC: ross.lazarus@gmail.com; gregory_warnes@urmc.rochester.edu; greg@warnes.net > Subject: Re: [R] Do you use R for data manipulation? > Date: Wed, 6 May 2009 15:01:33 +0100 > > My institute uses SAS religiously, I am the only R "heathen". > > I have resisted learning to use SAS because I dont see the point after years > of using R and I like being able to do everything using one program. > However, my colleagues maintain that SAS is "better" for programming without > really ever giving me a good reason why other than memory issues. > > dont want to hi-jack the thread but would be interested in hearing some > other views, especially since my organisation spends (wastes?) alot of money > every year on SAS licences... > > Simon. > > ----- Original Message ----- > From: "Laura Arsanto" <ghina84@hotmail.it> > To: <jrkrideau@yahoo.ca>; <r-help@stat.math.ethz.ch>; <fjbuch@gmail.com> > Cc: <ross.lazarus@gmail.com>; <gregory_warnes@urmc.rochester.edu>; > <greg@warnes.net> > Sent: Wednesday, May 06, 2009 2:53 PM > Subject: Re: [R] Do you use R for data manipulation? > > > > > I used R for my master thesis (with big effort, anyway) and now I find > difficult to use R in my daily work, becasue it has really serious problems > with datasets of big dimension, both in the data manipulation step and in > the analysis step. > > But I really would love to use it, as I like its transparence, compared to > other software. > > Laura > > *********** > > > Date: Wed, 6 May 2009 06:42:45 -0700 > > From: jrkrideau@yahoo.ca > > To: r-help@stat.math.ethz.ch; fjbuch@gmail.com > > CC: ross.lazarus@gmail.com; gregory_warnes@urmc.rochester.edu; > > greg@warnes.net > > Subject: Re: [R] Do you use R for data manipulation? > > > > > > > > > > --- On Wed, 5/6/09, Farrel Buchinsky <fjbuch@gmail.com> wrote: > > > > > Is R an appropriate tool for data > > > manipulation and data reshaping and data > > > organizing? I think so but someone who recently joined our > > > group thinks not. > > > > I only do small scale projects and am by no means a programmer. Isn't Perl > > something for earings? > > > > That said, I find R to be extremely useful at data manipulation and have > > used it exclusively in my last three projects. The different data > > structures alone are worth their weight in gold, if for nothing else than > > making it harder to make stupid mistakes in coding. > > > > > The new recruit believes that python or another language is > > > a far better tool for developing data manipulation scripts that can be > > > then used by> several members of our research group. Her assessment is > > > that R is useful> only when it comes to data analysis and working with > > > statistical models. > > > > Any reason that she thinks this? How well does she know R? It is not > > exactly a language that one picks up in a week, especially if one is > > coming from using a stats package like SAS or SPSS. As an ex-SAS and > > SYSTAT user it took me weeks to just get comfortable with the power of > > subscripting and the ability to do all kinds of calculations "in-line". > > > > > So what do you think: > > > 1)R is a phenomenally powerful and flexible tool and since you are going > > > > to do analyses in R you might as well use it to read data in and merge > > > it and reshape it to whatever you need. > > > > Definately. I am not a computer scientist or a statistician. I usually am > > working as a single contractor and normally with small datasets as part of > > a larger project. R does what I want, usually very elegantly (albeit > > perhaps after a lot of headbanging and calls for help to the R-list) and > > it would be stupid for me to use more than one language when it is not > > needed. > > > > Another plus is that I can easily leave my data analysis work and a > > working copy of R with the client. He/she may have a problem seeing what > > I did but it is clearly readable & replicable by either the client or > > another consultant. > > > > > OR > > > 2) Are you crazy? Nobody in their right mind uses R to pipe > > > the data around their lab and assemble it for analysis. > > > > Well I don't work in a lab but why complicate things? If everyone is using > > the same tools then you have a good situation. Others who do work in labs > > can address this point more cogently > > > > >From a personnel point of view do you expect everyone in the lab to be > > >proficient with R and, for example, Perl? What happens when/if you lose > > >your Perl expert(s)? I've had occasions where I waited a week for data > > >simply because the division's MS Access "expert" was on holiday and the > > >only other "Access" person there only knew how to enter data and run the > > >monthly reports. Anything more complicated required the "expert". > > > > > > > > > > > > __________________________________________________________________ > > Make your browsing faster, safer, and easier with the new Internet > > Explorer® 8. Optimized for Yahoo! Get it Now for Free! at > > http://downloads.yahoo.com/ca/internetexplorer/ > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > _________________________________________________________________ > [[elided Hotmail spam]] > > [[alternative HTML version deleted]] > > > > > -------------------------------------------------------------------------------- > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------------------------------------------------------ Tante nuove funzioni su Messenger! Scarica la nuova versione! [[alternative HTML version deleted]]
In my opinion, no statisticians toolbox should contain only 1 tool (even if it is as amazing a tool as R). Learning the different tools helps you appreciate when each are the most appropriate to use and learn different ways of looking at problems. There are some tasks that I (it could easily differ for others) find quickest to do some data extraction using Perl, then load the results into R. Having said the above, I do admit that the percentage of time that I spend using tools other than R for working with data has gone down quite a bit with time. 3 possible reasons: 1. my clients are getting better at giving me the data in appropriate forms 2. my proficiency with R continues to grow and I can better see how to do something using R 3. R continues to grow with more and more tools to help manage data. And a possible 4th: 4. I am getting to lazy in my old age to switch to other programs. While I like to think that I am having success at educating my clients, number 1 only contributes very little to the overall, 3 is definitely a big contributor and hopefully 2 is part of the reason as well. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Farrel Buchinsky > Sent: Tuesday, May 05, 2009 10:23 PM > To: R > Cc: Ross; gregory_warnes at urmc.rochester.edu; greg at warnes.net > Subject: [R] Do you use R for data manipulation? > > Is R an appropriate tool for data manipulation and data reshaping and > data > organizing? I think so but someone who recently joined our group thinks > not. > The new recruit believes that python or another language is a far > better > tool for developing data manipulation scripts that can be then used by > several members of our research group. Her assessment is that R is > useful > only when it comes to data analysis and working with statistical > models. > So what do you think: > 1)R is a phenomenally powerful and flexible tool and since you are > going to > do analyses in R you might as well use it to read data in and merge it > and > reshape it to whatever you need. > OR > 2) Are you crazy? Nobody in their right mind uses R to pipe the data > around > their lab and assemble it for analysis. > > Your insights would be appreciated. > > Details if you are interested: > > Our setup: Hundreds of patients recorded as cases with about 60 > variables. > Inputted and stored in a Sybase relational database. High throughput > SNP > genotyping platforms saved data output to csv or excel tables. > Previously, > not knowing any SQL I had used Microsoft Access to write queries to get > the > data that I needed and to merge the genotyping with the clinical > database. > It was horrible. I could not even use it on anything other than my > desktop > machine at work. When I realized that I was going to need to learn R to > handle the genetic analyses I decided to keep Sybase as the data > repository > for the clinical information and the do all the data manipulation, > merging > and piping with R using RODBC. I was and am a very amateur coder. > Nevertheless, many many hours later I have scripts that did what I > needed > them to do and I understand R code and can tinker with it as needed. My > scripts work for me but they are not exactly user-friendly for others > in the > laboratory to just run. For instance, depending on what machine the > script > is being run from, one may need to change the file name or file path > and > tinker under the hood to accomplish that. My bias is to fulfill all our > data > manipulation and reshaping with R. Since I am the principal > investigator it > is me who stays constant and coders or analysts who may come and go. > > I am even more enamored with R for data manipulation since reading a > book > about it. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Another tool I find useful is Matthew Dowle's data.table package. It has very fast indexing, can have much lower memory requirements than a data frame, and has some built-in data manipulation capability. Especially with a 64-bit OS, you can use this to keep things in memory where you otherwise would have to use a database. See here: http://article.gmane.org/gmane.comp.lang.r.packages/282 - Tom
Simon Pickett <simon.pickett at bto.org> wrote>My institute uses SAS religiously, I am the only R "heathen". > >I have resisted learning to use SAS because I dont see the point after years >of using R and I like being able to do everything using one program. >However, my colleagues maintain that SAS is "better" for programming without >really ever giving me a good reason why other than memory issues. > >dont want to hi-jack the thread but would be interested in hearing some >other views, especially since my organisation spends (wastes?) alot of money >every year on SAS licences... >I think that, to find out what SAS can do that R can't, or vice versa, you'd have to ask both SAS-L and R-help for some challenging data manipulation problems. Asking only on R-help, you will (naturally) get people who use R a lot and like it, and have figured out ways to do things with it. Similarly if you asked only on SAS-L, you will find people who use SAS. I have read a couple of statistics books where the authors use both, because they find SAS easier for some things (principally data manipulation and textual output) and R easier for others (especially fancy statistics and graphics). Peter Peter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com
>> Farrel Buchinsky wrote: >>> Is R an appropriate tool for data manipulation and data reshaping and data >>> organizing? I think so but someone who recently joined our group thinks >>> not. >>> The new recruit believes that python or another language is a far better >>> tool for developing data manipulation scripts that can be then used by >>> several members of our research group. Her assessment is that R is useful >>> only when it comes to data analysis and working with statistical models.If the project data is complex and heterogeneous I use SQL database for manipulating data. Ideally, your data should be entered into the database at the point of creation - if not then you are bound to be using python, perl, java, bash etc programs to input the stuff. Postgres=SQL these days is a very good choice, there are even generators available for automatic generation of web forms for data entry and viewing for those who have to use the web or cant be bothered with SQL. But once data is in SQL database, it is immenesely more usable and manipulatable in a natural way (what they used to call data-centric way). From R, it trivial to get it into a dataframe with auto generation of column names. Regards, Kostas Savvidis Nanjing University PS: perl, python, and so on are definitely not to be pushed onto everybody by the "expert" in the lab. But perhaps SQL is, especially if you would like the web interface to your data. ----------------------------------------------- Histion Partners LP and Nanjing University tel: +8625 8622 8040 (h) +86 13451 911 944 (m)
Farrel Buchinsky wrote:> Is R an appropriate tool for data manipulation and data reshaping and data > organizing? I think so but someone who recently joined our group thinks not. > The new recruit believes that python or another language is a far better > tool for developing data manipulation scripts that can be then used by > several members of our research group. Her assessment is that R is useful > only when it comes to data analysis and working with statistical models.It's hard to shift people's individual preferences, but impressive objective comparisons are easy to come by. Ask her how many lines it would take to do this trivial R task in Python: data <- read.csv('original-data.csv') write.csv('scaled-data.csv', data * 10) R's ability to do something to an entire data structure -- or a slice of it, or some other subset -- in a single operation is very useful when cleaning up data for presentation and analysis. Also point out how easy it is to get data *out* of R, as above, not just into it, so you can then hack on it in Python, if that's the better language for further manipulation. If she gives you static about how a few more lines are no big deal, remind her that it's well established that bug count is always a simple function of line count. This fact has been known since the 70's. While making your points, remember that she has a good one, too: R is not the only good language out there. You should learn Python while she's learning R.