Hi, I find that a typical workflow for me looks something like this: 1) import some data from files 2) mess around with the data for a while 3) mess around with plotting for a while 4) get a plot or analysis that looks good 5) go back through my history to make a list of the shortest command sequence to recreate the plot or analysis 6) send out that sequence to colleagues, along with the generated plots or analysis output I wonder if there are any tools people have developed to help with step 5. Typically I do something like this: 5a) save my entire history to a text file 5b) open it up in Emacs 5c) prune any lines that don't have assignment operators 5d) prune any plotting commands that were superseded by later plots and then start on other more subtle stuff like pruning assignments that were later overwritten, unless the later assignments have variable overlap between the LHS and the RHS. Then I just start eyeballing it. Would any deeper introspection of the history expressions be feasible, e.g. detecting statements that have no side effects, dead ends, etc. The holy grail would be something like "show me all the statements that contributed to the current plot" or the like. Thanks. -- Ken Williams Research Scientist The Thomson Reuters Corporation Eagan, MN
on 07/30/2008 01:12 PM Ken Williams wrote:> Hi, > > I find that a typical workflow for me looks something like this: > > 1) import some data from files > 2) mess around with the data for a while > 3) mess around with plotting for a while > 4) get a plot or analysis that looks good > 5) go back through my history to make a list of the shortest command > sequence to recreate the plot or analysis > 6) send out that sequence to colleagues, along with the generated plots > or analysis output > > I wonder if there are any tools people have developed to help with step > 5. Typically I do something like this: > > 5a) save my entire history to a text file > 5b) open it up in Emacs > 5c) prune any lines that don't have assignment operators > 5d) prune any plotting commands that were superseded by later plots > > and then start on other more subtle stuff like pruning assignments that > were later overwritten, unless the later assignments have variable > overlap between the LHS and the RHS. Then I just start eyeballing it. > > Would any deeper introspection of the history expressions be feasible, > e.g. detecting statements that have no side effects, dead ends, etc. > > The holy grail would be something like "show me all the statements that > contributed to the current plot" or the like. > > Thanks.I (and many others) use ESS (Emacs Speaks Statistics), in which case, I have an R source buffer in the upper frame and an R session in the lower frame. In my particular case, I also happen to use ECB (Emacs Code Browser) which also has a left hand column spanning the full vertical length, to provide access to other things (file browser, R function and data objects, etc.). It also helps integrate Sweave/LaTeX functionality to further centralize things and increase productivity. I have also tied in Subversion functionality to enable me to engage in version control of my code and other key files. I do all of my editing in the upper frame and use the built-in ESS functions to submit the code to the R session. This also provides for code syntax highlighting, which makes it easier to visualize code as well as to check for things like matching parens/braces, etc. In this way, your working code (including comments) is kept functionally intact in the upper frame and you can edit and use it without having to scroll through a long history of commands (which is still there if you need it). More information here: http://ess.r-project.org/ HTH, Marc Schwartz
On 7/31/08 11:01 AM, "hadley wickham" <h.wickham at gmail.com> wrote:> I think that would be a very hard task -Well, at least medium-hard. But I think significant automatic steps could be made, and then a human can take over for the last few steps. That's why I was enquiring about "tools" rather than a complete solution. Does R provide facilities for introspection or interrogation of expression objects? I couldn't find anything useful on first look:> methods(class="expression")no methods were found> dput(expression(foo <- 5 * bar))expression(foo <- 5 * bar)> str(expression(foo <- 5 * bar))expression(foo <- 5 * bar)> it's equivalent to taking a > long rambling conversation and then automatically turning it into a > concise summary of what was said. I think you must have human > intervention.It's not really equivalent, natural language has ambiguities and subtleties that computer languages, especially functional languages, intentionally don't have. By their nature, computer languages can be turned into parse trees unambiguously and then those trees can be manipulated. But coincidentally I work in a Natural Language Processing group, and one of the things we do is create exactly the kind of concise summaries you describe. =) -- Ken Williams Research Scientist The Thomson Reuters Corporation Eagan, MN
JGR's "Copy Commands" command works well for me (even if it is both fascinating and embarrassing how little is sometimes left over). It retains only commands that worked, so it is still not the minimum possible. Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, Mathematics Institute, University of Augsburg, 86135 Augsburg, Germany Tel: + 49 821 5982218 antony.unwin@math.uni-augsburg.de http://stats.math.uni-augsburg.de/ [[alternative HTML version deleted]]
>5a) save my entire history to a text file >5b) open it up in Emacs >5c) prune any lines that don't have assignment operators > > >Ken Williams >Research Scientist >The Thomson Reuters Corporation >Eagan, MNNo one has yet mentioned the obvious. ESS does your 5a 5b 5c with M-x ess-transcript-clean-buffer It works in either the *R* buffer or a *.rt or *.st buffer. It handles multiple-line commands correctly. Make sure the buffer is writable (C-x C-q on the *.rt buffer) M-x ess-transcript-clean-buffer Save the buffer as a *.r file. On automatic content analysis, that is tougher. I would be scared to do your>5d) prune any plotting commands that were superseded by later plotsbecause I don't know what supersede means. I can imagine situations, for example, par(mfrow=c(1,2)) plot(y ~ x) x <- x + 1 plot(y ~ x) where I want to keep both plots. You also have to trust that there are no side effects, which I wouldn't want to do, because plot() changes the value of par() parameters.
Ken, Others have given hints on pruning the history, but are you committed to doing this way? An alternative would be something more like sink, where when you get to a place that you know you want to start saving the commands you run a function to start saving your commands, then at the end you run a command to stop saving the commands. One tool for doing this is in the TeachingDemos package, see the help on ?txtStart. The main goal of this set of functions was more to save a transcript of a session (including graphical output if you use the etxtStart interface and an external tool), but it has a possible side effect of saving the commands issued in a file that could be 'source'd to rerun the set of commands (which seems similar to what you want). Commands (actually expressions) that result in an error are not included and you can use the txtSkip function to run a command without saving the command in the file (for things like "?plot" that you don't want to rerun). This may give you what you want, or at least something that needs less editing to get at what you want. Another option would be to take the source code for the above utilities and add some checks that will decide whether to save the command or not (check if an assignment was made, check if any 'par'ameters were changed, etc.). Another option if you just want some code to recreate the current plot is to look at the plot2script function in the TeachingDemos package. It will create a script (put it on the clipboard by default) to recreate the current plot. It does NOT use the same set of commands that you used to create the plot, but rather lowlevel commands, but it creates a script that you can edit to recreate the plot with just your changes (the current version needs some edits (line wrapping, fixing the box command) before running the script, but it may be another place to start). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Ken Williams > Sent: Wednesday, July 30, 2008 12:13 PM > To: r-help at stat.math.ethz.ch > Subject: [R] History pruning > > Hi, > > I find that a typical workflow for me looks something like this: > > 1) import some data from files > 2) mess around with the data for a while > 3) mess around with plotting for a while > 4) get a plot or analysis that looks good > 5) go back through my history to make a list of the shortest > command sequence to recreate the plot or analysis > 6) send out that sequence to colleagues, along with the > generated plots or analysis output > > I wonder if there are any tools people have developed to help > with step 5. Typically I do something like this: > > 5a) save my entire history to a text file > 5b) open it up in Emacs > 5c) prune any lines that don't have assignment operators > 5d) prune any plotting commands that were superseded by later plots > > and then start on other more subtle stuff like pruning > assignments that were later overwritten, unless the later > assignments have variable overlap between the LHS and the > RHS. Then I just start eyeballing it. > > Would any deeper introspection of the history expressions be > feasible, e.g. detecting statements that have no side > effects, dead ends, etc. > > The holy grail would be something like "show me all the > statements that contributed to the current plot" or the like. > > Thanks. > > -- > Ken Williams > Research Scientist > The Thomson Reuters Corporation > Eagan, MN > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >