Ross Boylan
2010-Jan-21 19:38 UTC
[Rd] Rgeneric.py assists in rearranging generic function definitions
I've attached a script I wrote that pulls all the setGeneric definitions out of a set of R files and puts them in a separate file, default allGenerics.R. I thought it might help others who find themselves in a similar situation. The "situation" was that I had to change the order in which files in my package were parsed; the scheme in which the generic definition is in the "first" file that has the corresponding setMethod breaks under re-ordering. So I pulled out all the definitions and put them first. In retrospect, it is clearly preferable to create allGenerics.py from the start. If you didn't, and discover you should have, the script automates the conversion. Thanks to everyone who helped me with my packaging problems. The package finally made it to CRAN as http://cran.r-project.org/web/packages/mspath/index.html. I'll send a public notice of that to the general R list. Ross Boylan
Ross Boylan
2010-Jan-25 19:46 UTC
[Rd] Rgeneric.py assists in rearranging generic function definitions [inline]
On Thu, 2010-01-21 at 11:38 -0800, Ross Boylan wrote:> I've attached a script I wrote that pulls all the setGeneric definitions > out of a set of R files and puts them in a separate file, default > allGenerics.R. I thought it might help others who find themselves in a > similar situation. > > The "situation" was that I had to change the order in which files in my > package were parsed; the scheme in which the generic definition is in > the "first" file that has the corresponding setMethod breaks under > re-ordering. So I pulled out all the definitions and put them first. > > In retrospect, it is clearly preferable to create allGenerics.py from > the start. If you didn't, and discover you should have, the script > automates the conversion. > > Thanks to everyone who helped me with my packaging problems. The > package finally made it to CRAN as > http://cran.r-project.org/web/packages/mspath/index.html. I'll send a > public notice of that to the general R list. > > Ross BoylanApparently the attachment didn't make it through. I've pasted Rgeneric.py below. #! /usr/bin/python # python 2.5 required for with statement from __future__ import with_statement # Rgeneric.py extracts setGeneric definitions from R sources and # writes them to a special file, while removing them from the # original. # # Context: In a system with several R files, having generic # definitions sprinkled throughout, there are errors arising from the # sequencing of files, or of definitions within files. In general, # changing the order in which files are parsed (e.g., by the Collate: # filed in DESCRIPTION) will break things even when they were # working. For example, a setMethod may occur before the # corresponding setGeneric, and then fail. Given that it is not safe # to call setGeneric twice for the same function, the cleanest # solution may be to move all the generic definitions to a separate # file that will be read before any of the setMethod's. Rgeneric.py # helps automate that process. # # It is, of course, preferable not to get into this situation in the # first place, for example by creating an allGenerics.R file as you # go. # Typical useage: ./Rgeneric.py *.R # Will create allGenerics.R with all the extracted generic # definitions, including any preceding comments. # Rewrites the *.R files, replacing the setGeneric's with comments # indicating the generic has moved to allGenerics.py. # *.R.old has the original .R files. # # The program does not work for all conceivable styles. In # particular, it assumes that # 1. setGeneric is immediately followed by an open parenthesis and # a quoted name of the function. Subsequent parts of the # definition may be split across lines and have interspersed # comments. # # 2. Comments precede the definition. They are optional, and will # be left in place in the .R file and copied to allGenerics.R. # # 3. If you first define an ordinary function foo, and then do # setGeneric("foo") the setGeneric will be moved to # allGenerics.R. It will not work properly there; you should # make manual adjustments such as moving it back to the # original. The code at the bottom reports on all such # definitions, and then lists all the generic functions processed. # # 4. allGenerics.R will contain generic definitions in the order of # files examined, and in the order they are defined within the # file. This is to preserve context for the comments, in # particular for comments which apply to a block of # definitions. If you would like something else, e.g., # alphabetical ordering, you should post-process the AllForKey # object created at the bottom of this file. # # There are program (not command line) options to do a read-only scan, # and a class to hold the results, which can be inspected in various # ways. # Copyright 2010 Regents of University of California # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # See <http://www.gnu.org/licenses/> for the full license. # Author: Ross Boylan <ross at biostat.ucsf.edu> # # Revision History: # # 1.0 2010-01-21 Initial release. import os, os.path, re, sys class ParseGeneric: """Extract setGeneric functions and preceding comments in one file. states of the parser: findComment -- look for start of comment inComment -- found comment; accumulate and look for end inGeneric -- extract setGeneric definition. Typical use: p = ParseGeneric() results = p.parse("myfile.R") or p.parse("myfile.R") results = p.generics() """ def __init__(self): self.reStartComment = re.compile(r"^\s*#") self.reInComment = re.compile(r"^(\s*#)|(\s*$)", re.DOTALL) self.reStartGeneric = re.compile(r"^([^#]*)(\s*setGeneric\(\"([^\"]+)\".*)$", re.DOTALL) self._gfname = "allGenerics.R" if os.path.exists(self._gfname): os.remove(self._gfname) def parse(self, fname, makeOutput=True): "parse the entire file. Return list of generics" self._fname = fname self._state = self.findComment self._generics = [] # results will go here self._currentGeneric = None # holds current parse if makeOutput: ofname = fname+".new" self._ofname = ofname self._ofile = open(ofname, "w") self._gfile = open(self._gfname, "a") else: self._ofname = None self._ofile = None self._gfile = None try: with open(fname, "r") as fin: if self._gfile: self._gfile.write("\n\n########## generics from %s #############\n\n"%fname) for line in fin: self._state(line) return self.generics() finally: if makeOutput: self.cleanup() def cleanup(self): "Final processing when we output a revised file" if self._ofile: self._ofile.close() self._ofile = None if self._gfile: self._gfile.close() self._gfile = None backupName = self.fileName()+".old" if os.path.exists(backupName): os.remove(backupName) # on Unix, but not MS Windows, preceding step is unnecessary os.rename(self.fileName(), backupName) os.rename(self._ofname, self.fileName()) def fileName(self): return self._fname def write(self, line): if self._ofile: self._ofile.write(line) def stripGeneric(self, pre, name): """"strip generic function name from file. pre is preceding material on line, before setGeneric.""" if not self._ofile: return if pre and not pre.isspace(): self._ofile.write(pre+"\n") self._ofile.write("# %s generic definition stripped out"%name) if self._gfname and self._gfile: self._ofile.write(" and put in %s.\n"%(self._gfname)) else: self._ofile.write(".\n") def currentGeneric(self): "Return current generic, creating it if necessary--for internal use" s = self._currentGeneric if s: return s s = SetGeneric(self.fileName()) self._currentGeneric = s return s def findComment(self, line): "look for start of a comment" if self.reStartComment.match(line): self.currentGeneric().addComment(line) self._state = self.inComment if not self.checkGeneric(line): self.write(line) def inComment(self, line): "scan through a comment" if self.reInComment.match(line): self.currentGeneric().addComment(line) elif self.checkGeneric(line): return else: self._state = self.findComment self._currentGeneric = None self.write(line) def checkGeneric(self, line): "True if line starts generic definition" m = self.reStartGeneric.match(line) if m: self._state = self.inGeneric self._parenDepth = 0 self._commas = 0 self.stripGeneric(m.group(1), m.group(3)) self.currentGeneric().setName(m.group(3)) self.inGeneric(m.group(2)) return True return False def inGeneric(self, line): "extract entire generic definition" i = 1 # 1 past current parse position for c in line: i += 1 if c == "(": self._parenDepth += 1 elif c == ",": self._commas += 1 elif c == ")": self._parenDepth -= 1 if self._parenDepth <= 0: self.currentGeneric().addDef(line[0:i]) post = line[i:len(line)] if not post.isspace(): self.write(post) return self.makeGeneric(self._commas) self.currentGeneric().addDef(line) def makeGeneric(self, ncommas): "Record generic based on _currentGeneric. It has ncommas+1 arguments" self.currentGeneric().setNargs(ncommas+1) self._generics.append(self.currentGeneric()) if self._gfile: self._gfile.write("%s\n"%(self.currentGeneric().asText())) self._currentGeneric = None self._state = self.findComment def generics(self): "return list of SetGeneric instances I found" return self._generics class SetGeneric: """Describes a single generic function definition.""" def __init__(self, sourceFile): "sourceFile <String> where this generic was defined" self._comment = [] self._code = [] self._file = sourceFile def addComment(self, line): "Add a line that is a comment" self._comment.append(line) def addDef(self, line): self._code.append(line) def setName(self, genericName): "set name of function being defined" self._name = genericName def setNargs(self, nargs): self._nargs = nargs def isFull(self): """True if generic definition is complete in itself, rather than relying on an existing regular function definition.""" return self._nargs > 1 def name(self): return self._name def file(self): "<String> file name where generic was defined" return self._file def hasComment(self): "True if there is a comment defined for me" return len(self._comment) > 0 def comment(self): "return comment as a (possibly multi-line) string" return "".join(self._comment) def code(self): "return definition as (possibly multi-line) string" return "".join(self._code) def asText(self): return self.comment() + self.code() def __str__(self): "Summary description" if self.hasComment(): wc = "with" else: wc = "without" if self.isFull(): f = "" else: f = "(need prior plain fn def)" return "setGeneric(%s) %s comment %s"%(self.name(), wc, f) class AllForKey: "track situations in which there may be more than one entry per key" def __init__(self): "values are lists containing the real values" self._dict = dict() def addKey(self, key, value): vs = self._dict.setdefault(key, []) vs.append(value) def keys(self): return self._dict.keys() def values(self, key): "return LIST of values for key" return self._dict[key] def duplicateKeys(self): "return a list of all keys with multiple entries" ks = [ k for (k, v) in self._dict.iteritems() if len(v)>1 ] return ks p = ParseGeneric() all = AllForKey() for fn in sys.argv[1:len(sys.argv)]: #print fn xs = p.parse(fn) for x in xs: all.addKey(x.name(), x) dups = all.duplicateKeys() if dups: print "There were duplicates." dups.sort() for k in (dups[0:1]): print "%s: "%k , for v in all.values(k): print "%s "%(v.file()) , print else: print "No Duplicates" keys = all.keys() keys.sort() print "Report for all definitions" for key in keys: print "%s: "%key , for v in all.values(key): wc = "" if v.hasComment() else "without comment " f = "" if v.isFull() else "PARTIAL DEFINITION " if not v.isFull(): print "%s%s in %s; "%(f, wc , v.file()) , print