I have just committed some changes to R-devel (which will become R 2.5.0 next spring) to add source references to parsed R code. Here's a description of the scheme: The design is done through 2 old-style classes. "srcfile" corresponds to a source file: it contains a filename, the working directory in which that filename is to be interpreted, the last modified timestamp of the file at the time the object is created, plus some internal components. It is implemented as an environment so that there can be multiple references to it. "srcref" is a reference to a particular range of characters (as the parser sees them; I think that really means bytes, but I haven't tested with MBCSs) in a source file. It is implemented as a vector of 4 integers (first line, first column, last line, last column), with the srcfile as an attribute. The parser attaches a srcref attribute to each complete statement as it gets parsed, if option("useSource") is TRUE. (I've left the old source attribute in place as well for functions; I think it won't be needed in the long run, but it is needed now.) When printing an object with a srcref attribute, print.default tries to read the srcfile to obtain the text. If it fails, it falls back to an ugly display of the reference. Using a new argument useSource=FALSE in printing will stop this attempt: when printing language, it will deparse; when printing a srcref, it will print the ugly fallback. source(echo=T) will echo all the lines of the file including comments and formatting. demo() does the same, and I would guess Sweave will do this too, but I haven't tested that yet. I think this will improve Sweave output, but will need changes to the input file: people may have comments there that they don't want shown. Some sort of "useSource=FALSE" option will need to be added. The browser used with debug() etc. will display statements as they were formatted in the original source. It will not display leading or following comments, but will display embedded comments. Parsing errors display the name of the source file that was parsed, and display verbose error messages describing what's wrong. This display could still be improved, e.g. by displaying the whole source line with a pointer to the error, instead of just the text up to the location of the error. I plan to add some sort of equivalent of C "#line" directives, so that preprocessed source files (e.g. the concatenated source that is installed) can include references back to the original source files, for syntax error reporting, and/or debugging. This will require modification of the INSTALL process, but I haven't started on this yet. It would probably be a good idea to have some utility functions to play with the srcref records for debugging and other purposes, but I haven't written those yet. For example, the current source record on a function could be replaced with a srcref, but only by expanding the srcref to include some of the surrounding comments. Comments and problem reports are welcome. Duncan Murdoch
On 11/25/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:> I have just committed some changes to R-devel (which will become R 2.5.0 > next spring) to add source references to parsed R code. Here's a > description of the scheme: > > The design is done through 2 old-style classes. > > "srcfile" corresponds to a source file: it contains a filename, the > working directory in which that filename is to be interpreted, the last > modified timestamp of the file at the time the object is created, plus > some internal components. It is implemented as an environment so that > there can be multiple references to it. > > "srcref" is a reference to a particular range of characters (as the > parser sees them; I think that really means bytes, but I haven't tested > with MBCSs) in a source file. It is implemented as a vector of 4 > integers (first line, first column, last line, last column), with the > srcfile as an attribute. > > The parser attaches a srcref attribute to each complete statement as it > gets parsed, if option("useSource") is TRUE. (I've left the old source > attribute in place as well for functions; I think it won't be needed in > the long run, but it is needed now.) > > When printing an object with a srcref attribute, print.default tries to > read the srcfile to obtain the text. If it fails, it falls back to an > ugly display of the reference. Using a new argument useSource=FALSE in > printing will stop this attempt: when printing language, it will > deparse; when printing a srcref, it will print the ugly fallback. > > source(echo=T) will echo all the lines of the file including comments > and formatting. demo() does the same, and I would guess Sweave will do > this too, but I haven't tested that yet. I think this will improve > Sweave output, but will need changes to the input file: people may have > comments there that they don't want shown. Some sort of > "useSource=FALSE" option will need to be added. > > The browser used with debug() etc. will display statements as they were > formatted in the original source. It will not display leading or > following comments, but will display embedded comments. > > Parsing errors display the name of the source file that was parsed, and > display verbose error messages describing what's wrong. This display > could still be improved, e.g. by displaying the whole source line with a > pointer to the error, instead of just the text up to the location of the > error. > > I plan to add some sort of equivalent of C "#line" directives, so that > preprocessed source files (e.g. the concatenated source that is > installed) can include references back to the original source files, for > syntax error reporting, and/or debugging. This will require > modification of the INSTALL process, but I haven't started on this yet. > > It would probably be a good idea to have some utility functions to play > with the srcref records for debugging and other purposes, but I haven't > written those yet. For example, the current source record on a function > could be replaced with a srcref, but only by expanding the srcref to > include some of the surrounding comments. > > Comments and problem reports are welcome.I haven't tested this, but the idea seems useful. Will this have any effect on code parsed using parse(text = "...")? Can it be extended to have some such effect? I ask because this is relevant in the context of Sweave, where I have always wanted the ability to retain the original formatting. I'm currently testing a patch that allows me to do this specifically for Sweave, but a more general solution is obviously preferable. -Deepayan
A few days ago Brian Ripley pointed out a bug with the design of this, so I've changed it. See the notes below if you were trying to work with it. On 11/25/2006 1:51 PM, Duncan Murdoch wrote:> I have just committed some changes to R-devel (which will become R 2.5.0 > next spring) to add source references to parsed R code. Here's a > description of the scheme: > > The design is done through 2 old-style classes. > > "srcfile" corresponds to a source file: it contains a filename, the > working directory in which that filename is to be interpreted, the last > modified timestamp of the file at the time the object is created, plus > some internal components. It is implemented as an environment so that > there can be multiple references to it. > > "srcref" is a reference to a particular range of characters (as the > parser sees them; I think that really means bytes, but I haven't tested > with MBCSs) in a source file. It is implemented as a vector of 4 > integers (first line, first column, last line, last column), with the > srcfile as an attribute. > > The parser attaches a srcref attribute to each complete statement as it > gets parsed, if option("useSource") is TRUE. (I've left the old source > attribute in place as well for functions; I think it won't be needed in > the long run, but it is needed now.)This is the part that changed. The srcref attribute is no longer attached to each statement, because some statements are objects that can't have attributes. Now a list of srcref objects is attached to the container of the statements: the expression() list in the case of parse(), or the call to "{" which is how the parser stores a block of code.> > When printing an object with a srcref attribute, print.default tries to > read the srcfile to obtain the text. If it fails, it falls back to an > ugly display of the reference. Using a new argument useSource=FALSE in > printing will stop this attempt: when printing language, it will > deparse; when printing a srcref, it will print the ugly fallback. > > source(echo=T) will echo all the lines of the file including comments > and formatting. demo() does the same, and I would guess Sweave will do > this too, but I haven't tested that yet. I think this will improve > Sweave output, but will need changes to the input file: people may have > comments there that they don't want shown. Some sort of > "useSource=FALSE" option will need to be added.As discussed, this facility was added to Sweave, but currently (and probably permanently) defaults to not being turned on.> The browser used with debug() etc. will display statements as they were > formatted in the original source. It will not display leading or > following comments, but will display embedded comments.I think the debugger will now only use deparsed output, since the srcref is no longer part of the statement.> > Parsing errors display the name of the source file that was parsed, and > display verbose error messages describing what's wrong. This display > could still be improved, e.g. by displaying the whole source line with a > pointer to the error, instead of just the text up to the location of the > error. > > I plan to add some sort of equivalent of C "#line" directives, so that > preprocessed source files (e.g. the concatenated source that is > installed) can include references back to the original source files, for > syntax error reporting, and/or debugging. This will require > modification of the INSTALL process, but I haven't started on this yet.I haven't done this yet, and I'm not sure I'll have time to get to it before 2.5.0.> It would probably be a good idea to have some utility functions to play > with the srcref records for debugging and other purposes, but I haven't > written those yet. For example, the current source record on a function > could be replaced with a srcref, but only by expanding the srcref to > include some of the surrounding comments.This hasn't been done either.> > Comments and problem reports are welcome.That's still true. Duncan Murdoch> > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel