Florian Sihler
2022-Dec-06 14:08 UTC
[R] Preexisting Work on Data- and Control-Flow Analysis
Hello R-Help Mailinglist, I hope I've found the correct mailing list for my question (if not, please point me to the correct one). For my master's thesis I plan on creating and implementing a program-slicing algorithm for R-Programs using (probably only static) data- and control-flow analysis. While researching the problem I was unable to find any preexisting work on the matter. Does anyone here know of any preexisting work on data- and control-flow analysis (or even program slicing) in the context of R-Programs? I would be really glad for any pointer in the right direction (or reasons for why doing that would be a stupid idea). Regarding my background: I am a computer science student and usually program in C++, Java, TypeScript, and Haskell. Although I've worked with R for roughly a year now (mostly in my spare time), I am still getting used to some constructs. Thank you, Florian
Richard O'Keefe
2022-Dec-08 05:15 UTC
[R] Preexisting Work on Data- and Control-Flow Analysis
You should probably look at the compiler. One issue with data and control flow analysis in R is that f <- function (x, y) x + y f(ping, pong) may invoke an S3 (see ?S3groupGeneric, Ops) or S4 (see ?Arith) method, which might not have existed when f was analysed. Indeed, f <- function (x, y) { foo(x); bar(y); x + y } may change the method(s) bound to + in foo and/or bar, so the bindings may change while f is running. Then there's the whole "imperative programming with lazy function argument evaluation" thing, which is definitely going to make analysis a wee bit challenging. And then there's this little gem:> x <- 17 > f <- function () {+ cat(x, "\n") + x <- 42 + cat(x, "\n") + }> f()The first occurrence of x in f and the last occurrence of x in f refer to *different* variables. The last occurrence of x refers to a local variable, but that local variable did not exist until it was assigned to. (Yep, the set of local variables of a function is *dynamic*.) While there are oh so many ways that R can make life horrible for analysis, even R programmers don't go out of their way to make life difficult for themselves. It will probably be good enough to define a "sane" subset of R, and a tool that reports that an R function is outside that subset will be useful in its own right because most of the time it won't be intentional. I set out to write an R compiler 20+ years ago. I filled several exercise books with issues. I suggest you start by considering the question "what variables are in the environment at this control point." Start with ?get, ?assign, ?exists, ?rm. On Wed, 7 Dec 2022 at 05:20, Florian Sihler <florian.sihler at uni-ulm.de> wrote:> Hello R-Help Mailinglist, > > I hope I've found the correct mailing list for my question (if not, > please point me to the correct one). > For my master's thesis I plan on creating and implementing a > program-slicing algorithm for R-Programs using (probably only static) > data- and control-flow analysis. > While researching the problem I was unable to find any preexisting work > on the matter. > Does anyone here know of any preexisting work on data- and control-flow > analysis (or even program slicing) in the context of R-Programs? > I would be really glad for any pointer in the right direction (or > reasons for why doing that would be a stupid idea). > > Regarding my background: I am a computer science student and usually > program in C++, Java, TypeScript, and Haskell. > Although I've worked with R for roughly a year now (mostly in my spare > time), I am still getting used to some constructs. > > Thank you, > Florian > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]