Hi, In an effort to learn S4 objects, I am trying to port some c based tree code to S4 object. My immediate goal is to use .Call() interface for calling my c code from R. My long term goal is to understand how to write c structs that follows S4 classes and not the other-way-around. The c struct for the node is : typedef struct node{ int c; int n; inode **nodes; //length = n } inode; I am trying to define corresponding S4 class. The closest I could come was: setClassUnion("listOrNULL",members=c("list", "NULL")) setClass(Class = "Inode", representation(c "numeric",n="numeric",nodes="listOrNULL")) This does not work, as setting an element of the list of nodes in a node object overwrites the original object. I have read that there are dirty ways of writing code that gets around this issue. For example, using the assign() operator. I am not even sure if it will work. I would appreciate if someone suggest ways in which this problem is best approached. Is it possible to have recursive data-structures within R ? What is the best way of writing an interface with pre-existing c code? What are the best practices for deveolping new code in c for such purposes? Thanks, Russ [[alternative HTML version deleted]]
Martin Morgan
2010-Jan-24 17:40 UTC
[Rd] recursive data-structures in R - An S4 "node" Class
Hi Russ -- some ideas below... On 01/23/2010 06:18 PM, rt wrote:> Hi, > > In an effort to learn S4 objects, I am trying to port some c based tree > code to S4 object. My immediate goal is to use .Call() interface for > calling my c code from R. My long term goal is to understand how to write c > structs that follows S4 classes and not the other-way-around. > > The c struct for the node is : > typedef struct node{ > int c; > int n; > inode **nodes; //length = n > } inode; > > I am trying to define corresponding S4 class. The closest I could come was: > setClassUnion("listOrNULL",members=c("list", "NULL")) > setClass(Class = "Inode", representation(c > "numeric",n="numeric",nodes="listOrNULL")) > > This does not work, as setting an element of the list of nodes in a node > object overwrites the original object. I have read that there are dirty ways > of writing code that gets around this issue. For example, using the assign() > operator. > I am not even sure if it will work. > > I would appreciate if someone suggest ways in which this problem is best > approached. > Is it possible to have recursive data-structures within R ?Maybe for your C structure I'd have started with setClass("INodeList", contains="list") setValidity("INodeList", function(object) { ok <- sapply(object, is, "INode") if (!all(ok)) "'INodeList' elements must all be 'INode'" else TRUE }) setClass("INode", representation=representation( c="integer", n="integer", nodes="INodeList")) and then> tree <- new("INode", n=2L,nodes=new("INodeList", list( new("INode"), new("INode", n=1L, nodes=new("INodeList", list(new("INode"))))))) creates a recursive structure. One might update with> anode <- new("INode", n=1L,nodes=new("INodeList", list(new("INode", c=1L, n=0L))))> tree at nodes[[2]]@nodes[[1]] <- anodeor write a recursive procedure (e.g., to count the number of nodes) setGeneric("icount", function(x) standardGeneric("icount")) setMethod(icount, "INode", function(x) { res <- sapply(x at nodes, icount) if (length(res) == 0) 1 else sum(res) + 1 })> icount(tree)[1] 5 or maybe if 'c' were meant to be the number of nodes below the current setGeneric("icumm", function(x) standardGeneric("icumm")) setMethod("icumm", "INode", function(x) { if (length(x at nodes) == 0) new("INode", x, c=1L) else { nodes = new("INodeList", lapply(x at nodes, icumm)) c <- 1L + sum(sapply(nodes, slot, "c")) new("INode", x, c=c, nodes=nodes) } })> t1 <- icumm(tree) > t1 at c[1] 5> t1 at nodes[[1]]@c[1] 1> t1 at nodes[[2]]@c[1] 3 All of this says 'yes, recursive data structures are possible within [S4 classes in] R'. But something like icumm makes a full copy of 'tree' (probably several more!, actually, thinking about it, probably many more since the last new("INodes", ...) likely recopies all the nodes below the current; rewriting this to reduce the copying [a revised algorithm, rather than clever R tricks or jumping to C [as ?rapply does] would be an interesting exercise for the algorithmically inclined reader), and the result is a different object, rather than an updated 'tree'. This is the copy-on-change semantics of R, and you'll not get around it easily (the usual approach is to use an environment, but then your R user accustomed to copy-on-change is surprised with action-at-a-distance, where say tree2 <- tree and changing 'tree' unexpectedly alters the value of 'tree2'). Another interesting exercise for the reader is to use standard OOP patterns to replace the conditional code with method dispatch, e.g., with a derived class NullINode rather than the R-centric classUnion.> What is the best way of writing an interface with pre-existing c code?'Best' is a strong word. If you're trying to capture data structures that have well-developed C-level support, then perhaps what you want is to return an 'ExternalPtr'; your S4 class would have a slot containing the ExternalPtr, and any access is via methods that end up doing .Call to a C-level manipulation on the pointer's content. You have some responsibility to make sure that the object is managed appropriately (e.g., with a finalizer that frees memory when the ExternalPtr is no longer referenced in R). You also need to protect or prepare your user for the action-at-distance problems mentioned above. Hope that helps. Martin> What are the best practices for deveolping new code in c for such purposes? > > Thanks, > > Russ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793