Markus Weisner
2010-Jul-19 20:56 UTC
[R] to extend data.frame or not ... that is the question
I am designing a package to analyze fire department data. I created an S4 class called CAD (short for Computer Aided Dispatch data) as part of this package that essentially extends the data.frame class. When the user initializes a CAD object, a data.frame object is broken down into seperate incident data and unit response data that are stored in separate slots. This separation is intentional because typically a user would want to analyze either incident data (e.g. plot fires by day of week) or response data (what is the 90th percentile response time for each unit). The two data sets are linked by a common incident number and, in preparing the data for analysis, they need to remain linked. For instance, it may be found that certain incidents (in the incident slot) had incident type "remove" ... the user would want to remove all of those incidents from both the incident table and the response table. In messing around with my package, I found the easiest (and most intuitive) way of cleaning data was to basically have the CAD object work exactly like a data.frame, but store the data in separate slots. To accomplish this, I set up methods that basically merge the data back together and treat the object as a data.frame. In some cases such as head() and tail() the object is left as a data frame. In other cases, such as subset and the indexing functions, the result is converted back to a CAD object. I ran into some problems implementing the rbind() method but, in the process of researching the issue, realized I may be reinventing the wheel here. I realized that maybe CAD should be defined as an extension of the data.frame class using setAs() and a defined coercion method. I played around with trying to get this work ... including trying things such as setOldClass to convert the S3 data.frame to a S4 class, but with no luck. I also tried reading portions of John Chambers book, Software for Data Analysis: Programming with R. Although the book really goes into detail about how to extend data.frames, I can't seem to make it work. My code is below. I am looking to answer the following questions: Should I extend data.frame? If so, what code would I use to do this? If I do not extend the data.frame class, how would I implement an rbind method that would allow me to bind CAD objects? Thanks in advance for any help, Cheers, Markus ########################################## ### define CAD class setClass('CAD', representation=representation(responses='data.frame', incidents='data.frame'), prototype=list(responses=data.frame(), incidents=data.frame())) ### define conversion between data.frame and S4 CAD object -- used to initialize new objects as.CAD = function(from, ...) attributes(from, ...) setGeneric("as.CAD") setMethod("as.CAD", "data.frame", function(from,row.names=NULL, optional=FALSE, ...){ #new("NFIRS", data=x) new("CAD", incidents=unique(from[,c("incident_num", "incident_type")]), responses=unique(from[,c("incident_num", "unit", "response_time")])) }) ### define methods setMethod("head", "CAD", function(x, n=5, row.names=NULL, optional=FALSE, ...){ tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) head(tdata, n=n, row.names=row.names, optional=optional) }) setMethod("tail", "CAD", function(x, n=5, row.names=NULL, optional=FALSE, ...){ tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) tail(tdata, n=n, row.names=row.names, optional=optional) }) setMethod("$", "CAD", function(x, name){ merge(x@responses, x@incidents, by="incident_num", all.x=TRUE)[,name] }) setReplaceMethod("$", "CAD", function(x, name, value) { tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) tdata[,name] <- value x = as.CAD(tdata) if(validObject(x)) x }) setMethod("subset", "CAD", function(x, subset_index, row.names=NULL, optional=FALSE, ...){ tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) tdata = subset(tdata, subset_index) as.CAD(tdata) }) setMethod("[", "CAD", function(x, i, j, ..., drop=TRUE) { tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) tdata[i, j, drop=drop] }) setReplaceMethod("[", "CAD", function (x, i, j, ..., value) { tdata = merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) tdata[i, j] <- value x=as.CAD(tdata) if(validObject(x)) x }) setMethod("as.data.frame", "CAD", function(x,row.names=NULL, optional=FALSE, ...){ merge(x@responses, x@incidents, by="incident_num", all.x=TRUE) }) ### create example object DF = data.frame(incident_num=c(1,2,2,3,4,4), incident_type=c("EMS", "FIRE", "FIRE", "EMS", "FIRE", "FIRE"), unit=c("E1", "E5", "T1", "E3", "E1", "T1"), response_time=c(300,400,350,250,500,200)) data = as.CAD(DF) ### test methods on CAD example head(data) tail(data) subset(data, data$unit %in% c("E5", "T1")) as.data.frame(data[2, c("incident_num", "unit")]) ################################################### *Markus Weisner*, Firefighter Medic and GIS Analyst Charlottesville Fire Department 203 Ridge Street Charlottesville, Virginia 22901 (434) 970-3240 [[alternative HTML version deleted]]