Dear all, I'd like to have a dataframe store information about the units of the data it contains. You'll find below a minimal exemple of the way I do, so far. I add a "units" attribute to the dataframe. But I dont' like the long syntax needed to access to the unit of a given variable (namely, something like : var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, "names"))]] Can anybody point me to a better solution ? Thanks in advance, Bruno. # Dataframe creation x <- c(1:10) y <- c(11:20) z <- c(101:110) my_frame <- data.frame(x, y, z) attr(my_frame, "units") <- c("x_unit", "y_unit") # # later on, using dataframe for (var_name in c("x", "y")) { idx <- match(var_name, attr(my_frame, "names")) var_unit <- attr(my_frame, "units")[[idx]] print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) }
Marc Schwartz
2011-Oct-03 15:08 UTC
[R] Best method to add unit information to dataframe ?
On Oct 3, 2011, at 9:35 AM, bruno Piguet wrote:> Dear all, > > I'd like to have a dataframe store information about the units of > the data it contains. > > You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > idx <- match(var_name, attr(my_frame, "names")) > var_unit <- attr(my_frame, "units")[[idx]] > print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > }The problem is that there are operations on data frames (e.g. subset()) that will end up stripping your attributes.> str(my_frame)'data.frame': 10 obs. of 3 variables: $ x: int 1 2 3 4 5 6 7 8 9 10 $ y: int 11 12 13 14 15 16 17 18 19 20 $ z: int 101 102 103 104 105 106 107 108 109 110 - attr(*, "units")= chr "x_unit" "y_unit" newDF <- subset(my_frame, x <= 5)> str(newDF)'data.frame': 5 obs. of 3 variables: $ x: int 1 2 3 4 5 $ y: int 11 12 13 14 15 $ z: int 101 102 103 104 105 You might want to look at either ?comment or the ?label function in Frank's Hmisc package on CRAN, either to use or for example code on how he handles this. HTH, Marc Schwartz
Hi Bruno, It sounds like what you want is really a separate class, one that has stores information about units for each variable. This is far from an elegant example, but depending on your situation may be useful. I create a new class inheriting from the data frame class. This is likely fraught with problems because a formal S4 class is inheriting from an informal S3. Then a data frame can be stored in the .Data slot (special---I did not make it), but character data can also be stored in the units slot (which I did define). You could get fancier imposing constraints that the length of units be equal to the number of columns in the data frame or the like. S3 methods for data frames should still mostly work, but you also have the ability to access the new units slot. You could define special S4 methods to do the extraction then, if you wanted, so that your ultimate syntax to get the units of a particular variable would be shorter. setOldClass("data.frame") setClass("mydf", representation(units = "character"), contains = "data.frame", S3methods = TRUE) tmp <- new("mydf") tmp at .Data <- mtcars tmp at row.names <- rownames(mtcars) tmp at units <- c("x", "y") ## data frameish colMeans(tmp) tmp + 10 # but tmp at units Cheers, Josh N.B. I've read once and skimmeda gain Chambers' book, but I still do not have a solid grasp on S4 so I may have made some fundamental blunder in the example. On Mon, Oct 3, 2011 at 7:35 AM, bruno Piguet <bruno.piguet at gmail.com> wrote:> Dear all, > > ?I'd like to have a dataframe store information about the units of > the data it contains. > > ?You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But ?I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > ? var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > ?Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > ? idx <- match(var_name, attr(my_frame, "names")) > ? var_unit <- attr(my_frame, "units")[[idx]] > ? print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Gabor Grothendieck
2011-Oct-03 16:56 UTC
[R] Best method to add unit information to dataframe ?
On Mon, Oct 3, 2011 at 10:35 AM, bruno Piguet <bruno.piguet at gmail.com> wrote:> Dear all, > > ?I'd like to have a dataframe store information about the units of > the data it contains. > > ?You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But ?I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > ? var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > ?Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > ? idx <- match(var_name, attr(my_frame, "names")) > ? var_unit <- attr(my_frame, "units")[[idx]] > ? print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > }The Hmisc package has some support for this: library(Hmisc) DF <- data.frame(x, y, z) units(DF$x) <- "my x units" units(DF$y) <- "my y units" units(DF$x) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com