Dear r-devel, Among the R functions I have written and later shared with colleagues, there are five that I hope will become a part of the R base package. The tasks are neither specific nor marginal, so rather than creating one more 'misc' package, I would be happy if the R Development Core Team would adopt these functions and hammer them into shape. The functions are available from http://students.washington.edu/arnima/s, and a short demo session below explains their behaviour. Sorry about the length of this message, but it should be quick read.> env()Environment Objects Kb 1 .GlobalEnv 19 982 2 package:lme4 180 992 3 file:f:/gnu/home/r/toolbox.rdata 49 457 4 Autoloads 44 0 5 package:base 1657 8196 These environments are currently loaded. Verbose version of search().> ll()Class Kb chinook data.frame 66 chinook.0 glm 244 coho data.frame 127 coho.0 glm 466 fig2 function 21 fig3 function 13 table3 data.frame 1 table4 data.frame 2 x numeric 1 y integer 1 z list 5 My workspace contains these objects. Verbose version of ls(). I think package:R.oo provides something similar.> elem(coho)Class Kb <row.names> character 21 Species factor 7 Estuary ordered 8 EstSize factor 7 EstSizeLog numeric 14 EstNatural numeric 14 Oyster factor 7 RelYear factor 7 SSTsummer numeric 14 Survival numeric 14 TxF numeric 14 The coho data frame contains these elements. Compactly describes the data frame, not overlapping with summary() or describe() in package:Hmisc. I use this function when choosing appropriate storage mode for elements in large data frames. It has also helped me locating errors in imported data (numbers containing both . and , decimal seperator are flagged by factor).> elem(coho.0, dim=T)Class Kb Dim coefficients numeric 0 1 residuals numeric 35 1768 fitted.values numeric 35 1768 effects numeric 35 1768 R matrix 0 1 x 1 rank integer 0 1 qr list 35 5 family family 7 11 linear.predictors numeric 35 1768 deviance numeric 0 1 aic numeric 0 1 null.deviance numeric 0 1 iter integer 0 1 weights numeric 35 1768 prior.weights numeric 35 1768 df.residual numeric 0 1 df.null numeric 0 1 y numeric 35 1768 converged logical 0 1 boundary logical 0 1 model data.frame 50 1768 x 2 call call 1 5 formula formula 0 3 terms terms 1 3 data data.frame 127 1768 x 10 offset NULL 0 control list 0 3 method character 0 1 contrasts NULL 0 xlevels NULL 0 Gives me a good idea about the elements of this GLM.> is.what(y)[1] "is.atomic" "is.finite" "is.integer" "is.numeric" "is.vector" Now I know which is.* tests are positive on an integer object. Inspired by demo(is.things).> keep(fig2, fig3)[1] "chinook" "chinook.0" "coho" "coho.0" "table3" "table4" [7] "x" "y" "z" Shows which objects will be removed, if I'm sure.> keep(fig2, fig3, sure=T)Default workspace has been cleared, only fig2 and fig3 were kept. --- I understand that functions for the base package have to be selected very carefully, but I believe the functions above can save a significant amount of time and effort for many R users. My implementations are reasonably generic and robust, but I'm hoping the R Development Core Team will adopt and improve improve them. Regards, Arni Magnusson (fish biologist)
Good points, Martin. Thanks for looking at this. I have to admit I had only briefly looked at str() and ls.str() until I read your message. I guess they seemed to me more like a tool for developers than for end users, but now I realize str() is a valid contender with dim(), summary.data.frame(), describe(), and now elem() to view data frames 'from the outside'. Granted, different users are looking for different information. I often use dim(), summary(), and describe(), so I implemented elem() in a way that wouldn't overlap with those. Compactness and ease of reading, given my coho data frame (1768x10) is dim(coho): 6 non-whitespace characters, 1 line elem(coho): 184, 12 str(coho): 543, 11 summary(coho): 688, 8 describe(coho): 2242, 70 The main reason I include element size (sorry about the misspelling of kB) is simply that it's not provided recursively by any other function, yet the user might be interested. It makes it easier to evaluate how much the data containers could be shrunk by removing certain elements, coercing to other storage modes, or using a matrix. Mainly relevant for very large datasets, and perhaps beginners who are familiarizing themselves with different storage modes. The object size is more important in the ll() output. When I tidy my workspace, the heavy objects are often the first to go, but sometimes vice versa (using the keep() function). It also helps me spot major data frames and models in a sea of objects. One way to avoid the unwanted ll() name of the function would be to take the Unix analogy all the way and add an argument to ls(). Those who want could then define ll <- function(...) ls(..., long=TRUE) or something along those lines. Compactness stats are ls(): 74, 1 ll(): 156, 12 ll(dim=T): 197, 12 ls.str(max.level=-1): 353, 12 Discussing the core info-utilities in R is worth the time. Perhaps others will comment on what information they look for, and how they go about getting it. My only feedback so far is from colleagues who are using my functions. Cheers, Arni On Thu, 21 Aug 2003, Martin Maechler wrote:> Thank you, Arni. > > Note that "base" already has > str() which seems more useful than elem() > -- at least for human inspection, str() does not produce output; > > ls.str() building on str() which is somewhat > related to your ll() {the name of which would be too short for R base}. > > As a matter of fact, for a really compact ls.str() output, I'd > have to change the default value of 'max.level = 0' to '-1' and > change 'max.level = 0' to mean no recursion into the list > structure at all. > > Why is it important for you to know sizes in kB or MB, instead > of just length/dim ? Note that object.size() basically > recursively builds on length() {and typeof()}, but still is > only approximative. > > Regards, > Martin > > Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ > Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 > ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND > phone: x-41-1-632-3408 fax: ...-1228 <>< >