Bengoechea Bartolomé Enrique (SIES 73)
2009-Jul-08 08:15 UTC
[Rd] Suggestion: Dimension-sensitive attributes
Hi, I agree with Henrik that his suggestion to have "dimension vector attributes" working like dimnames (see below) would be an extremely useful infrastructure adittion to R. If this is not considered for R-core, I am happy to try to implement this in a package, as a new class. And possibly do the same thing for data frames. Should you have any comments, ideas or suggestions about it, please share! Best, Enrique ----------------------------------------------------------------------------- Subject: From: Henrik Bengtsson <hb_at_stat.berkeley.edu Date: Sun, 07 Jun 2009 14:42:08 -0700 Hi, maybe this has been suggested before, but would it be possible, without not breaking too much existing code, to add other "dimension vector attributes" in addition to 'dimnames'? These attributes would then be subsetted just like dimnames. Something like this:> x <- array(1:30, dim=c(2,3,5)) > dimnames(x) <- list(c("a", "b"), c("a1", "a2", "a3"), NULL); > dimattr(x, "misc") <- list(1:2, list(x=1:5, y=letters[1:8], z=NA), letters[1:5]);> y <- x[,1:2,2:3] > str(dimnames(y))List of 3 $ : chr [1:2] "a" "b" $ : chr [1:2] "a1" "a2" $ : NULL> str(dimattr(x, "misc"))List of 3 $ : int [1:2] 1 2 $ :List of 2 ..$ x: int [1:5] 1 2 3 4 5 ..$ y: chr [1:8] "a" "b" "c" "d" ... $ : chr [1:2] "b" "c" I can imagine this needs to be added in several places and functions such as is.vector() needs to be updated etc. It is not a quick migration, but is it something worth considering for the future? /Henrik
There have been times when I've thought this could be useful too. One way to go about it could be to introduce a special attribute that controls how attributes are dealt with in subsetting, e.g., "attr.dimname.like". The contents of this would be character data; on subsetting, any attribute that had a name appearing in this vector would be treated as a dimension. At the same time, it might be nice to also introduce "attr.keep.on.subset", which would specify which attributes should be kept on the result of a subsetting operation (could be useful for attributes that specify units). This of course could be a way of implementing Henrik's suggestion: dimattr(x, "misc") <- value would add "misc" to the "attr.dimname.like" attribute and also set the attribute "misc". The tricky part would be modifying the "[" methods. However, the most useful would probably be the one for ordinary matrices and arrays, and others could be modified when and if their maintainers see the need. -- Tony Plate Bengoechea Bartolom? Enrique (SIES 73) wrote:> Hi, > > I agree with Henrik that his suggestion to have "dimension vector attributes" working like dimnames (see below) would be an extremely useful infrastructure adittion to R. > > If this is not considered for R-core, I am happy to try to implement this in a package, as a new class. And possibly do the same thing for data frames. Should you have any comments, ideas or suggestions about it, please share! > > Best, > > Enrique > > ----------------------------------------------------------------------------- > Subject: > From: Henrik Bengtsson <hb_at_stat.berkeley.edu > Date: Sun, 07 Jun 2009 14:42:08 -0700 > > Hi, > > maybe this has been suggested before, but would it be possible, without not breaking too much existing code, to add other "dimension vector attributes" in addition to 'dimnames'? These attributes would then be subsetted just like dimnames. > > Something like this: > > >> x <- array(1:30, dim=c(2,3,5)) >> dimnames(x) <- list(c("a", "b"), c("a1", "a2", "a3"), NULL); >> dimattr(x, "misc") <- list(1:2, list(x=1:5, y=letters[1:8], z=NA), letters[1:5]); >> > > > >> y <- x[,1:2,2:3] >> str(dimnames(y)) >> > > List of 3 > > $ : chr [1:2] "a" "b" > $ : chr [1:2] "a1" "a2" > $ : NULL > > > >> str(dimattr(x, "misc")) >> > > List of 3 > $ : int [1:2] 1 2 > $ :List of 2 > ..$ x: int [1:5] 1 2 3 4 5 > ..$ y: chr [1:8] "a" "b" "c" "d" ... > $ : chr [1:2] "b" "c" > > I can imagine this needs to be added in several places and functions such as is.vector() needs to be updated etc. It is not a quick migration, but is it something worth considering for the future? > > /Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >
Bengoechea Bartolomé Enrique (SIES 73)
2009-Jul-09 09:14 UTC
[Rd] Suggestion: Dimension-sensitive attributes
> If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting.My proposed design would be that: * "objattr" would be a list of attributes (just preserved on subsetting) * "dimattr" would be a list with as many elements as array dimensions. Each element can be any object whose length matches the corresponding array dimension's length and that can be itself subsetted with "[": so it could be a vector, a list, a data frame... * "cellattr" would be any object whose dimensions match the array dimensions: another array, a data frame...> In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached.Indeed, that's the objective: attaching user-defined metadata that is automatically synchronized with subsetting operations to the actual data. I've had dozens of use cases on my own R programs that needed this type of pattern, and seen it implemented in different ways in several classes (xts, timeSeries, AnnotatedDataFrame, etc.) As you point, this could offer a unified design for a common need. Enrique -----Original Message----- From: Heinz Tuechler [mailto:tuechler at gmx.at] Sent: jueves, 09 de julio de 2009 10:56 To: Bengoechea Bartolom? Enrique (SIES 73); Tony Plate; r-devel at r-project.org Cc: Henrik Bengtsson Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes At 10:01 09.07.2009, SIES 73 wrote:>I've also had several use cases where I needed "cell-like" attributes, >that is, attributes that have the same dimensions as the original array >and are subsetted in the same way --along all its dimensions. > >So we're talking about a way to add metadata to matrices/arrays at 3 >possible levels: > > 1) at the "whole object" level: > attributes that are not dropped on subsetting > 2) at the "dimension" level: attributes that behave like > "dimnames", i.e. subsetted along each dimension > 3) at the "cell" level: attributes that are subsetted in the > same way as the original array > >My proposal would be simpler that Tony's >suggestion: like "dimnames", just have reserved attribute names for >each case, say "objdata", "dimdata", and "celldata" (or "objattr", >"dimattr" and "cellattr").If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached.>On the other hand, Tony's pattern would allow as many attributes of >each type as necessary (some multiplicity is already possible with the >simpler design as dimdata or celldata could be lists of lists), at the >cost of a more complex scheme of attributes that needs to be "parsed" >each time. > >On Tony's suggestion, "attr.keep.on.subset" and "attr.dimname.like" >(and possible >"attr.cell.like") could be kept on a single list with 3 elements, >something like: > > > attr(x, "attr.subset.with") <- list(object=..., dims=..., cells=...) > >Would something like this make sense for R-core --either for standard >arrays or as a new class-- or would it be better implemented in a >package? > >Enrique >
Starting by working on an interface for such object(s) is probably the first step toward a unified solution, and this before about if and how R attributes are used. It would also help to ensure a smooth transition from the existing classes implementing a similar solution (first the interface is added to those classes, then after a grace period the classes are eventually refactored). Dimension-level is what seems to the be most needed... but I am not convinced of the practicality of the object-level, and cell-level scheme s proposed: - Object-level, if not linked to any dimension-attribute is such saying that one want to attach anything to any object. That's what attr() is already doing. - Cell-level, is may be out-of-scope for one first trial (but may be I missed the use-cases for it) If starting with behaviour, it seems to boil to having "["/"[<-" and "dimmeta()"/"dimmeta<-()", : - extract "[" / replace "[<-" : * keeps working the way it already does * extracts a subset of the object as well as a subset of the dimension-associated metadata. * departing too much from the way "[" is working and add behind-the-curtain name matching will only compromise the chances of adoption. * forget about the bit about which metadata is kept and which one isn't when using "[". Make a function "unmeta()" (similar behavior to "unname()") to drop them all, or work it out with something like > dimmeta(x, 1) <- NULL # drop the metadata associated with dimension 1 - access the dimension-associated metadata: * may be a function called "dimmeta()" (for consistency with "dimnames()") ? The signature could be dimmeta(x, i), with x the object, and i the dimension requested. A replace function "dimmeta<-"(x, i, value) would be provided. In the abstract the "names" associated with a given dimension is just one of possible metadata, but I'd keep away from meddling with it for a start. It would seem natural that metadata associated with one dimension: would a table-like object (data.frame seems natural in R, and unfortunately there is no data.frame-like structure in R). L.