Laurent Gautier
2008-Mar-12 21:06 UTC
[Rd] Subsetting vectors/arrays using factors can be seen as misleading
Dear list, Subsetting vectors/arrays using factors can be seen as misleading, and I was thinking that it could be discouraged (at least by issuing a warning). I could not find whether this was discussed earlier, but I can be pointed to a reference if I missed any. The "extract" operator "[" can take as arguments either vectors of integers or vectors of characters in order to subset a data structure. For example:> x <- seq(1, 5) > names(x) <- letters[1:5] > > x[1]a 1> x["a"]a 1 Using a factor caused some confusion to someone here, and I have to admit that it can indeed appear misleading:> f <- factor("a", levels=c("b", "a", "c")) > f[1] a Levels: b a c> x[f] # here the integer is used, rather than the levelb 2 The dual nature of the factor (vector of integers, with an attached vector of levels), is not always clear to many users, especially since factors are treated differently in other situations. Example:> f == 1[1] FALSE> f == "a" #here the level is used, not the integer[1] TRUE This is making me suggest that indexing using a factor could issue a warning, and the user should explicitly wrap the vector with either "as.integer" or "as.character". L. PS: All examples above were run with platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status Under development (unstable) major 2 minor 7.0 year 2008 month 03 day 12 svn rev 44742 language R version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
Prof Brian Ripley
2008-Mar-14 09:55 UTC
[Rd] Subsetting vectors/arrays using factors can be seen as misleading
This is long established and documented on the basic help page for '['. Further, the convention is widely used in R itself: running 'make check' would give a few hundred warnings and then fail. Working around those warnings would be inefficient (involving unnecessary copying of large objects). One place where this matters is the advice to use levels(x)[x] as in as.character.factor() -- that construction is widespread, perhaps so widespread as to make it worthwhile making that an internal operation. On Thu, 13 Mar 2008, Laurent Gautier wrote:> Dear list, > > Subsetting vectors/arrays using factors can be seen as misleading, and > I was thinking that it could be discouraged (at least by issuing a > warning). > I could not find whether this was discussed earlier, but I can be > pointed to a reference if I missed any. > > The "extract" operator "[" can take as arguments either vectors of > integers or vectors of characters in order to subset a data structure. > For example: >> x <- seq(1, 5) >> names(x) <- letters[1:5] >> >> x[1] > a > 1 >> x["a"] > a > 1 > > Using a factor caused some confusion to someone here, and I have to > admit that it can indeed appear misleading: >> f <- factor("a", levels=c("b", "a", "c")) >> f > [1] a > Levels: b a c >> x[f] # here the integer is used, rather than the level > b > 2 > > The dual nature of the factor (vector of integers, with an attached > vector of levels), is not always clear to many users, especially since > factors are treated differently in other situations. > Example: >> f == 1 > [1] FALSE >> f == "a" #here the level is used, not the integer > [1] TRUE > > This is making me suggest that indexing using a factor could issue a > warning, and the user should explicitly wrap the vector with either > "as.integer" or "as.character". > > > L. > > PS: All examples above were run with > platform x86_64-unknown-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status Under development (unstable) > major 2 > minor 7.0 > year 2008 > month 03 > day 12 > svn rev 44742 > language R > version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Apparently Analagous Threads
- Loss of dimensions in subsetting arrays
- misleading output after ordering data frame
- [LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation
- Subsetting vectors based on condition
- How to cbind or rbind different lengths vectors/arrays without repeating the elements of the shorter vectors/arrays ?