Hi All, I like to report this bug related to matrix subset by rownames when passed as factors. Now factors are may not be safe to use but then it should generate a warning message. Since many time we use values returned by some packages as factor to subset a matrix and which may result in a wrong calculation. I wish if "factor" is not expected in matrix operation then it should throw an error/warning message. Below are the codes to reproduce it.> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),c("A","B","C")))> > rNames <- as.factor(c("X","Z")) > # As some functions from different packages return factors and whichcould be overlooked> rNames[1] X Z Levels: X Z> > x[rNames,]A B C X 1 4 7 Y 2 5 8> > ## The intended matrix should return X and Z rows instead of X and Y > > sessionInfo()R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0 locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.1>-- With regards Rishi Das Roy [[alternative HTML version deleted]]
Hi, I get the same behavior in R 3.5.2 on macOS. Others may feel differently, but I am not so sure that this is a bug, as opposed to perhaps the need to clarify in ?Extract, that the following, which is found under Atomic vectors: "The index object i can be numeric, logical, character or empty. Indexing by factors is allowed and is equivalent to indexing by the numeric codes (see factor) and not by the character values which are printed (for which use [as.character(i)])." also applies to the indexing of matrices and arrays. Since matrices and arrays in R are vectors with 'dim' attributes, the behavior is essentially consistent as described above. Thus, perhaps just add the second sentence above or similar wording to the section for Matrices and arrays. Regards, Marc Schwartz> On Feb 20, 2019, at 4:23 AM, ??? ( ??? / rIsHi ) <rishi.dasroy at gmail.com> wrote: > > Hi All, > > I like to report this bug related to matrix subset by rownames when passed > as factors. Now factors are may not be safe to use but then it should > generate a warning message. Since many time we use values returned by some > packages as factor to subset a matrix and which may result in a wrong > calculation. > > I wish if "factor" is not expected in matrix operation then it should throw > an error/warning message. > > Below are the codes to reproduce it. > >> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), > c("A","B","C"))) >> >> rNames <- as.factor(c("X","Z")) >> # As some functions from different packages return factors and which > could be overlooked >> rNames > [1] X Z > Levels: X Z >> >> x[rNames,] > A B C > X 1 4 7 > Y 2 5 8 >> >> ## The intended matrix should return X and Z rows instead of X and Y >> >> sessionInfo() > R version 3.4.1 (2017-06-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 14.04.5 LTS > > Matrix products: default > BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 > LAPACK: /usr/lib/lapack/liblapack.so.3.0 > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.1 >> > > > > With regards > Rishi Das Roy
With on official weight, I second the opinion that the existing behavior is appropriate and not a bug. Functions should not "unexpectedly" return factors... a common example are the read.table family of functions that by default return factors, but the behaviour is deterministic and controllable with the as.is or stringsAsFactors arguments. If you have functions that randomly return different types then the bug is in those functions. Don't confuse factors and character data types... they are distinct and used for different purposes. On February 20, 2019 12:59:54 PM PST, Marc Schwartz via R-help <r-help at r-project.org> wrote:>Hi, > >I get the same behavior in R 3.5.2 on macOS. > >Others may feel differently, but I am not so sure that this is a bug, >as opposed to perhaps the need to clarify in ?Extract, that the >following, which is found under Atomic vectors: > >"The index object i can be numeric, logical, character or empty. >Indexing by factors is allowed and is equivalent to indexing by the >numeric codes (see factor) and not by the character values which are >printed (for which use [as.character(i)])." > >also applies to the indexing of matrices and arrays. > >Since matrices and arrays in R are vectors with 'dim' attributes, the >behavior is essentially consistent as described above. > >Thus, perhaps just add the second sentence above or similar wording to >the section for Matrices and arrays. > >Regards, > >Marc Schwartz > >> On Feb 20, 2019, at 4:23 AM, ??? ( ??? / rIsHi ) ><rishi.dasroy at gmail.com> wrote: >> >> Hi All, >> >> I like to report this bug related to matrix subset by rownames when >passed >> as factors. Now factors are may not be safe to use but then it should >> generate a warning message. Since many time we use values returned by >some >> packages as factor to subset a matrix and which may result in a wrong >> calculation. >> >> I wish if "factor" is not expected in matrix operation then it should >throw >> an error/warning message. >> >> Below are the codes to reproduce it. >> >>> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), >> c("A","B","C"))) >>> >>> rNames <- as.factor(c("X","Z")) >>> # As some functions from different packages return factors and which >> could be overlooked >>> rNames >> [1] X Z >> Levels: X Z >>> >>> x[rNames,] >> A B C >> X 1 4 7 >> Y 2 5 8 >>> >>> ## The intended matrix should return X and Z rows instead of X and Y >>> >>> sessionInfo() >> R version 3.4.1 (2017-06-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 14.04.5 LTS >> >> Matrix products: default >> BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 >> LAPACK: /usr/lib/lapack/liblapack.so.3.0 >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.4.1 >>> >> >> >> >> With regards >> Rishi Das Roy > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.