tlumley@u.washington.edu
2004-Sep-04 01:47 UTC
[Rd] Inconsistencies in subassignment (PR#7210)
I have made the 3-d case do the same as the vector case, which is what the C code clearly intended (a goto label was in the wrong place). This leaves the bigger question of the right thing to do. I note that data frames give an error when any indices are NA. -thomas On Fri, 3 Sep 2004 ripley@stats.ox.ac.uk wrote:> Apart from the inconsistencies, there are two clear bugs here: > > 1) miscalculating the number of values needed, in the matrix case. E.g. > > > AA[idx, 1] <- B[1:4] > Error in "[<-"(`*tmp*`, idx, 1, value = B[1:4]) : > number of items to replace is not a multiple of replacement length > > although only 4 values are replaced by AA[idx, 1] <- B. > > 2) the behaviour of the 3D case. > > ---------- Forwarded message ---------- > Date: Fri, 3 Sep 2004 16:40:24 +0100 (BST) > From: Prof Brian Ripley <ripley@stats.ox.ac.uk> > To: "Yao, Minghua" <myao@ou.edu> > Cc: R Help <r-help@stat.math.ethz.ch> > Subject: Re: [R] Different Index behaviors of Array and Matrix > > [I will copy a version of this to R-bugs: please be careful when you reply > to only copy to R-bugs a version with a PR number in the subject.] > > On Fri, 3 Sep 2004, Yao, Minghua wrote: > > > I found a difference between the indexing of an array and that of a > > matrix when there are NA's in the index array. The screen copy is as > > follows. > > > > > A <- array(NA, dim=6) > > > A > > [1] NA NA NA NA NA NA > > > > idx <- c(1,NA,NA,4,5,6) > > > B <- c(10,20,30,40,50,60) > > > A[idx] <- B > > > A > > [1] 10 NA NA 40 50 60 > > > AA <- matrix(NA,6,1) > > > AA > > [,1] > > [1,] NA > > [2,] NA > > [3,] NA > > [4,] NA > > [5,] NA > > [6,] NA > > > AA[idx,1] <- B > > > AA > > [,1] > > [1,] 10 > > [2,] NA > > [3,] NA > > [4,] 20 > > [5,] 30 > > [6,] 40 > > > > > In the case of a array, we miss the elements (20 and 30) in B > > corresponding to the NA's in the index array. In the case of a matrix, > > 20 and 30 are assigned to the elements indexed by the indexes following > > the NA's. Is this a reasonable behavior. Thanks in advance for > > explanation. > > A is a 1D array but it behaves just like a vector. > Wierder things happen with multi-dimensional arrrays > > > A <- array(NA, dim=c(6,1,1)) > > A[idx,1,1] <- B > > A > , , 1 > > [,1] > [1,] 10 > [2,] NA > [3,] NA > [4,] NA > [5,] NA > [6,] NA > > One problem with what happens for matrices is that > > > idx <- c(1,4,5,6) > > AA <- matrix(NA,6,1) > > AA[idx,1] <- B > Error in "[<-"(`*tmp*`, idx, 1, value = B) : > number of items to replace is not a multiple of replacement length > > is an error, so it is not counting the values consistently. > > The only discussion I could find (Blue Book p.103, which is also > discussing LHS subscripting) just says > > If a subscript is NA, an NA is returned. > > S normally does not use up values when encountering an NA in an index set, > although it does for logical matrix indexing of data frames. > > I can see two possible interpretations. > > 1) The NA indicates the values was lost after assignment. We don't know > what index the first NA was, so 20 got assigned somewhere. And as we > don't know where, all the elements had better be NA. However, that is > unless the NA was 0, when no assignment took place any no value was used. > > 2) The NA indicates the value was lost before assignment, so no assignment > took place and no value was used. > > R does neither of those. I suspect the correct course of action is to ban > NAs in subscripted assignments. > > > -- > Brian D. Ripley, ripley@stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Thomas Lumley Assoc. Professor, Biostatistics tlumley@u.washington.edu University of Washington, Seattle
On Sat, 4 Sep 2004 tlumley@u.washington.edu wrote:> I have made the 3-d case do the same as the vector case, which is what the > C code clearly intended (a goto label was in the wrong place). > > This leaves the bigger question of the right thing to do. I note that data > frames give an error when any indices are NA.One case is unambiguous and common: x[ind] <- val where `val' is of length one. I've written code to ban all other subassignments involving NAs. Once I fixed occurrences in R itself (notably in ifelse), only three problems remain in tests over the CRAN packages < Running examples in ape-Ex.R failed. < > ### * popsize area[a == 0] <- stepfunction[a == 0] < Running examples in RandomFields-Ex.R failed. < > ### * ShowModels expr[pmatch(covlist, namen)] <- exprlist < Running examples in sm-Ex.R failed. < > ### * sm.sphere z[xyzok < 0] <- (za - zb)[xyzok < 0] The first and third are a typical usage, where R makes more sense than S. [Worryingly, sm was written for S-PLUS and would seem to be incorrect there.] So in R 2.0.0 we will have \section{NAs in indexing}{ When subscripting, a numerical, logical or character \code{NA} picks an unknown element and so returns \code{NA} in the corresponding element of a logical, integer, numeric, complex or character result, and \code{NULL} for a list. When replacing (that is using subscripting on the lhs of an assignment) \code{NA} does not select any element to be replaced. As there is ambiguity as to whether an element of the rhs should be used or not (and \R handled this inconsistently prior to \R 2.0.0), this is only allowed if the rhs value is of length one (so the two interpretations would have the same outcome). } -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595