Whenever going from working with a data.frame to a matrix, I get annoyed that I cannot assign and subset at the same time with matrices - like I can with data.frames. For example, if I want to add a new column to a data.frame, I can do something like `myDataFrame[, "newColumn"] <- NA`. However, with a matrix, this syntax does not work and I have to use a call to `cbind` and create a new object. For example, `mymatrix2 <- cbind(mymatrix, "newColumn" = NA)`. Is there a programming reason that base R does not have a matrix method for `[<-` or is it something that arguably should be added? -- David J. Disabato, Ph.D. Postdoctoral Research Scholar Kent State University ddisab01 at gmail.com Email is not a secure form of communication as information and confidentiality cannot be guaranteed. Information provided in an email is not intended to be a professional service. In the case of a crisis or emergency situation, call 911. [[alternative HTML version deleted]]
Hello David, On Sat, 23 Nov 2019 11:58:42 -0500 David Disabato <ddisab01 at gmail.com> wrote:> For example, if I want to add a new column to a data.frame, I can do > something like `myDataFrame[, "newColumn"] <- NA`.<Opinion> Arguably, iterative growth of data structures is not the "R style", since it may lead to costly reallocations, resulting in the worst case scenario of quadratic behaviour for linear operations. If iterative processing is unavoidable, it might help to store partial results in a list, then build the final matrix with a single call to do.call(cbind, results). </Opinion>> However, with a matrix, this syntax does not work and I have to use a > call to `cbind` and create a new object. For example, `mymatrix2 <- > cbind(mymatrix, "newColumn" = NA)`.> Is there a programming reason that base R does not have a matrix > method for `[<-` or is it something that arguably should be added?A data frame is a list of columns, so adding a new column is relatively cheap: allocate enough memory for one column and append (roughly speaking) a pointer to the list of pointers-to-column-data. This results in reallocation of the *latter* list, but, since that list is small in comparison to the whole data frame, it's okay. Note that this operation does not affect any of the other columns belonging to the same data frame. A matrix, on the other hand, is a vector containing the whole matrix with array dimensions stored as an attribute. Since R matrices are stored by column [*], adding a new column to the matrix means resizing the buffer to hold length(matrix) + nrow(matrix) elements, then appending the new column to the end of the buffer. If the allocator cannot enlarge the buffer in place (because the buffer is followed in memory by another buffer), it has to allocate the new buffer elsewhere, copy the memory, then free the old buffer. To build a matrix by appending columns, one needs to perform this O(n) operation O(n) times, resulting in O(n^2) performance. Adding rows is even worse because memory has to be copied in parts, not as a whole. Disclaimer: this is one reason I can think about why doesn't R offer subassignment to non-existent matrix columns by default. The actual reason might be different. -- Best regards, Ivan [*] https://github.com/wch/r-source/blob/bac4cd3013ead1379e20127d056ee036278b47ff/src/main/duplicate.c#L443
The subject is misguided. It is not a problem to assign to a subset of columns. The issue is that the assignment operation does not want to _expand_ the matrix automatically upon seeing an out-of-bounds index. E.g.:> M <- matrix(0,2,2) > M[,3]<-1Error in `[<-`(`*tmp*`, , 3, value = 1) : subscript out of bounds> M[,2]<-1 > M[,1] [,2] [1,] 0 1 [2,] 0 1 You can, however, do things like this:> M <- M[,c(1,2,2)] > M[,3]<-3 > M[,1] [,2] [,3] [1,] 0 1 3 [2,] 0 1 3 -pd> On 23 Nov 2019, at 17:58 , David Disabato <ddisab01 at gmail.com> wrote: > > Whenever going from working with a data.frame to a matrix, I get annoyed > that I cannot assign and subset at the same time with matrices - like I can > with data.frames. > > For example, if I want to add a new column to a data.frame, I can do > something like `myDataFrame[, "newColumn"] <- NA`. > > However, with a matrix, this syntax does not work and I have to use a call > to `cbind` and create a new object. For example, `mymatrix2 <- > cbind(mymatrix, "newColumn" = NA)`. > > Is there a programming reason that base R does not have a matrix method for > `[<-` or is it something that arguably should be added? > > -- > David J. Disabato, Ph.D. > Postdoctoral Research Scholar > Kent State University > ddisab01 at gmail.com > > Email is not a secure form of communication as information and > confidentiality cannot be guaranteed. Information provided in an email is > not intended to be a professional service. In the case of a crisis or > emergency situation, call 911. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Re: [<-. It is perhaps worth noting that the OP seems "misguided" in another sense. His complaint seems to rest on the assumption that because matrices and data frames both have a row/column structure, certain operations on them should be similar. I disagree. In fact, data frames and matrices are very different structures with very different semantics and wholly different purposes. Their "similarity" is superficial. First and foremost, (numeric) matrices are numerical objects, the basic building blocks for linear algebra with a whole devoted set of algebraic functionality for them (see also: BLAS) ; while data frames are essentially data storage/manipulation structures, internal data bases for R. As a result, imo, there is good reason that [<-. should *not* behave with matrices as it does with data frames: when doing complex matrix calculations, returning an error message when indices go out of range seems much more desirable than silently changing dimensions. Indeed, I think one might make a better argument for doing that for data frames also, but, as it is both relativey innocuous and convenient to add columns in that context -- the data frame method is just a wrapper for data.frame() as the man page says -- it's not really an issue (and certainly shouldn't be altered now). Perhaps a moral: one should be very wary of assuming that behavior that you think is "natural" and "desirable" would be assumed to be so by others. Especially for long used and extensively exercised core functionality. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Nov 24, 2019 at 6:47 AM peter dalgaard <pdalgd at gmail.com> wrote:> The subject is misguided. It is not a problem to assign to a subset of > columns. > > The issue is that the assignment operation does not want to _expand_ the > matrix automatically upon seeing an out-of-bounds index. E.g.: > > > M <- matrix(0,2,2) > > M[,3]<-1 > Error in `[<-`(`*tmp*`, , 3, value = 1) : subscript out of bounds > > M[,2]<-1 > > M > [,1] [,2] > [1,] 0 1 > [2,] 0 1 > > You can, however, do things like this: > > > M <- M[,c(1,2,2)] > > M[,3]<-3 > > M > [,1] [,2] [,3] > [1,] 0 1 3 > [2,] 0 1 3 > > -pd > > > On 23 Nov 2019, at 17:58 , David Disabato <ddisab01 at gmail.com> wrote: > > > > Whenever going from working with a data.frame to a matrix, I get annoyed > > that I cannot assign and subset at the same time with matrices - like I > can > > with data.frames. > > > > For example, if I want to add a new column to a data.frame, I can do > > something like `myDataFrame[, "newColumn"] <- NA`. > > > > However, with a matrix, this syntax does not work and I have to use a > call > > to `cbind` and create a new object. For example, `mymatrix2 <- > > cbind(mymatrix, "newColumn" = NA)`. > > > > Is there a programming reason that base R does not have a matrix method > for > > `[<-` or is it something that arguably should be added? > > > > -- > > David J. Disabato, Ph.D. > > Postdoctoral Research Scholar > > Kent State University > > ddisab01 at gmail.com > > > > Email is not a secure form of communication as information and > > confidentiality cannot be guaranteed. Information provided in an email is > > not intended to be a professional service. In the case of a crisis or > > emergency situation, call 911. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]