On Sat, 18 Dec 2021 11:50:54 +0100 Arnaud FELD <arnaud.feldmann at gmail.com> wrote:> However, I'm a bit troubled about the "address" argument. What is it > intended for since (as far as I know) "address equality" is until now > something that isn't really let for the user to decide within R.Using the words from "Extending R" by John M. Chambers, the concept of address identity could be related to the question:>> If some of the data in the object has changed, is this still the >> same object?Most objects in R are defined by their content. If you had a 100x100 matrix and changed an element at [50,50], it's now a different matrix, even if it's stored in the same variable. If you create another 100x100 matrix in a different variable but fill it with the same numbers, it should still compare equal to your original matrix. Not all types of R objects are like that. Environments are good candidates for pointer equality comparison. For example, the contents of the global environment change every time you assign some variable in the R command line, but it remains the same global environment. Indeed, identical() for environments just compares their pointers: even if two different environments only contain objects that compare equal, they cannot be considered the same environment, because different closures might be referring to them. Similar are data.tables: if you had a giant dataset and, as part of cleaning it up, removed some outliers, perhaps it should be considered the same dataset, even if the contents aren't strictly the same any more. Same goes for reference class and R6 objects: unlike the pass-by-value semantics associated with most objects in R, these are assumed to carry global state within them, and modifications to them are reflected everywhere they are referenced, not limited to the current function call. I *think* that most (if not all) objects with reference semantics already use pointer comparison when being compared by identical(), so the default of "identical" is, as the help page says, almost always the right choice, but if it matters to your code whether the objects are actually stored in the same area in the memory, use hashes of type "address". (Perhaps this topic could be a better fit for R-help.) -- Best regards, Ivan
iuke-tier@ey m@iii@g oii uiow@@edu
2021-Dec-22 15:11 UTC
[Rd] [External] Re: hashtab address arg
On Wed, 22 Dec 2021, Ivan Krylov wrote:> On Sat, 18 Dec 2021 11:50:54 +0100 > Arnaud FELD <arnaud.feldmann at gmail.com> wrote: > >> However, I'm a bit troubled about the "address" argument. What is it >> intended for since (as far as I know) "address equality" is until now >> something that isn't really let for the user to decide within R. > > Using the words from "Extending R" by John M. Chambers, the concept of > address identity could be related to the question: > >>> If some of the data in the object has changed, is this still the >>> same object? > > Most objects in R are defined by their content. If you had a 100x100 > matrix and changed an element at [50,50], it's now a different matrix, > even if it's stored in the same variable. If you create another 100x100 > matrix in a different variable but fill it with the same numbers, it > should still compare equal to your original matrix. > > Not all types of R objects are like that. Environments are good > candidates for pointer equality comparison. For example, the contents > of the global environment change every time you assign some variable in > the R command line, but it remains the same global environment. Indeed, > identical() for environments just compares their pointers: even if two > different environments only contain objects that compare equal, they > cannot be considered the same environment, because different closures > might be referring to them. Similar are data.tables: if you had a giant > dataset and, as part of cleaning it up, removed some outliers, perhaps > it should be considered the same dataset, even if the contents aren't > strictly the same any more. Same goes for reference class and R6 > objects: unlike the pass-by-value semantics associated with most > objects in R, these are assumed to carry global state within them, and > modifications to them are reflected everywhere they are referenced, not > limited to the current function call.This is still experimental and the 'address' option may not survive at the R level. There are some C level applications where it can be useful; maybe it will only be retained there.> I *think* that most (if not all) objects with reference semantics > already use pointer comparison when being compared by identical(), so > the default of "identical" is, as the help page says, almost always the > right choice, but if it matters to your code whether the objects are > actually stored in the same area in the memory, use hashes of type > "address".Unfortunately not all: External pointer objects are reference objects but by default are not compared based on object address. Fixing the default is not an option in the short term as it breaks too much code (mostly through dependencies on a few packages).> (Perhaps this topic could be a better fit for R-help.)R-devel is the right place for this. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu