Michael Chirico
2019-May-26 05:16 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
Debugging this issue: https://github.com/Rdatatable/data.table/issues/2008 We have custom class 'IDate' which inherits from 'Date' (it just forces integer storage for efficiency, hence, I). The concatenation done by rbind, however, breaks this and returns a double: library(data.table) DF = data.frame(date = as.IDate(Sys.Date())) storage.mode(rbind(DF, DF)$date) # [1] "double" This is specific to base::rbind (data.table's rbind returns an integer as expected); in ?rbind we see: The method dispatching is not done via UseMethod(), but by C-internal dispatching. Therefore there is no need for, e.g., rbind.default. The dispatch algorithm is described in the source file (?.../src/main/bind.c?) as 1. For each argument we get the list of possible class memberships from the class attribute. 2. *We inspect each class in turn to see if there is an applicable method.* 3. If we find an applicable method we make sure that it is identical to any method determined for prior arguments. If it is identical, we proceed, otherwise we immediately drop through to the default code. It's not clear what #2 means -- an applicable method *for what*? Glancing at the source code would suggest it's looking for rbind.IDate: https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should be rbind here const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over the classes, should get to IDate first sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate but adding this method (or even exporting it) is no help [ simply defining rbind.IDate = function(...) as.IDate(NextMethod()) ] Lastly, it appears that as.Date.IDate is called, which is causing the type conversion: debug(data.table:::as.Date.IDate) rbind(DF, DF) # launches debugger x # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not c(DF$date, DF$date) undebug(data.table:::as.Date.IDate) I can't really wrap my head around why as.Date is being called here, and even allowing that, why the end result is still the original class [ class(rbind(DF, DF)$date) == c('IDate', 'Date') ] So, I'm beginning to think this might be a bug. Am I missing something? [[alternative HTML version deleted]]
Michael Chirico
2019-May-26 09:05 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
Have finally managed to come up with a fix after checking out sys.calls() from within the as.Date.IDate debugger, which shows something like: [[1]] rbind(DF, DF) [[2]] rbind(deparse.level, ...) [[3]] `[<-`(`*tmp*`, ri, value = 18042L) [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) [[5]] as.Date(value) [[6]] as.Date.IDate(value) I'm not sure why [<- is called, I guess the implementation is to assign to the output block by block? Anyway, we didn't have a [<- method. And [<-.Date looks like: value <- unclass(as.Date(value)) # <- converts to double .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class So we can fix our bug by defining a [<- class; the question that I still don't see answered in documentation or source code is, why/where is [<- called, exactly? Mike C On Sun, May 26, 2019 at 1:16 PM Michael Chirico <michaelchirico4 at gmail.com> wrote:> Debugging this issue: > > https://github.com/Rdatatable/data.table/issues/2008 > > We have custom class 'IDate' which inherits from 'Date' (it just forces > integer storage for efficiency, hence, I). > > The concatenation done by rbind, however, breaks this and returns a double: > > library(data.table) > DF = data.frame(date = as.IDate(Sys.Date())) > storage.mode(rbind(DF, DF)$date) > # [1] "double" > > This is specific to base::rbind (data.table's rbind returns an integer as > expected); in ?rbind we see: > > The method dispatching is not done via UseMethod(), but by C-internal > dispatching. Therefore there is no need for, e.g., rbind.default. > The dispatch algorithm is described in the source file > (?.../src/main/bind.c?) as > 1. For each argument we get the list of possible class memberships from > the class attribute. > 2. *We inspect each class in turn to see if there is an applicable > method.* > 3. If we find an applicable method we make sure that it is identical to > any method determined for prior arguments. If it is identical, we proceed, > otherwise we immediately drop through to the default code. > > It's not clear what #2 means -- an applicable method *for what*? Glancing > at the source code would suggest it's looking for rbind.IDate: > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should > be rbind here > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over > the classes, should get to IDate first > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > but adding this method (or even exporting it) is no help [ simply defining > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > Lastly, it appears that as.Date.IDate is called, which is causing the type > conversion: > > debug(data.table:::as.Date.IDate) > rbind(DF, DF) # launches debugger > x > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not > c(DF$date, DF$date) > undebug(data.table:::as.Date.IDate) > > I can't really wrap my head around why as.Date is being called here, and > even allowing that, why the end result is still the original class [ > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > So, I'm beginning to think this might be a bug. Am I missing something? >[[alternative HTML version deleted]]
Joshua Ulrich
2019-May-26 11:47 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
On Sun, May 26, 2019 at 4:06 AM Michael Chirico <michaelchirico4 at gmail.com> wrote:> > Have finally managed to come up with a fix after checking out sys.calls() > from within the as.Date.IDate debugger, which shows something like: > > [[1]] rbind(DF, DF) > [[2]] rbind(deparse.level, ...) > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > [[5]] as.Date(value) > [[6]] as.Date.IDate(value) > > I'm not sure why [<- is called, I guess the implementation is to assign to > the output block by block? Anyway, we didn't have a [<- method. And > [<-.Date looks like: > > value <- unclass(as.Date(value)) # <- converts to double > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class > > So we can fix our bug by defining a [<- class; the question that I still > don't see answered in documentation or source code is, why/where is [<- > called, exactly? >Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The `[<-` call is this line: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij That's where the storage.mode changes from integer to double. debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij Browse[2]> debug: xij Browse[2]> storage.mode(xij) [1] "integer" Browse[2]> value[[jj]][ri] [1] "2019-05-26" Browse[2]> storage.mode(value[[jj]][ri]) [1] "integer" Browse[2]> debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm Browse[2]> storage.mode(value[[jj]][ri]) [1] "double"> Mike C > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico <michaelchirico4 at gmail.com> > wrote: > > > Debugging this issue: > > > > https://github.com/Rdatatable/data.table/issues/2008 > > > > We have custom class 'IDate' which inherits from 'Date' (it just forces > > integer storage for efficiency, hence, I). > > > > The concatenation done by rbind, however, breaks this and returns a double: > > > > library(data.table) > > DF = data.frame(date = as.IDate(Sys.Date())) > > storage.mode(rbind(DF, DF)$date) > > # [1] "double" > > > > This is specific to base::rbind (data.table's rbind returns an integer as > > expected); in ?rbind we see: > > > > The method dispatching is not done via UseMethod(), but by C-internal > > dispatching. Therefore there is no need for, e.g., rbind.default. > > The dispatch algorithm is described in the source file > > (?.../src/main/bind.c?) as > > 1. For each argument we get the list of possible class memberships from > > the class attribute. > > 2. *We inspect each class in turn to see if there is an applicable > > method.* > > 3. If we find an applicable method we make sure that it is identical to > > any method determined for prior arguments. If it is identical, we proceed, > > otherwise we immediately drop through to the default code. > > > > It's not clear what #2 means -- an applicable method *for what*? Glancing > > at the source code would suggest it's looking for rbind.IDate: > > > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should > > be rbind here > > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over > > the classes, should get to IDate first > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > > > but adding this method (or even exporting it) is no help [ simply defining > > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > > > Lastly, it appears that as.Date.IDate is called, which is causing the type > > conversion: > > > > debug(data.table:::as.Date.IDate) > > rbind(DF, DF) # launches debugger > > x > > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not > > c(DF$date, DF$date) > > undebug(data.table:::as.Date.IDate) > > > > I can't really wrap my head around why as.Date is being called here, and > > even allowing that, why the end result is still the original class [ > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > > > So, I'm beginning to think this might be a bug. Am I missing something? > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2019 | www.rinfinance.com
Apparently Analagous Threads
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)