Joshua Ulrich
2019-May-26 11:47 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
On Sun, May 26, 2019 at 4:06 AM Michael Chirico <michaelchirico4 at gmail.com> wrote:> > Have finally managed to come up with a fix after checking out sys.calls() > from within the as.Date.IDate debugger, which shows something like: > > [[1]] rbind(DF, DF) > [[2]] rbind(deparse.level, ...) > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > [[5]] as.Date(value) > [[6]] as.Date.IDate(value) > > I'm not sure why [<- is called, I guess the implementation is to assign to > the output block by block? Anyway, we didn't have a [<- method. And > [<-.Date looks like: > > value <- unclass(as.Date(value)) # <- converts to double > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class > > So we can fix our bug by defining a [<- class; the question that I still > don't see answered in documentation or source code is, why/where is [<- > called, exactly? >Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The `[<-` call is this line: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij That's where the storage.mode changes from integer to double. debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij Browse[2]> debug: xij Browse[2]> storage.mode(xij) [1] "integer" Browse[2]> value[[jj]][ri] [1] "2019-05-26" Browse[2]> storage.mode(value[[jj]][ri]) [1] "integer" Browse[2]> debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm Browse[2]> storage.mode(value[[jj]][ri]) [1] "double"> Mike C > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico <michaelchirico4 at gmail.com> > wrote: > > > Debugging this issue: > > > > https://github.com/Rdatatable/data.table/issues/2008 > > > > We have custom class 'IDate' which inherits from 'Date' (it just forces > > integer storage for efficiency, hence, I). > > > > The concatenation done by rbind, however, breaks this and returns a double: > > > > library(data.table) > > DF = data.frame(date = as.IDate(Sys.Date())) > > storage.mode(rbind(DF, DF)$date) > > # [1] "double" > > > > This is specific to base::rbind (data.table's rbind returns an integer as > > expected); in ?rbind we see: > > > > The method dispatching is not done via UseMethod(), but by C-internal > > dispatching. Therefore there is no need for, e.g., rbind.default. > > The dispatch algorithm is described in the source file > > (?.../src/main/bind.c?) as > > 1. For each argument we get the list of possible class memberships from > > the class attribute. > > 2. *We inspect each class in turn to see if there is an applicable > > method.* > > 3. If we find an applicable method we make sure that it is identical to > > any method determined for prior arguments. If it is identical, we proceed, > > otherwise we immediately drop through to the default code. > > > > It's not clear what #2 means -- an applicable method *for what*? Glancing > > at the source code would suggest it's looking for rbind.IDate: > > > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should > > be rbind here > > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over > > the classes, should get to IDate first > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > > > but adding this method (or even exporting it) is no help [ simply defining > > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > > > Lastly, it appears that as.Date.IDate is called, which is causing the type > > conversion: > > > > debug(data.table:::as.Date.IDate) > > rbind(DF, DF) # launches debugger > > x > > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not > > c(DF$date, DF$date) > > undebug(data.table:::as.Date.IDate) > > > > I can't really wrap my head around why as.Date is being called here, and > > even allowing that, why the end result is still the original class [ > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > > > So, I'm beginning to think this might be a bug. Am I missing something? > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2019 | www.rinfinance.com
Joshua Ulrich
2019-May-27 14:25 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:> > On Sun, May 26, 2019 at 4:06 AM Michael Chirico > <michaelchirico4 at gmail.com> wrote: > > > > Have finally managed to come up with a fix after checking out sys.calls() > > from within the as.Date.IDate debugger, which shows something like: > > > > [[1]] rbind(DF, DF) > > [[2]] rbind(deparse.level, ...) > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > > [[5]] as.Date(value) > > [[6]] as.Date.IDate(value) > > > > I'm not sure why [<- is called, I guess the implementation is to assign to > > the output block by block? Anyway, we didn't have a [<- method. And > > [<-.Date looks like: > > > > value <- unclass(as.Date(value)) # <- converts to double > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class > > > > So we can fix our bug by defining a [<- class; the question that I still > > don't see answered in documentation or source code is, why/where is [<- > > called, exactly? > > > Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The > `[<-` call is this line: > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > That's where the storage.mode changes from integer to double. > > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > Browse[2]> > debug: xij > Browse[2]> storage.mode(xij) > [1] "integer" > Browse[2]> value[[jj]][ri] > [1] "2019-05-26" > Browse[2]> storage.mode(value[[jj]][ri]) > [1] "integer" > Browse[2]> > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm > Browse[2]> storage.mode(value[[jj]][ri]) > [1] "double" >To be clear, I don't think this is a bug in rbind() or rbind.data.frame(). The confusion is that rbind.data.frame() calls `[<-` for each column of the data.frame, and there is no `[<-.IDate` method. So the parent class method is dispatched, which converts the storage mode to double. Someone may argue that this is an issue with `[<-.Date`, and that it shouldn't convert the storage.mode from integer to double.> > > Mike C > > > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico <michaelchirico4 at gmail.com> > > wrote: > > > > > Debugging this issue: > > > > > > https://github.com/Rdatatable/data.table/issues/2008 > > > > > > We have custom class 'IDate' which inherits from 'Date' (it just forces > > > integer storage for efficiency, hence, I). > > > > > > The concatenation done by rbind, however, breaks this and returns a double: > > > > > > library(data.table) > > > DF = data.frame(date = as.IDate(Sys.Date())) > > > storage.mode(rbind(DF, DF)$date) > > > # [1] "double" > > > > > > This is specific to base::rbind (data.table's rbind returns an integer as > > > expected); in ?rbind we see: > > > > > > The method dispatching is not done via UseMethod(), but by C-internal > > > dispatching. Therefore there is no need for, e.g., rbind.default. > > > The dispatch algorithm is described in the source file > > > (?.../src/main/bind.c?) as > > > 1. For each argument we get the list of possible class memberships from > > > the class attribute. > > > 2. *We inspect each class in turn to see if there is an applicable > > > method.* > > > 3. If we find an applicable method we make sure that it is identical to > > > any method determined for prior arguments. If it is identical, we proceed, > > > otherwise we immediately drop through to the default code. > > > > > > It's not clear what #2 means -- an applicable method *for what*? Glancing > > > at the source code would suggest it's looking for rbind.IDate: > > > > > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > > > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should > > > be rbind here > > > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over > > > the classes, should get to IDate first > > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > > > > > but adding this method (or even exporting it) is no help [ simply defining > > > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > > > > > Lastly, it appears that as.Date.IDate is called, which is causing the type > > > conversion: > > > > > > debug(data.table:::as.Date.IDate) > > > rbind(DF, DF) # launches debugger > > > x > > > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not > > > c(DF$date, DF$date) > > > undebug(data.table:::as.Date.IDate) > > > > > > I can't really wrap my head around why as.Date is being called here, and > > > even allowing that, why the end result is still the original class [ > > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > > > > > So, I'm beginning to think this might be a bug. Am I missing something? > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > R/Finance 2019 | www.rinfinance.com-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2019 | www.rinfinance.com
Michael Chirico
2019-May-27 14:31 UTC
[Rd] rbind has confusing result for custom sub-class (possible bug?)
Yes, thanks for following up on thread here. And thanks again for clearing things up, your email was a finger snap of clarity on the whole issue. I'll add that actually it was data.table's code at fault on the storage conversion -- note that if you use an arbitrary sub-class 'foo' with no methods defined, it'll stay integer. That's because [<- calls as.Date and then as.Date.IDate, and that method (ours) has as.numeric(); earlier I had recognized that if we commented that line, the issue was "fixed" but I still wasn't understanding the root cause. My last curiosity on this issue will be in my follow-up thread. Mike C On Mon, May 27, 2019, 10:25 PM Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:> On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich <josh.m.ulrich at gmail.com> > wrote: > > > > On Sun, May 26, 2019 at 4:06 AM Michael Chirico > > <michaelchirico4 at gmail.com> wrote: > > > > > > Have finally managed to come up with a fix after checking out > sys.calls() > > > from within the as.Date.IDate debugger, which shows something like: > > > > > > [[1]] rbind(DF, DF) > > > [[2]] rbind(deparse.level, ...) > > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > > > [[5]] as.Date(value) > > > [[6]] as.Date.IDate(value) > > > > > > I'm not sure why [<- is called, I guess the implementation is to > assign to > > > the output block by block? Anyway, we didn't have a [<- method. And > > > [<-.Date looks like: > > > > > > value <- unclass(as.Date(value)) # <- converts to double > > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class > > > > > > So we can fix our bug by defining a [<- class; the question that I > still > > > don't see answered in documentation or source code is, why/where is [<- > > > called, exactly? > > > > > Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The > > `[<-` call is this line: > > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > > > That's where the storage.mode changes from integer to double. > > > > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > Browse[2]> > > debug: xij > > Browse[2]> storage.mode(xij) > > [1] "integer" > > Browse[2]> value[[jj]][ri] > > [1] "2019-05-26" > > Browse[2]> storage.mode(value[[jj]][ri]) > > [1] "integer" > > Browse[2]> > > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm > > Browse[2]> storage.mode(value[[jj]][ri]) > > [1] "double" > > > To be clear, I don't think this is a bug in rbind() or > rbind.data.frame(). The confusion is that rbind.data.frame() calls > `[<-` for each column of the data.frame, and there is no `[<-.IDate` > method. So the parent class method is dispatched, which converts the > storage mode to double. > > Someone may argue that this is an issue with `[<-.Date`, and that it > shouldn't convert the storage.mode from integer to double. > > > > > Mike C > > > > > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico < > michaelchirico4 at gmail.com> > > > wrote: > > > > > > > Debugging this issue: > > > > > > > > https://github.com/Rdatatable/data.table/issues/2008 > > > > > > > > We have custom class 'IDate' which inherits from 'Date' (it just > forces > > > > integer storage for efficiency, hence, I). > > > > > > > > The concatenation done by rbind, however, breaks this and returns a > double: > > > > > > > > library(data.table) > > > > DF = data.frame(date = as.IDate(Sys.Date())) > > > > storage.mode(rbind(DF, DF)$date) > > > > # [1] "double" > > > > > > > > This is specific to base::rbind (data.table's rbind returns an > integer as > > > > expected); in ?rbind we see: > > > > > > > > The method dispatching is not done via UseMethod(), but by C-internal > > > > dispatching. Therefore there is no need for, e.g., rbind.default. > > > > The dispatch algorithm is described in the source file > > > > (?.../src/main/bind.c?) as > > > > 1. For each argument we get the list of possible class memberships > from > > > > the class attribute. > > > > 2. *We inspect each class in turn to see if there is an applicable > > > > method.* > > > > 3. If we find an applicable method we make sure that it is identical > to > > > > any method determined for prior arguments. If it is identical, we > proceed, > > > > otherwise we immediately drop through to the default code. > > > > > > > > It's not clear what #2 means -- an applicable method *for what*? > Glancing > > > > at the source code would suggest it's looking for rbind.IDate: > > > > > > > > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > > > > > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // > should > > > > be rbind here > > > > const char *s = translateChar(STRING_ELT(classlist, i)); // > iterating over > > > > the classes, should get to IDate first > > > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > > > > > > > but adding this method (or even exporting it) is no help [ simply > defining > > > > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > > > > > > > Lastly, it appears that as.Date.IDate is called, which is causing > the type > > > > conversion: > > > > > > > > debug(data.table:::as.Date.IDate) > > > > rbind(DF, DF) # launches debugger > > > > x > > > > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, > not > > > > c(DF$date, DF$date) > > > > undebug(data.table:::as.Date.IDate) > > > > > > > > I can't really wrap my head around why as.Date is being called here, > and > > > > even allowing that, why the end result is still the original class [ > > > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > > > > > > > So, I'm beginning to think this might be a bug. Am I missing > something? > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > > -- > > Joshua Ulrich | about.me/joshuaulrich > > FOSS Trading | www.fosstrading.com > > R/Finance 2019 | www.rinfinance.com > > > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > R/Finance 2019 | www.rinfinance.com >[[alternative HTML version deleted]]
Reasonably Related Threads
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)
- rbind has confusing result for custom sub-class (possible bug?)