HB
2024-Jul-14 17:09 UTC
[Rd] xftrm is more than 100x slower for AsIs than for character vectors
Dear Ivan, thanks for the confirmation and the proposed patch. I just wanted to add some notes regarding the relevance of this: base::merge using by.x=0 or by.y=0 (i.e. matching on row.names) will automatically add a column Row.names which is I(row.names(x)) to the corresponding input table (using I() since revision 39026 to avoid conversion of character to factor). When this column is used for sorting (sort=TRUE by default in merge; should happen at least if all.x=T or all.y=T), this will result in slower execution. xtfrm.AsIs is unchanged since its addition in r50992 (likely unrelated to the former). So I guess that this just went unnoticed since it will not cause problems on small data frames. Best regards Hilmar [[alternative HTML version deleted]]
Hilmar Berger
2024-Jul-16 07:08 UTC
[Rd] I() in merge (was: Re: xftrm is more than 100x slower for AsIs than for character vectors)
Dear all, actually, it is not clear to me why there is still a protection of the added Row.names column in merge using I(). This seems to stem from a time when R would automatically convert character vectors to factor in data.frame on insert. However, I can't reproduce this behaviour even in data.frames generated with stringsAsFactors = T in current versions of R. Maybe the I() inserted in r 39026 can be removed altogether? Best regards Hilmar On 14.07.24 19:09, HB via R-devel wrote:> Dear Ivan, > > thanks for the confirmation and the proposed patch. > > I just wanted to add some notes regarding the relevance of this: base::merge using by.x=0 or by.y=0 (i.e. matching on row.names) will automatically add a column Row.names which is I(row.names(x)) to the corresponding input table (using I() since revision 39026 to avoid conversion of character to factor). When this column is used for sorting (sort=TRUE by default in merge; should happen at least if all.x=T or all.y=T), this will result in slower execution. > > xtfrm.AsIs is unchanged since its addition in r50992 (likely unrelated to the former). > > So I guess that this just went unnoticed since it will not cause problems on small data frames. > > Best regards > > Hilmar > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Possibly Parallel Threads
- xftrm is more than 100x slower for AsIs than for character vectors
- xftrm is more than 100x slower for AsIs than for character vectors
- xftrm is more than 100x slower for AsIs than for character vectors
- order() fails on a chr object of class "AsIs" with "\265" in it
- Refactor all factors in a data frame