mwtoews at sfu.ca
2007-Apr-19 03:20 UTC
[Rd] Feature request for 'sprintf' optimization (PR#9621)
Full_Name: Michael Toews Version: R-devel and 2.4.1 OS: Debian etch and WindowsXP Submission from: (NULL) (142.58.206.114) This is a quick demonstration of the present time limitation of 'sprintf' on long vectors with a suggestion for significant optimization. First, consider a data.frame with numeric (double) values: dat <- data.frame(year=as.numeric(rep(1970:2000,each=365)), yday=as.numeric(1:365)) nrow(dat) Consider using 'sprintf' in R with and without casting the arrays: wocast <- system.time(with(dat,sprintf("%04i-%03i",year,yday))) wcast <- system.time(with(dat,sprintf("%04i-%03i",as.integer(year), as.integer(yday)))) 100*wocast/wcast # as a percent comparison My results on a Debian VM with R-devel (r41236) have elapsed ratios of 63408%, and on Windows XP with R 2.4.1 of 23300%. Using a similar data frame to 'dat' except, much longer (using 1900:2100 for year; nrow=73365) result in ratios of 120775%. Certainly, the time of the 'sprintf' wrapper is dependent not only on processor and platform, but more significantly on the data types of the '...' values passed to the wrapper. The first and simplest suggestion is to document in 'sprnitf' that it is significantly faster when supplied with values in the intended data type for 'fmt' through casting (namely using 'as.integer'). However, to the user it would seem that they have to specify the format twice (e.g., once for '%i' and the second for 'as.integer()'). A second and more elegant suggestion is for 'sprintf' (or called C code) is to parse 'fmt' for the data types, and cast the values from '...' according to those types before continuing with the wrapper call. (I have not looked at the source code, nor am I good C programmer, so I can't do more than suggest -- it is possible there could be an alternate optimizations in the wrapper, since the processing time is very dependent on the length of the '...' vectors, and it might be evaluating the values repeatedly in a 'for' loop.) Thanks! +mt