Paul Johnson
2014-Apr-10 14:59 UTC
[R] premature evaluation of symbols. Is that the way to describe this problem?
Dear eveRybody In the package rockchalk, I have functions that take regressions and make tables and plots. Sometimes I'll combine a model with various arguments and pass the resulting list around to be executed, usually using do.call. While debugging a function on a particularly large dataset, I noticed that I've caused inconvenience for myself. I get the correct calculations, but the arguments into functions are not symbols when the function call happens. They are fully "written out". So the debugger does not see function(x), it sees function(c(1,3,1,3,45,2,4,2....). Debugging begins with a comprehensive listing of the elements in every object, which freezes Emacs/ESS and generally ruins my quality of life. I don't know the right words for this, but it seems to me object names are parsed and evaluated before they need to be. Function calls include the evaluated arguments in them, not just the symbols for them. I have some guesses on fixes, wonder what you think. And I wonder if fixing this problem might generally make my functions faster and more efficient because I'm not passing gigantic collections of numbers through the function call. I know, I don't have all the right words here. I've built a toy example that will illustrate the problem, you tell me the words for it. ## Paul Johnson 2014-04-10 dat <- data.frame(x = rnorm(50),y = rnorm(50)) m1 <- lm(y ~ x, dat) myRegFit <- function(model, nd) predict(model, nd) mySpecialFeature <- function(model, ci){ pargs <- list(model, model.frame(model)[1:3, ]) res <- do.call(myRegFit, pargs) print(res) } mySpecialFeature (m1) debug(myRegFit) mySpecialFeature (m1) Note when the debugger enters, it has the whole model's structure "splatted" into the function call.> dat <- data.frame(x = rnorm(50),y = rnorm(50)) > m1 <- lm(y ~ x, dat) > myRegFit <-function(model, nd) predict(model, nd) > mySpecialFeature <- function(model, ci){+ pargs <- list(model, model.frame(model)[1:3, ]) + res <- do.call(myRegFit, pargs) + print(res) + }> mySpecialFeature (m1)1 2 3 -0.04755431 0.35162844 -0.11715522> debug(myRegFit) > mySpecialFeature (m1)debugging in: (function (model, nd) predict(model, nd))(list(coefficients = c(0.0636305741709566, -0.177786836929453), residuals = c(-0.0152803151982162, -0.885875162659858, -1.23645319405006, -1.77900639571943, -1.9952045397527, 1.38150266407176, -2.27403449262599, 0.0367524776530579, -0.881037818467492, -1.10816713568432, -0.55749829201921, -0.372526253742828, -0.353208893775679, 0.531708523456415, -0.43187124865558, 1.03973431972897, 0.849170115617157, 1.11227803262189, 0.47216440383252, 0.920060697785203, -0.374672861268964, 2.94683565121636, 0.514112041811711, -0.52321362055969, -0.0412387814196237, 0.983863448669766, 0.534230127442599, -0.869960511196742, 1.90586406082412, -1.84705932449576, 0.806425475391075, 1.90939977897903, 0.41030042787483, 0.994503041407507, 0.715719209301158, -0.538096591457249, -0.482411304681239, 0.0323998214753804, 0.551162374882342, -0.618989357027834, 1.08996565055366, -0.697423620816604, 1.38170655971013, 1.55752893685726, -0.0929258405664267, -1.00210610433922, -1.51879925258188, -1.57050250989563, -1.06868502360026, 0.458860605094578 ), effects = c(-0.661274203468574, -1.07255577360914, -1.36123824096605, -1.71875465303308, -1.8806486154155, 1.38636416232103, -2.29259163100096, 0.153263315278269, -0.950879523052079, -0.963705724647863, -0.62175976245114, -0.423965256680951, -0.350662885659068, 0.469175412149025, -0.400505083448679, 1.03440116252973, 0.878739280288923, 1.16574672001397, 0.358222935858666, 1.00514946836967, -0.316592303881481, 2.8507611924072, 0.573209391002668, -0.393720180068215, 0.0971873363200073, 1.23818281311352, 0.449576129222722, -0.929511151618747, 1.97922180250824, -1.80820009744905, 0.877996855335966, 1.86871623414376, 0.226023354471842, 0.814892815951223, 0.821400980265047, -0.536299037896556, -0.358703204255386, 0.105714598012197, 0.543301738010905, -0.643659132172249, 1.26412624281219, -0.808498804261978, 1.32273956796476, 1.57655585529458, 0.022266343917185, -1.20958321975888, -1.52288584310647, -1.60904879400386, -1.08384090772898, 0.567729611277337), rank = 2L, fitted.values = c(-0.0475543078742839, 0.351628437239565, -0.117155221544445, 0.165041007767092, 0.247859323684996, 0.0805663573239046, 0.0448510221995737, 0.250840726246576, -0.0333621333428176, 0.293467635417987, -0.0248518201238109, -0.00529650906918994, 0.0770350456001708, -0.0222159311803766, 0.120988140963125, 0.0650186744394092, 0.118247568257686, 0.154696293986828, -0.100617877288265, 0.202919505525394, 0.161729772710581, -0.0733692278375946, 0.163280463324874, 0.270640256833884, 0.284263319806951, 0.461009994308065, -0.0559520919143943, -0.0176674192887905, 0.185028727541523, 0.132415672762266, 0.182304379869478, 0.011106443703975, -0.207885424899216, -0.200768100305762, 0.234325514404888, 0.0758935912226656, 0.261817140331961, 0.184963202179859, 0.0611640613395693, 0.0355287514706238, 0.338761314434586, -0.0962465590881959, -0.0167773073486601, 0.102169781012095, 0.248829672417071, -0.243267385378769, 0.0669197905143708, 0.0143659410153998, 0.0500382129484238, 0.239186308642765 ), assign = 0:1, qr = list(qr = c(-7.07106781186548, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 0.14142135623731, 1.18871623033411, 6.0328188078105, -0.18012556382541, 0.0829807810893617, 0.160196640586552, 0.00422063404457838, -0.0290786460517367, 0.162976358560143, -0.102000873004284, 0.202719661631434, -0.0940662616503503, -0.0758338195379766, 0.000928206918060759, -0.0916086841440873, 0.0419079829301299, -0.0102752861369684, 0.0393528032622073, 0.0733358618834786, -0.164706930442335, 0.118296891160581, 0.0798935429820433, -0.139301585451296, 0.0813393331707195, 0.181436499351408, 0.194137995449112, 0.358928189908056, -0.123062676154681, -0.0873678679520487, 0.101616380529727, 0.0525624701654998, 0.0990763283113914, -0.0605404863829379, -0.264718090934247, -0.258082235933901, 0.147578360387876, -0.000136030863872082, 0.173210245091236, 0.101555287798443, -0.0138691440925964, -0.0377702879768747, 0.244949334093302, -0.160631321222164, -0.0865379699066212, 0.0243626389824618, 0.161101347601284, -0.297706548364757, -0.0085027759125684, -0.0575014860888491, -0.0242423560643218, 0.152110333789614), qraux = c(1.14142135623731, 1.25694602752555 ), pivot = 1:2, tol = 1e-07, rank = 2L), df.residual = 48L, xlevels = list(), call = lm(formula = y ~ x, data = dat), terms = y ~ x, model = list(y = c(-0.0628346230725001, -0.534246725420292, -1.3536084155945, -1.61396538795233, -1.7473452160677, 1.46206902139567, -2.22918347042642, 0.287593203899634, -0.91439995181031, -0.81469950026633, -0.582350112143021, -0.377822762812018, -0.276173848175508, 0.509492592276038, -0.310883107692455, 1.10475299416838, 0.967417683874843, 1.26697432660872, 0.371546526544255, 1.1229802033106, -0.212943088558383, 2.87346642337877, 0.677392505136584, -0.252573363725805, 0.243024538387328, 1.44487344297783, 0.478278035528205, -0.887627930485533, 2.09089278836564, -1.71464365173349, 0.988729855260553, 1.920506222683, 0.202415002975614, 0.793734941101744, 0.950044723706047, -0.462203000234584, -0.220594164349278, 0.21736302365524, 0.612326436221911, -0.58346060555721, 1.42872696498824, -0.793670179904799, 1.36492925236147, 1.65969871786935, 0.155903831850644, -1.24537348971799, -1.45187946206751, -1.55613656888023, -1.01864681065184, 0.698046913737343), x = c(0.6253830934028, -1.6199054330602, 1.01686828360155, -0.570404622454562, -1.03623391189047, -0.0952589260568722, 0.105629597194727, -1.05300344676195, 0.545556179461482, -1.29276759301495, 0.497688106852802, 0.387695087164958, -0.0753963097646713, 0.482861987051367, -0.322619873230144, -0.00780766614911678, -0.307204937272165, -0.512218572469498, 0.9238504621374, -0.783460315510917, -0.551779874336542, 0.770584619056546, -0.560502064578936, -1.16437013132232, -1.24099595586789, -2.23514534034255, 0.672618221633596, 0.457277911367569, -0.682829817253222, -0.38689646421126, -0.667506142457625, 0.295433179273129, 1.52719967214396, 1.48716676129197, -0.960110113785672, -0.0689759560578443, -1.11474262990373, -0.682461255874912, 0.0138734277181953, 0.158064698071454, -1.54753155529058, 0.899263050180665, 0.452271286830549, -0.216771992273148, -1.0416918453845, 1.7262130585714, -0.0185008991679137, 0.277099441142009, 0.0764531359986249, -0.987450688160168))), list(y = c(-0.0628346230725001, -0.534246725420292, -1.3536084155945), x = c(0.6253830934028, -1.6199054330602, 1.01686828360155))) debug: predict(model, nd) Browse[2]> I wish the debug output would look more like R's own lm, Note the contents of "dat" are not splatted into the middle of the function call.> m1 <- lm(y ~ x, dat)debugging in: lm(y ~ x, dat) debug: { ret.x <- x ret.y <- y cl <- match.call() mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "subset", I've been reading quite a while on this question, testing lots of ideas. The quote function seems to work, but I worry about how the R runtime environment finds all the pieces if I pass them through this way. mySpecialFeature <- function(model, ci){ pargs <- list(quote(model), quote(model.frame(model)[1:3, ])) res <- do.call(myRegFit, pargs) print(res) } See, that fixes it:> mySpecialFeature (m1)debugging in: (function (model, nd) predict(model, nd))(model, model.frame(model)[1:3, ]) debug: predict(model, nd) Browse[2]> c exiting from: (function (model, nd) predict(model, nd))(model, model.frame(model)[1:3, ]) 1 2 3 -0.04755431 0.35162844 -0.11715522>Are there dangers in this I don't know about? -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu
peter dalgaard
2014-Apr-10 15:36 UTC
[R] premature evaluation of symbols. Is that the way to describe this problem?
On 10 Apr 2014, at 16:59 , Paul Johnson <pauljohn32 at gmail.com> wrote:> I've been reading quite a while on this question, testing lots of > ideas. The quote function seems to work, but I worry about how the R > runtime environment finds all the pieces if I pass them through this > way. > > mySpecialFeature <- function(model, ci){ > pargs <- list(quote(model), quote(model.frame(model)[1:3, ])) > res <- do.call(myRegFit, pargs) > print(res) > } > > See, that fixes it: > >> mySpecialFeature (m1) > debugging in: (function (model, nd) > predict(model, nd))(model, model.frame(model)[1:3, ]) > debug: predict(model, nd) > Browse[2]> c > exiting from: (function (model, nd) > predict(model, nd))(model, model.frame(model)[1:3, ]) > 1 2 3 > -0.04755431 0.35162844 -0.11715522 >> > > Are there dangers in this I don't know about? >I don't think so. A similar issue made the rounds quite recently over on R-devel ("Problem with do.call()") and the quote() trick was mentioned as a fix. The only potential issue that I can see is the fairly obvious one that the quoted arguments are evaluated later than unquoted ones, and possibly not at all due to lazy evaluation. This is of course largely the point of the maneuver, but there is also a slight risk that the object could change or disappear due to side effects when evaluating other arguments. This isn't different from the usual perils of lazy evaluation, though. -pd -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com