Kevin B. Hendricks
2006-Jun-02 15:36 UTC
[Rd] Helping out - simple bugs to help familiarize with R design, source, etc
Hi All, Well I finally have found the time to get svn working and I have successfully built my own tuned atlas (multi-threaded version) libs and have both the r-devel and r-patched trees building daily on my box. The problem is I still do not have a good idea of the layout and design of R, and typically I "learn by doing" by trying to fix a bug that hits me. Unfortunately ;-) no bugs have hit me (which indicates how good the current software really is!) other than the small build issue I sent a pseudo-patch for earlier. I tried to look at your bug database but I found it hard to tell what was really relevant. So could someone who knows please point me at one or two minor or annoying bugs they want to have tracked down and fixed. Fixing those will force me to become more familiar with the source code and help me learn my way around before jumping in further. Duncan, sorry about not getting in touch with you last week, I simply have been overwhelmed with trying to clean up and pack up my office, return computer equipment and the like. I will try again next week once I am back in the office. Thanks, Kevin
Kevin B. Hendricks
2006-Jun-02 18:03 UTC
[Rd] Helping out - simple bugs to help familiarize with R design, source, etc
Hi Thomas,> A key fact here is that NAMED(object) is 0 if the object is not > (part of) an R variable, 1 if it is (part of) exactly one R > variable, 2 if it is part of more than one R variable.> The point is that NAMED=0 or 1 objects can be safely modified, but > NAMED=2 have to be copied first (with the duplicate() function).I am not sure if this question makes much sense since I am unsure of what you mean by "object" here (is that a object oriented programming idea of "object" or a more general term for some structure) but ... Can an "object" be an extended "structure" comprised of other "objects" and if so do each of the substructure objects have their own NAMED() attribute? If so, then if any of those sub objects actually is part of another variable, does that force the NAMED() attribute of the parent (or enclosing) object to have a value of 2 (forcing duplication before modification) for the entire thing? If so, perhaps the NAMED() attribute of the enclosing object is simply not being properly updated to be the maximum of all of the NAMED() attributes of the sub objects? Isn't this what the Note: in the bug report was talking about? This might be fun to look at. But don't hold your breath for any quick fixes! I still have a very steep learning curve to climb before I can be of much use for anything other than build time issues. Thanks, Kevin
Kevin B. Hendricks
2006-Jun-02 21:41 UTC
[Rd] Helping out - simple bugs to help familiarize with R design, source, etc
Hi, Please forgive me but I am just learning how to debug in this environment. Info on Bug 7924, as far as I understand the comment, if someone knew how calling identical(call1,call2) actually helped set NAMED it might lead them to a solution. Here is my attempt to track down when how this happens. Okay from examining compute_identical in main/identical.c it is clear that it does not change any of the NAMED values (this is also shown by single stepping through the entire routine. Therefore the "fix" caused by invoking "identical" must have happened either on the way there or on the way back. Here is the backtrace from the call to identical for the problem case (gdb) bt #0 compute_identical (x=0x9a0130, y=0x99d0f0) at ../../../r-devel/r- devel/R/src/main/identical.c:53 #1 0x00002aaaaab7fede in do_identical (x=0x9a0130, y=0x99d0f0) at ../../../r-devel/r-devel/R/src/main/identical.c:38 #2 0x00002aaaaab95650 in do_internal (call=Variable "call" is not available. ) at ../../../r-devel/r-devel/R/src/main/names.c:1093 #3 0x00002aaaaab6785b in Rf_eval (e=0x92ea98, rho=0x999750) at ../../../r-devel/r-devel/R/src/main/eval.c:424 #4 0x00002aaaaab697c6 in Rf_applyClosure (call=0x99d2b0, op=0x92d678, arglist=0x999638, rho=0x547858, suppliedenv=0x547890) at ../../../r-devel/r-devel/R/src/main/eval.c:614 #5 0x00002aaaaab676f8 in Rf_eval (e=0x99d2b0, rho=0x547858) at ../../../r-devel/r-devel/R/src/main/eval.c:455 #6 0x00002aaaaab8639e in Rf_ReplIteration (rho=0x547858, savestack=0, browselevel=0, state=0x7fffffd59a70) at ../../../r-devel/ r-devel/R/src/main/main.c:254 #7 0x00002aaaaab86540 in R_ReplConsole (rho=0x547858, savestack=0, browselevel=0) at ../../../r-devel/r-devel/R/src/main/main.c:302 #8 0x00002aaaaab86860 in run_Rmainloop () at ../../../r-devel/r- devel/R/src/main/main.c:915 #9 0x000000000040081d in main (ac=Variable "ac" is not available. ) at ../../../r-devel/r-devel/R/src/main/Rmain.c:33 Now grepping through all of the files in main for the macro SET_NAMED many of them appear in eval.c Specifically, some appear in R_eval which is in the backtrace to compute_identical. So I set a breakpoint for Rf_eval and for every line in that routine that invokes SET_NAMED. Here is a result of the gdb trace (see my comments sprinkled throughout) as the problem case is entered. Please forgive its length > call1<- Quote(f(arg[[1]], arg[[1]], arg[[1]])) Breakpoint 3, Rf_eval (e=0x9a0328, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 307 { (gdb) c Continuing. Breakpoint 3, Rf_eval (e=0x9a0280, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 307 { (gdb) c Continuing. Breakpoint 3, Rf_eval (e=0x775ff8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 307 { (gdb) c Continuing. So we see none of the SET_NAMED breakpoints (in Rf_eval) being set when call1 is first defined which makes sense (at least with my limited understanding it does). Now here we define call2 > call2 <- Quote(f(arg[[1]]))[c(1,2,2,2)] Breakpoint 3, Rf_eval (e=0x99ddb8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x99dd10, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x9a06a8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x775ff8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x99dbc0, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0xf7f118, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Below in the definition of call2 we see SET_NAMED being set to 2 which I think if I understand things makes sense. Breakpoint 4, Rf_eval (e=0xf7f118, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:362 362 if (NAMED(tmp) != 2) SET_NAMED(tmp, 2); (gdb) c Continuing. 358 /* Make sure constants in expressions are NAMED before being 359 used as values. Setting NAMED to 2 makes sure weird calls 360 to assignment functions won't modify constants in 361 expressions. */ 362 if (NAMED(tmp) != 2) SET_NAMED(tmp, 2); Breakpoint 3, Rf_eval (e=0xf7f148, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 And again Breakpoint 4, Rf_eval (e=0xf7f148, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:362 362 if (NAMED(tmp) != 2) SET_NAMED(tmp, 2); Breakpoint 3, Rf_eval (e=0xf7f1d8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 And Again Breakpoint 4, Rf_eval (e=0xf7f1d8, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:362 362 if (NAMED(tmp) != 2) SET_NAMED(tmp, 2); (gdb) c Continuing. Breakpoint 3, Rf_eval (e=0xf7f208, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 And again but always via line 362 Breakpoint 4, Rf_eval (e=0xf7f208, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:362 362 if (NAMED(tmp) != 2) SET_NAMED(tmp, 2); (gdb) c Continuing. Now we run the identical(call1,call2) statement > identical(call1,call2) Breakpoint 3, Rf_eval (e=0x99d2b0, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x7509b0, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x92ea98, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x56ca30, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x99d320, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x9ecb28, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 And finally we see SET_NAMED being set in a different place in the code then in the original definition. So for some reason when call2 is first defined for PROMSXP objects, the SET_NAMED field is not properly set? Breakpoint 5, Rf_eval (e=0x56ca30, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:389 389 SET_NAMED(tmp, 2); (gdb) list 384 else error(_("argument is missing, with no default")); 385 } 386 else if (TYPEOF(tmp) == PROMSXP) { 387 PROTECT(tmp); 388 tmp = eval(tmp, rho); 389 SET_NAMED(tmp, 2); 390 UNPROTECT(1); 391 } 392 else if (!isNull(tmp) && NAMED(tmp) < 1) 393 SET_NAMED(tmp, 1); (gdb) c Continuing. Breakpoint 3, Rf_eval (e=0x92d5d0, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x999670, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 Breakpoint 3, Rf_eval (e=0x9eb938, rho=0x547858) at ../../../r-devel/ r-devel/R/src/main/eval.c:307 It happens here again Breakpoint 5, Rf_eval (e=0x92d5d0, rho=0x999750) at ../../../r-devel/ r-devel/R/src/main/eval.c:389 389 SET_NAMED(tmp, 2); (gdb) list 384 else error(_("argument is missing, with no default")); 385 } 386 else if (TYPEOF(tmp) == PROMSXP) { 387 PROTECT(tmp); 388 tmp = eval(tmp, rho); 389 SET_NAMED(tmp, 2); 390 UNPROTECT(1); 391 } 392 else if (!isNull(tmp) && NAMED(tmp) < 1) 393 SET_NAMED(tmp, 1); (gdb) c Continuing. And then finally compute_identical is reached. Breakpoint 1, compute_identical (x=0x9a0130, y=0x99d0f0) at ../../../ r-devel/r-devel/R/src/main/identical.c:53 53 { $150 = {sxpinfo = {type = 6, obj = 0, named = 2, gp = 0, mark = 0, debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib = 0x508818, gengc_next_node = 0x9a0168, gengc_prev_node = 0x9a00f8, u = {primsxp = {offset = 10663048}, symsxp = {pname = 0xa2b488, value = 0x9a15b0, internal = 0x508818}, listsxp = {carval = 0xa2b488, cdrval = 0x9a15b0, tagval = 0x508818}, envsxp = {frame = 0xa2b488, enclos = 0x9a15b0, hashtab = 0x508818}, closxp = {formals = 0xa2b488, body = 0x9a15b0, env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x9a15b0, env = 0x508818}}} $151 = {sxpinfo = {type = 6, obj = 0, named = 2, gp = 0, mark = 0, debug = 0, trace = 0, fin = 0, gcgen = 0, gccls = 0}, attrib = 0x508818, gengc_next_node = 0x99d128, gengc_prev_node = 0x99d0b8, u = {primsxp = {offset = 10663048}, symsxp = {pname = 0xa2b488, value = 0x99d0b8, internal = 0x508818}, listsxp = {carval = 0xa2b488, cdrval = 0x99d0b8, tagval = 0x508818}, envsxp = {frame = 0xa2b488, enclos = 0x99d0b8, hashtab = 0x508818}, closxp = {formals = 0xa2b488, body = 0x99d0b8, env = 0x508818}, promsxp = {value = 0xa2b488, expr = 0x99d0b8, env = 0x508818}}} I then continued and showed that none of the breakpoints for SET_NAMED (at least in Rf_eval) were invoked on the way out. So one possible hypothesis is that Rf_eval as called when doing indentical(call1,call2) actually fixes up NAMED values on the way to that routine compute_identical and that is why it helps. I still don't know enough about how R works but reading up in include/ Rinternals.h makes me think of C++ code written in C (like Xorg's X11 code). I really like the code base and its is clean and well documented and organized. Congrats to the developers on that by the way! Hope this helps. If not, I will begin to track down what is going on during the define of call2 that may not be possibly setting NAMED values properly for PROMSXP sub objects? Kevin