William Dunlap
2009-Nov-03 22:28 UTC
[Rd] memory misuse in subscript code when rep() is called in odd way
The following odd call to rep() gives somewhat random results:> rep(1:4, 1:8, each=2)[1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA> rep(1:4, 1:8, each=2)Error: only 0's may be mixed with negative subscripts> rep(1:4, 1:8, each=2)[1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA> rep(1:4, 1:8, each=2)[1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA> rep(1:4, 1:8, each=2)[1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA> rep(1:4, 1:8, each=2)[1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 NA NA 2 NA NA 2> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status Under development (unstable) major 2 minor 11.0 year 2009 month 10 day 20 svn rev 50178 language R version.string R version 2.11.0 Under development (unstable) (2009-10-20 r50178) valgrind says that the C code is using uninitialized data:> rep(1:4, 1:8, each=2)==26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C557D: integerSubscript (subscript.c:408) ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) ==26459== by 0x80C5FFD: Rf_makeSubscript (subscript.c:613) ==26459== by 0x80C7368: do_subset_dflt (subset.c:158) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C5567: integerSubscript (subscript.c:409) ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) ==26459== by 0x80C5FFD: Rf_makeSubscript (subscript.c:613) ==26459== by 0x80C7368: do_subset_dflt (subset.c:158) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C556E: integerSubscript (subscript.c:411) ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) ==26459== by 0x80C5FFD: Rf_makeSubscript (subscript.c:613) ==26459== by 0x80C7368: do_subset_dflt (subset.c:158) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C558F: integerSubscript (subscript.c:415) ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) ==26459== by 0x80C5FFD: Rf_makeSubscript (subscript.c:613) ==26459== by 0x80C7368: do_subset_dflt (subset.c:158) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C55C1: integerSubscript (subscript.c:387) ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) ==26459== by 0x80C5FFD: Rf_makeSubscript (subscript.c:613) ==26459== by 0x80C7368: do_subset_dflt (subset.c:158) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C60BB: ExtractSubset (subset.c:64) ==26459== by 0x80C73B9: do_subset_dflt (subset.c:171) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C61F6: ExtractSubset (subset.c:74) ==26459== by 0x80C73B9: do_subset_dflt (subset.c:171) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) ==26459===26459== Conditional jump or move depends on uninitialised value(s) ==26459== at 0x80C61FF: ExtractSubset (subset.c:74) ==26459== by 0x80C73B9: do_subset_dflt (subset.c:171) ==26459== by 0x80B4283: do_rep (Rinlinedfuns.h:161) ==26459== by 0x816491B: Rf_eval (eval.c:464) ==26459== by 0x805A726: Rf_ReplIteration (main.c:262) ==26459== by 0x805A95E: R_ReplConsole (main.c:311) ==26459== by 0x805AFBC: run_Rmainloop (main.c:964) ==26459== by 0x8058E2B: main (Rmain.c:33) [1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 [26] 4 4 4 4 4 4 4 4 4 4 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA> rle(.Last.value)Run Length Encoding lengths: int [1:40] 3 7 11 15 1 1 1 1 1 1 ... values : int [1:40] 1 2 3 4 NA NA NA NA NA NA ... S+ returns the non-NA part of this output: S+> rep(1:4, 1:8, each=2) [1] 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 S+> rle( rep(1:4, 1:8, each=2)) $lengths: [1] 3 7 11 15 $values: [1] 1 2 3 4 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
Seth Falcon
2009-Nov-04 05:40 UTC
[Rd] memory misuse in subscript code when rep() is called in odd way
Hi, On 11/3/09 2:28 PM, William Dunlap wrote:> The following odd call to rep() > gives somewhat random results: > >> rep(1:4, 1:8, each=2)I've committed a fix for this to R-devel. I admit that I had to reread the rep man page as I first thought this was not a valid call to rep since times (1:8) is longer than x (1:4), but closer reading of the man page says: > If times is a vector of the same length as x (after replication > by each), the result consists of x[1] repeated times[1] times, > x[2] repeated times[2] times and so on. So the expected result is the same as rep(rep(1:4, each=2), 1:8).> valgrind says that the C code is using uninitialized data: >> rep(1:4, 1:8, each=2) > ==26459== Conditional jump or move depends on uninitialised value(s) > ==26459== at 0x80C557D: integerSubscript (subscript.c:408) > ==26459== by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658)A little investigation seems to suggest that the problem is originating earlier. Debugging in seq.c:do_rep I see the following: > rep(1:4, 1:8, each=2) Breakpoint 1, do_rep (call=0x102de0068, op=<value temporarily unavailable, due to optimizations>, args=<value temporarily unavailable, due to optimizations>, rho=0x1018829f0) at /Users/seth/src/R-devel-all/src/main/seq.c:434 434 ans = do_subset_dflt(R_NilValue, R_NilValue, list2(x, ind), rho); (gdb) p Rf_PrintValue(ind) [1] 1 1 1 2 2 2 [7] 2 2 2 2 3 3 [13] 3 3 3 3 3 3 [19] 3 3 3 4 4 4 [25] 4 4 4 4 4 4 [31] 4 4 4 4 4 4 [37] 44129344 1 44129560 1 44129776 1 [43] 44129992 1 44099592 1 44099808 1 [49] 44100024 1 44100456 1 2724144 3801089 [55] -536870733 0 54857992 1 22275728 1 [61] 2724144 1 34 1 44100744 1 [67] 44100960 1 44101176 1 43652616 1 $2 = void (gdb) c Continuing. Error: only 0's may be mixed with negative subscripts The patch I applied adjusts how the index vector length is computed when times has length more than one. + seth