Martin Maechler
2016-Jun-01 13:07 UTC
[Rd] segfault / crash when asking for large memory via strrep()
We've had this more general topic on R-help, and also in R-devel recently. There's one case here where I get the feeling R never gets into swapping but more directly aborts possibly from a bug we can more easily fix. Today I've been working (successfully! - not yet committed) at fixing str() for very large strings. In this process, I've found that pc <- function(.) paste(., collapse=".1.2.3.4.5.") p <- function(.) strrep(pc(.), 64L) p(p(p(p(LETTERS)))) produces a (memory related) segmentation fault (aka "crash") very reproducibly and relatively quickly both on my Linux (Fedora 22) desktop and on our Windows server. *** caught segfault *** address 0x7fc52dc89000, cause 'memory not mapped' Traceback: 1: strrep(pc(.), 64L) 2: p(p(p(p(LETTERS)))) 3: system.time(L2 <- p(p(p(p(LETTERS))))) In the debugger, the symptoms point to the possibility of a bug just in the C parts of strrep() : Program received signal SIGSEGV, Segmentation fault. 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 (gdb) bt #0 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 #1 0x0000000000457def in do_strrep (call=<optimized out>, op=<optimized out>, args=<optimized out>, env=<optimized out>) at ../../../R/src/main/character.c:1658 #2 0x00000000004d6844 in bcEval (body=body at entry=0xd66840, rho=rho at entry=0x45253b8, useCache=useCache at entry=TRUE) at ../../../R/src/main/eval.c:5648 #3 0x00000000004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at ../../../R/src/main/eval.c:616 #4 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x45250a8, op=op at entry=0xd668e8, arglist=0x45251f8, rho=rho at entry=0x4525000, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #5 0x00000000004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at ../../../R/src/main/eval.c:732 #6 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x4525718, op=op at entry=0x4524d28, arglist=0x4524f90, rho=rho at entry=0xa8ea30, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #7 0x00000000004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at ../../../R/src/main/eval.c:732 #8 0x00000000004e0cde in do_set (call=0x4525670, op=0xa61358, args=<optimized out>, rho=0xa8ea30) at ../../../R/src/main/eval.c:2196
luke-tierney at uiowa.edu
2016-Jun-01 14:31 UTC
[Rd] segfault / crash when asking for large memory via strrep()
That would be because the product nc * ni overflows in cbuf = buf = CallocCharBuf(nc * ni); Since we disallow strings with more than 2^31-1 bytes we could test and reject this. It might be more future-proof to change the declaration of int j, ni, nc; to R_xlen_t j, ni, nc; and let the character allocation code reject, but that would create a memory leak since the Free call isn't reached. This is a problem in any case though, as SET_STRING_ELT(s, is, markKnown(cbuf, STRING_ELT(x, ix))); could throw errors for a number of reasons and then the Free() is not reached. It would be better to use R_alloc or register a cleanup function to call Free on a jump. Best, luke On Wed, 1 Jun 2016, Martin Maechler wrote:> We've had this more general topic on R-help, and also in R-devel recently. > There's one case here where I get the feeling R never gets into > swapping but more directly aborts possibly from a bug we can > more easily fix. > > Today I've been working (successfully! - not yet committed) at > fixing str() for very large strings. > > In this process, I've found that > > pc <- function(.) paste(., collapse=".1.2.3.4.5.") > p <- function(.) strrep(pc(.), 64L) > p(p(p(p(LETTERS)))) > > produces a (memory related) segmentation fault (aka "crash") > very reproducibly and relatively quickly > both on my Linux (Fedora 22) desktop and on our Windows server. > > *** caught segfault *** > address 0x7fc52dc89000, cause 'memory not mapped' > > Traceback: > 1: strrep(pc(.), 64L) > 2: p(p(p(p(LETTERS)))) > 3: system.time(L2 <- p(p(p(p(LETTERS))))) > > In the debugger, the symptoms point to the possibility of a > bug just in the C parts of strrep() : > > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 > Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 > (gdb) bt > #0 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 > #1 0x0000000000457def in do_strrep (call=<optimized out>, op=<optimized out>, args=<optimized out>, > env=<optimized out>) at ../../../R/src/main/character.c:1658 > #2 0x00000000004d6844 in bcEval (body=body at entry=0xd66840, rho=rho at entry=0x45253b8, > useCache=useCache at entry=TRUE) at ../../../R/src/main/eval.c:5648 > #3 0x00000000004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at ../../../R/src/main/eval.c:616 > #4 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x45250a8, op=op at entry=0xd668e8, > arglist=0x45251f8, rho=rho at entry=0x4525000, suppliedvars=0xa57188) > at ../../../R/src/main/eval.c:1134 > #5 0x00000000004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at ../../../R/src/main/eval.c:732 > #6 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x4525718, op=op at entry=0x4524d28, > arglist=0x4524f90, rho=rho at entry=0xa8ea30, suppliedvars=0xa57188) > at ../../../R/src/main/eval.c:1134 > #7 0x00000000004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at ../../../R/src/main/eval.c:732 > #8 0x00000000004e0cde in do_set (call=0x4525670, op=0xa61358, args=<optimized out>, rho=0xa8ea30) > at ../../../R/src/main/eval.c:2196 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
luke-tierney at uiowa.edu
2016-Jun-01 18:38 UTC
[Rd] segfault / crash when asking for large memory via strrep()
I've added a size/overflow check before the buffer allocation in R-devel and R-patched. It would be a good idea sometime to review the use of calloc ... free patterns to make sure the ... can't raise an error or otherwise jump and leave the memory pointer dangling. Best, luke On Wed, 1 Jun 2016, luke-tierney at uiowa.edu wrote:> That would be because the product nc * ni overflows in > > cbuf = buf = CallocCharBuf(nc * ni); > > Since we disallow strings with more than 2^31-1 bytes we could test > and reject this. It might be more future-proof to change the > declaration of > > int j, ni, nc; > > to > > R_xlen_t j, ni, nc; > > and let the character allocation code reject, but that would create a > memory leak since the Free call isn't reached. This is a problem in > any case though, as > > SET_STRING_ELT(s, is, markKnown(cbuf, STRING_ELT(x, ix))); > > could throw errors for a number of reasons and then the Free() is not > reached. It would be better to use R_alloc or register a cleanup > function to call Free on a jump. > > Best, > > luke > > On Wed, 1 Jun 2016, Martin Maechler wrote: > >> We've had this more general topic on R-help, and also in R-devel >> recently. >> There's one case here where I get the feeling R never gets into >> swapping but more directly aborts possibly from a bug we can >> more easily fix. >> >> Today I've been working (successfully! - not yet committed) at >> fixing str() for very large strings. >> >> In this process, I've found that >> >> pc <- function(.) paste(., collapse=".1.2.3.4.5.") >> p <- function(.) strrep(pc(.), 64L) >> p(p(p(p(LETTERS)))) >> >> produces a (memory related) segmentation fault (aka "crash") >> very reproducibly and relatively quickly >> both on my Linux (Fedora 22) desktop and on our Windows server. >> >> *** caught segfault *** >> address 0x7fc52dc89000, cause 'memory not mapped' >> >> Traceback: >> 1: strrep(pc(.), 64L) >> 2: p(p(p(p(LETTERS)))) >> 3: system.time(L2 <- p(p(p(p(LETTERS))))) >> >> In the debugger, the symptoms point to the possibility of a >> bug just in the C parts of strrep() : >> >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from >> /usr/lib64/libc.so.6 >> Missing separate debuginfos, use: dnf debuginfo-install >> bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 >> libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 >> libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 >> libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 >> pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 >> xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 >> (gdb) bt >> #0 0x00007ffff54d6223 in __strcpy_sse2_unaligned () from >> /usr/lib64/libc.so.6 >> #1 0x0000000000457def in do_strrep (call=<optimized out>, op=<optimized >> out>, args=<optimized out>, >> env=<optimized out>) at ../../../R/src/main/character.c:1658 >> #2 0x00000000004d6844 in bcEval (body=body at entry=0xd66840, >> rho=rho at entry=0x45253b8, >> useCache=useCache at entry=TRUE) at ../../../R/src/main/eval.c:5648 >> #3 0x00000000004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at >> ../../../R/src/main/eval.c:616 >> #4 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x45250a8, >> op=op at entry=0xd668e8, >> arglist=0x45251f8, rho=rho at entry=0x4525000, suppliedvars=0xa57188) >> at ../../../R/src/main/eval.c:1134 >> #5 0x00000000004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at >> ../../../R/src/main/eval.c:732 >> #6 0x00000000004dedaf in Rf_applyClosure (call=call at entry=0x4525718, >> op=op at entry=0x4524d28, >> arglist=0x4524f90, rho=rho at entry=0xa8ea30, suppliedvars=0xa57188) >> at ../../../R/src/main/eval.c:1134 >> #7 0x00000000004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at >> ../../../R/src/main/eval.c:732 >> #8 0x00000000004e0cde in do_set (call=0x4525670, op=0xa61358, >> args=<optimized out>, rho=0xa8ea30) >> at ../../../R/src/main/eval.c:2196 >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu