On 16/07/2017 6:17 AM, Anthony Damico wrote:> thank you for taking the time to write this. i set it running last > night and it's still going -- if it doesn't finish by tomorrow, i will > try to find a site to host the problem file and add that link to the bug > report so the archive package can be avoided at least. i'm sorry for > the bother >How big is that text file? I wouldn't expect my script to take more than a few minutes even on a huge file. My script might have a bug... Duncan Murdoch> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: > > On 15/07/2017 11:33 AM, Anthony Damico wrote: > > hi, i realized that the segfault happens on the text file in a new R > session. so, creating the segfault-generating text file requires a > contributed package, but prompting the actual segfault does not -- > pretty sure that means this is a base R bug? submitted here: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 > <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311> > hopefully i > am not doing something remarkably stupid. the text file itself > is 4GB > so cannot upload it to bugzilla, and from the > R_AllocStringBugger error > in the previous message, i think most or all of it needs to be > there to > trigger the segfault. thanks! > > > I don't want to download the big file or install the archive > package. Could you run the code below on the bad file? If you're > right and it's only nulls that matter, this might allow me to create > a file that triggers the bug. > > f <- # put the filename of the bad file here > > con <- file(f, open="rb") > zeros <- numeric() > repeat { > bytes <- readBin(con, "int", 1000000, size=1) > zeros <- c(zeros, count + which(bytes == 0)) > count <- count + length(bytes) > if (length(bytes) < 1000000) break > } > close(con) > cat("File length=", count, "\n") > cat("Nulls:\n") > zeros > > Here's some code to recreate a file of the same length with nulls in > the same places, and spaces everywhere else: > > size <- count > f2 <- tempfile() > con <- file(f2, open="wb") > count <- 0 > while (count < size) { > nonzeros <- min(c(size - count, 1000000, zeros - 1)) > if (nonzeros) { > writeBin(rep(32L, nonzeros), con, size = 1) > count <- count + nonzeros > } > zeros <- zeros - nonzeros > if (length(zeros) && min(zeros) == 1) { > writeBin(0L, con, size = 1) > count <- count + 1 > zeros <- zeros[-1] - 1 > } > } > close(con) > > Duncan Murdoch > > > >
hi, the text file that prompts the segfault is 4gb but only 80,937 lines> file.info( "S:/temp/crash.txt")size isdir mode mtime ctime atime exe S:/temp/crash.txt 4078192743 FALSE 666 2017-07-15 17:24:35 2017-07-15 17:19:47 2017-07-15 17:19:47 no On Sun, Jul 16, 2017 at 6:34 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 16/07/2017 6:17 AM, Anthony Damico wrote: > >> thank you for taking the time to write this. i set it running last >> night and it's still going -- if it doesn't finish by tomorrow, i will >> try to find a site to host the problem file and add that link to the bug >> report so the archive package can be avoided at least. i'm sorry for >> the bother >> >> > How big is that text file? I wouldn't expect my script to take more than > a few minutes even on a huge file. > > My script might have a bug... > > Duncan Murdoch > > On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch >> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: >> >> On 15/07/2017 11:33 AM, Anthony Damico wrote: >> >> hi, i realized that the segfault happens on the text file in a >> new R >> session. so, creating the segfault-generating text file requires >> a >> contributed package, but prompting the actual segfault does not -- >> pretty sure that means this is a base R bug? submitted here: >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 >> <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311> >> hopefully i >> am not doing something remarkably stupid. the text file itself >> is 4GB >> so cannot upload it to bugzilla, and from the >> R_AllocStringBugger error >> in the previous message, i think most or all of it needs to be >> there to >> trigger the segfault. thanks! >> >> >> I don't want to download the big file or install the archive >> package. Could you run the code below on the bad file? If you're >> right and it's only nulls that matter, this might allow me to create >> a file that triggers the bug. >> >> f <- # put the filename of the bad file here >> >> con <- file(f, open="rb") >> zeros <- numeric() >> repeat { >> bytes <- readBin(con, "int", 1000000, size=1) >> zeros <- c(zeros, count + which(bytes == 0)) >> count <- count + length(bytes) >> if (length(bytes) < 1000000) break >> } >> close(con) >> cat("File length=", count, "\n") >> cat("Nulls:\n") >> zeros >> >> Here's some code to recreate a file of the same length with nulls in >> the same places, and spaces everywhere else: >> >> size <- count >> f2 <- tempfile() >> con <- file(f2, open="wb") >> count <- 0 >> while (count < size) { >> nonzeros <- min(c(size - count, 1000000, zeros - 1)) >> if (nonzeros) { >> writeBin(rep(32L, nonzeros), con, size = 1) >> count <- count + nonzeros >> } >> zeros <- zeros - nonzeros >> if (length(zeros) && min(zeros) == 1) { >> writeBin(0L, con, size = 1) >> count <- count + 1 >> zeros <- zeros[-1] - 1 >> } >> } >> close(con) >> >> Duncan Murdoch >> >> >> >> >> >[[alternative HTML version deleted]]
>>>>> Anthony Damico <ajdamico at gmail.com> >>>>> on Sun, 16 Jul 2017 06:40:38 -0400 writes:> hi, the text file that prompts the segfault is 4gb but only 80,937 lines >> file.info( "S:/temp/crash.txt") > size isdir mode mtime > ctime atime exe > S:/temp/crash.txt 4078192743 FALSE 666 2017-07-15 17:24:35 2017-07-15 > 17:19:47 2017-07-15 17:19:47 no > On Sun, Jul 16, 2017 at 6:34 AM, Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: >> On 16/07/2017 6:17 AM, Anthony Damico wrote: >> >>> thank you for taking the time to write this. i set it running last >>> night and it's still going -- if it doesn't finish by tomorrow, i will >>> try to find a site to host the problem file and add that link to the bug >>> report so the archive package can be avoided at least. i'm sorry for >>> the bother >>> >>> >> How big is that text file? I wouldn't expect my script to take more than >> a few minutes even on a huge file. >> >> My script might have a bug... >> >> Duncan Murdoch >> >> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch >>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: >>> >>> On 15/07/2017 11:33 AM, Anthony Damico wrote: >>> >>> hi, i realized that the segfault happens on the text file in a >>> new R >>> session. so, creating the segfault-generating text file requires >>> a >>> contributed package, but prompting the actual segfault does not -- >>> pretty sure that means this is a base R bug? submitted here: >>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 >>> <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311> >>> hopefully i am not doing something remarkably stupid. the text file itself >>> is 4GB >>> so cannot upload it to bugzilla, and from the >>> R_AllocStringBugger error >>> in the previous message, i think most or all of it needs to be >>> there to >>> trigger the segfault. thanks! In the mean time, communication has continued a bit at the bugzilla bug tracker (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 ), and as you can read there, the bug is fixed now, also thanks to an initial patch proposal by Hannes M?hleisen. Martin Maechler ETH Zurich (and R Core)