On 15/07/2017 11:33 AM, Anthony Damico wrote:> hi, i realized that the segfault happens on the text file in a new R > session. so, creating the segfault-generating text file requires a > contributed package, but prompting the actual segfault does not -- > pretty sure that means this is a base R bug? submitted here: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 hopefully i > am not doing something remarkably stupid. the text file itself is 4GB > so cannot upload it to bugzilla, and from the R_AllocStringBugger error > in the previous message, i think most or all of it needs to be there to > trigger the segfault. thanks!I don't want to download the big file or install the archive package. Could you run the code below on the bad file? If you're right and it's only nulls that matter, this might allow me to create a file that triggers the bug. f <- # put the filename of the bad file here con <- file(f, open="rb") zeros <- numeric() repeat { bytes <- readBin(con, "int", 1000000, size=1) zeros <- c(zeros, count + which(bytes == 0)) count <- count + length(bytes) if (length(bytes) < 1000000) break } close(con) cat("File length=", count, "\n") cat("Nulls:\n") zeros Here's some code to recreate a file of the same length with nulls in the same places, and spaces everywhere else: size <- count f2 <- tempfile() con <- file(f2, open="wb") count <- 0 while (count < size) { nonzeros <- min(c(size - count, 1000000, zeros - 1)) if (nonzeros) { writeBin(rep(32L, nonzeros), con, size = 1) count <- count + nonzeros } zeros <- zeros - nonzeros if (length(zeros) && min(zeros) == 1) { writeBin(0L, con, size = 1) count <- count + 1 zeros <- zeros[-1] - 1 } } close(con) Duncan Murdoch
thank you for taking the time to write this. i set it running last night and it's still going -- if it doesn't finish by tomorrow, i will try to find a site to host the problem file and add that link to the bug report so the archive package can be avoided at least. i'm sorry for the bother On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 15/07/2017 11:33 AM, Anthony Damico wrote: > >> hi, i realized that the segfault happens on the text file in a new R >> session. so, creating the segfault-generating text file requires a >> contributed package, but prompting the actual segfault does not -- >> pretty sure that means this is a base R bug? submitted here: >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 hopefully i >> am not doing something remarkably stupid. the text file itself is 4GB >> so cannot upload it to bugzilla, and from the R_AllocStringBugger error >> in the previous message, i think most or all of it needs to be there to >> trigger the segfault. thanks! >> > > I don't want to download the big file or install the archive package. > Could you run the code below on the bad file? If you're right and it's > only nulls that matter, this might allow me to create a file that triggers > the bug. > > f <- # put the filename of the bad file here > > con <- file(f, open="rb") > zeros <- numeric() > repeat { > bytes <- readBin(con, "int", 1000000, size=1) > zeros <- c(zeros, count + which(bytes == 0)) > count <- count + length(bytes) > if (length(bytes) < 1000000) break > } > close(con) > cat("File length=", count, "\n") > cat("Nulls:\n") > zeros > > Here's some code to recreate a file of the same length with nulls in the > same places, and spaces everywhere else: > > size <- count > f2 <- tempfile() > con <- file(f2, open="wb") > count <- 0 > while (count < size) { > nonzeros <- min(c(size - count, 1000000, zeros - 1)) > if (nonzeros) { > writeBin(rep(32L, nonzeros), con, size = 1) > count <- count + nonzeros > } > zeros <- zeros - nonzeros > if (length(zeros) && min(zeros) == 1) { > writeBin(0L, con, size = 1) > count <- count + 1 > zeros <- zeros[-1] - 1 > } > } > close(con) > > Duncan Murdoch > > > >[[alternative HTML version deleted]]
On 16/07/2017 6:17 AM, Anthony Damico wrote:> thank you for taking the time to write this. i set it running last > night and it's still going -- if it doesn't finish by tomorrow, i will > try to find a site to host the problem file and add that link to the bug > report so the archive package can be avoided at least. i'm sorry for > the bother >How big is that text file? I wouldn't expect my script to take more than a few minutes even on a huge file. My script might have a bug... Duncan Murdoch> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: > > On 15/07/2017 11:33 AM, Anthony Damico wrote: > > hi, i realized that the segfault happens on the text file in a new R > session. so, creating the segfault-generating text file requires a > contributed package, but prompting the actual segfault does not -- > pretty sure that means this is a base R bug? submitted here: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 > <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311> > hopefully i > am not doing something remarkably stupid. the text file itself > is 4GB > so cannot upload it to bugzilla, and from the > R_AllocStringBugger error > in the previous message, i think most or all of it needs to be > there to > trigger the segfault. thanks! > > > I don't want to download the big file or install the archive > package. Could you run the code below on the bad file? If you're > right and it's only nulls that matter, this might allow me to create > a file that triggers the bug. > > f <- # put the filename of the bad file here > > con <- file(f, open="rb") > zeros <- numeric() > repeat { > bytes <- readBin(con, "int", 1000000, size=1) > zeros <- c(zeros, count + which(bytes == 0)) > count <- count + length(bytes) > if (length(bytes) < 1000000) break > } > close(con) > cat("File length=", count, "\n") > cat("Nulls:\n") > zeros > > Here's some code to recreate a file of the same length with nulls in > the same places, and spaces everywhere else: > > size <- count > f2 <- tempfile() > con <- file(f2, open="wb") > count <- 0 > while (count < size) { > nonzeros <- min(c(size - count, 1000000, zeros - 1)) > if (nonzeros) { > writeBin(rep(32L, nonzeros), con, size = 1) > count <- count + nonzeros > } > zeros <- zeros - nonzeros > if (length(zeros) && min(zeros) == 1) { > writeBin(0L, con, size = 1) > count <- count + 1 > zeros <- zeros[-1] - 1 > } > } > close(con) > > Duncan Murdoch > > > >