hadley wickham
2004-Jul-16 20:50 UTC
[R] Strange (non-deterministic) problem with strsplit
I'm having an odd problem with strsplit (well I think it's strplit that's causing the problem). When I run the code below as follows: str(parseFormulaMin(y +x +d ~ b +d +e| a * b)) I expect to get List of 3 $ y: chr "y+x+d" $ x: chr "b+d+e" $ g: chr "a*b" But about half the time I get List of 3 $ y: chr "y+x+d" $ x: chr "b+d+e" $ g: chr "a*[square box]" (square box not reproduced here because copy and pasting it seems to break my web mail) Can anyone reproduce the problem and/or suggest any solutions? parseFormula <- function(formula) { splitvars <- function(x) { strsplit(x, "\\+|\\*")[[1]] } stripwhitespace <- function(x) { gsub("\\s", "", x, perl=T) } vars <- stripwhitespace(as.character(formula)[3]) varsplit <- strsplit(vars, "|", fixed=TRUE)[[1]] parts <- list( y = stripwhitespace(as.character(formula)[2]), x = varsplit[1], g = varsplit[2] ) lapply(parts, splitvars) } Thanks, Hadley
Henrik Bengtsson
2004-Jul-17 01:59 UTC
[Rd] RE: [R] Strange (non-deterministic) problem with strsplit
[Moving this thread to R-devel instead] I suspect your "random" results are due to a bug in gsub(). On my R v1.9.0 (Rterm and Rgui) R crashes when I do % R --vanilla> gsub(" ", "", "abb + c | a*b", perl=TRUE)Trying> gsub(" ", "", "b c + d | a * b", perl=TRUE)and I'll get NULL. With> gsub("\\s", "", "bc + d | a * b", perl=TRUE)it works as expected. So there is something buggy for sure. This might have been fixed in R v1.9.1 or its patched version. (I'm still busy to recover from a HDD crash, but, yes, I will update to Rv1.9.1. BTW, what's the name of the error logger for Windows that is once in a while recommended on this list and that gives more detailed errors than the default Windows one?) Cheers Henrik> -----Original Message----- > From: r-help-bounces@stat.math.ethz.ch > [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of hadley wickham > Sent: Friday, July 16, 2004 10:50 PM > To: r-help@stat.math.ethz.ch > Subject: [R] Strange (non-deterministic) problem with strsplit > > > I'm having an odd problem with strsplit (well I think it's > strplit that's causing the problem). When I run the code > below as follows: str(parseFormulaMin(y +x +d ~ b +d +e| a * b)) > > I expect to get > List of 3 > $ y: chr "y+x+d" > $ x: chr "b+d+e" > $ g: chr "a*b" > > But about half the time I get > > List of 3 > $ y: chr "y+x+d" > $ x: chr "b+d+e" > $ g: chr "a*[square box]" > (square box not reproduced here because copy and pasting it > seems to break my web mail) > > Can anyone reproduce the problem and/or suggest any solutions? > > parseFormula <- function(formula) { > splitvars <- function(x) { > strsplit(x, "\\+|\\*")[[1]] > } > stripwhitespace <- function(x) { > gsub("\\s", "", x, perl=T) > } > > vars <- stripwhitespace(as.character(formula)[3]) > varsplit <- strsplit(vars, "|", fixed=TRUE)[[1]] > > parts <- list( > y = stripwhitespace(as.character(formula)[2]), > x = varsplit[1], > g = varsplit[2] > ) > lapply(parts, splitvars) > } > > Thanks, > > Hadley > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Duncan Murdoch
2004-Jul-17 13:52 UTC
[Rd] RE: [R] Strange (non-deterministic) problem with strsplit
On Sat, 17 Jul 2004 01:59:17 +0200, "Henrik Bengtsson" <hb@maths.lth.se> wrote:> BTW, >what's the name of the error logger for Windows that is once in a while >recommended on this list and that gives more detailed errors than the >default Windows one?)I think you mean DrMinGW, available from the mingw-utils bundle on www.mingw.org. I've put together a page of debugging tips (with a Windows emphasis) at http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR It's still fairly new, so there are likely important details missing; please pass them on to me, or tell me which advice doesn't work for you. Duncan Murdoch
Prof Brian Ripley
2004-Jul-26 19:23 UTC
[Rd] RE: [R] Strange (non-deterministic) problem with strsplit
Yes, that is a bug that I found a while back and now we have a replacement for the CVS archive is in 1.9.1 patched. However, it is in gsub(perl=TRUE) only, and Hadley was not using anything like that. On Sat, 17 Jul 2004, Henrik Bengtsson wrote:> [Moving this thread to R-devel instead] > > I suspect your "random" results are due to a bug in gsub(). On my R v1.9.0 > (Rterm and Rgui) R crashes when I do > > % R --vanilla > > gsub(" ", "", "abb + c | a*b", perl=TRUE) > > Trying > > > gsub(" ", "", "b c + d | a * b", perl=TRUE) > > and I'll get NULL. With > > > gsub("\\s", "", "bc + d | a * b", perl=TRUE) > > it works as expected. So there is something buggy for sure. > > This might have been fixed in R v1.9.1 or its patched version. (I'm still > busy to recover from a HDD crash, but, yes, I will update to Rv1.9.1. BTW, > what's the name of the error logger for Windows that is once in a while > recommended on this list and that gives more detailed errors than the > default Windows one?)Dr. Mingw. See the rw-FAQ Q7.4. (FAQs are always worth consulting.)> Cheers > > Henrik > > > -----Original Message----- > > From: r-help-bounces@stat.math.ethz.ch > > [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of hadley wickham > > Sent: Friday, July 16, 2004 10:50 PM > > To: r-help@stat.math.ethz.ch > > Subject: [R] Strange (non-deterministic) problem with strsplit > > > > > > I'm having an odd problem with strsplit (well I think it's > > strplit that's causing the problem). When I run the code > > below as follows: str(parseFormulaMin(y +x +d ~ b +d +e| a * b)) > > > > I expect to get > > List of 3 > > $ y: chr "y+x+d" > > $ x: chr "b+d+e" > > $ g: chr "a*b" > > > > But about half the time I get > > > > List of 3 > > $ y: chr "y+x+d" > > $ x: chr "b+d+e" > > $ g: chr "a*[square box]" > > (square box not reproduced here because copy and pasting it > > seems to break my web mail) > > > > Can anyone reproduce the problem and/or suggest any solutions? > > > > parseFormula <- function(formula) { > > splitvars <- function(x) { > > strsplit(x, "\\+|\\*")[[1]] > > } > > stripwhitespace <- function(x) { > > gsub("\\s", "", x, perl=T) > > } > > > > vars <- stripwhitespace(as.character(formula)[3]) > > varsplit <- strsplit(vars, "|", fixed=TRUE)[[1]] > > > > parts <- list( > > y = stripwhitespace(as.character(formula)[2]), > > x = varsplit[1], > > g = varsplit[2] > > ) > > lapply(parts, splitvars) > > } > > > > Thanks, > > > > Hadley > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Reasonably Related Threads
- Strange (non-deterministic) problem with strsplit
- Building package compatible w/ R v1.9.1 and R v2.0.0?
- Problems with rJava and tm packages
- How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
- package "tm" fails to remove "the" with remove stopwords