Hi, I'm attempting to "glm" a formula - something that's not caused problems in the past. I've used formulas of the form formula( "dependant-variable~independant-variables" ) where the independant variable string is of the form: "indvar1+indvar2+...+indvarN" Now, however, our independant variable strings are quite long (hundreds of variables) - R dies with an "input buffer overflow" error. I've tried writing out the code to files and sourcing them, as well as building the strings incrementally in R, but these have not worked either. I have come to believe there is a maximum length for char strings - some sort of fundamental limitation. Is there such a max-length and, if so, is there a way I can work with long strings of the sort referenced above? Thank you, J. Wilson
On Wed, 12 Jul 2006, jake wilson wrote:> I'm attempting to "glm" a formula - something that's not caused problems in > the past. I've used formulas of the form > > formula( "dependant-variable~independant-variables" ) > > where the independant variable string is of the form: > > "indvar1+indvar2+...+indvarN" > > Now, however, our independant variable strings are quite long (hundreds of > variables) - R dies with an "input buffer overflow" error. I've tried > writing out the code to files and sourcing them, as well as building the > strings incrementally in R, but these have not worked either. I have come > to believe there is a maximum length for char strings - some sort of > fundamental limitation. Is there such a max-length and, if so, is there a > way I can work with long strings of the sort referenced above? >How long are the strings, and where does the error occur (traceback()) will tell you where)? With fn <- function(n) formula(paste("y",paste("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",1:n,collapse="+",sep=""),sep="~")) I can run terms(fn(500)) with no problems. This is a 15500 character string, and produces a terms object over a megabyte in size. This suggests that it isn't a string problem, unless you really want formulas larger than this. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
On Wed, 12 Jul 2006, jake wilson wrote:> Hi, > > I'm attempting to "glm" a formula - something that's not caused problems in > the past. I've used formulas of the form > > formula( "dependant-variable~independant-variables" ) > > where the independant variable string is of the form: > > "indvar1+indvar2+...+indvarN"Why the quotes?: I think that is your problem.> Now, however, our independant variable strings are quite long (hundreds of > variables) - R dies with an "input buffer overflow" error.It is normal to use (y ~ ., data=mydata) to avoid such formulae.> I've tried writing out the code to files and sourcing them, as well as > building the strings incrementally in R, but these have not worked > either. I have come to believe there is a maximum length for char > strings - some sort of fundamental limitation. Is there such a > max-length and, if so, is there a way I can work with long strings of > the sort referenced above?The limit is 2^31 -1, not relevant here. Your message is coming from the parser, and suggests that it is trying to parse a piece of text longer than MAXELTSIZE bytes. The latter depends on the platform (unstated: do see the posting guide) and is often 8196 bytes. So there is a limit on the length of quoted strings which can be input. However, what is wrong with say tmp <- paste(paste("indvar", 1:1000, sep=""), collapse="+") tmp <- paste("y ~", tmp) form <- eval(parse(text=tmp)) ? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Apparently Analagous Threads
- main/character.c (et.al): dangerous AllocBuffer()
- main/character.c (et.al): dangerous AllocBuffer()
- [LLVMdev] [Polly] Aliasing problems escalation (WAS: Re: [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?)
- RW 0.64.2 substring() string truncation?
- segfault with readDCF on R 3.1.2 on AIX 6.1 when using install.packages