Hi, (First, apology for my earlier incorrectly addressed "subscribe" post.) Can somebody tell me what exactly is going on below. Basically, I am running into some kind of "string truncation" problem when I try to get a substring starting past the 8192nd character (see sample session below). There doesn't appear to be any problem creating the string, and nchar() reports the correct size as constructed. # Start of R session:> tmp <- paste(rep("1",8192),sep="",collapse="") > nchar(tmp)[1] 8192> tmp <- paste(tmp,"23456789",sep="",collapse="") > nchar(tmp)[1] 8200> substring(tmp,8190,8192)[1] "111"> substring(tmp,8190,8193)Warning in substr(x, as.integer(start), as.integer(stop)) : a string was truncated in substr() [1] "111"># End of R session: Thanks for any assistance. T.E.Diaz George Washington University Washington, DC For comparison, I run the following in Splus 4.0. # Start of Splus 4.0 session.> tmp <- paste(rep("1",8192),sep="",collapse="") > nchar(tmp)[1] 8192> > tmp <- paste(tmp,"23456789",sep="",collapse="") > nchar(tmp)[1] 8200> substring(tmp,8190,8192)[1] "111"> substring(tmp,8190,8193)# End of Splus 4.0 session. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"T.E.Diaz" <tediaz at gwis2.circ.gwu.edu> writes:> [1] 8192 > > tmp <- paste(tmp,"23456789",sep="",collapse="") > > nchar(tmp) > [1] 8200 > > substring(tmp,8190,8192) > [1] "111" > > substring(tmp,8190,8193) > Warning in substr(x, as.integer(start), as.integer(stop)) : a string was truncated in substr() > [1] "111"Possibly - ;) - related to this: src/include/Defn.h:#define MAXELTSIZE 8192 /* The largest string size */ There are a couple of fixed-size arrays in the code. We'll want to eradicate them at some point but it's pretty painful to do. Do you have a serious application for text strings of more than 8k length? -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 3 Aug 1999, T.E.Diaz wrote:> Can somebody tell me what exactly is going on below. Basically, I am > running into some kind of "string truncation" problem when I try > to get a substring starting past the 8192nd character (see sample > session below). There doesn't appear to be any problem creating the > string, and nchar() reports the correct size as constructed.substr/substring has a buffer size limit of 8192. Indeed, the include file says Defn.h:#define MAXELTSIZE 8192 /* The largest string size */ One day this limit may be removed, but for now at least we could document it. I am not at all clear why one would want to use a single string longer than 8192 chars: is it possible in your applications to use a vector of shorter strings instead? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks to Prof Ripley and Peter Dalgaard for the "#define MAXELTSIZE 8192" clarification to my earlier post (attached below). As to the the need for a string of nchar()>8192, I am using it to store the alphanumeric names of FromNodes and ToNodes in a large generalized network. The optimization routine is implemented in C (actually translated from Fortran 77 using F2C and did some modifications) but the network representation is constructed in R. (1) My first impulse was to use a pair of contiguous memories to store the FromNodes and ToNodes, pass their addresses to the C function through the .C() call, which then writes the solution SolFromNodes and SolToNodes in another pair of contiguous memories whose addresses were also passed in the .C() call. These four contiguous memories are represented in R as four character "vectors" each of length()=1 (node names are of fixed length). Inside the C function, I employ pointer arithmetic applied to each single string to access subsets of characters (the individual nodes). (2) As suggested below by Prof Ripley, to overcome the 8192 limitation, I can also use a vector representation of FromNodes, for example, with each element representing a single node. Inside the C function, I would then employ "pointer to character pointers" arithmetic to access individual nodes. Solution (1) is actually closer to the "array of characters" representation of FromNodes (etc ...) in the original Fortran77 code, and which, I was guessing, is the more efficient implementation (I shall know better after experimenting with (2)) from a process time viewpoint. We are dealing here with as large as 10,000 nodes each of 6 characters long. The optimization routine will be implemented in a simulation function and a fraction of a second gain in efficiency in a single replicate would be nice. T.E.Diaz George Washington University Washington, DC From: Prof Brian D Ripley <ripley at stats.ox.ac.uk>> > On Tue, 3 Aug 1999, T.E.Diaz wrote: > > > Can somebody tell me what exactly is going on below. Basically, I am > > running into some kind of "string truncation" problem when I try > > to get a substring starting past the 8192nd character (see sample > > session below). There doesn't appear to be any problem creating the > > string, and nchar() reports the correct size as constructed. > > substr/substring has a buffer size limit of 8192. Indeed, the include file > says > > Defn.h:#define MAXELTSIZE 8192 /* The largest string size */ > > One day this limit may be removed, but for now at least we could document > it. I am not at all clear why one would want to use a single string longer > than 8192 chars: is it possible in your applications to use a vector of > shorter strings instead?From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>> Possibly - ;) - related to this: > > src/include/Defn.h:#define MAXELTSIZE 8192 /* The largest string > size */ > > There are a couple of fixed-size arrays in the code. We'll want to > eradicate them at some point but it's pretty painful to do. Do you > have a serious application for text strings of more than 8k length?-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- read.table problem on Linux/Alpha (seg faults caused by isspace(R_EOF)) (PR#303)
- segfault with readDCF on R 3.1.2 on AIX 6.1 when using install.packages
- sprintf("%d", integer(0)) aborts
- Error in substring: invalid multibyte string
- main/character.c (et.al): dangerous AllocBuffer()