thr3ads.net - R help - [R] RW 0.64.2 substring() string truncation? [Aug 1999]

If this information is useful, please help other people find it:
Share via:

T.E.Diaz

1999-Aug-03 03:06 UTC

[R] RW 0.64.2 substring() string truncation?

Hi,

(First, apology for my earlier incorrectly addressed "subscribe" 
post.)

Can somebody tell me what exactly is going on below. Basically, I am 
running into some kind of "string truncation" problem when I try 
to get a substring starting past the 8192nd character (see sample 
session below). There doesn't appear to be any problem creating the 
string, and nchar() reports the correct size as constructed.

# Start of R session:>  tmp <-
paste(rep("1",8192),sep="",collapse="")
> nchar(tmp)
[1] 8192> tmp <-
paste(tmp,"23456789",sep="",collapse="")
> nchar(tmp)
[1] 8200> substring(tmp,8190,8192)
[1] "111"> substring(tmp,8190,8193)Warning in substr(x, as.integer(start), as.integer(stop)) : a string was
truncated in substr()
[1] "111"># End of R session:

Thanks for any assistance.

T.E.Diaz
George Washington University
Washington, DC

For comparison, I run the following in Splus 4.0.

# Start of Splus 4.0 session.> tmp <-
paste(rep("1",8192),sep="",collapse="")
> nchar(tmp)
[1] 8192> 
> tmp <-
paste(tmp,"23456789",sep="",collapse="")
> nchar(tmp)
[1] 8200> substring(tmp,8190,8192)
[1] "111"> substring(tmp,8190,8193)# End of Splus 4.0 session.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard BSA

1999-Aug-03 08:09 UTC

head link

[R] RW 0.64.2 substring() string truncation?

"T.E.Diaz" <tediaz at gwis2.circ.gwu.edu> writes:
> [1] 8192
> > tmp <-
paste(tmp,"23456789",sep="",collapse="")
> > nchar(tmp)
> [1] 8200
> > substring(tmp,8190,8192)
> [1] "111"
> > substring(tmp,8190,8193)
> Warning in substr(x, as.integer(start), as.integer(stop)) : a string was
truncated in substr()
> [1] "111"
Possibly - ;) - related to this:

src/include/Defn.h:#define MAXELTSIZE 8192 /* The largest string size */

There are a couple of fixed-size arrays in the code. We'll want to
eradicate them at some point but it's pretty painful to do. Do you
have a serious application for text strings of more than 8k length?

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian D Ripley

1999-Aug-03 08:22 UTC

head link

[R] RW 0.64.2 substring() string truncation?

On Tue, 3 Aug 1999, T.E.Diaz wrote:
> Can somebody tell me what exactly is going on below. Basically, I am 
> running into some kind of "string truncation" problem when I try 
> to get a substring starting past the 8192nd character (see sample 
> session below). There doesn't appear to be any problem creating the 
> string, and nchar() reports the correct size as constructed.
substr/substring has a buffer size limit of 8192. Indeed, the include file
says

Defn.h:#define MAXELTSIZE 8192 /* The largest string size */

One day this limit may be removed, but for now at least we could document
it.  I am not at all clear why one would want to use a single string longer
than 8192 chars: is it possible in your applications to use a vector of
shorter strings instead?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

T.E.Diaz

1999-Aug-03 13:25 UTC

head link

[R] RW 0.64.2 substring() string truncation?

Thanks to Prof Ripley and Peter Dalgaard for the "#define MAXELTSIZE 
8192" clarification to my earlier post (attached below).

As to the the need for a string of nchar()>8192, I am using it to 
store the alphanumeric names of FromNodes and ToNodes in a large 
generalized network. The optimization routine is implemented in C 
(actually translated from Fortran 77 using F2C and did some 
modifications) but the network representation is constructed in R.

(1) My first impulse was to use a pair of contiguous 
memories to store the FromNodes and ToNodes, pass their addresses to 
the C function through the .C() call, which then writes the solution 
SolFromNodes and SolToNodes in another pair of contiguous memories 
whose addresses were also passed in the .C() call. These four 
contiguous memories are represented in R as four character "vectors" 
each of length()=1 (node names are of fixed length). Inside the C 
function, I employ pointer arithmetic applied to each single string 
to access subsets of characters (the individual nodes).

(2) As suggested below by Prof Ripley, to overcome the 8192 
limitation, I can also use a vector representation of FromNodes, for 
example, with each element representing a single node. Inside the C 
function, I would then employ "pointer to character pointers" 
arithmetic to access individual nodes. 

Solution (1) is actually closer to the "array of characters"
representation of FromNodes (etc ...) in the original Fortran77 code,
and which, I was guessing, is the more efficient implementation (I 
shall know better after experimenting with (2)) from a process time 
viewpoint. We are dealing here with as large as 10,000 nodes each of 
6 characters long. The optimization routine will be implemented in a 
simulation function and a fraction of a second gain in efficiency in 
a single replicate would be nice. 

T.E.Diaz
George Washington University
Washington, DC

From:          Prof Brian D Ripley <ripley at
stats.ox.ac.uk>>
> On Tue, 3 Aug 1999, T.E.Diaz wrote:
> 
> > Can somebody tell me what exactly is going on below. Basically, I am 
> > running into some kind of "string truncation" problem when I
try
> > to get a substring starting past the 8192nd character (see sample 
> > session below). There doesn't appear to be any problem creating
the
> > string, and nchar() reports the correct size as constructed.
> 
> substr/substring has a buffer size limit of 8192. Indeed, the include file
> says
> 
> Defn.h:#define MAXELTSIZE 8192 /* The largest string size */
> 
> One day this limit may be removed, but for now at least we could document
> it.  I am not at all clear why one would want to use a single string longer
> than 8192 chars: is it possible in your applications to use a vector of
> shorter strings instead?
From:          Peter Dalgaard BSA <p.dalgaard at
biostat.ku.dk>> Possibly - ;) - related to this:
> 
> src/include/Defn.h:#define MAXELTSIZE 8192 /* The largest string
> size */
> 
> There are a couple of fixed-size arrays in the code. We'll want to
> eradicate them at some point but it's pretty painful to do. Do you
> have a serious application for text strings of more than 8k length?-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more reasonably related threads

R help - Aug 1999 - RW 0.64.2 substring() string truncation?

[R] RW 0.64.2 substring() string truncation?

[R] RW 0.64.2 substring() string truncation?

[R] RW 0.64.2 substring() string truncation?

[R] RW 0.64.2 substring() string truncation?

Maybe Matching Threads