thr3ads.net - R devel - [Rd] incorrect output and segfaults from sprintf with %*d (PR#13667) [Apr 2009]

If this information is useful, please help other people find it:
Share via:

waku at idi.ntnu.no

2009-Apr-21 11:05 UTC

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Full_Name: Wacek Kusnierczyk
Version: 2.10.0 r48365
OS: Ubuntu 8.04 Linux 32bit
Submission from: (NULL) (129.241.110.141)


sprintf has a documented limit on strings included in the output using the
format '%s'.  It appears that there is a limit on the length of strings
included
with, e.g., the format '%d' beyond which surprising things happen
(output
modified for conciseness):

   gregexpr('1', sprintf('%9000d', 1))
   # [1] 9000 9801

   gregexpr('1', sprintf('%9000d', 1))
   # [1]  9000  9801 10602

   gregexpr('1', sprintf('%9000d', 1))
   # [1]  9000  9801 10602 11403

   gregexpr('1', sprintf('%9000d', 1))
   # [1]  9000  9801 10602 11403 12204

   ...

Note that not only more than one '1' is included in the output, but also
that
the same functional expression (no side effects used beyond the interface) gives
different results on each execution.  Analogous behaviour can be observed with
'%nd' where n > 8200.

The actual output above is consistent across separate sessions.

With sufficiently large field width values, R segfaults:

   sprintf('%*d', 10^5, 1)
   # *** caught segfault ***
   # address 0xbfcfc000, cause 'memory not mapped'
   # Segmentation fault


   sessionInfo()
   # R version 2.10.0 Under development (unstable) (2009-04-20 r48365) 
   # i686-pc-linux-gnu

maechler at stat.math.ethz.ch

2009-Apr-22 14:10 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

>>>>> "vQ" == Wacek Kusnierczyk <waku at
idi.ntnu.no>
>>>>>     on Tue, 21 Apr 2009 13:05:11 +0200 (CEST) writes:
    vQ> Full_Name: Wacek Kusnierczyk
    vQ> Version: 2.10.0 r48365
    vQ> OS: Ubuntu 8.04 Linux 32bit
    vQ> Submission from: (NULL) (129.241.110.141)


    vQ> sprintf has a documented limit on strings included in the output
using the
    vQ> format '%s'.  It appears that there is a limit on the length
of strings included
    vQ> with, e.g., the format '%d' beyond which surprising things
happen (output
    vQ> modified for conciseness):

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403 12204

    vQ> ...

    vQ> Note that not only more than one '1' is included in the
output, but also that
    vQ> the same functional expression (no side effects used beyond the
interface) gives
    vQ> different results on each execution.  Analogous behaviour can be
observed with
    vQ> '%nd' where n > 8200.

    vQ> The actual output above is consistent across separate sessions.

    vQ> With sufficiently large field width values, R segfaults:

    vQ> sprintf('%*d', 10^5, 1)
    vQ> # *** caught segfault ***
    vQ> # address 0xbfcfc000, cause 'memory not mapped'
    vQ> # Segmentation fault


Thank you, Wacek.
That's all ``interesting''  ... unfortunately, 

my version of  'man 3 sprintf' contains
>> BUGS
>>        Because sprintf() and vsprintf() assume an arbitrarily
>>        long string, callers must be careful not to overflow the
>>        actual space; this is often impossible to assure. Note
>>        that the length of the strings produced is
>>        locale-dependent and difficult to predict.  Use
>>        snprintf() and vsnprintf() instead (or asprintf() and
vasprintf).
(note the "impossible" part above)       
and we haven't used  snprintf() yet, probably because it
requires the  C99 C standard, and AFAIK, we have only relatively
recently started to more or less rely on C99 in the R sources.
       
More precisely, I see that some windows-only code relies on
snprintf() being available  whereas in at least on non-Windows
section, I read   /* we cannot assume snprintf here */

Now such platform dependency issues and corresponding configure
settings I do typically leave to other R-corers with a much
wider overview about platforms and their compilers and C libraries.
       

BTW,  
1) sprintf("%n %g", 1,1)   also seg.faults

2) Did you have a true use case where  the  8192  limit was an
   undesirable limit?

Martin       

    vQ> sessionInfo()
    vQ> # R version 2.10.0 Under development (unstable) (2009-04-20 r48365) 
    vQ> # i686-pc-linux-gnu

maechler at stat.math.ethz.ch

2009-Apr-23 10:40 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

>>>>> "vQ" == Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>     on Thu, 23 Apr 2009 11:49:54 +0200 writes:
    vQ> maechler at stat.math.ethz.ch wrote:
    >> 
    vQ> sprintf has a documented limit on strings included in the output
using the
    vQ> format '%s'.  It appears that there is a limit on the length
of strings included
    vQ> with, e.g., the format '%d' beyond which surprising things
happen (output
    vQ> modified for conciseness):
    >> 

    vQ> ... and this limit is *not* documented.

well, it is basically (+ a few bytes ?)
the same  8192  limit that *is* documented.

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403 12204
    >> 
    vQ> ...
    >> 
    vQ> Note that not only more than one '1' is included in the
output, but also that
    vQ> the same functional expression (no side effects used beyond the
interface) gives
    vQ> different results on each execution.  Analogous behaviour can be
observed with
    vQ> '%nd' where n > 8200.
    >> 
    vQ> The actual output above is consistent across separate sessions.
    >> 
    vQ> With sufficiently large field width values, R segfaults:
    >> 
    vQ> sprintf('%*d', 10^5, 1)
    vQ> # *** caught segfault ***
    vQ> # address 0xbfcfc000, cause 'memory not mapped'
    vQ> # Segmentation fault
    >> 
    >> 
    >> Thank you, Wacek.
    >> That's all ``interesting''  ... unfortunately, 
    >> 
    >> my version of  'man 3 sprintf' contains
    >> 
    >> 
    >>>> BUGS
    >>>> Because sprintf() and vsprintf() assume an arbitrarily
    >>>> long string, callers must be careful not to overflow the
    >>>> actual space; this is often impossible to assure. Note
    >>>> that the length of the strings produced is
    >>>> locale-dependent and difficult to predict.  Use
    >>>> snprintf() and vsnprintf() instead (or asprintf() and
vasprintf).

    vQ> yes, but this is c documentation, not r documentation.

Of course! ...  and I  *do*  apply it to R's C code [sprintf.c]
and hence am even concurring with you .. 

    vQ> it's applicable
    vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper
for the
    vQ> C function 'sprintf'".  however, in c you use a buffer
and you usually
    vQ> have control over it's capacity, while in r this is a hidden
    vQ> implementational detail, which should not be visible to the user, or
    vQ> should cause an attempt to overflow the buffer to fail more
gracefully
    vQ> than with a segfault.

    vQ> in r, sprintf('%9000d', 1) will produce a confused output
with a count
    vQ> of 1's variable (!) across runs (while sprintf('%*d',
9000, 1) seems to
    vQ> do fine):

    vQ> gregexpr('1', sprintf('%*d', 9000, 1))
    vQ> # [1] 9000

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801 ..., variable across executions

    vQ> on one execution in a series i actually got this:

    vQ> Warning message:
    vQ> In gregexpr("1", sprintf("%9000d", 1)) :
    vQ> input string 1 is invalid in this locale

    vQ> while the very next execution, still in the same session, gave

    vQ> # [1]  9000  9801 10602

    vQ> with sprintf('%*d', 10000, 1) i got segfaults on some
executions but
    vQ> correct output on others, while sprintf('%10000d', 1) is
confused again.



    >> (note the "impossible" part above)       
    >> 

    vQ> yes, but it does also say "must be careful", and it seems
that someone
    vQ> has not been careful enough.

    >> and we haven't used  snprintf() yet, probably because it
    >> requires the  C99 C standard, and AFAIK, we have only relatively
    >> recently started to more or less rely on C99 in the R sources.
    >> 

    vQ> while snprintf would help avoid buffer overflow, it may not be a
    vQ> solution to the issue of confused output.

I think it would / will.  We would be able to give warnings and
errors, by checking the  snprintf()  return codes.


    >> More precisely, I see that some windows-only code relies on
    >> snprintf() being available  whereas in at least on non-Windows
    >> section, I read   /* we cannot assume snprintf here */
    >> 
    >> Now such platform dependency issues and corresponding configure
    >> settings I do typically leave to other R-corers with a much
    >> wider overview about platforms and their compilers and C libraries.
    >> 

    vQ> it looks like src/main/sprintf.c is just buggy, and it's
plausible that
    vQ> the bug could be repaired in a platform-independent manner.

definitely.
In the mean time, I've actually found that what I first said on
the usability of snprintf() in R's code base was only partly correct.
There are other parts of R code where we use  snprintf() for all
platforms, hence we rely on its presence (and correct
implementation!) and so we can and I think should use it in
place of sprintf() in quite a few places inside R's sprintf.c


    >> BTW,  
    >> 1) sprintf("%n %g", 1,1)   also seg.faults
    >> 

    vQ> as do

    vQ> sprintf('%n%g', 1, 1)
    vQ> sprintf('%n%')

    vQ> etc., while

    vQ> sprintf('%q%g', 1, 1)
    vQ> sprintf('%q%')
  
    vQ> work just fine.  strange, because per ?sprintf 'n' is not
recognized as
    vQ> a format specifier, so the output from the first two above should be
as
    vQ> from the last two above, respectively.  (and likewise in the %S case,
    vQ> discussed and bug-reported earlier.)

I have now fixed these bugs at least;
the more subtle  "%<too_large_n>d" ones are different, and
as I said, I'm convinced that a nice & clean fix for those will
start using snprintf().

    >> 2) Did you have a true use case where  the  8192  limit was an
    >> undesirable limit?

    vQ> how does it matter?  

well, we could increase it, if it did matter.
      {{ you *could* have been more polite here, no?
      	 it *was* after all a serious question that I asked! }}

    vQ>  if you set a limit, be sure to consistently enforce
    vQ> it and warn the user on attempts to exceed it.  or write clearly in
the
    vQ> docs that such attempts will cause the output to be silently
truncated.

Sure, I'm not at all disagreeing on that, and if you read this into my
posting, you misunderstand.

Martin

    vQ> examples such as

    vQ> sprintf('%9000d', 1)

    vQ> do not contribute to the reliability of r, and neither to the
user's
    vQ> confidence in it.

    vQ> vQ

Waclaw.Marcin.Kusnierczyk at idi.ntnu.no

2009-Apr-23 13:05 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Martin Maechler wrote:>>>>>> "vQ" == Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>>     on Thu, 23 Apr 2009 11:49:54 +0200 writes:
>>>>>>             
>
>     vQ> maechler at stat.math.ethz.ch wrote:
>     >> 
>     vQ> sprintf has a documented limit on strings included in the output
using the
>     vQ> format '%s'.  It appears that there is a limit on the
length of strings included
>     vQ> with, e.g., the format '%d' beyond which surprising
things happen (output
>     vQ> modified for conciseness):
>     >> 
>
>     vQ> ... and this limit is *not* documented.
>
> well, it is basically (+ a few bytes ?)
> the same  8192  limit that *is* documented.
>   
martin, ?sprintf says:

" There is a limit of 8192 bytes on elements of 'fmt' and also on
     strings included by a '%s' conversion specification."

for me, it's clear that the *elements of fmt* cannot be longer than 8192
bytes, and that each single bit included in the output in place of a %s
cannot be longer than 8192.  nowhere does it say that strings included
in the output in place of a %d, for example, cannot be longer than
8192.  the fact that %s is particularized makes me infer that there is
something specific to %s that does not apply to %d, for example,
otherwise the help would have been formulated differently.  (though
given how r help pages are written, nothing seems unlikely.)

and in fact, the limit does not seem to apply in an obvious way in cases
such as sprintf('%*d', 10000, 1), where the output is correct.  at the
very least, the documentation leaves the user ignorant as to what will
happen if the limit is exceeded.
>     >> my version of  'man 3 sprintf' contains
>     >> 
>     >> 
>     >>>> BUGS
>     >>>> Because sprintf() and vsprintf() assume an arbitrarily
>     >>>> long string, callers must be careful not to overflow
the
>     >>>> actual space; this is often impossible to assure. Note
>     >>>> that the length of the strings produced is
>     >>>> locale-dependent and difficult to predict.  Use
>     >>>> snprintf() and vsnprintf() instead (or asprintf() and
vasprintf).
>
>     vQ> yes, but this is c documentation, not r documentation.
>
> Of course! ...  and I  *do*  apply it to R's C code [sprintf.c]
> and hence am even concurring with you .. 
>
>
>     vQ> while snprintf would help avoid buffer overflow, it may not be a
>     vQ> solution to the issue of confused output.
>
> I think it would / will.  We would be able to give warnings and
> errors, by checking the  snprintf()  return codes.
>   
maybe, i can't judge without carefully examining the code for sprintf.c
(which i am rather unwilling to do, having had a look at a sample).
>
>     >> More precisely, I see that some windows-only code relies on
>     >> snprintf() being available  whereas in at least on non-Windows
>     >> section, I read   /* we cannot assume snprintf here */
>     >> 
>     >> Now such platform dependency issues and corresponding
configure
>     >> settings I do typically leave to other R-corers with a much
>     >> wider overview about platforms and their compilers and C
libraries.
>     >> 
>
>     vQ> it looks like src/main/sprintf.c is just buggy, and it's
plausible that
>     vQ> the bug could be repaired in a platform-independent manner.
>
> definitely.
> In the mean time, I've actually found that what I first said on
> the usability of snprintf() in R's code base was only partly correct.
> There are other parts of R code where we use  snprintf() for all
> platforms, hence we rely on its presence (and correct
> implementation!) and so we can and I think should use it in
> place of sprintf() in quite a few places inside R's sprintf.c
>
>   
would be interesting to see how this improves sprintf.
>     >> BTW,  
>     >> 1) sprintf("%n %g", 1,1)   also seg.faults
>     >> 
>
>     vQ> as do
>
>     vQ> sprintf('%n%g', 1, 1)
>     vQ> sprintf('%n%')
>
>     vQ> etc., while
>
>     vQ> sprintf('%q%g', 1, 1)
>     vQ> sprintf('%q%')
>   
>     vQ> work just fine.  strange, because per ?sprintf 'n' is
not recognized as
>     vQ> a format specifier, so the output from the first two above
should be as
>     vQ> from the last two above, respectively.  (and likewise in the %S
case,
>     vQ> discussed and bug-reported earlier.)
>
> I have now fixed these bugs at least;
>   
great, i'm going to torture the fix soon ;)
> the more subtle  "%<too_large_n>d" ones are different, and
> as I said, I'm convinced that a nice & clean fix for those will
> start using snprintf().
>
>     >> 2) Did you have a true use case where  the  8192  limit was an
>     >> undesirable limit?
>
>     vQ> how does it matter?  
>
> well, we could increase it, if it did matter.
>       {{ you *could* have been more polite here, no?
>   
i don't see how i could be more polite here, i had absolutely no
intention to be impolite and didn't think i were. 

i gave a serious answer by means of a serious question.  increasing an
arbitrary, poorly documented limit of obscure effect is hardly any
solution.  suggesting that a bug is not a bug because some limit is not
likely to be exceeded in practice is not a particularly good idea.

>       	 it *was* after all a serious question that I asked! }}
>
>     vQ>  if you set a limit, be sure to consistently enforce
>     vQ> it and warn the user on attempts to exceed it.  or write clearly
in the
>     vQ> docs that such attempts will cause the output to be silently
truncated.
>
> Sure, I'm not at all disagreeing on that, and if you read this into my
> posting, you misunderstand.
>   
no, i didn't read that into your posting, i'm just referring to the
state of the 'art' in r.

cheers,
vQ

maechler at stat.math.ethz.ch

2009-Apr-23 13:15 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

>>>>> "vQ" == Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>     on Thu, 23 Apr 2009 15:00:29 +0200 writes:
    vQ> Martin Maechler wrote:
    >>>>>>> "vQ" == Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
    >>>>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes:


[......................]
[......................]

    >> >> BTW,  
    >> >> 1) sprintf("%n %g", 1,1)   also seg.faults
    >> >> 
    >> 
    vQ> as do
    >> 
    vQ> sprintf('%n%g', 1, 1)
    vQ> sprintf('%n%')
    >> 
    vQ> etc., while
    >> 
    vQ> sprintf('%q%g', 1, 1)
    vQ> sprintf('%q%')
    >> 
    vQ> work just fine.  strange, because per ?sprintf 'n' is not
recognized as
    vQ> a format specifier, so the output from the first two above should be
as
    vQ> from the last two above, respectively.  (and likewise in the %S case,
    vQ> discussed and bug-reported earlier.)
    >> 
    >> I have now fixed these bugs at least;
    >> 

    vQ> great, i'm going to torture the fix soon ;)

there will be another one, still today, fixing the

      sprintf("%s", tryCatch(stop(), error=identity))

bug {which actually *is* a subtle, too}

    >> the more subtle  "%<too_large_n>d" ones are
different, and
    >> as I said, I'm convinced that a nice & clean fix for those
will
    >> start using snprintf().
    >> 
    >> >> 2) Did you have a true use case where  the  8192  limit
was an
    >> >> undesirable limit?
    >> 
    vQ> how does it matter?  
    >> 
    >> well, we could increase it, if it did matter.
    >> {{ you *could* have been more polite here, no?
    >> 

    vQ> i don't see how i could be more polite here, i had absolutely no
    vQ> intention to be impolite and didn't think i were. 

    vQ> i gave a serious answer by means of a serious question.  increasing
an
    vQ> arbitrary, poorly documented limit of obscure effect is hardly any
    vQ> solution.  suggesting that a bug is not a bug because some limit is
not
    vQ> likely to be exceeded in practice is not a particularly good idea.

But that's exactly what I did  NOT  suggest!!

It was a serious question, *related* to your bug report, but
*NOT* really on the bug report proper.
[It *was* under the "heading" of 'BTW' which I assumed you
knew
 to interpret]
I was seriously asking, if BTW, the limit which is there was
possibly to be increased or not...


    >> it *was* after all a serious question that I asked! }}
    >> 
    vQ> if you set a limit, be sure to consistently enforce
    vQ> it and warn the user on attempts to exceed it.  or write clearly in
the
    vQ> docs that such attempts will cause the output to be silently
truncated.
    >> 
    >> Sure, I'm not at all disagreeing on that, and if you read this
into my
    >> posting, you misunderstand.
    >> 

    vQ> no, i didn't read that into your posting, i'm just referring
to the
    vQ> state of the 'art' in r.

[not so funny...  yet another "very polite" assumption.]

    vQ> cheers,
    vQ> vQ

maechler at stat.math.ethz.ch

2009-Apr-24 10:45 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

>>>>> "MM" == Martin Maechler <maechler at
stat.math.ethz.ch>
>>>>>     on Thu, 23 Apr 2009 12:40:22 +0200 (CEST) writes:
>>>>> "vQ" == Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>     on Thu, 23 Apr 2009 11:49:54 +0200 writes:
    vQ> maechler at stat.math.ethz.ch wrote:
    >>> 
    vQ> sprintf has a documented limit on strings included in the output
using the
    vQ> format '%s'.  It appears that there is a limit on the length
of strings included
    vQ> with, e.g., the format '%d' beyond which surprising things
happen (output
    vQ> modified for conciseness):
    >>> 

    vQ> ... and this limit is *not* documented.

    MM> well, it is basically (+ a few bytes ?)
    MM> the same  8192  limit that *is* documented.

indeed, I was right with that..

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801
    >>> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602
    >>> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403
    >>> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403 12204
    >>> 
    vQ> ...
    >>> 

    vQ> Note that not only more than one '1' is included in the
    vQ> output, but also that the same functional expression (no
    vQ> side effects used beyond the interface) gives different
    vQ> results on each execution.  Analogous behaviour can be
    vQ> observed with '%nd' where n > 8200.

    vQ> The actual output above is consistent across separate sessions.
    >>> 
    vQ> With sufficiently large field width values, R segfaults:
    >>> 
    vQ> sprintf('%*d', 10^5, 1)
    vQ> # *** caught segfault ***
    vQ> # address 0xbfcfc000, cause 'memory not mapped'
    vQ> # Segmentation fault
    >>> 
    >>> 
    >>> Thank you, Wacek.
    >>> That's all ``interesting''  ... unfortunately, 
    >>> 
    >>> my version of  'man 3 sprintf' contains
    >>> 
    >>> 
    >>>>> BUGS
    >>>>> Because sprintf() and vsprintf() assume an arbitrarily
    >>>>> long string, callers must be careful not to overflow
the
    >>>>> actual space; this is often impossible to assure. Note
    >>>>> that the length of the strings produced is
    >>>>> locale-dependent and difficult to predict.  Use
    >>>>> snprintf() and vsnprintf() instead (or asprintf() and
vasprintf).

    vQ> yes, but this is c documentation, not r documentation.

    MM> Of course! ...  and I  *do*  apply it to R's C code [sprintf.c]
    MM> and hence am even concurring with you .. 

    vQ> it's applicable
    vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper
for the
    vQ> C function 'sprintf'".  however, in c you use a buffer
and you usually
    vQ> have control over it's capacity, while in r this is a hidden
    vQ> implementational detail, which should not be visible to the user, or
    vQ> should cause an attempt to overflow the buffer to fail more
gracefully
    vQ> than with a segfault.

    vQ> in r, sprintf('%9000d', 1) will produce a confused output
with a count
    vQ> of 1's variable (!) across runs (while sprintf('%*d',
9000, 1) seems to
    vQ> do fine):

    vQ> gregexpr('1', sprintf('%*d', 9000, 1))
    vQ> # [1] 9000

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801 ..., variable across executions

    vQ> on one execution in a series i actually got this:

    vQ> Warning message:
    vQ> In gregexpr("1", sprintf("%9000d", 1)) :
    vQ> input string 1 is invalid in this locale

    vQ> while the very next execution, still in the same session, gave

    vQ> # [1]  9000  9801 10602

    vQ> with sprintf('%*d', 10000, 1) i got segfaults on some
executions but
    vQ> correct output on others, while sprintf('%10000d', 1) is
confused again.


    >>> (note the "impossible" part above)       
    >>> 

    vQ> yes, but it does also say "must be careful", and it seems
that someone
    vQ> has not been careful enough.

    >>> and we haven't used  snprintf() yet, probably because it
    >>> requires the  C99 C standard, and AFAIK, we have only
relatively
    >>> recently started to more or less rely on C99 in the R sources.
    >>> 

    vQ> while snprintf would help avoid buffer overflow, it may not be a
    vQ> solution to the issue of confused output.

    MM> I think it would / will.  We would be able to give warnings and
    MM> errors, by checking the  snprintf()  return codes.

My current working code gives an error for all the above
examples, e.g.,

 > sprintf('%9999d', 1)
 Error in sprintf("%9999d", 1) : 
   required resulting string length 9999 is > maximal 8191

it passes  'make check-devel' and I am inclined to commit that
code to R-devel (e.g. tomorrow). 

Yes, the documentation will also have to be amended, but apart
from that, would people see a big problem with the "8192" limit
which now is suddenly of greater importance
{{as I said all along;  hence my question to Wacek (and the
  R-develers)  if anybody found that limit too low}}

Of course, one could realloc()ate longer strings when needed
inside R's  sprintf() code,  but I reluctant to do that (render
the code yet more complicated ...) if nobody sees a need.

Martin

    >>> More precisely, I see that some windows-only code relies on
    >>> snprintf() being available  whereas in at least on non-Windows
    >>> section, I read   /* we cannot assume snprintf here */
    >>> 
    >>> Now such platform dependency issues and corresponding configure
    >>> settings I do typically leave to other R-corers with a much
    >>> wider overview about platforms and their compilers and C
libraries.
    >>> 

    vQ> it looks like src/main/sprintf.c is just buggy, and it's
plausible that
    vQ> the bug could be repaired in a platform-independent manner.

    MM> definitely.
    MM> In the mean time, I've actually found that what I first said on
    MM> the usability of snprintf() in R's code base was only partly
correct.
    MM> There are other parts of R code where we use  snprintf() for all
    MM> platforms, hence we rely on its presence (and correct
    MM> implementation!) and so we can and I think should use it in
    MM> place of sprintf() in quite a few places inside R's sprintf.c


    >>> BTW,  
    >>> 1) sprintf("%n %g", 1,1)   also seg.faults
    >>> 

    vQ> as do

    vQ> sprintf('%n%g', 1, 1)
    vQ> sprintf('%n%')

    vQ> etc., while

    vQ> sprintf('%q%g', 1, 1)
    vQ> sprintf('%q%')
  
    vQ> work just fine.  strange, because per ?sprintf 'n' is not
recognized as
    vQ> a format specifier, so the output from the first two above should be
as
    vQ> from the last two above, respectively.  (and likewise in the %S case,
    vQ> discussed and bug-reported earlier.)

    MM> I have now fixed these bugs at least;
    MM> the more subtle  "%<too_large_n>d" ones are
different, and
    MM> as I said, I'm convinced that a nice & clean fix for those
will
    MM> start using snprintf().

    >>> 2) Did you have a true use case where  the  8192  limit was an
    >>> undesirable limit?

    vQ> how does it matter?  

    MM> well, we could increase it, if it did matter.
    MM> {{ you *could* have been more polite here, no?
    MM> it *was* after all a serious question that I asked! }}

    vQ> if you set a limit, be sure to consistently enforce
    vQ> it and warn the user on attempts to exceed it.  or write clearly in
the
    vQ> docs that such attempts will cause the output to be silently
truncated.

    MM> Sure, I'm not at all disagreeing on that, and if you read this
into my
    MM> posting, you misunderstand.

    MM> Martin

    [..........................]

Waclaw.Marcin.Kusnierczyk at idi.ntnu.no

2009-Apr-25 17:40 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Martin Maechler wrote:>
>>>     MM> well, it is basically (+ a few bytes ?)
>>>     MM> the same  8192  limit that *is* documented.
>>>
>>> indeed, I was right with that..
>>>
>>>       
>> hmm, i'd guess this limit is valid for all strings included in the
>> output with any format?  not just %s (and, as it appears,
undocumentedly
>> %d)?
>>     
>
> yes.
>   
so it's perhaps easiest to change

" There is a limit of 8192 bytes on elements of 'fmt' and also on
     strings included by a '%s' conversion specification."

to sth like

" There is a limit of 8192 bytes on elements of 'fmt' and also on
     strings included in the output by any conversion specification."

it's in fact so easy that even i should be able to do it.

[1 minute later...]  i see you've fixed this one, too.

>> btw. (i do know what that means ;)), after your recent fix:
>>
>>    sprintf('%q%s', 1)
>>    # Error in sprintf("%q%s", 1) :
>>    #  use format %f, %e, %g or %a for numeric objects
>>
>>    sprintf('%s', 1)
>>    # [1] "1"
>>
>> you may want to add '%s' (and '%x', and ...) to the
error message.  or
>> perhaps make it say sth like 'invalid format: ...'.  the
problem is not
>> that %q is not applicable to numeric, but that it is not a valid format
>> at all.
>>     
>
> yes.  As a matter of fact,  "%q%s" is dealt with as *one* format
> chunk, since "%q" is
> not a valid format.
> The code I have just committed now gives a longer erro message, that
> should be more helpful.
>   
yes, but

    sptinf('%q%s', 1)


still suggests that one uses %{f,e,g,a} for numerics, while %s is pretty
much valid, too.  you see, in c sprintf(buffer, "%s", 1) is destined
to
cause a segfault, but in r it works -- so the error message is slightly
misleading, as it suggests %s is *not* valid for numerics.
> Thank you for the suggestion!
>   
yo welcum
>   
>> there's also an issue with the additional arguments supplied after
the
>> format:  any superfluous arguments are ignored (this is not documented,
>> as far as i can see),
>>     
>
> I think we should signal an error if there are too many arguments.
>   
agree.  but it might be more complex than it appears:

    sprintf('%3$s', 1, 2, 3)

should *not* complain about too many args, despite just one conversion
spec in the format.  interestingly,

    sprintf('%3$s', , , 3)
    # error: argument is missing, with no default

> Could anyone think of a case where the current behavior is desirable ?
>   
well, one scenario might be that one wants to print a collection of
items with an arbitrary format, supplied by the users, e.g.

    foo = function(fmt) {
       a = ...
       b = ...
       ...
       s = sprintf(fmt, a, b, ...)
       ... }

without having to examine the format to establish which values are
needed.  in the current state, sprintf would use those it would need to
use with a particular format, with no undesirable complaints.

>   
>> but they *are* evaluated nevertheless, e.g.:
>>
>>    sprintf('%d', 0, {print(1)})
>>    # "1"
>>    # [1] "0"
>>
>> it might be a good idea to document this behaviour.
>>     
>
> actually I think it should be changed to be more strict (see above).
>   
strict in which sense?  enforce a constraint on the number of arguments
to that needed by a specific format?  or do you mean evaluation of only
those arguments that are needed in a format?  or both?

what about:

    sprintf('%2$s', {print(1)}, 2)
    # too many arguments?
    # should 1 be printed?

    sprintf('%2$s', , 2)
    # too few arguments?
    # missing value?  (yes, sprintf is .Internal, but...)

> Thank you for the constructive feedback!
>   
not much to thank for...  certainly, it's the first time my feedback is
called 'constructive'.  i'm making progress, am i not?

btw., thank you for the fixes.  i appreciate your efforts a lot.

best,
vQ

maechler at stat.math.ethz.ch

2009-Apr-27 14:50 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

>>>>> "vQ" == Waclaw Marcin Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>     on Sat, 25 Apr 2009 19:40:27 +0200 (CEST) writes:
    vQ> Martin Maechler wrote:
    >> 
    MM> well, it is basically (+ a few bytes ?)  the same 8192
    MM> limit that *is* documented.
    >>>> 
    >>>> indeed, I was right with that..
    >>>> 
    >>>> 
    >>> hmm, i'd guess this limit is valid for all strings
    >>> included in the output with any format?  not just %s
    >>> (and, as it appears, undocumentedly %d)?
    >>> 
    >> 


    >>> btw. (i do know what that means ;)), after your recent
    >>> fix:
    >>> 
    >>> sprintf('%q%s', 1) # Error in sprintf("%q%s",
1) : # use
    >>> format %f, %e, %g or %a for numeric objects
    >>> 
    >>> sprintf('%s', 1) # [1] "1"
    >>> 
    >>> you may want to add '%s' (and '%x', and ...) to
the
    >>> error message.  or perhaps make it say sth like 'invalid
    >>> format: ...'.  the problem is not that %q is not
    >>> applicable to numeric, but that it is not a valid format
    >>> at all.
    >>> 
    >> > yes.  As a matter of fact, "%q%s" is dealt with as *one*    >> format chunk, since "%q" is not a valid format.  The code
    >> I have just committed now gives a longer erro message,
    >> that should be more helpful.
    >> 

    vQ> yes, but

    vQ>     sptinf('%q%s', 1)


    vQ> still suggests that one uses %{f,e,g,a} for numerics,
    vQ> while %s is pretty much valid, too.  you see, in c
    vQ> sprintf(buffer, "%s", 1) is destined to cause a
    vQ> segfault, but in r it works -- so the error message is
    vQ> slightly misleading, as it suggests %s is *not* valid
    vQ> for numerics.

yes, but only the error message somewhat suggests that;
at the moment, I'd like to keep it, since really the user
*should* think of using the number formats, rather than %s
{which just calls  as.character(.)}  for numeric arguments


    >> Thank you for the suggestion!
    >> 

  vQ> yo welcum

    >> 
    >>> there's also an issue with the additional arguments
    >>> supplied after the format: any superfluous arguments are
    >>> ignored (this is not documented, as far as i can see),
    >>> 
    >> 

  MM> I think we should signal an error if there are too many arguments.

  vQ> agree.  but it might be more complex than it appears:

    vQ>     sprintf('%3$s', 1, 2, 3)

    vQ> should *not* complain about too many args, despite just
    vQ> one conversion spec in the format.  

very good point; thanks!

    vQ> Interestingly,

    vQ>     sprintf('%3$s', , , 3) # error: argument is missing,
    vQ> with no default

yes, empty (aka "missing") arguments are not allowed in  sprintf().

    >> Could anyone think of a case where the current behavior
    >> is desirable ?
    >> 

    vQ> well, one scenario might be that one wants to print a collection of
    vQ> items with an arbitrary format, supplied by the users,
    vQ> e.g.

    vQ>     foo = function(fmt) { a = ...  b = ...  ...  s     vQ>
sprintf(fmt, a, b, ...)  ... }

    vQ> without having to examine the format to establish which
    vQ> values are needed.  in the current state, sprintf would
    vQ> use those it would need to use with a particular format,
    vQ> with no undesirable complaints.

ok. you have given good examples which make me revert my
proposal, i.e. continue to not erroring about "too many" arguments. 

    >>> but they *are* evaluated nevertheless, e.g.:
    >>> 
    >>> sprintf('%d', 0, {print(1)}) # "1" # [1]
"0"
    >>> 
    >>> it might be a good idea to document this behaviour.

   MM> actually I think it should be changed to be more strict
   MM> (see above).

as a matter of fact, and the result of many more examples,
I've changed my oppinion and now agree with your original
proposal:

I've just commmited another sprintf() patch which (among more
more important changes) *documents* that all arguments of
sprintf() are evaluated; this actually already entails that
empty / missing arguments are not allowed.

    vq> strict in which sense?  enforce a constraint on the
    vQ> number of arguments to that needed by a specific format?
    vQ> or do you mean evaluation of only those arguments that
    vQ> are needed in a format?  or both?

    vQ> what about:

    vQ>     sprintf('%2$s', {print(1)}, 2) # too many arguments?
    vQ> # should 1 be printed?

    vQ>     sprintf('%2$s', , 2) # too few arguments?  # missing
    vQ> value?  (yes, sprintf is .Internal, but...)


    >> Thank you for the constructive feedback!
    >> 

    vQ> not much to thank for...  certainly, it's the first time my
feedback is
    vQ> called 'constructive'.  i'm making progress, am i not?

indeed!  :-)

Martin Maechler, ETH Zurich

    vQ> btw., thank you for the fixes.  i appreciate your
    vQ> efforts a lot.

    vQ> best, vQ

Waclaw.Marcin.Kusnierczyk at idi.ntnu.no

2009-Apr-27 19:15 UTC

head link

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Martin Maechler wrote:>
>     vQ>     sptinf('%q%s', 1)
>
>
>     vQ> still suggests that one uses %{f,e,g,a} for numerics,
>     vQ> while %s is pretty much valid, too.  you see, in c
>     vQ> sprintf(buffer, "%s", 1) is destined to cause a
>     vQ> segfault, but in r it works -- so the error message is
>     vQ> slightly misleading, as it suggests %s is *not* valid
>     vQ> for numerics.
>
> yes, but only the error message somewhat suggests that;
> at the moment, I'd like to keep it, since really the user
> *should* think of using the number formats, rather than %s
> {which just calls  as.character(.)}  for numeric arguments
>   
then maybe sprintf('%s', 1) should complain about a format-argument
mismatch?  in c, 1 would be taken to be an address at which a string
starts, but in r you do not have pointers, so this interpretation is
impossible.  if the user *should* use number formats for numerics, %s
should not work.  smells lack of design.

>   MM> I think we should signal an error if there are too many arguments.
>
>   vQ> agree.  but it might be more complex than it appears:
>
>     vQ>     sprintf('%3$s', 1, 2, 3)
>
>     vQ> should *not* complain about too many args, despite just
>     vQ> one conversion spec in the format.  
>
> very good point; thanks!
>
>     vQ> Interestingly,
>
>     vQ>     sprintf('%3$s', , , 3) # error: argument is missing,
>     vQ> with no default
>
> yes, empty (aka "missing") arguments are not allowed in 
sprintf().
>   
be nice and document such items, pliz....  even if all internal
functions have this behavour (do they?), ?sprintf does not say that
sprintf is internal, or that the r wrapper calls an internal so that all
arguments are necessarily evaluated.
>     >> Could anyone think of a case where the current behavior
>     >> is desirable ?
>     >> 
>
>     vQ> well, one scenario might be that one wants to print a collection
of
>     vQ> items with an arbitrary format, supplied by the users,
>     vQ> e.g.
>
>     vQ>     foo = function(fmt) { a = ...  b = ...  ...  s >    
vQ> sprintf(fmt, a, b, ...)  ... }
>
>     vQ> without having to examine the format to establish which
>     vQ> values are needed.  in the current state, sprintf would
>     vQ> use those it would need to use with a particular format,
>     vQ> with no undesirable complaints.
>
> ok. you have given good examples which make me revert my
> proposal, i.e. continue to not erroring about "too many"
arguments.
>   
i did not say i supported that view, however.  it was just an example
where *a* developer might wish sprintf did not complain about wrong
number of arguments.  examples to the opposite effect can easily be
given, but that's not what you asked about.

>     >>> but they *are* evaluated nevertheless, e.g.:
>     >>> 
>     >>> sprintf('%d', 0, {print(1)}) # "1" # [1]
"0"
>     >>> 
>     >>> it might be a good idea to document this behaviour.
>
>    MM> actually I think it should be changed to be more strict
>    MM> (see above).
>
> as a matter of fact, and the result of many more examples,
> I've changed my oppinion and now agree with your original
> proposal:
>
> I've just commmited another sprintf() patch which (among more
> more important changes) *documents* that all arguments of
> sprintf() are evaluated; this actually already entails that
> empty / missing arguments are not allowed.
>   
excellent, thanks.

vQ

Maybe Matching Threads

Search for more seemingly similar threads

R devel - Apr 2009 - incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Maybe Matching Threads