waku at idi.ntnu.no
2009-Apr-21 11:05 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
Full_Name: Wacek Kusnierczyk Version: 2.10.0 r48365 OS: Ubuntu 8.04 Linux 32bit Submission from: (NULL) (129.241.110.141) sprintf has a documented limit on strings included in the output using the format '%s'. It appears that there is a limit on the length of strings included with, e.g., the format '%d' beyond which surprising things happen (output modified for conciseness): gregexpr('1', sprintf('%9000d', 1)) # [1] 9000 9801 gregexpr('1', sprintf('%9000d', 1)) # [1] 9000 9801 10602 gregexpr('1', sprintf('%9000d', 1)) # [1] 9000 9801 10602 11403 gregexpr('1', sprintf('%9000d', 1)) # [1] 9000 9801 10602 11403 12204 ... Note that not only more than one '1' is included in the output, but also that the same functional expression (no side effects used beyond the interface) gives different results on each execution. Analogous behaviour can be observed with '%nd' where n > 8200. The actual output above is consistent across separate sessions. With sufficiently large field width values, R segfaults: sprintf('%*d', 10^5, 1) # *** caught segfault *** # address 0xbfcfc000, cause 'memory not mapped' # Segmentation fault sessionInfo() # R version 2.10.0 Under development (unstable) (2009-04-20 r48365) # i686-pc-linux-gnu
maechler at stat.math.ethz.ch
2009-Apr-22 14:10 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
>>>>> "vQ" == Wacek Kusnierczyk <waku at idi.ntnu.no> >>>>> on Tue, 21 Apr 2009 13:05:11 +0200 (CEST) writes:vQ> Full_Name: Wacek Kusnierczyk vQ> Version: 2.10.0 r48365 vQ> OS: Ubuntu 8.04 Linux 32bit vQ> Submission from: (NULL) (129.241.110.141) vQ> sprintf has a documented limit on strings included in the output using the vQ> format '%s'. It appears that there is a limit on the length of strings included vQ> with, e.g., the format '%d' beyond which surprising things happen (output vQ> modified for conciseness): vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 12204 vQ> ... vQ> Note that not only more than one '1' is included in the output, but also that vQ> the same functional expression (no side effects used beyond the interface) gives vQ> different results on each execution. Analogous behaviour can be observed with vQ> '%nd' where n > 8200. vQ> The actual output above is consistent across separate sessions. vQ> With sufficiently large field width values, R segfaults: vQ> sprintf('%*d', 10^5, 1) vQ> # *** caught segfault *** vQ> # address 0xbfcfc000, cause 'memory not mapped' vQ> # Segmentation fault Thank you, Wacek. That's all ``interesting'' ... unfortunately, my version of 'man 3 sprintf' contains>> BUGS >> Because sprintf() and vsprintf() assume an arbitrarily >> long string, callers must be careful not to overflow the >> actual space; this is often impossible to assure. Note >> that the length of the strings produced is >> locale-dependent and difficult to predict. Use >> snprintf() and vsnprintf() instead (or asprintf() and vasprintf).(note the "impossible" part above) and we haven't used snprintf() yet, probably because it requires the C99 C standard, and AFAIK, we have only relatively recently started to more or less rely on C99 in the R sources. More precisely, I see that some windows-only code relies on snprintf() being available whereas in at least on non-Windows section, I read /* we cannot assume snprintf here */ Now such platform dependency issues and corresponding configure settings I do typically leave to other R-corers with a much wider overview about platforms and their compilers and C libraries. BTW, 1) sprintf("%n %g", 1,1) also seg.faults 2) Did you have a true use case where the 8192 limit was an undesirable limit? Martin vQ> sessionInfo() vQ> # R version 2.10.0 Under development (unstable) (2009-04-20 r48365) vQ> # i686-pc-linux-gnu
maechler at stat.math.ethz.ch
2009-Apr-23 10:40 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes:vQ> maechler at stat.math.ethz.ch wrote: >> vQ> sprintf has a documented limit on strings included in the output using the vQ> format '%s'. It appears that there is a limit on the length of strings included vQ> with, e.g., the format '%d' beyond which surprising things happen (output vQ> modified for conciseness): >> vQ> ... and this limit is *not* documented. well, it is basically (+ a few bytes ?) the same 8192 limit that *is* documented. vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 >> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 >> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 >> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 12204 >> vQ> ... >> vQ> Note that not only more than one '1' is included in the output, but also that vQ> the same functional expression (no side effects used beyond the interface) gives vQ> different results on each execution. Analogous behaviour can be observed with vQ> '%nd' where n > 8200. >> vQ> The actual output above is consistent across separate sessions. >> vQ> With sufficiently large field width values, R segfaults: >> vQ> sprintf('%*d', 10^5, 1) vQ> # *** caught segfault *** vQ> # address 0xbfcfc000, cause 'memory not mapped' vQ> # Segmentation fault >> >> >> Thank you, Wacek. >> That's all ``interesting'' ... unfortunately, >> >> my version of 'man 3 sprintf' contains >> >> >>>> BUGS >>>> Because sprintf() and vsprintf() assume an arbitrarily >>>> long string, callers must be careful not to overflow the >>>> actual space; this is often impossible to assure. Note >>>> that the length of the strings produced is >>>> locale-dependent and difficult to predict. Use >>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf). vQ> yes, but this is c documentation, not r documentation. Of course! ... and I *do* apply it to R's C code [sprintf.c] and hence am even concurring with you .. vQ> it's applicable vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper for the vQ> C function 'sprintf'". however, in c you use a buffer and you usually vQ> have control over it's capacity, while in r this is a hidden vQ> implementational detail, which should not be visible to the user, or vQ> should cause an attempt to overflow the buffer to fail more gracefully vQ> than with a segfault. vQ> in r, sprintf('%9000d', 1) will produce a confused output with a count vQ> of 1's variable (!) across runs (while sprintf('%*d', 9000, 1) seems to vQ> do fine): vQ> gregexpr('1', sprintf('%*d', 9000, 1)) vQ> # [1] 9000 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 ..., variable across executions vQ> on one execution in a series i actually got this: vQ> Warning message: vQ> In gregexpr("1", sprintf("%9000d", 1)) : vQ> input string 1 is invalid in this locale vQ> while the very next execution, still in the same session, gave vQ> # [1] 9000 9801 10602 vQ> with sprintf('%*d', 10000, 1) i got segfaults on some executions but vQ> correct output on others, while sprintf('%10000d', 1) is confused again. >> (note the "impossible" part above) >> vQ> yes, but it does also say "must be careful", and it seems that someone vQ> has not been careful enough. >> and we haven't used snprintf() yet, probably because it >> requires the C99 C standard, and AFAIK, we have only relatively >> recently started to more or less rely on C99 in the R sources. >> vQ> while snprintf would help avoid buffer overflow, it may not be a vQ> solution to the issue of confused output. I think it would / will. We would be able to give warnings and errors, by checking the snprintf() return codes. >> More precisely, I see that some windows-only code relies on >> snprintf() being available whereas in at least on non-Windows >> section, I read /* we cannot assume snprintf here */ >> >> Now such platform dependency issues and corresponding configure >> settings I do typically leave to other R-corers with a much >> wider overview about platforms and their compilers and C libraries. >> vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that vQ> the bug could be repaired in a platform-independent manner. definitely. In the mean time, I've actually found that what I first said on the usability of snprintf() in R's code base was only partly correct. There are other parts of R code where we use snprintf() for all platforms, hence we rely on its presence (and correct implementation!) and so we can and I think should use it in place of sprintf() in quite a few places inside R's sprintf.c >> BTW, >> 1) sprintf("%n %g", 1,1) also seg.faults >> vQ> as do vQ> sprintf('%n%g', 1, 1) vQ> sprintf('%n%') vQ> etc., while vQ> sprintf('%q%g', 1, 1) vQ> sprintf('%q%') vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as vQ> a format specifier, so the output from the first two above should be as vQ> from the last two above, respectively. (and likewise in the %S case, vQ> discussed and bug-reported earlier.) I have now fixed these bugs at least; the more subtle "%<too_large_n>d" ones are different, and as I said, I'm convinced that a nice & clean fix for those will start using snprintf(). >> 2) Did you have a true use case where the 8192 limit was an >> undesirable limit? vQ> how does it matter? well, we could increase it, if it did matter. {{ you *could* have been more polite here, no? it *was* after all a serious question that I asked! }} vQ> if you set a limit, be sure to consistently enforce vQ> it and warn the user on attempts to exceed it. or write clearly in the vQ> docs that such attempts will cause the output to be silently truncated. Sure, I'm not at all disagreeing on that, and if you read this into my posting, you misunderstand. Martin vQ> examples such as vQ> sprintf('%9000d', 1) vQ> do not contribute to the reliability of r, and neither to the user's vQ> confidence in it. vQ> vQ
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
2009-Apr-23 13:05 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
Martin Maechler wrote:>>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes: >>>>>> > > vQ> maechler at stat.math.ethz.ch wrote: > >> > vQ> sprintf has a documented limit on strings included in the output using the > vQ> format '%s'. It appears that there is a limit on the length of strings included > vQ> with, e.g., the format '%d' beyond which surprising things happen (output > vQ> modified for conciseness): > >> > > vQ> ... and this limit is *not* documented. > > well, it is basically (+ a few bytes ?) > the same 8192 limit that *is* documented. >martin, ?sprintf says: " There is a limit of 8192 bytes on elements of 'fmt' and also on strings included by a '%s' conversion specification." for me, it's clear that the *elements of fmt* cannot be longer than 8192 bytes, and that each single bit included in the output in place of a %s cannot be longer than 8192. nowhere does it say that strings included in the output in place of a %d, for example, cannot be longer than 8192. the fact that %s is particularized makes me infer that there is something specific to %s that does not apply to %d, for example, otherwise the help would have been formulated differently. (though given how r help pages are written, nothing seems unlikely.) and in fact, the limit does not seem to apply in an obvious way in cases such as sprintf('%*d', 10000, 1), where the output is correct. at the very least, the documentation leaves the user ignorant as to what will happen if the limit is exceeded.> >> my version of 'man 3 sprintf' contains > >> > >> > >>>> BUGS > >>>> Because sprintf() and vsprintf() assume an arbitrarily > >>>> long string, callers must be careful not to overflow the > >>>> actual space; this is often impossible to assure. Note > >>>> that the length of the strings produced is > >>>> locale-dependent and difficult to predict. Use > >>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf). > > vQ> yes, but this is c documentation, not r documentation. > > Of course! ... and I *do* apply it to R's C code [sprintf.c] > and hence am even concurring with you .. > > > vQ> while snprintf would help avoid buffer overflow, it may not be a > vQ> solution to the issue of confused output. > > I think it would / will. We would be able to give warnings and > errors, by checking the snprintf() return codes. >maybe, i can't judge without carefully examining the code for sprintf.c (which i am rather unwilling to do, having had a look at a sample).> > >> More precisely, I see that some windows-only code relies on > >> snprintf() being available whereas in at least on non-Windows > >> section, I read /* we cannot assume snprintf here */ > >> > >> Now such platform dependency issues and corresponding configure > >> settings I do typically leave to other R-corers with a much > >> wider overview about platforms and their compilers and C libraries. > >> > > vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that > vQ> the bug could be repaired in a platform-independent manner. > > definitely. > In the mean time, I've actually found that what I first said on > the usability of snprintf() in R's code base was only partly correct. > There are other parts of R code where we use snprintf() for all > platforms, hence we rely on its presence (and correct > implementation!) and so we can and I think should use it in > place of sprintf() in quite a few places inside R's sprintf.c > >would be interesting to see how this improves sprintf.> >> BTW, > >> 1) sprintf("%n %g", 1,1) also seg.faults > >> > > vQ> as do > > vQ> sprintf('%n%g', 1, 1) > vQ> sprintf('%n%') > > vQ> etc., while > > vQ> sprintf('%q%g', 1, 1) > vQ> sprintf('%q%') > > vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as > vQ> a format specifier, so the output from the first two above should be as > vQ> from the last two above, respectively. (and likewise in the %S case, > vQ> discussed and bug-reported earlier.) > > I have now fixed these bugs at least; >great, i'm going to torture the fix soon ;)> the more subtle "%<too_large_n>d" ones are different, and > as I said, I'm convinced that a nice & clean fix for those will > start using snprintf(). > > >> 2) Did you have a true use case where the 8192 limit was an > >> undesirable limit? > > vQ> how does it matter? > > well, we could increase it, if it did matter. > {{ you *could* have been more polite here, no? >i don't see how i could be more polite here, i had absolutely no intention to be impolite and didn't think i were. i gave a serious answer by means of a serious question. increasing an arbitrary, poorly documented limit of obscure effect is hardly any solution. suggesting that a bug is not a bug because some limit is not likely to be exceeded in practice is not a particularly good idea.> it *was* after all a serious question that I asked! }} > > vQ> if you set a limit, be sure to consistently enforce > vQ> it and warn the user on attempts to exceed it. or write clearly in the > vQ> docs that such attempts will cause the output to be silently truncated. > > Sure, I'm not at all disagreeing on that, and if you read this into my > posting, you misunderstand. >no, i didn't read that into your posting, i'm just referring to the state of the 'art' in r. cheers, vQ
maechler at stat.math.ethz.ch
2009-Apr-23 13:15 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>> on Thu, 23 Apr 2009 15:00:29 +0200 writes:vQ> Martin Maechler wrote: >>>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes: [......................] [......................] >> >> BTW, >> >> 1) sprintf("%n %g", 1,1) also seg.faults >> >> >> vQ> as do >> vQ> sprintf('%n%g', 1, 1) vQ> sprintf('%n%') >> vQ> etc., while >> vQ> sprintf('%q%g', 1, 1) vQ> sprintf('%q%') >> vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as vQ> a format specifier, so the output from the first two above should be as vQ> from the last two above, respectively. (and likewise in the %S case, vQ> discussed and bug-reported earlier.) >> >> I have now fixed these bugs at least; >> vQ> great, i'm going to torture the fix soon ;) there will be another one, still today, fixing the sprintf("%s", tryCatch(stop(), error=identity)) bug {which actually *is* a subtle, too} >> the more subtle "%<too_large_n>d" ones are different, and >> as I said, I'm convinced that a nice & clean fix for those will >> start using snprintf(). >> >> >> 2) Did you have a true use case where the 8192 limit was an >> >> undesirable limit? >> vQ> how does it matter? >> >> well, we could increase it, if it did matter. >> {{ you *could* have been more polite here, no? >> vQ> i don't see how i could be more polite here, i had absolutely no vQ> intention to be impolite and didn't think i were. vQ> i gave a serious answer by means of a serious question. increasing an vQ> arbitrary, poorly documented limit of obscure effect is hardly any vQ> solution. suggesting that a bug is not a bug because some limit is not vQ> likely to be exceeded in practice is not a particularly good idea. But that's exactly what I did NOT suggest!! It was a serious question, *related* to your bug report, but *NOT* really on the bug report proper. [It *was* under the "heading" of 'BTW' which I assumed you knew to interpret] I was seriously asking, if BTW, the limit which is there was possibly to be increased or not... >> it *was* after all a serious question that I asked! }} >> vQ> if you set a limit, be sure to consistently enforce vQ> it and warn the user on attempts to exceed it. or write clearly in the vQ> docs that such attempts will cause the output to be silently truncated. >> >> Sure, I'm not at all disagreeing on that, and if you read this into my >> posting, you misunderstand. >> vQ> no, i didn't read that into your posting, i'm just referring to the vQ> state of the 'art' in r. [not so funny... yet another "very polite" assumption.] vQ> cheers, vQ> vQ
maechler at stat.math.ethz.ch
2009-Apr-24 10:45 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Thu, 23 Apr 2009 12:40:22 +0200 (CEST) writes:>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes:vQ> maechler at stat.math.ethz.ch wrote: >>> vQ> sprintf has a documented limit on strings included in the output using the vQ> format '%s'. It appears that there is a limit on the length of strings included vQ> with, e.g., the format '%d' beyond which surprising things happen (output vQ> modified for conciseness): >>> vQ> ... and this limit is *not* documented. MM> well, it is basically (+ a few bytes ?) MM> the same 8192 limit that *is* documented. indeed, I was right with that.. vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 12204 >>> vQ> ... >>> vQ> Note that not only more than one '1' is included in the vQ> output, but also that the same functional expression (no vQ> side effects used beyond the interface) gives different vQ> results on each execution. Analogous behaviour can be vQ> observed with '%nd' where n > 8200. vQ> The actual output above is consistent across separate sessions. >>> vQ> With sufficiently large field width values, R segfaults: >>> vQ> sprintf('%*d', 10^5, 1) vQ> # *** caught segfault *** vQ> # address 0xbfcfc000, cause 'memory not mapped' vQ> # Segmentation fault >>> >>> >>> Thank you, Wacek. >>> That's all ``interesting'' ... unfortunately, >>> >>> my version of 'man 3 sprintf' contains >>> >>> >>>>> BUGS >>>>> Because sprintf() and vsprintf() assume an arbitrarily >>>>> long string, callers must be careful not to overflow the >>>>> actual space; this is often impossible to assure. Note >>>>> that the length of the strings produced is >>>>> locale-dependent and difficult to predict. Use >>>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf). vQ> yes, but this is c documentation, not r documentation. MM> Of course! ... and I *do* apply it to R's C code [sprintf.c] MM> and hence am even concurring with you .. vQ> it's applicable vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper for the vQ> C function 'sprintf'". however, in c you use a buffer and you usually vQ> have control over it's capacity, while in r this is a hidden vQ> implementational detail, which should not be visible to the user, or vQ> should cause an attempt to overflow the buffer to fail more gracefully vQ> than with a segfault. vQ> in r, sprintf('%9000d', 1) will produce a confused output with a count vQ> of 1's variable (!) across runs (while sprintf('%*d', 9000, 1) seems to vQ> do fine): vQ> gregexpr('1', sprintf('%*d', 9000, 1)) vQ> # [1] 9000 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 ..., variable across executions vQ> on one execution in a series i actually got this: vQ> Warning message: vQ> In gregexpr("1", sprintf("%9000d", 1)) : vQ> input string 1 is invalid in this locale vQ> while the very next execution, still in the same session, gave vQ> # [1] 9000 9801 10602 vQ> with sprintf('%*d', 10000, 1) i got segfaults on some executions but vQ> correct output on others, while sprintf('%10000d', 1) is confused again. >>> (note the "impossible" part above) >>> vQ> yes, but it does also say "must be careful", and it seems that someone vQ> has not been careful enough. >>> and we haven't used snprintf() yet, probably because it >>> requires the C99 C standard, and AFAIK, we have only relatively >>> recently started to more or less rely on C99 in the R sources. >>> vQ> while snprintf would help avoid buffer overflow, it may not be a vQ> solution to the issue of confused output. MM> I think it would / will. We would be able to give warnings and MM> errors, by checking the snprintf() return codes. My current working code gives an error for all the above examples, e.g., > sprintf('%9999d', 1) Error in sprintf("%9999d", 1) : required resulting string length 9999 is > maximal 8191 it passes 'make check-devel' and I am inclined to commit that code to R-devel (e.g. tomorrow). Yes, the documentation will also have to be amended, but apart from that, would people see a big problem with the "8192" limit which now is suddenly of greater importance {{as I said all along; hence my question to Wacek (and the R-develers) if anybody found that limit too low}} Of course, one could realloc()ate longer strings when needed inside R's sprintf() code, but I reluctant to do that (render the code yet more complicated ...) if nobody sees a need. Martin >>> More precisely, I see that some windows-only code relies on >>> snprintf() being available whereas in at least on non-Windows >>> section, I read /* we cannot assume snprintf here */ >>> >>> Now such platform dependency issues and corresponding configure >>> settings I do typically leave to other R-corers with a much >>> wider overview about platforms and their compilers and C libraries. >>> vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that vQ> the bug could be repaired in a platform-independent manner. MM> definitely. MM> In the mean time, I've actually found that what I first said on MM> the usability of snprintf() in R's code base was only partly correct. MM> There are other parts of R code where we use snprintf() for all MM> platforms, hence we rely on its presence (and correct MM> implementation!) and so we can and I think should use it in MM> place of sprintf() in quite a few places inside R's sprintf.c >>> BTW, >>> 1) sprintf("%n %g", 1,1) also seg.faults >>> vQ> as do vQ> sprintf('%n%g', 1, 1) vQ> sprintf('%n%') vQ> etc., while vQ> sprintf('%q%g', 1, 1) vQ> sprintf('%q%') vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as vQ> a format specifier, so the output from the first two above should be as vQ> from the last two above, respectively. (and likewise in the %S case, vQ> discussed and bug-reported earlier.) MM> I have now fixed these bugs at least; MM> the more subtle "%<too_large_n>d" ones are different, and MM> as I said, I'm convinced that a nice & clean fix for those will MM> start using snprintf(). >>> 2) Did you have a true use case where the 8192 limit was an >>> undesirable limit? vQ> how does it matter? MM> well, we could increase it, if it did matter. MM> {{ you *could* have been more polite here, no? MM> it *was* after all a serious question that I asked! }} vQ> if you set a limit, be sure to consistently enforce vQ> it and warn the user on attempts to exceed it. or write clearly in the vQ> docs that such attempts will cause the output to be silently truncated. MM> Sure, I'm not at all disagreeing on that, and if you read this into my MM> posting, you misunderstand. MM> Martin [..........................]
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
2009-Apr-25 17:40 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
Martin Maechler wrote:> >>> MM> well, it is basically (+ a few bytes ?) >>> MM> the same 8192 limit that *is* documented. >>> >>> indeed, I was right with that.. >>> >>> >> hmm, i'd guess this limit is valid for all strings included in the >> output with any format? not just %s (and, as it appears, undocumentedly >> %d)? >> > > yes. >so it's perhaps easiest to change " There is a limit of 8192 bytes on elements of 'fmt' and also on strings included by a '%s' conversion specification." to sth like " There is a limit of 8192 bytes on elements of 'fmt' and also on strings included in the output by any conversion specification." it's in fact so easy that even i should be able to do it. [1 minute later...] i see you've fixed this one, too.>> btw. (i do know what that means ;)), after your recent fix: >> >> sprintf('%q%s', 1) >> # Error in sprintf("%q%s", 1) : >> # use format %f, %e, %g or %a for numeric objects >> >> sprintf('%s', 1) >> # [1] "1" >> >> you may want to add '%s' (and '%x', and ...) to the error message. or >> perhaps make it say sth like 'invalid format: ...'. the problem is not >> that %q is not applicable to numeric, but that it is not a valid format >> at all. >> > > yes. As a matter of fact, "%q%s" is dealt with as *one* format > chunk, since "%q" is > not a valid format. > The code I have just committed now gives a longer erro message, that > should be more helpful. >yes, but sptinf('%q%s', 1) still suggests that one uses %{f,e,g,a} for numerics, while %s is pretty much valid, too. you see, in c sprintf(buffer, "%s", 1) is destined to cause a segfault, but in r it works -- so the error message is slightly misleading, as it suggests %s is *not* valid for numerics.> Thank you for the suggestion! >yo welcum> >> there's also an issue with the additional arguments supplied after the >> format: any superfluous arguments are ignored (this is not documented, >> as far as i can see), >> > > I think we should signal an error if there are too many arguments. >agree. but it might be more complex than it appears: sprintf('%3$s', 1, 2, 3) should *not* complain about too many args, despite just one conversion spec in the format. interestingly, sprintf('%3$s', , , 3) # error: argument is missing, with no default> Could anyone think of a case where the current behavior is desirable ? >well, one scenario might be that one wants to print a collection of items with an arbitrary format, supplied by the users, e.g. foo = function(fmt) { a = ... b = ... ... s = sprintf(fmt, a, b, ...) ... } without having to examine the format to establish which values are needed. in the current state, sprintf would use those it would need to use with a particular format, with no undesirable complaints.> >> but they *are* evaluated nevertheless, e.g.: >> >> sprintf('%d', 0, {print(1)}) >> # "1" >> # [1] "0" >> >> it might be a good idea to document this behaviour. >> > > actually I think it should be changed to be more strict (see above). >strict in which sense? enforce a constraint on the number of arguments to that needed by a specific format? or do you mean evaluation of only those arguments that are needed in a format? or both? what about: sprintf('%2$s', {print(1)}, 2) # too many arguments? # should 1 be printed? sprintf('%2$s', , 2) # too few arguments? # missing value? (yes, sprintf is .Internal, but...)> Thank you for the constructive feedback! >not much to thank for... certainly, it's the first time my feedback is called 'constructive'. i'm making progress, am i not? btw., thank you for the fixes. i appreciate your efforts a lot. best, vQ
maechler at stat.math.ethz.ch
2009-Apr-27 14:50 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
>>>>> "vQ" == Waclaw Marcin Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> >>>>> on Sat, 25 Apr 2009 19:40:27 +0200 (CEST) writes:vQ> Martin Maechler wrote: >> MM> well, it is basically (+ a few bytes ?) the same 8192 MM> limit that *is* documented. >>>> >>>> indeed, I was right with that.. >>>> >>>> >>> hmm, i'd guess this limit is valid for all strings >>> included in the output with any format? not just %s >>> (and, as it appears, undocumentedly %d)? >>> >> >>> btw. (i do know what that means ;)), after your recent >>> fix: >>> >>> sprintf('%q%s', 1) # Error in sprintf("%q%s", 1) : # use >>> format %f, %e, %g or %a for numeric objects >>> >>> sprintf('%s', 1) # [1] "1" >>> >>> you may want to add '%s' (and '%x', and ...) to the >>> error message. or perhaps make it say sth like 'invalid >>> format: ...'. the problem is not that %q is not >>> applicable to numeric, but that it is not a valid format >>> at all. >>> >>> yes. As a matter of fact, "%q%s" is dealt with as *one*>> format chunk, since "%q" is not a valid format. The code >> I have just committed now gives a longer erro message, >> that should be more helpful. >> vQ> yes, but vQ> sptinf('%q%s', 1) vQ> still suggests that one uses %{f,e,g,a} for numerics, vQ> while %s is pretty much valid, too. you see, in c vQ> sprintf(buffer, "%s", 1) is destined to cause a vQ> segfault, but in r it works -- so the error message is vQ> slightly misleading, as it suggests %s is *not* valid vQ> for numerics. yes, but only the error message somewhat suggests that; at the moment, I'd like to keep it, since really the user *should* think of using the number formats, rather than %s {which just calls as.character(.)} for numeric arguments >> Thank you for the suggestion! >> vQ> yo welcum >> >>> there's also an issue with the additional arguments >>> supplied after the format: any superfluous arguments are >>> ignored (this is not documented, as far as i can see), >>> >> MM> I think we should signal an error if there are too many arguments. vQ> agree. but it might be more complex than it appears: vQ> sprintf('%3$s', 1, 2, 3) vQ> should *not* complain about too many args, despite just vQ> one conversion spec in the format. very good point; thanks! vQ> Interestingly, vQ> sprintf('%3$s', , , 3) # error: argument is missing, vQ> with no default yes, empty (aka "missing") arguments are not allowed in sprintf(). >> Could anyone think of a case where the current behavior >> is desirable ? >> vQ> well, one scenario might be that one wants to print a collection of vQ> items with an arbitrary format, supplied by the users, vQ> e.g. vQ> foo = function(fmt) { a = ... b = ... ... s vQ> sprintf(fmt, a, b, ...) ... } vQ> without having to examine the format to establish which vQ> values are needed. in the current state, sprintf would vQ> use those it would need to use with a particular format, vQ> with no undesirable complaints. ok. you have given good examples which make me revert my proposal, i.e. continue to not erroring about "too many" arguments. >>> but they *are* evaluated nevertheless, e.g.: >>> >>> sprintf('%d', 0, {print(1)}) # "1" # [1] "0" >>> >>> it might be a good idea to document this behaviour. MM> actually I think it should be changed to be more strict MM> (see above). as a matter of fact, and the result of many more examples, I've changed my oppinion and now agree with your original proposal: I've just commmited another sprintf() patch which (among more more important changes) *documents* that all arguments of sprintf() are evaluated; this actually already entails that empty / missing arguments are not allowed. vq> strict in which sense? enforce a constraint on the vQ> number of arguments to that needed by a specific format? vQ> or do you mean evaluation of only those arguments that vQ> are needed in a format? or both? vQ> what about: vQ> sprintf('%2$s', {print(1)}, 2) # too many arguments? vQ> # should 1 be printed? vQ> sprintf('%2$s', , 2) # too few arguments? # missing vQ> value? (yes, sprintf is .Internal, but...) >> Thank you for the constructive feedback! >> vQ> not much to thank for... certainly, it's the first time my feedback is vQ> called 'constructive'. i'm making progress, am i not? indeed! :-) Martin Maechler, ETH Zurich vQ> btw., thank you for the fixes. i appreciate your vQ> efforts a lot. vQ> best, vQ
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
2009-Apr-27 19:15 UTC
[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
Martin Maechler wrote:> > vQ> sptinf('%q%s', 1) > > > vQ> still suggests that one uses %{f,e,g,a} for numerics, > vQ> while %s is pretty much valid, too. you see, in c > vQ> sprintf(buffer, "%s", 1) is destined to cause a > vQ> segfault, but in r it works -- so the error message is > vQ> slightly misleading, as it suggests %s is *not* valid > vQ> for numerics. > > yes, but only the error message somewhat suggests that; > at the moment, I'd like to keep it, since really the user > *should* think of using the number formats, rather than %s > {which just calls as.character(.)} for numeric arguments >then maybe sprintf('%s', 1) should complain about a format-argument mismatch? in c, 1 would be taken to be an address at which a string starts, but in r you do not have pointers, so this interpretation is impossible. if the user *should* use number formats for numerics, %s should not work. smells lack of design.> MM> I think we should signal an error if there are too many arguments. > > vQ> agree. but it might be more complex than it appears: > > vQ> sprintf('%3$s', 1, 2, 3) > > vQ> should *not* complain about too many args, despite just > vQ> one conversion spec in the format. > > very good point; thanks! > > vQ> Interestingly, > > vQ> sprintf('%3$s', , , 3) # error: argument is missing, > vQ> with no default > > yes, empty (aka "missing") arguments are not allowed in sprintf(). >be nice and document such items, pliz.... even if all internal functions have this behavour (do they?), ?sprintf does not say that sprintf is internal, or that the r wrapper calls an internal so that all arguments are necessarily evaluated.> >> Could anyone think of a case where the current behavior > >> is desirable ? > >> > > vQ> well, one scenario might be that one wants to print a collection of > vQ> items with an arbitrary format, supplied by the users, > vQ> e.g. > > vQ> foo = function(fmt) { a = ... b = ... ... s > vQ> sprintf(fmt, a, b, ...) ... } > > vQ> without having to examine the format to establish which > vQ> values are needed. in the current state, sprintf would > vQ> use those it would need to use with a particular format, > vQ> with no undesirable complaints. > > ok. you have given good examples which make me revert my > proposal, i.e. continue to not erroring about "too many" arguments. >i did not say i supported that view, however. it was just an example where *a* developer might wish sprintf did not complain about wrong number of arguments. examples to the opposite effect can easily be given, but that's not what you asked about.> >>> but they *are* evaluated nevertheless, e.g.: > >>> > >>> sprintf('%d', 0, {print(1)}) # "1" # [1] "0" > >>> > >>> it might be a good idea to document this behaviour. > > MM> actually I think it should be changed to be more strict > MM> (see above). > > as a matter of fact, and the result of many more examples, > I've changed my oppinion and now agree with your original > proposal: > > I've just commmited another sprintf() patch which (among more > more important changes) *documents* that all arguments of > sprintf() are evaluated; this actually already entails that > empty / missing arguments are not allowed. >excellent, thanks. vQ