wdunlap at tibco.com
2008-Nov-14 18:25 UTC
[Rd] (PR#13283) R crashes on sprintf with bad format specification
> From: r-devel-bounces at r-project.org=20 > [mailto:r-devel-bounces at r-project.org] On Behalf Of Prof Brian Ripley > Sent: Friday, November 14, 2008 2:25 AM > To: Duncan Murdoch > Cc: R-bugs at r-project.org; ocheyett at bonddesk.com;=20 > r-devel at stat.math.ethz.ch > Subject: Re: [Rd] (PR#13283) R crashes on sprintf with bad=20 > format specification >=20 > As R's sprintf is a wrapper for the OS's sprintf, misuse does=20 > run the risk of crashing from OS, and when it does the error=20 > will come from the implementation of sprintf (which for R for=20 > Windows is the Trio library). > One could argue that the OS service should not segfault on=20 > incorrect input, but this one often does. >=20 > The place to add checks is in the R wrapper code (presumably=20 > in the C component). We do have checks there, but it is hard=20 > to imagine just what errors users will make, and this one has=20 > slipped through the checks.The following change recognizes 'S' as a format type (but doesn't do anything with it) and thus avoids the crash. =20 --- sprintf.c (revision 46944) +++ sprintf.c (working copy) @@ -103,7 +103,7 @@ else { /* recognise selected types from Table B-1 of K&R */ /* This is MBCS-OK, as we are in a format spec */ - chunk =3D strcspn(formatString + cur + 1, "aAdisfeEgGxX%") + 2; + chunk =3D strcspn(formatString + cur + 1, "aAdisSfeEgGxX%") + 2; if (cur + chunk > n) error(_("unrecognised format at end of string")); The resulting error messages are not terribly direct, but nicer than the crash > sprintf("A %S %S %S XYZ", 1, 1, 1) Error in sprintf("A %S %S %S XYZ", 1, 1, 1) : use format %f, %e, %g or %a for numeric objects > sprintf("A %S %S %S XYZ", "foo", "bar", "foo") Error in sprintf("A %S %S %S XYZ", "foo", "bar", "foo") : use format %s for character objects One could add 'S' to some switch cases below there to have it convert the corresponding argument to a wide character string or say directly that %S is not allowed. This piece of code finds the conversion type by looking for this first of a bunch of known conversion characters after the %, which doesn't find lots of illegal formats and causes some surprises. E.g., because 'p' is not in that list we get: > sprintf("%p", 3) Error in sprintf("%p", 3) : unrecognised format at end of string > sprintf("%p is a pointer!", 3) [1] "0x3 is a pointer!" > sprintf("%p %o", 17L, 17L) [1] "0x4 o"> Here the issue is the use of %S, an unknown format for=20 > numbers, followed by 'X'. There is a check for a suitable=20 > format, but it is not strict enough. And it is non-trivial=20 > to write a printf format parser just to check for user error=20 > (and potentially it will slow down correct uses:=20 > speed in sprintf is important in some applications). >=20 > So I am not sure this is something that should be R's=20 > responsibility to > fix: maybe just add a warning to the help page? >=20 >=20 > On Thu, 13 Nov 2008, Duncan Murdoch wrote: >=20 > > On 12/11/2008 8:30 PM, ocheyett at bonddesk.com wrote: > >> Full_Name: Oren Cheyette > >> Version: 2.7.2 > >> OS: Win XP > >> Submission from: (NULL) (64.161.123.194) > >>=20 > >>=20 > >> Enter the following at the R command prompt: > >>> sprintf("A %S %S %S XYZ", 1, 1, 1); >=20 > R is not C, and the empty command at the end of that line (after the=20 > semicolon) is not relevant. >=20 > >> Note the erroneous capitalized %S instead of %s and the=20 > numeric inputs=20 > >> instead > >> of strings. With strings there's no crash - R reports bad format > >> specifications. > > > > 2.7.2 is obsolete, but I can confirm a crash on Windows=20 > with a recent=20 > > R-devel. > > > > The last few entries in the stack dump at the time of the=20 > crash are shown=20 > > below; these make it look as though the problem is in the=20 > Trio library, so it=20 > > may be hard to fix. > > > > Duncan Murdoch > > > > > > Rgui.exe caused an Access Violation at location 6c913fb3 in=20 > module R.dll=20 > > Reading from location 00000001. > > > > Registers: > > eax=3D7fffffff ebx=3D00000000 ecx=3D00e17b21 edx=3D00000001=20 > esi=3D00e1c83b edi=3D0000000a > > eip=3D6c913fb3 esp=3D00e17944 ebp=3D00e17b3c iopl=3D0 nv up=20 > ei pl nz na po nc > > cs=3D001b ss=3D0023 ds=3D0023 es=3D0023 fs=3D003b gs=3D0000 efl=3D00000206 > > > > Call stack: > > 6C913FB3 R.dll:6C913FB3 TrioFormatProcess trio.c:2724 > > > > ... > > while (length > 0) > > { > >> size =3D TrioWriteWideStringCharacter(self,=20 > *wstring++, flags,=20 > > length); > > if (size =3D=3D 0) > > break; /* while */ > > ... > > > > 6C916592 R.dll:6C916592 trio_vsprintf trio.c:3771 > > > > ... > > return status; > > > >> status =3D TrioFormatProcess(&data, format, parameters); > > if (data.error !=3D 0) > > { > > ... > > > > 6C911F62 R.dll:6C911F62 sprintf compat.c:46 > > > > ... > > va_end(ap); > > return res; > >> } > > > > > > ... > > > > 6C889F1E R.dll:6C889F1E do_sprintf sprintf.c:297 > > > > ... > > sprintf(bit, fmtp, " NaN"); > > else > >> sprintf(bit, fmtp, "NaN"); > > } else if (x =3D=3D R_PosInf) { > > if (strcspn(fmtp, "+") < strlen(fmtp)) > > ... > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > >=20 > --=20 > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >=20 > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >=20