Hello, We have identified functions in LLVM sources using a static code analyzer which are marked as a "security vulnerability"[1][2]. There has been work already done to address some of them for Linux (e.g. snprintf). We are attempting to solve this issue in a comprehensive fashion across all platforms. Most of the functions identified are for manipulating strings. Memcpy is the most commonly used of all these unsecure methods. The following table lists all these functions are their recommended secure alternatives. Recommended alternatives: Functions Windows Unix/Mac OS Memcpy memcpy_s - Sprint sprintf_s snprintf Sscanf scanf_s - _alloca _malloca - Strcat strcat_s strlcat Strcpy strcpy_s strlcpy Strtok strtok_s - The proposal is to add secure versions of these functions. These functions will be implemented in LLVM Support module and be used by all other LLVM modules. The interface of these methods will be platform independent while their implementation will be platform specific (like the Mutex class in Support module). In cases where the platform does not support the functionality natively, we are writing an implementation of these functions. For example, in the case of memcpy the secure function will look like llvm::memcpy_secure. Some secure functions require additional data that needs to be passed (like buffer sizes). That information has to be added in all places of invocation. In some cases, this requires an extra size_t argument to be passed through. Hence, this change would not just be a one to one function refactoring. The attached patch helps illustrate how an instance of memcpy would be modified. Is this proposal of interest to the LLVM community? Can you also comment if the approach specified is good to address this issue? References: [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx [2] https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120919/7de4688a/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm_secure_function.patch Type: application/octet-stream Size: 6350 bytes Desc: llvm_secure_function.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120919/7de4688a/attachment.obj>
I generally disagree with the approach. Generally char* strings aren't recommended for use in LLVM and this kind of string manipulation in LLVM shouldn't be done with the primitive C library functions. The Programmer's Manual gives the preferred types to use for strings [1] and all of them keep track of length. There are also safe routines for creating and formatting strings, such as raw_ostream which is used pervasively in LLVM. The example routine in your patch probably should just use raw_string_ostream or raw_svector_ostream, instead of relying on C-style string routines. That way, the correctness is enforced by the compiler, instead of manually laboring over these things (like checking the return code, which your patch doesn't do...). In other words, there are completely safe alternatives for these functions for almost all cases. One particular use case that usually pertains to memcpy though is when performance is of significant concern and hence the author "knows what they are doing" and aren't willing to sacrifice performance calling into some "secure" version when they have other assurances that the target buffer has sufficient space. The performance difference can be significant, since usually memcpy will be turned into a compiler builtin that the compiler recognizes and optimizes specially, whereas with the suggested approach, a regular call into a "llvm::*_secure" wrapper which then calls into the OS-provided general-purpose "secure" version will happen. I think that it would be useful if you used the output of your static analyzer to provide a list of the places where C-style string manipulation is being done, so that these places can be migrated to using modern, safe LLVM interfaces for these operations. [1] http://llvm.org/docs/ProgrammersManual.html#ds_string --Sean Silva On Tue, Sep 18, 2012 at 8:00 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> Hello, > > > > We have identified functions in LLVM sources using a static code analyzer > which are marked as a “security vulnerability”[1][2]. There has been work > already done to address some of them for Linux (e.g. snprintf). We are > attempting to solve this issue in a comprehensive fashion across all > platforms. Most of the functions identified are for manipulating strings. > Memcpy is the most commonly used of all these unsecure methods. The > following table lists all these functions are their recommended secure > alternatives. > > > > Recommended alternatives: > > Functions Windows Unix/Mac OS > > Memcpy memcpy_s - > > Sprint sprintf_s snprintf > > Sscanf scanf_s - > > _alloca _malloca - > > Strcat strcat_s strlcat > > Strcpy strcpy_s strlcpy > > Strtok strtok_s - > > > > The proposal is to add secure versions of these functions. These functions > will be implemented in LLVM Support module and be used by all other LLVM > modules. The interface of these methods will be platform independent while > their implementation will be platform specific (like the Mutex class in > Support module). In cases where the platform does not support the > functionality natively, we are writing an implementation of these functions. > For example, in the case of memcpy the secure function will look like > llvm::memcpy_secure. > > > > Some secure functions require additional data that needs to be passed (like > buffer sizes). That information has to be added in all places of invocation. > In some cases, this requires an extra size_t argument to be passed through. > Hence, this change would not just be a one to one function refactoring. The > attached patch helps illustrate how an instance of memcpy would be modified. > > > > Is this proposal of interest to the LLVM community? Can you also comment if > the approach specified is good to address this issue? > > > > References: > > [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx > > [2] > https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Martinez, Javier E wrote:> Hello, > > We have identified functions in LLVM sources using a static code > analyzer which are marked as a “security vulnerability”[1][2]. There has > been work already done to address some of them for Linux (e.g. > snprintf). We are attempting to solve this issue in a comprehensive > fashion across all platforms. Most of the functions identified are for > manipulating strings. Memcpy is the most commonly used of all these > unsecure methods. The following table lists all these functions are > their recommended secure alternatives. > > Recommended alternatives: > > Functions Windows Unix/Mac OS > > Memcpy memcpy_s - > > Sprint sprintf_s snprintf > > Sscanf scanf_s - > > _alloca _malloca - > > Strcat strcat_s strlcat > > Strcpy strcpy_s strlcpy > > Strtok strtok_s - > > The proposal is to add secure versions of these functions. These > functions will be implemented in LLVM Support module and be used by all > other LLVM modules. The interface of these methods will be platform > independent while their implementation will be platform specific (like > the Mutex class in Support module). In cases where the platform does not > support the functionality natively, we are writing an implementation of > these functions. For example, in the case of memcpy the secure function > will look like llvm::memcpy_secure. > > Some secure functions require additional data that needs to be passed > (like buffer sizes). That information has to be added in all places of > invocation. In some cases, this requires an extra size_t argument to be > passed through. Hence, this change would not just be a one to one > function refactoring. The attached patch helps illustrate how an > instance of memcpy would be modified. > > Is this proposal of interest to the LLVM community? Can you also comment > if the approach specified is good to address this issue?Personally, I'm not particularly interested in blanket replacement of memcpy with memcpy_s in the hopes that it might close a security hole. I am very interested in fixing any actual bugs. If it's easier to fix real bugs by aggressively using this additional layer, then that may well be the way to go, but before I agree to that, I've got a ton of questions to answer first. What's the current error rate? How often are we seeing bugs in llvm that would be fixed if only we were calling "secure" functions? What's the impact of calling the secure function? On Release builds and on Debug builds? On size and performance? Why not rely on platforms to secure these functions? For instance, Linux and Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to know what the equivalent is there. What about existing tools like valgrind or ASAN? What happens if memcpy_secure does detect an insecure memcpy? It's considered very rude for LLVM to terminate on the spot since it's often used as a library, so how do we handle the error? By calling llvm::report_fatal_error and hoping we don't recurse? What if it's a debug build and we'd like to see where the code went wrong? How do you plan to enforce that the insecure functions aren't called? Nick> References: > > [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx > > [2] > https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Nick, Thanks for taking the time to review the proposal. I'd like to stress out that the purpose of the changes in the proposal are not to improve performance or to fix existing bugs. The purpose is to catch instances where a buffer overrun would happen with specific function calls. With the appropriate input the buffer overrun could lead to undefined behavior or a crash. What's the current error rate? How often are we seeing bugs in llvm that would be fixed if only we were calling "secure" functions? [JM] Zero, the motivation is not to fix exiting bugs but potential hidden ones. What's the impact of calling the secure function? On Release builds and on Debug builds? On size and performance? [JM] I haven't compared the performance of the secure functions and performance is not mentioned much on the pages I've looked at. However I found at [1] a performance comparison between strcpy and strcpy_s. There appears to be a performance penalty for the use of strcpy_s. In a real case it's very likely that the penalty is hidden by bottlenecks in other places. If the performance degradation is apparent then the proposal can be modified. For example, the *_secure version could do only parameter validation and size checks. Why not rely on platforms to secure these functions? For instance, Linux and Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to know what the equivalent is there. What about existing tools like valgrind or ASAN? [JM] I'm not too familiar with Linux specifics and the tools you mentioned but from what I could gather FORTIFY_SOURCE doesn't cover all buffer overflow cases. The cases it covers are exactly the ones I described above, where the size is known at compile time. I don't know of a FORTIFY_SOURCE equivalent for Windows. The proposed solution would work for both Windows and Linux and cover more general cases of buffer overflows. What happens if memcpy_secure does detect an insecure memcpy? It's considered very rude for LLVM to terminate on the spot since it's often used as a library, so how do we handle the error? By calling llvm::report_fatal_error and hoping we don't recurse? What if it's a debug build and we'd like to see where the code went wrong? [JM] We use LLVM inside a DLL so I completely sympathize with you about terminating execution on the spot. Although the patch doesn't address this I think calling llvm::report_fatal_error() if the secure functions fail is a good idea because instead of crashing now LLVM can exit gracefully or if an error handler is available then a controlled situation. In Windows the secure functions allow the use of an error handler to be called if parameter validation fails. The custom error handler can dirently call report_fata_error(). How do you plan to enforce that the insecure functions aren't called? [JM] How about modifying the LLVM programmer's manual should to add a section about the use of secure functions? I can provide a blurb for it. [1] http://codeketchup.blogspot.com/2012/02/sprintf-strcpy-strncpy-strcpys-what-is.html Thanks, Javier -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Wednesday, September 19, 2012 2:11 AM To: Martinez, Javier E Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Handling of unsafe functions Martinez, Javier E wrote:> Hello, > > We have identified functions in LLVM sources using a static code > analyzer which are marked as a "security vulnerability"[1][2]. There > has been work already done to address some of them for Linux (e.g. > snprintf). We are attempting to solve this issue in a comprehensive > fashion across all platforms. Most of the functions identified are for > manipulating strings. Memcpy is the most commonly used of all these > unsecure methods. The following table lists all these functions are > their recommended secure alternatives. > > Recommended alternatives: > > Functions Windows Unix/Mac OS > > Memcpy memcpy_s - > > Sprint sprintf_s snprintf > > Sscanf scanf_s - > > _alloca _malloca - > > Strcat strcat_s strlcat > > Strcpy strcpy_s strlcpy > > Strtok strtok_s - > > The proposal is to add secure versions of these functions. These > functions will be implemented in LLVM Support module and be used by > all other LLVM modules. The interface of these methods will be > platform independent while their implementation will be platform > specific (like the Mutex class in Support module). In cases where the > platform does not support the functionality natively, we are writing > an implementation of these functions. For example, in the case of > memcpy the secure function will look like llvm::memcpy_secure. > > Some secure functions require additional data that needs to be passed > (like buffer sizes). That information has to be added in all places of > invocation. In some cases, this requires an extra size_t argument to > be passed through. Hence, this change would not just be a one to one > function refactoring. The attached patch helps illustrate how an > instance of memcpy would be modified. > > Is this proposal of interest to the LLVM community? Can you also > comment if the approach specified is good to address this issue?Personally, I'm not particularly interested in blanket replacement of memcpy with memcpy_s in the hopes that it might close a security hole. I am very interested in fixing any actual bugs. If it's easier to fix real bugs by aggressively using this additional layer, then that may well be the way to go, but before I agree to that, I've got a ton of questions to answer first. What's the current error rate? How often are we seeing bugs in llvm that would be fixed if only we were calling "secure" functions? What's the impact of calling the secure function? On Release builds and on Debug builds? On size and performance? Why not rely on platforms to secure these functions? For instance, Linux and Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to know what the equivalent is there. What about existing tools like valgrind or ASAN? What happens if memcpy_secure does detect an insecure memcpy? It's considered very rude for LLVM to terminate on the spot since it's often used as a library, so how do we handle the error? By calling llvm::report_fatal_error and hoping we don't recurse? What if it's a debug build and we'd like to see where the code went wrong? How do you plan to enforce that the insecure functions aren't called? Nick> References: > > [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx > > [2] > https://developer.apple.com/library/mac/#documentation/Security/Concep > tual/SecureCodingGuide/Articles/BufferOverflows.html > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Sean, Thanks for the valued feedback. I agree with you that the containers available in LLVM are preferable to char buffers but I want to point out that the proposal doesn't add any new uses of char buffer and merely works with existing ones. Changing existing uses of char buffers to other objects is beyond the scope of this proposal. It makes more sense to do that when changes to code that uses string manipulation functions are made as it could incur in larger design changes. I'm unsure of the performance impact of using the secure functions and how to balance it with the benefit of improving the code quality. If the proposal gets support I can gather performance data to make the determination of whether there is a performance hit and if it's acceptable. Hoping that authors "know what they are doing" is not enough. If that were the case there wouldn't be bugs to fix and code to review. I don't have the output of the static analyzer at hand but will provide it on a follow up email. Thanks, Javier -----Original Message----- From: Sean Silva [mailto:silvas at purdue.edu] Sent: Tuesday, September 18, 2012 6:25 PM To: Martinez, Javier E Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Handling of unsafe functions I generally disagree with the approach. Generally char* strings aren't recommended for use in LLVM and this kind of string manipulation in LLVM shouldn't be done with the primitive C library functions. The Programmer's Manual gives the preferred types to use for strings [1] and all of them keep track of length. There are also safe routines for creating and formatting strings, such as raw_ostream which is used pervasively in LLVM. The example routine in your patch probably should just use raw_string_ostream or raw_svector_ostream, instead of relying on C-style string routines. That way, the correctness is enforced by the compiler, instead of manually laboring over these things (like checking the return code, which your patch doesn't do...). In other words, there are completely safe alternatives for these functions for almost all cases. One particular use case that usually pertains to memcpy though is when performance is of significant concern and hence the author "knows what they are doing" and aren't willing to sacrifice performance calling into some "secure" version when they have other assurances that the target buffer has sufficient space. The performance difference can be significant, since usually memcpy will be turned into a compiler builtin that the compiler recognizes and optimizes specially, whereas with the suggested approach, a regular call into a "llvm::*_secure" wrapper which then calls into the OS-provided general-purpose "secure" version will happen. I think that it would be useful if you used the output of your static analyzer to provide a list of the places where C-style string manipulation is being done, so that these places can be migrated to using modern, safe LLVM interfaces for these operations. [1] http://llvm.org/docs/ProgrammersManual.html#ds_string --Sean Silva On Tue, Sep 18, 2012 at 8:00 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> Hello, > > > > We have identified functions in LLVM sources using a static code > analyzer which are marked as a "security vulnerability"[1][2]. There > has been work already done to address some of them for Linux (e.g. > snprintf). We are attempting to solve this issue in a comprehensive > fashion across all platforms. Most of the functions identified are for manipulating strings. > Memcpy is the most commonly used of all these unsecure methods. The > following table lists all these functions are their recommended secure > alternatives. > > > > Recommended alternatives: > > Functions Windows Unix/Mac OS > > Memcpy memcpy_s - > > Sprint sprintf_s snprintf > > Sscanf scanf_s - > > _alloca _malloca - > > Strcat strcat_s strlcat > > Strcpy strcpy_s strlcpy > > Strtok strtok_s - > > > > The proposal is to add secure versions of these functions. These > functions will be implemented in LLVM Support module and be used by > all other LLVM modules. The interface of these methods will be > platform independent while their implementation will be platform > specific (like the Mutex class in Support module). In cases where the > platform does not support the functionality natively, we are writing an implementation of these functions. > For example, in the case of memcpy the secure function will look like > llvm::memcpy_secure. > > > > Some secure functions require additional data that needs to be passed > (like buffer sizes). That information has to be added in all places of invocation. > In some cases, this requires an extra size_t argument to be passed through. > Hence, this change would not just be a one to one function > refactoring. The attached patch helps illustrate how an instance of memcpy would be modified. > > > > Is this proposal of interest to the LLVM community? Can you also > comment if the approach specified is good to address this issue? > > > > References: > > [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx > > [2] > https://developer.apple.com/library/mac/#documentation/Security/Concep > tual/SecureCodingGuide/Articles/BufferOverflows.html > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> We have identified functions in LLVM sources using a static code analyzer > which are marked as a “security vulnerability”[1][2]. There has been work > already done to address some of them for Linux (e.g. snprintf). We are > attempting to solve this issue in a comprehensive fashion across all > platforms. Most of the functions identified are for manipulating strings. > Memcpy is the most commonly used of all these unsecure methods. The > following table lists all these functions are their recommended secure > alternatives.I am strongly opposed to using *_s functions. The issue is that they are no more "secure" than original functions. One can still pass the destination buffer length incorrectly, especially if it is not known at compile time and should be computed. I agree with Sean that we should move the code using C strings to LLVM safe data types. And one more thing: it is interesting that the "unsafe" APFloat::convertToHexString (from your patch) is not used anywhere. Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E > <javier.e.martinez at intel.com> wrote: >> We have identified functions in LLVM sources using a static code analyzer >> which are marked as a “security vulnerability”[1][2]. There has been work >> already done to address some of them for Linux (e.g. snprintf). We are >> attempting to solve this issue in a comprehensive fashion across all >> platforms. Most of the functions identified are for manipulating strings. >> Memcpy is the most commonly used of all these unsecure methods. The >> following table lists all these functions are their recommended secure >> alternatives. > > I am strongly opposed to using *_s functions. The issue is that they > are no more "secure" than original functions. One can still pass the > destination buffer length incorrectly, especially if it is not known > at compile time and should be computed. > > I agree with Sean that we should move the code using C strings to LLVM > safe data types.I agree.> > And one more thing: it is interesting that the "unsafe" > APFloat::convertToHexString (from your patch) is not used anywhere.Zap it! Oh wait, is it used by Clang or something else? -Chris
On Wed, Sep 19, 2012 at 12:00:50AM +0000, Martinez, Javier E wrote:> We have identified functions in LLVM sources using a static code > analyzer which are marked as a "security vulnerability"[1][2]. > > Recommended alternatives: > > Functions Windows Unix/Mac OS > > Memcpy memcpy_s -... Please fill bug reports for your tool. memcpy operates on explicitly bounded objects, unlikely e.g. strcat/strcpy. Marking them as deprecated is just as buggy. From the rest of your list, strtok has some issues, but it is generally safe to use too. The replacements are not an improvement at all. First time I saw the annex K (?) from C11, I was thinking like "Who pushed this crap into the standard, Microsoft?". Joerg