>From the responses it's pretty clear that the preference is to avoid using C string functions altogether. I've attached at list of calls in Clang/LLVM. The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the call based on the location of the source buffer. If there are no objections I'll prepare a patch that replaces the string manipulation functions an appropriate string object.The proposal comments have largely centered on the string functions. Do people feel the same way about memcpy_s? What about those of you building LLVM on Windows with Visual Studio? I know both functions can be used the wrong way but at least the "secure" version makes one think about the value passed for destination size. Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with such a high number of uses there are likely a few that are not. I'm willing to plow through all the calls to check the parameters while making the change to the memcpy_secure version from the proposal. Thanks, Javier From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Richard Smith Sent: Thursday, September 20, 2012 3:09 PM To: Chris Lattner Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Handling of unsafe functions On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at apple.com<mailto:clattner at apple.com>> wrote: On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com<mailto:gribozavr at gmail.com>> wrote:> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E > <javier.e.martinez at intel.com<mailto:javier.e.martinez at intel.com>> wrote: >> We have identified functions in LLVM sources using a static code analyzer >> which are marked as a "security vulnerability"[1][2]. There has been work >> already done to address some of them for Linux (e.g. snprintf). We are >> attempting to solve this issue in a comprehensive fashion across all >> platforms. Most of the functions identified are for manipulating strings. >> Memcpy is the most commonly used of all these unsecure methods. The >> following table lists all these functions are their recommended secure >> alternatives. > > I am strongly opposed to using *_s functions. The issue is that they > are no more "secure" than original functions. One can still pass the > destination buffer length incorrectly, especially if it is not known > at compile time and should be computed. > > I agree with Sean that we should move the code using C strings to LLVM > safe data types.I agree.> > And one more thing: it is interesting that the "unsafe" > APFloat::convertToHexString (from your patch) is not used anywhere.Zap it! Oh wait, is it used by Clang or something else? Clang doesn't use it, but LLDB does, with an arbitrary-seeming 256-character buffer. Perhaps we should change it to take an llvm::SmallVectorImpl<char>, rather than relying on a character buffer "which must be of sufficient size". -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120921/f99504c8/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: CStringFunctionsUse.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120921/f99504c8/attachment.txt>
On Fri, Sep 21, 2012 at 6:52 AM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> From the responses it’s pretty clear that the preference is to avoid using C > string functions altogether. I’ve attached at list of calls in Clang/LLVM. > The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the > call based on the location of the source buffer. If there are no objections > I’ll prepare a patch that replaces the string manipulation functions an > appropriate string object.Thank you for the list. Fixed the Clang non-platform specific bits in r164554. Didn't dare to fix other occurrences in Clang because they are #ifdefed for Windows and I don't have a Windows machine to test. Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
> I’ll prepare a patch that replaces the string manipulation functions an > appropriate string object.Please break the patch up into focused chunks, one per logical change. We try to keep all LLVM development as incremental as possible [1]. I recommend fixing a single logical occurrence (such as fixing APFloat::convertToHexString()) and then mailing the patch to llvm-commits. It's likely that the feedback you get from the review of the first patch will help inform future patches and simplify later patch reviews. [1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development> I know both functions can be used the wrong way but at least the “secure” > version makes one think about the value passed for destination size. Very > likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with > such a high number of uses there are likely a few that are not. I’m willing > to plow through all the calls to check the parameters while making the > change to the memcpy_secure version from the proposal.My inclination is that it will be redundant. For example, consider this usage, grabbed randomly from the source code: char *Buf = static_cast<char *>(Allocate(Directory.size())); memcpy(Buf, Directory.data(), Directory.size()); it's not that clear to me that just adding an extra argument of Directory.size() buys anything. If anything, it seems like it introduces _another_ place to make an error. I went through a couple other random uses of memcpy() in the LLVM tree and none of them seemed like they would benefit.> [...] I’m willing > to plow through all the calls to check the parameters while making the > change to the memcpy_secure version from the proposal.By all means, please take a look at them. However, it might be overall more beneficial to go through and report "sketchy" uses, where the copy is not obviously correct, then discuss on the list how to ensure the code is safe. Once all of the uses are audited, you can then just write a little script using something like git's "pickaxe" functionality (see `-S` option to git-diff(1), and other commands that take "diff" options) to send you an email every time a new use of memcpy gets introduced into the codebase. For example, you could have a daily cron job that runs `git log -p -S'memcpy(' origin/master@{yesterday}..origin/master` and if there is any output then it mails that output to you. -- Sean Silva On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> From the responses it’s pretty clear that the preference is to avoid using C > string functions altogether. I’ve attached at list of calls in Clang/LLVM. > The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the > call based on the location of the source buffer. If there are no objections > I’ll prepare a patch that replaces the string manipulation functions an > appropriate string object. > > > > The proposal comments have largely centered on the string functions. Do > people feel the same way about memcpy_s? What about those of you building > LLVM on Windows with Visual Studio? > > > > I know both functions can be used the wrong way but at least the “secure” > version makes one think about the value passed for destination size. Very > likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with > such a high number of uses there are likely a few that are not. I’m willing > to plow through all the calls to check the parameters while making the > change to the memcpy_secure version from the proposal. > > > > Thanks, > > Javier > > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Richard Smith > Sent: Thursday, September 20, 2012 3:09 PM > To: Chris Lattner > > > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Handling of unsafe functions > > > > On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at apple.com> wrote: > > > On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote: > >> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E >> <javier.e.martinez at intel.com> wrote: >>> We have identified functions in LLVM sources using a static code analyzer >>> which are marked as a “security vulnerability”[1][2]. There has been work >>> already done to address some of them for Linux (e.g. snprintf). We are >>> attempting to solve this issue in a comprehensive fashion across all >>> platforms. Most of the functions identified are for manipulating strings. >>> Memcpy is the most commonly used of all these unsecure methods. The >>> following table lists all these functions are their recommended secure >>> alternatives. >> >> I am strongly opposed to using *_s functions. The issue is that they >> are no more "secure" than original functions. One can still pass the >> destination buffer length incorrectly, especially if it is not known >> at compile time and should be computed. >> >> I agree with Sean that we should move the code using C strings to LLVM >> safe data types. > > I agree. > > >> >> And one more thing: it is interesting that the "unsafe" >> APFloat::convertToHexString (from your patch) is not used anywhere. > > Zap it! Oh wait, is it used by Clang or something else? > > > > Clang doesn't use it, but LLDB does, with an arbitrary-seeming 256-character > buffer. Perhaps we should change it to take an llvm::SmallVectorImpl<char>, > rather than relying on a character buffer "which must be of sufficient > size". > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Sean, Thanks for the suggestion on the patch submission. I started with the first patch to address a couple of calls to C string functions in Errno.cpp. If that goes well I'll prepare a patch for each function occurrence.> I went through a couple other random uses of memcpy() in the LLVM tree and none of them seemed like they would benefit.I understand that not all uses of memcpy will clearly benefit from a replacement to the proposed memcpy_secure. For people using LLVM under Windows using Visual Studio there's an additional benefit; LLVM can be compiled without deprecating the functions marked as unsecure. When this is a requirement the alternative is to make the changes locally which becomes a problem when upgrading LLVM versions. I still think that abstracting the implementation of memcpy facilitates the work for users in Windows without having a significant impacting users in other platforms. Thanks, Javier -----Original Message----- From: Sean Silva [mailto:silvas at purdue.edu] Sent: Monday, September 24, 2012 3:37 PM To: Martinez, Javier E Cc: Richard Smith; Chris Lattner; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Handling of unsafe functions> I'll prepare a patch that replaces the string manipulation functions > an appropriate string object.Please break the patch up into focused chunks, one per logical change. We try to keep all LLVM development as incremental as possible [1]. I recommend fixing a single logical occurrence (such as fixing APFloat::convertToHexString()) and then mailing the patch to llvm-commits. It's likely that the feedback you get from the review of the first patch will help inform future patches and simplify later patch reviews. [1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development> I know both functions can be used the wrong way but at least the "secure" > version makes one think about the value passed for destination size. > Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct > but with such a high number of uses there are likely a few that are > not. I'm willing to plow through all the calls to check the parameters > while making the change to the memcpy_secure version from the proposal.My inclination is that it will be redundant. For example, consider this usage, grabbed randomly from the source code: char *Buf = static_cast<char *>(Allocate(Directory.size())); memcpy(Buf, Directory.data(), Directory.size()); it's not that clear to me that just adding an extra argument of Directory.size() buys anything. If anything, it seems like it introduces _another_ place to make an error. I went through a couple other random uses of memcpy() in the LLVM tree and none of them seemed like they would benefit.> [...] I'm willing > to plow through all the calls to check the parameters while making the > change to the memcpy_secure version from the proposal.By all means, please take a look at them. However, it might be overall more beneficial to go through and report "sketchy" uses, where the copy is not obviously correct, then discuss on the list how to ensure the code is safe. Once all of the uses are audited, you can then just write a little script using something like git's "pickaxe" functionality (see `-S` option to git-diff(1), and other commands that take "diff" options) to send you an email every time a new use of memcpy gets introduced into the codebase. For example, you could have a daily cron job that runs `git log -p -S'memcpy(' origin/master@{yesterday}..origin/master` and if there is any output then it mails that output to you. -- Sean Silva On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:> From the responses it's pretty clear that the preference is to avoid > using C string functions altogether. I've attached at list of calls in Clang/LLVM. > The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace > the call based on the location of the source buffer. If there are no > objections I'll prepare a patch that replaces the string manipulation > functions an appropriate string object. > > > > The proposal comments have largely centered on the string functions. > Do people feel the same way about memcpy_s? What about those of you > building LLVM on Windows with Visual Studio? > > > > I know both functions can be used the wrong way but at least the "secure" > version makes one think about the value passed for destination size. > Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct > but with such a high number of uses there are likely a few that are > not. I'm willing to plow through all the calls to check the parameters > while making the change to the memcpy_secure version from the proposal. > > > > Thanks, > > Javier > > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Richard Smith > Sent: Thursday, September 20, 2012 3:09 PM > To: Chris Lattner > > > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Handling of unsafe functions > > > > On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at apple.com> wrote: > > > On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote: > >> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E >> <javier.e.martinez at intel.com> wrote: >>> We have identified functions in LLVM sources using a static code >>> analyzer which are marked as a "security vulnerability"[1][2]. There >>> has been work already done to address some of them for Linux (e.g. >>> snprintf). We are attempting to solve this issue in a comprehensive >>> fashion across all platforms. Most of the functions identified are for manipulating strings. >>> Memcpy is the most commonly used of all these unsecure methods. The >>> following table lists all these functions are their recommended >>> secure alternatives. >> >> I am strongly opposed to using *_s functions. The issue is that they >> are no more "secure" than original functions. One can still pass the >> destination buffer length incorrectly, especially if it is not known >> at compile time and should be computed. >> >> I agree with Sean that we should move the code using C strings to >> LLVM safe data types. > > I agree. > > >> >> And one more thing: it is interesting that the "unsafe" >> APFloat::convertToHexString (from your patch) is not used anywhere. > > Zap it! Oh wait, is it used by Clang or something else? > > > > Clang doesn't use it, but LLDB does, with an arbitrary-seeming > 256-character buffer. Perhaps we should change it to take an > llvm::SmallVectorImpl<char>, rather than relying on a character buffer > "which must be of sufficient size". > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 09/21/2012 05:52 AM, Martinez, Javier E wrote:> The proposal comments have largely centered on the string functions. Do > people feel the same way about memcpy_s? What about those of you > building LLVM on Windows with Visual Studio?Is memcmp_s (or a variant thereof) a win in practice? It covers the case pretty well where you try to copy a dynamically sized buffer to the start of a statically sized one. I don't want to say that it doesn't happen, but it seems to be rather rare, and I expect most calls to memcpy_s would use the same expression for both sizes. -- Florian Weimer / Red Hat Product Security Team
Florian, You have a valid point, the majority of the times memcpy is used correctly and the destination buffer size is redundant information. I think we also agree that there are cases where the correct use of memcpy is not clear cut and the 4-parameter version adds value. I've modified the original proposal to strike the middle of the road. Instead of only providing a 4-parameter version of memcpy_secure now a 3-parameter version also exists. The latter maps to memcpy or memcpy_s depending on what's available as determined by the CMake scripts. The 4-parameter version replaces those calls to memcpy where safe use can't be determined. All other calls will be replaced to the 3-parameter version. This solution avoids passing redundant information, covers the situation where additional checking is of value and is platform independent (except for the implementation of memcpy_secure). The attached patch demonstrates the idea. Thanks, Javier -----Original Message----- From: Florian Weimer [mailto:fweimer at redhat.com] Sent: Thursday, September 27, 2012 5:26 AM To: Martinez, Javier E Cc: Richard Smith; Chris Lattner; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Handling of unsafe functions On 09/21/2012 05:52 AM, Martinez, Javier E wrote:> The proposal comments have largely centered on the string functions. > Do people feel the same way about memcpy_s? What about those of you > building LLVM on Windows with Visual Studio?Is memcmp_s (or a variant thereof) a win in practice? It covers the case pretty well where you try to copy a dynamically sized buffer to the start of a statically sized one. I don't want to say that it doesn't happen, but it seems to be rather rare, and I expect most calls to memcpy_s would use the same expression for both sizes. -- Florian Weimer / Red Hat Product Security Team -------------- next part -------------- A non-text attachment was scrubbed... Name: memcpy_patch.patch Type: application/octet-stream Size: 5337 bytes Desc: memcpy_patch.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121003/361da83d/attachment.obj>