thr3ads.net - llvm dev - [LLVMdev] Handling of unsafe functions [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Martinez, Javier E

2012-Sep-21 03:52 UTC

[LLVMdev] Handling of unsafe functions

>From the responses it's pretty clear that the preference is to avoid
using C string functions altogether. I've attached at list of calls in
Clang/LLVM. The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to
replace the call based on the location of the source buffer. If there are no
objections I'll prepare a patch that replaces the string manipulation
functions an appropriate  string object.
The proposal comments have largely centered on the string functions. Do people
feel the same way about memcpy_s? What about those of you building LLVM on
Windows with Visual Studio?

I know both functions can be used the wrong way but at least the
"secure" version makes one think about the value passed for
destination size. Very likely most of the 1691 uses of memcpy in Clang/LLVM are
correct but with such a high number of uses there are likely a few that are not.
I'm willing to plow through all the calls to check the parameters while
making the change to the memcpy_secure version from the proposal.

Thanks,
Javier

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Richard Smith
Sent: Thursday, September 20, 2012 3:09 PM
To: Chris Lattner
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions

On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at
apple.com<mailto:clattner at apple.com>> wrote:

On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at
gmail.com<mailto:gribozavr at gmail.com>> wrote:
> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E
> <javier.e.martinez at intel.com<mailto:javier.e.martinez at
intel.com>> wrote:
>> We have identified functions in LLVM sources using a static code
analyzer
>> which are marked as a "security vulnerability"[1][2]. There
has been work
>> already done to address some of them for Linux (e.g. snprintf). We are
>> attempting to solve this issue in a comprehensive fashion across all
>> platforms. Most of the functions identified are for manipulating
strings.
>> Memcpy is the most commonly used of all these unsecure methods. The
>> following table lists all these functions are their recommended secure
>> alternatives.
>
> I am strongly opposed to using *_s functions.  The issue is that they
> are no more "secure" than original functions.  One can still pass
the
> destination buffer length incorrectly, especially if it is not known
> at compile time and should be computed.
>
> I agree with Sean that we should move the code using C strings to LLVM
> safe data types.I agree.
>
> And one more thing: it is interesting that the "unsafe"
> APFloat::convertToHexString (from your patch) is not used anywhere.Zap it!  Oh wait, is it used by Clang or something else?

Clang doesn't use it, but LLDB does, with an arbitrary-seeming 256-character
buffer. Perhaps we should change it to take an
llvm::SmallVectorImpl<char>, rather than relying on a character buffer
"which must be of sufficient size".
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120921/f99504c8/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: CStringFunctionsUse.txt
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120921/f99504c8/attachment.txt>

Dmitri Gribenko

2012-Sep-24 21:14 UTC

head link

[LLVMdev] Handling of unsafe functions

On Fri, Sep 21, 2012 at 6:52 AM, Martinez, Javier E
<javier.e.martinez at intel.com> wrote:> From the responses it’s pretty clear that the preference is to avoid using
C
> string functions altogether. I’ve attached at list of calls in Clang/LLVM.
> The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the
> call based on the location of the source buffer. If there are no objections
> I’ll prepare a patch that replaces the string manipulation functions an
> appropriate  string object.
Thank you for the list.  Fixed the Clang non-platform specific bits in
r164554.  Didn't dare to fix other occurrences in Clang because they
are #ifdefed for Windows and I don't have a Windows machine to test.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at
gmail.com>*/

Sean Silva

2012-Sep-24 22:37 UTC

head link

[LLVMdev] Handling of unsafe functions

> I’ll prepare a patch that replaces the string manipulation functions an
> appropriate  string object.
Please break the patch up into focused chunks, one per logical change.
We try to keep all LLVM development as incremental as possible [1]. I
recommend fixing a single logical occurrence (such as fixing
APFloat::convertToHexString()) and then mailing the patch to
llvm-commits. It's likely that the feedback you get from the review of
the first patch will help inform future patches and simplify later
patch reviews.

[1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development
> I know both functions can be used the wrong way but at least the “secure”
> version makes one think about the value passed for destination size. Very
> likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with
> such a high number of uses there are likely a few that are not. I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.
My inclination is that it will be redundant. For example, consider
this usage, grabbed randomly from the source code:

      char *Buf = static_cast<char *>(Allocate(Directory.size()));
      memcpy(Buf, Directory.data(), Directory.size());

it's not that clear to me that just adding an extra argument of
Directory.size() buys anything. If anything, it seems like it
introduces _another_ place to make an error.

I went through a couple other random uses of memcpy() in the LLVM tree
and none of them seemed like they would benefit.
> [...] I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.
By all means, please take a look at them. However, it might be overall
more beneficial to go through and report "sketchy" uses, where the
copy is not obviously correct, then discuss on the list how to ensure
the code is safe. Once all of the uses are audited, you can then just
write a little script using something like git's "pickaxe"
functionality (see `-S` option to git-diff(1), and other commands that
take "diff" options) to send you an email every time a new use of
memcpy gets introduced into the codebase. For example, you could have
a daily cron job that runs `git log -p -S'memcpy('
origin/master@{yesterday}..origin/master` and if there is any output
then it mails that output to you.

-- Sean Silva





On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E
<javier.e.martinez at intel.com> wrote:> From the responses it’s pretty clear that the preference is to avoid using
C
> string functions altogether. I’ve attached at list of calls in Clang/LLVM.
> The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the
> call based on the location of the source buffer. If there are no objections
> I’ll prepare a patch that replaces the string manipulation functions an
> appropriate  string object.
>
>
>
> The proposal comments have largely centered on the string functions. Do
> people feel the same way about memcpy_s? What about those of you building
> LLVM on Windows with Visual Studio?
>
>
>
> I know both functions can be used the wrong way but at least the “secure”
> version makes one think about the value passed for destination size. Very
> likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with
> such a high number of uses there are likely a few that are not. I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.
>
>
>
> Thanks,
>
> Javier
>
>
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Richard Smith
> Sent: Thursday, September 20, 2012 3:09 PM
> To: Chris Lattner
>
>
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Handling of unsafe functions
>
>
>
> On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at
apple.com> wrote:
>
>
> On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com>
wrote:
>
>> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E
>> <javier.e.martinez at intel.com> wrote:
>>> We have identified functions in LLVM sources using a static code
analyzer
>>> which are marked as a “security vulnerability”[1][2]. There has
been work
>>> already done to address some of them for Linux (e.g. snprintf). We
are
>>> attempting to solve this issue in a comprehensive fashion across
all
>>> platforms. Most of the functions identified are for manipulating
strings.
>>> Memcpy is the most commonly used of all these unsecure methods. The
>>> following table lists all these functions are their recommended
secure
>>> alternatives.
>>
>> I am strongly opposed to using *_s functions.  The issue is that they
>> are no more "secure" than original functions.  One can still
pass the
>> destination buffer length incorrectly, especially if it is not known
>> at compile time and should be computed.
>>
>> I agree with Sean that we should move the code using C strings to LLVM
>> safe data types.
>
> I agree.
>
>
>>
>> And one more thing: it is interesting that the "unsafe"
>> APFloat::convertToHexString (from your patch) is not used anywhere.
>
> Zap it!  Oh wait, is it used by Clang or something else?
>
>
>
> Clang doesn't use it, but LLDB does, with an arbitrary-seeming
256-character
> buffer. Perhaps we should change it to take an
llvm::SmallVectorImpl<char>,
> rather than relying on a character buffer "which must be of sufficient
> size".
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Martinez, Javier E

2012-Sep-27 07:40 UTC

head link

[LLVMdev] Handling of unsafe functions

Sean,

Thanks for the suggestion on the patch submission. I started with the first
patch to address a couple of calls to C string functions in Errno.cpp. If that
goes well I'll prepare a patch for each function occurrence.
> I went through a couple other random uses of memcpy() in the LLVM tree and
none of them seemed like they would benefit.I understand that not all uses of memcpy will clearly benefit from a replacement
to the proposed memcpy_secure. For people using LLVM under Windows using Visual
Studio there's an additional benefit; LLVM can be compiled without
deprecating the functions marked as unsecure. When this is a requirement the
alternative is to make the changes locally which becomes a problem when
upgrading LLVM versions. I still think that abstracting the implementation of
memcpy facilitates the work for users in Windows without having a significant
impacting users in other platforms.

Thanks,
Javier

-----Original Message-----
From: Sean Silva [mailto:silvas at purdue.edu] 
Sent: Monday, September 24, 2012 3:37 PM
To: Martinez, Javier E
Cc: Richard Smith; Chris Lattner; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions
> I'll prepare a patch that replaces the string manipulation functions 
> an appropriate  string object.
Please break the patch up into focused chunks, one per logical change.
We try to keep all LLVM development as incremental as possible [1]. I recommend
fixing a single logical occurrence (such as fixing
APFloat::convertToHexString()) and then mailing the patch to llvm-commits.
It's likely that the feedback you get from the review of the first patch
will help inform future patches and simplify later patch reviews.

[1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development
> I know both functions can be used the wrong way but at least the
"secure"
> version makes one think about the value passed for destination size. 
> Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct 
> but with such a high number of uses there are likely a few that are 
> not. I'm willing to plow through all the calls to check the parameters 
> while making the change to the memcpy_secure version from the proposal.
My inclination is that it will be redundant. For example, consider this usage,
grabbed randomly from the source code:

      char *Buf = static_cast<char *>(Allocate(Directory.size()));
      memcpy(Buf, Directory.data(), Directory.size());

it's not that clear to me that just adding an extra argument of
Directory.size() buys anything. If anything, it seems like it introduces
_another_ place to make an error.

I went through a couple other random uses of memcpy() in the LLVM tree and none
of them seemed like they would benefit.
> [...] I'm willing
> to plow through all the calls to check the parameters while making the 
> change to the memcpy_secure version from the proposal.
By all means, please take a look at them. However, it might be overall more
beneficial to go through and report "sketchy" uses, where the copy is
not obviously correct, then discuss on the list how to ensure the code is safe.
Once all of the uses are audited, you can then just write a little script using
something like git's "pickaxe"
functionality (see `-S` option to git-diff(1), and other commands that take
"diff" options) to send you an email every time a new use of memcpy
gets introduced into the codebase. For example, you could have a daily cron job
that runs `git log -p -S'memcpy('
origin/master@{yesterday}..origin/master` and if there is any output then it
mails that output to you.

-- Sean Silva





On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E <javier.e.martinez at
intel.com> wrote:> From the responses it's pretty clear that the preference is to avoid 
> using C string functions altogether. I've attached at list of calls in
Clang/LLVM.
> The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace 
> the call based on the location of the source buffer. If there are no 
> objections I'll prepare a patch that replaces the string manipulation 
> functions an appropriate  string object.
>
>
>
> The proposal comments have largely centered on the string functions. 
> Do people feel the same way about memcpy_s? What about those of you 
> building LLVM on Windows with Visual Studio?
>
>
>
> I know both functions can be used the wrong way but at least the
"secure"
> version makes one think about the value passed for destination size. 
> Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct 
> but with such a high number of uses there are likely a few that are 
> not. I'm willing to plow through all the calls to check the parameters 
> while making the change to the memcpy_secure version from the proposal.
>
>
>
> Thanks,
>
> Javier
>
>
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Richard Smith
> Sent: Thursday, September 20, 2012 3:09 PM
> To: Chris Lattner
>
>
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Handling of unsafe functions
>
>
>
> On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at
apple.com> wrote:
>
>
> On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com>
wrote:
>
>> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E 
>> <javier.e.martinez at intel.com> wrote:
>>> We have identified functions in LLVM sources using a static code 
>>> analyzer which are marked as a "security
vulnerability"[1][2]. There
>>> has been work already done to address some of them for Linux (e.g. 
>>> snprintf). We are attempting to solve this issue in a comprehensive
>>> fashion across all platforms. Most of the functions identified are
for manipulating strings.
>>> Memcpy is the most commonly used of all these unsecure methods. The
>>> following table lists all these functions are their recommended 
>>> secure alternatives.
>>
>> I am strongly opposed to using *_s functions.  The issue is that they 
>> are no more "secure" than original functions.  One can still
pass the
>> destination buffer length incorrectly, especially if it is not known 
>> at compile time and should be computed.
>>
>> I agree with Sean that we should move the code using C strings to 
>> LLVM safe data types.
>
> I agree.
>
>
>>
>> And one more thing: it is interesting that the "unsafe"
>> APFloat::convertToHexString (from your patch) is not used anywhere.
>
> Zap it!  Oh wait, is it used by Clang or something else?
>
>
>
> Clang doesn't use it, but LLDB does, with an arbitrary-seeming 
> 256-character buffer. Perhaps we should change it to take an 
> llvm::SmallVectorImpl<char>, rather than relying on a character
buffer
> "which must be of sufficient size".
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Florian Weimer

2012-Sep-27 12:26 UTC

head link

[LLVMdev] Handling of unsafe functions

On 09/21/2012 05:52 AM, Martinez, Javier E wrote:
> The proposal comments have largely centered on the string functions. Do
> people feel the same way about memcpy_s? What about those of you
> building LLVM on Windows with Visual Studio?
Is memcmp_s (or a variant thereof) a win in practice?  It covers the 
case pretty well where you try to copy a dynamically sized buffer to the 
start of a statically sized one.  I don't want to say that it doesn't 
happen, but it seems to be rather rare, and I expect most calls to 
memcpy_s would use the same expression for both sizes.

-- 
Florian Weimer / Red Hat Product Security Team

Martinez, Javier E

2012-Oct-03 06:30 UTC

head link

[LLVMdev] Handling of unsafe functions

Florian,

You have a valid point, the majority of the times memcpy is used correctly and
the destination buffer size is redundant information. I think we also agree that
there are cases where the correct use of memcpy is not clear cut and the
4-parameter version adds value.

I've modified the original proposal to strike the middle of the road.
Instead of only providing a 4-parameter version of memcpy_secure now a
3-parameter version also exists. The latter maps to memcpy or memcpy_s depending
on what's available as determined by the CMake scripts. The 4-parameter
version replaces those calls to memcpy where safe use can't be determined.
All other calls will be replaced to the 3-parameter version.

This solution avoids passing redundant information, covers the situation where
additional checking is of value and is platform independent (except for the
implementation of memcpy_secure). The attached patch demonstrates the idea.

Thanks,
Javier

-----Original Message-----
From: Florian Weimer [mailto:fweimer at redhat.com] 
Sent: Thursday, September 27, 2012 5:26 AM
To: Martinez, Javier E
Cc: Richard Smith; Chris Lattner; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions

On 09/21/2012 05:52 AM, Martinez, Javier E wrote:
> The proposal comments have largely centered on the string functions. 
> Do people feel the same way about memcpy_s? What about those of you 
> building LLVM on Windows with Visual Studio?
Is memcmp_s (or a variant thereof) a win in practice?  It covers the case pretty
well where you try to copy a dynamically sized buffer to the start of a
statically sized one.  I don't want to say that it doesn't happen, but
it seems to be rather rare, and I expect most calls to memcpy_s would use the
same expression for both sizes.

-- 
Florian Weimer / Red Hat Product Security Team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memcpy_patch.patch
Type: application/octet-stream
Size: 5337 bytes
Desc: memcpy_patch.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121003/361da83d/attachment.obj>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Sep 2012 - [LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

Seemingly Similar Threads