thr3ads.net - llvm dev - [LLVMdev] Handling of unsafe functions [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Martinez, Javier E

2012-Sep-19 00:00 UTC

[LLVMdev] Handling of unsafe functions

Hello,



We have identified functions in LLVM sources using a static code analyzer which
are marked as a "security vulnerability"[1][2]. There has been work
already done to address some of them for Linux (e.g. snprintf). We are
attempting to solve this issue in a comprehensive fashion across all platforms.
Most of the functions identified are for manipulating strings. Memcpy is the
most commonly used of all these unsecure methods. The following table lists all
these functions are their recommended secure alternatives.



Recommended alternatives:

Functions    Windows        Unix/Mac OS

Memcpy     memcpy_s      -

Sprint          sprintf_s         snprintf

Sscanf         scanf_s            -

_alloca        _malloca         -

Strcat          strcat_s          strlcat

Strcpy         strcpy_s          strlcpy

Strtok         strtok_s           -



The proposal is to add secure versions of these functions. These functions will
be implemented in LLVM Support module and be used by all other LLVM modules. The
interface of these methods will be platform independent while their
implementation will be platform specific (like the Mutex class in Support
module). In cases where the platform does not support the functionality
natively, we are writing an implementation of these functions. For example, in
the case of memcpy the secure function will look like llvm::memcpy_secure.



Some secure functions require additional data that needs to be passed (like
buffer sizes). That information has to be added in all places of invocation. In
some cases, this requires an extra size_t argument to be passed through. Hence,
this change would not just be a one to one function refactoring. The attached
patch helps illustrate how an instance of memcpy would be modified.



Is this proposal of interest to the LLVM community? Can you also comment if the
approach specified is good to address this issue?



References:

[1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx

[2]
https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120919/7de4688a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_secure_function.patch
Type: application/octet-stream
Size: 6350 bytes
Desc: llvm_secure_function.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120919/7de4688a/attachment.obj>

Sean Silva

2012-Sep-19 01:25 UTC

head link

[LLVMdev] Handling of unsafe functions

I generally disagree with the approach.

Generally char* strings aren't recommended for use in LLVM and this
kind of string manipulation in LLVM shouldn't be done with the
primitive C library functions. The Programmer's Manual gives the
preferred types to use for strings [1] and all of them keep track of
length. There are also safe routines for creating and formatting
strings, such as raw_ostream which is used pervasively in LLVM.

The example routine in your patch probably should just use
raw_string_ostream or raw_svector_ostream, instead of relying on
C-style string routines. That way, the correctness is enforced by the
compiler, instead of manually laboring over these things (like
checking the return code, which your patch doesn't do...).

In other words, there are completely safe alternatives for these
functions for almost all cases.

One particular use case that usually pertains to memcpy though is when
performance is of significant concern and hence the author "knows what
they are doing" and aren't willing to sacrifice performance calling
into some "secure" version when they have other assurances that the
target buffer has sufficient space. The performance difference can be
significant, since usually memcpy will be turned into a compiler
builtin that the compiler recognizes and optimizes specially, whereas
with the suggested approach, a regular call into a "llvm::*_secure"
wrapper which then calls into the OS-provided general-purpose "secure"
version will happen.

I think that it would be useful if you used the output of your static
analyzer to provide a list of the places where C-style string
manipulation is being done, so that these places can be migrated to
using modern, safe LLVM interfaces for these operations.

[1] http://llvm.org/docs/ProgrammersManual.html#ds_string

--Sean Silva

On Tue, Sep 18, 2012 at 8:00 PM, Martinez, Javier E
<javier.e.martinez at intel.com> wrote:> Hello,
>
>
>
> We have identified functions in LLVM sources using a static code analyzer
> which are marked as a “security vulnerability”[1][2]. There has been work
> already done to address some of them for Linux (e.g. snprintf). We are
> attempting to solve this issue in a comprehensive fashion across all
> platforms. Most of the functions identified are for manipulating strings.
> Memcpy is the most commonly used of all these unsecure methods. The
> following table lists all these functions are their recommended secure
> alternatives.
>
>
>
> Recommended alternatives:
>
> Functions    Windows        Unix/Mac OS
>
> Memcpy     memcpy_s      -
>
> Sprint          sprintf_s         snprintf
>
> Sscanf         scanf_s            -
>
> _alloca        _malloca         -
>
> Strcat          strcat_s          strlcat
>
> Strcpy         strcpy_s          strlcpy
>
> Strtok         strtok_s           -
>
>
>
> The proposal is to add secure versions of these functions. These functions
> will be implemented in LLVM Support module and be used by all other LLVM
> modules. The interface of these methods will be platform independent while
> their implementation will be platform specific (like the Mutex class in
> Support module). In cases where the platform does not support the
> functionality natively, we are writing an implementation of these
functions.
> For example, in the case of memcpy the secure function will look like
> llvm::memcpy_secure.
>
>
>
> Some secure functions require additional data that needs to be passed (like
> buffer sizes). That information has to be added in all places of
invocation.
> In some cases, this requires an extra size_t argument to be passed through.
> Hence, this change would not just be a one to one function refactoring. The
> attached patch helps illustrate how an instance of memcpy would be
modified.
>
>
>
> Is this proposal of interest to the LLVM community? Can you also comment if
> the approach specified is good to address this issue?
>
>
>
> References:
>
> [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx
>
> [2]
>
https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Nick Lewycky

2012-Sep-19 09:10 UTC

head link

[LLVMdev] Handling of unsafe functions

Martinez, Javier E wrote:> Hello,
>
> We have identified functions in LLVM sources using a static code
> analyzer which are marked as a “security vulnerability”[1][2]. There has
> been work already done to address some of them for Linux (e.g.
> snprintf). We are attempting to solve this issue in a comprehensive
> fashion across all platforms. Most of the functions identified are for
> manipulating strings. Memcpy is the most commonly used of all these
> unsecure methods. The following table lists all these functions are
> their recommended secure alternatives.
>
> Recommended alternatives:
>
> Functions Windows Unix/Mac OS
>
> Memcpy memcpy_s -
>
> Sprint sprintf_s snprintf
>
> Sscanf scanf_s -
>
> _alloca _malloca -
>
> Strcat strcat_s strlcat
>
> Strcpy strcpy_s strlcpy
>
> Strtok strtok_s -
>
> The proposal is to add secure versions of these functions. These
> functions will be implemented in LLVM Support module and be used by all
> other LLVM modules. The interface of these methods will be platform
> independent while their implementation will be platform specific (like
> the Mutex class in Support module). In cases where the platform does not
> support the functionality natively, we are writing an implementation of
> these functions. For example, in the case of memcpy the secure function
> will look like llvm::memcpy_secure.
>
> Some secure functions require additional data that needs to be passed
> (like buffer sizes). That information has to be added in all places of
> invocation. In some cases, this requires an extra size_t argument to be
> passed through. Hence, this change would not just be a one to one
> function refactoring. The attached patch helps illustrate how an
> instance of memcpy would be modified.
>
> Is this proposal of interest to the LLVM community? Can you also comment
> if the approach specified is good to address this issue?
Personally, I'm not particularly interested in blanket replacement of 
memcpy with memcpy_s in the hopes that it might close a security hole. I 
am very interested in fixing any actual bugs. If it's easier to fix real 
bugs by aggressively using this additional layer, then that may well be 
the way to go, but before I agree to that, I've got a ton of questions 
to answer first.

What's the current error rate? How often are we seeing bugs in llvm that 
would be fixed if only we were calling "secure" functions?

What's the impact of calling the secure function? On Release builds and 
on Debug builds? On size and performance?

Why not rely on platforms to secure these functions? For instance, Linux 
and Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to 
know what the equivalent is there. What about existing tools like 
valgrind or ASAN?

What happens if memcpy_secure does detect an insecure memcpy? It's 
considered very rude for LLVM to terminate on the spot since it's often 
used as a library, so how do we handle the error? By calling 
llvm::report_fatal_error and hoping we don't recurse? What if it's a 
debug build and we'd like to see where the code went wrong?

How do you plan to enforce that the insecure functions aren't called?

Nick
> References:
>
> [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx
>
> [2]
>
https://developer.apple.com/library/mac/#documentation/Security/Conceptual/SecureCodingGuide/Articles/BufferOverflows.html
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Martinez, Javier E

2012-Sep-20 07:54 UTC

head link

[LLVMdev] Handling of unsafe functions

Hi Nick,

Thanks for taking the time to review the proposal. I'd like to stress out
that the purpose of the changes in the proposal are not to improve performance
or to fix existing bugs. The purpose is to catch instances where a buffer
overrun would happen with specific function calls. With the appropriate input
the buffer overrun could lead to undefined behavior or a crash.

What's the current error rate? How often are we seeing bugs in llvm that
would be fixed if only we were calling "secure" functions?
[JM] Zero, the motivation is not to fix exiting bugs but potential hidden ones.

What's the impact of calling the secure function? On Release builds and on
Debug builds? On size and performance?
[JM] I haven't compared the performance of the secure functions and
performance is not mentioned much on the pages I've looked at. However I
found at [1] a performance comparison between strcpy and strcpy_s. There appears
to be a performance penalty for the use of strcpy_s. In a real case it's
very likely that the penalty is hidden by bottlenecks in other places. If the
performance degradation is apparent then the proposal can be modified. For
example, the *_secure version could do only parameter validation and size
checks.

Why not rely on platforms to secure these functions? For instance, Linux and
Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to know
what the equivalent is there. What about existing tools like valgrind or ASAN?
[JM] I'm not too familiar with Linux specifics and the tools you mentioned
but from what I could gather FORTIFY_SOURCE doesn't cover all buffer
overflow cases. The cases it covers are exactly the ones I described above,
where the size is known at compile time. I don't know of a FORTIFY_SOURCE
equivalent for Windows. The proposed solution would work for both Windows and
Linux and cover more general cases of buffer overflows.

What happens if memcpy_secure does detect an insecure memcpy? It's
considered very rude for LLVM to terminate on the spot since it's often used
as a library, so how do we handle the error? By calling llvm::report_fatal_error
and hoping we don't recurse? What if it's a debug build and we'd
like to see where the code went wrong?
[JM] We use LLVM inside a DLL so I completely sympathize with you about
terminating execution on the spot. Although the patch doesn't address this I
think calling llvm::report_fatal_error() if the secure functions fail is a good
idea because instead of crashing now LLVM can exit gracefully or if an error
handler is available then a controlled situation. In Windows the secure
functions allow the use of an error handler to be called if parameter validation
fails. The custom error handler can dirently call report_fata_error().

How do you plan to enforce that the insecure functions aren't called?
[JM] How about modifying the LLVM programmer's manual should to add a
section about the use of secure functions? I can provide a blurb for it.

[1]
http://codeketchup.blogspot.com/2012/02/sprintf-strcpy-strncpy-strcpys-what-is.html

Thanks,
Javier

-----Original Message-----
From: Nick Lewycky [mailto:nicholas at mxc.ca] 
Sent: Wednesday, September 19, 2012 2:11 AM
To: Martinez, Javier E
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions

Martinez, Javier E wrote:> Hello,
>
> We have identified functions in LLVM sources using a static code 
> analyzer which are marked as a "security vulnerability"[1][2].
There
> has been work already done to address some of them for Linux (e.g.
> snprintf). We are attempting to solve this issue in a comprehensive 
> fashion across all platforms. Most of the functions identified are for 
> manipulating strings. Memcpy is the most commonly used of all these 
> unsecure methods. The following table lists all these functions are 
> their recommended secure alternatives.
>
> Recommended alternatives:
>
> Functions Windows Unix/Mac OS
>
> Memcpy memcpy_s -
>
> Sprint sprintf_s snprintf
>
> Sscanf scanf_s -
>
> _alloca _malloca -
>
> Strcat strcat_s strlcat
>
> Strcpy strcpy_s strlcpy
>
> Strtok strtok_s -
>
> The proposal is to add secure versions of these functions. These 
> functions will be implemented in LLVM Support module and be used by 
> all other LLVM modules. The interface of these methods will be 
> platform independent while their implementation will be platform 
> specific (like the Mutex class in Support module). In cases where the 
> platform does not support the functionality natively, we are writing 
> an implementation of these functions. For example, in the case of 
> memcpy the secure function will look like llvm::memcpy_secure.
>
> Some secure functions require additional data that needs to be passed 
> (like buffer sizes). That information has to be added in all places of 
> invocation. In some cases, this requires an extra size_t argument to 
> be passed through. Hence, this change would not just be a one to one 
> function refactoring. The attached patch helps illustrate how an 
> instance of memcpy would be modified.
>
> Is this proposal of interest to the LLVM community? Can you also 
> comment if the approach specified is good to address this issue?
Personally, I'm not particularly interested in blanket replacement of memcpy
with memcpy_s in the hopes that it might close a security hole. I am very
interested in fixing any actual bugs. If it's easier to fix real bugs by
aggressively using this additional layer, then that may well be the way to go,
but before I agree to that, I've got a ton of questions to answer first.

What's the current error rate? How often are we seeing bugs in llvm that
would be fixed if only we were calling "secure" functions?

What's the impact of calling the secure function? On Release builds and on
Debug builds? On size and performance?

Why not rely on platforms to secure these functions? For instance, Linux and
Darwin both have FORTIFY_SOURCE, and I'm too ignorant of Windows to know
what the equivalent is there. What about existing tools like valgrind or ASAN?

What happens if memcpy_secure does detect an insecure memcpy? It's
considered very rude for LLVM to terminate on the spot since it's often used
as a library, so how do we handle the error? By calling llvm::report_fatal_error
and hoping we don't recurse? What if it's a debug build and we'd
like to see where the code went wrong?

How do you plan to enforce that the insecure functions aren't called?

Nick
> References:
>
> [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx
>
> [2]
> https://developer.apple.com/library/mac/#documentation/Security/Concep
> tual/SecureCodingGuide/Articles/BufferOverflows.html
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Martinez, Javier E

2012-Sep-20 07:55 UTC

head link

[LLVMdev] Handling of unsafe functions

Hi Sean,

Thanks for the valued feedback. I agree with you that the containers available
in LLVM are preferable to char buffers but I want to point out that the proposal
doesn't add any new uses of char buffer and merely works with existing ones.
Changing existing uses of char buffers to other objects is beyond the scope of
this proposal. It makes more sense to do that when changes to code that uses
string manipulation functions are made as it could incur in larger design
changes.

I'm unsure of the performance impact of using the secure functions and how
to balance it with the benefit of improving the code quality. If the proposal
gets support I can gather performance data to make the determination of whether
there is a performance hit and if it's acceptable. Hoping that authors
"know what they are doing" is not enough. If that were the case there
wouldn't be bugs to fix and code to review.

I don't have the output of the static analyzer at hand but will provide it
on a follow up email.

Thanks,
Javier
-----Original Message-----
From: Sean Silva [mailto:silvas at purdue.edu] 
Sent: Tuesday, September 18, 2012 6:25 PM
To: Martinez, Javier E
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions

I generally disagree with the approach.

Generally char* strings aren't recommended for use in LLVM and this kind of
string manipulation in LLVM shouldn't be done with the primitive C library
functions. The Programmer's Manual gives the preferred types to use for
strings [1] and all of them keep track of length. There are also safe routines
for creating and formatting strings, such as raw_ostream which is used
pervasively in LLVM.

The example routine in your patch probably should just use raw_string_ostream or
raw_svector_ostream, instead of relying on C-style string routines. That way,
the correctness is enforced by the compiler, instead of manually laboring over
these things (like checking the return code, which your patch doesn't
do...).

In other words, there are completely safe alternatives for these functions for
almost all cases.

One particular use case that usually pertains to memcpy though is when
performance is of significant concern and hence the author "knows what they
are doing" and aren't willing to sacrifice performance calling into
some "secure" version when they have other assurances that the target
buffer has sufficient space. The performance difference can be significant,
since usually memcpy will be turned into a compiler builtin that the compiler
recognizes and optimizes specially, whereas with the suggested approach, a
regular call into a "llvm::*_secure"
wrapper which then calls into the OS-provided general-purpose "secure"
version will happen.

I think that it would be useful if you used the output of your static analyzer
to provide a list of the places where C-style string manipulation is being done,
so that these places can be migrated to using modern, safe LLVM interfaces for
these operations.

[1] http://llvm.org/docs/ProgrammersManual.html#ds_string

--Sean Silva

On Tue, Sep 18, 2012 at 8:00 PM, Martinez, Javier E <javier.e.martinez at
intel.com> wrote:> Hello,
>
>
>
> We have identified functions in LLVM sources using a static code 
> analyzer which are marked as a "security vulnerability"[1][2].
There
> has been work already done to address some of them for Linux (e.g. 
> snprintf). We are attempting to solve this issue in a comprehensive 
> fashion across all platforms. Most of the functions identified are for
manipulating strings.
> Memcpy is the most commonly used of all these unsecure methods. The 
> following table lists all these functions are their recommended secure 
> alternatives.
>
>
>
> Recommended alternatives:
>
> Functions    Windows        Unix/Mac OS
>
> Memcpy     memcpy_s      -
>
> Sprint          sprintf_s         snprintf
>
> Sscanf         scanf_s            -
>
> _alloca        _malloca         -
>
> Strcat          strcat_s          strlcat
>
> Strcpy         strcpy_s          strlcpy
>
> Strtok         strtok_s           -
>
>
>
> The proposal is to add secure versions of these functions. These 
> functions will be implemented in LLVM Support module and be used by 
> all other LLVM modules. The interface of these methods will be 
> platform independent while their implementation will be platform 
> specific (like the Mutex class in Support module). In cases where the 
> platform does not support the functionality natively, we are writing an
implementation of these functions.
> For example, in the case of memcpy the secure function will look like 
> llvm::memcpy_secure.
>
>
>
> Some secure functions require additional data that needs to be passed 
> (like buffer sizes). That information has to be added in all places of
invocation.
> In some cases, this requires an extra size_t argument to be passed through.
> Hence, this change would not just be a one to one function 
> refactoring. The attached patch helps illustrate how an instance of memcpy
would be modified.
>
>
>
> Is this proposal of interest to the LLVM community? Can you also 
> comment if the approach specified is good to address this issue?
>
>
>
> References:
>
> [1] http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx
>
> [2]
> https://developer.apple.com/library/mac/#documentation/Security/Concep
> tual/SecureCodingGuide/Articles/BufferOverflows.html
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Dmitri Gribenko

2012-Sep-20 10:01 UTC

head link

[LLVMdev] Handling of unsafe functions

On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E
<javier.e.martinez at intel.com> wrote:> We have identified functions in LLVM sources using a static code analyzer
> which are marked as a “security vulnerability”[1][2]. There has been work
> already done to address some of them for Linux (e.g. snprintf). We are
> attempting to solve this issue in a comprehensive fashion across all
> platforms. Most of the functions identified are for manipulating strings.
> Memcpy is the most commonly used of all these unsecure methods. The
> following table lists all these functions are their recommended secure
> alternatives.
I am strongly opposed to using *_s functions.  The issue is that they
are no more "secure" than original functions.  One can still pass the
destination buffer length incorrectly, especially if it is not known
at compile time and should be computed.

I agree with Sean that we should move the code using C strings to LLVM
safe data types.

And one more thing: it is interesting that the "unsafe"
APFloat::convertToHexString (from your patch) is not used anywhere.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at
gmail.com>*/

Chris Lattner

2012-Sep-20 17:13 UTC

head link

[LLVMdev] Handling of unsafe functions

On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com>
wrote:
> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E
> <javier.e.martinez at intel.com> wrote:
>> We have identified functions in LLVM sources using a static code
analyzer
>> which are marked as a “security vulnerability”[1][2]. There has been
work
>> already done to address some of them for Linux (e.g. snprintf). We are
>> attempting to solve this issue in a comprehensive fashion across all
>> platforms. Most of the functions identified are for manipulating
strings.
>> Memcpy is the most commonly used of all these unsecure methods. The
>> following table lists all these functions are their recommended secure
>> alternatives.
> 
> I am strongly opposed to using *_s functions.  The issue is that they
> are no more "secure" than original functions.  One can still pass
the
> destination buffer length incorrectly, especially if it is not known
> at compile time and should be computed.
> 
> I agree with Sean that we should move the code using C strings to LLVM
> safe data types.
I agree.
> 
> And one more thing: it is interesting that the "unsafe"
> APFloat::convertToHexString (from your patch) is not used anywhere.
Zap it!  Oh wait, is it used by Clang or something else?

-Chris

Joerg Sonnenberger

2012-Sep-27 10:26 UTC

head link

[LLVMdev] Handling of unsafe functions

On Wed, Sep 19, 2012 at 12:00:50AM +0000, Martinez, Javier E
wrote:> We have identified functions in LLVM sources using a static code
> analyzer which are marked as a "security vulnerability"[1][2].
> 
> Recommended alternatives:
> 
> Functions    Windows        Unix/Mac OS
> 
> Memcpy     memcpy_s      -...

Please fill bug reports for your tool. memcpy operates on explicitly
bounded objects, unlikely e.g. strcat/strcpy. Marking them as deprecated
is just as buggy. From the rest of your list, strtok has some issues,
but it is generally safe to use too. The replacements are not an
improvement at all. First time I saw the annex K (?) from C11, I was
thinking like "Who pushed this crap into the standard, Microsoft?".

Joerg

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Sep 2012 - [LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

[LLVMdev] Handling of unsafe functions

Maybe Matching Threads