On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote:> Could you maybe give some example use cases?A use case I am interested in is to take a large application and use this instrumentation as a tool to help monitor how data flows from its inputs (sources) to its outputs (sinks). This has applications from a privacy/security perspective in that one can audit how a sensitive data item is used within a program and ensure it isn't exiting the program anywhere it shouldn't be. An ASPLOS paper from a few years ago discusses this problem and a solution based on dynamic binary instrumentation using QEMU: http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf Among other things, I hope to address a number of deficiencies of the tool described by that paper, in terms of efficiency (the other sanitizer tools have shown that compiler-based instrumentation can be much more efficient than binary instrumentation), and also in terms of accuracy (unlike the system described in that paper, we track data accurately through join points using union labels). There are other applications outside of security. For example, one could use this instrumentation pass (or a variant of it) to tag opposite-endian integers in memory, and check that no opposite-endian integer is loaded or otherwise used directly without first going through a conversion.> Also, "sanitizer" may not be the best name for this, since it doesn't > really sanitize anything.As Reid mentioned, a goal is to build sanitizer-like tools on top of this instrumentation. Not only that, but one of the things that an application can do is turn on its own sources and sinks in response to the instrumentation being enabled (via the __has_feature macro). So really, -fsanitize=dataflow would be the flag that turns on data-flow sanitization for an application designed for it. And should the component of the compiler that allows this data-flow sanitization be named any differently? Thanks, -- Peter
It is interesting. I can see some use cases with such a tool. To me, source-level implementation is not as accurate as binary translation. For instance, it is hard to check the taint for return addresses since there is no concept of return instructions on source level. The stack does not appear until later. For a security mechanism, return addresses need to be protected. On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:> On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote: > > Could you maybe give some example use cases? > > A use case I am interested in is to take a large application and use > this instrumentation as a tool to help monitor how data flows from its > inputs (sources) to its outputs (sinks). This has applications from > a privacy/security perspective in that one can audit how a sensitive > data item is used within a program and ensure it isn't exiting the > program anywhere it shouldn't be. > > An ASPLOS paper from a few years ago discusses this problem and a > solution based on dynamic binary instrumentation using QEMU: > > http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf > > Among other things, I hope to address a number of deficiencies of > the tool described by that paper, in terms of efficiency (the other > sanitizer tools have shown that compiler-based instrumentation can be > much more efficient than binary instrumentation), and also in terms > of accuracy (unlike the system described in that paper, we track data > accurately through join points using union labels). > > There are other applications outside of security. For example, > one could use this instrumentation pass (or a variant of it) to tag > opposite-endian integers in memory, and check that no opposite-endian > integer is loaded or otherwise used directly without first going > through a conversion. > > > Also, "sanitizer" may not be the best name for this, since it doesn't > > really sanitize anything. > > As Reid mentioned, a goal is to build sanitizer-like tools on top of > this instrumentation. Not only that, but one of the things that an > application can do is turn on its own sources and sinks in response > to the instrumentation being enabled (via the __has_feature macro). > So really, -fsanitize=dataflow would be the flag that turns on > data-flow sanitization for an application designed for it. And should > the component of the compiler that allows this data-flow sanitization > be named any differently? > > Thanks, > -- > Peter > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/706bd64d/attachment.html>
This tool isn't for stack protection; there are other tools for that. In general the tool isn't currently focused on defending against adversaries -- it would be trivial to write a program that accesses shadow memory directly in order to produce incorrect results, not to mention "tag scrubbers" which use control flow to remove tags (see section 6 of the ASPLOS paper). On Fri, Jun 14, 2013 at 01:23:22PM -0700, Bin Tzeng wrote:> It is interesting. I can see some use cases with such a tool. To me, > source-level implementation > is not as accurate as binary translation. For instance, it is hard to check > the taint for return addresses > since there is no concept of return instructions on source level. The stack > does not appear until later. > For a security mechanism, return addresses need to be protected. > > On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <peter at pcc.me.uk>wrote: > > > On Thu, Jun 13, 2013 at 03:13:37PM -0700, Sean Silva wrote: > > > Could you maybe give some example use cases? > > > > A use case I am interested in is to take a large application and use > > this instrumentation as a tool to help monitor how data flows from its > > inputs (sources) to its outputs (sinks). This has applications from > > a privacy/security perspective in that one can audit how a sensitive > > data item is used within a program and ensure it isn't exiting the > > program anywhere it shouldn't be. > > > > An ASPLOS paper from a few years ago discusses this problem and a > > solution based on dynamic binary instrumentation using QEMU: > > > > http://www.cs.ucsb.edu/~sherwood/pubs/ASPLOS-08-systemtomography.pdf > > > > Among other things, I hope to address a number of deficiencies of > > the tool described by that paper, in terms of efficiency (the other > > sanitizer tools have shown that compiler-based instrumentation can be > > much more efficient than binary instrumentation), and also in terms > > of accuracy (unlike the system described in that paper, we track data > > accurately through join points using union labels). > > > > There are other applications outside of security. For example, > > one could use this instrumentation pass (or a variant of it) to tag > > opposite-endian integers in memory, and check that no opposite-endian > > integer is loaded or otherwise used directly without first going > > through a conversion. > > > > > Also, "sanitizer" may not be the best name for this, since it doesn't > > > really sanitize anything. > > > > As Reid mentioned, a goal is to build sanitizer-like tools on top of > > this instrumentation. Not only that, but one of the things that an > > application can do is turn on its own sources and sinks in response > > to the instrumentation being enabled (via the __has_feature macro). > > So really, -fsanitize=dataflow would be the flag that turns on > > data-flow sanitization for an application designed for it. And should > > the component of the compiler that allows this data-flow sanitization > > be named any differently? > > > > Thanks, > > -- > > Peter > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Peter
On Fri, Jun 14, 2013 at 10:43 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:> > So really, -fsanitize=dataflow would be the flag that turns on > data-flow sanitization for an application designed for it. And should > the component of the compiler that allows this data-flow sanitization > be named any differently? >Excellent point. I agree with your reasoning. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/ebe2dac4/attachment.html>
15.06.2013, 00:53, "Bin Tzeng" <bintzeng at gmail.com>:> It is interesting. I can see some use cases with such a tool. To me, source-level implementation > is not as accurate as binary translation. For instance, it is hard to check the taint for return addresses > since there is no concept of return instructions on source level.Well, on many architectures there is no concept of return instruction on ISA level too :) -- Regards, Konstantin