Kostya Serebryany
2012-Jun-26 10:43 UTC
[LLVMdev] Proposed Enhancement to AddressSanitizer: Initialization Order
+llvmdev, -llvm-dev On Tue, Jun 26, 2012 at 2:28 PM, Kostya Serebryany <kcc at google.com> wrote:> Hi Reid, > > On Tue, Jun 26, 2012 at 4:30 AM, Reid Watson <reidw at google.com> wrote: > >> Hello, >> >> I'm starting work on a project to detect initialization order problems >> in C++ files using AddressSanitizer. >> The extension in question will hopefully result in AddressSanitizer >> being able to detect initializers which read an undefined value from a >> static or global variable defined in another TU. >> I'm currently working on this as a patch to AddressSanitizer, but I'm >> open to suggestions as to what the proper way to implement this >> extension would be. >> >> One of the simplest examples of this is the following example: >> It is undefined what this program will output, and it's fairly easy to >> see this behavior. >> >> When compiled as: >> $ clang++ file_1.cpp file_2.cpp main.cpp >> $./a.out >> x: 2 >> y: 1 >> >> However, when compiled as: >> $ clang++ file_2.cpp file_1.cpp main.cpp >> $./a.out >> x: 1 >> y: 2 >> >> //file_1.cpp >> extern int y; >> int x = y + 1; >> >> //file_2.cpp >> extern int x; >> int y = x + 1; >> >> //main.cpp >> #include <iostream> >> extern int x,y; >> >> int main(){ >> std::cout << "x: " << x << std::endl; >> std::cout << "y: " << y << std::endl; >> } >> >> Here's a sketch of the detection algorithm: >> For each TU: >> 1. Before each TU's initializers run, conditionally poison the >> global variable shadow memory >> -Each global variable is poisoned, unless it was defined in that >> TU >> -Additional information is added to struct __asan_global to >> identify which TU a global was declared in >> > > This could be tricky. > First, we don't want to poison the linker-initialized globals because they > are always initialized regardless the TU order. > > Second, consider we have 3 TUs, t1, t2, and t3, each has a global (g1, g2 > and g3) with initializer. > When we are running initializers in t2, we need to poison g1 and g3, but > so far we have seen only g1. > I don't know any good and portable way to get g3. > > One solution is to run the binary twice: once with the default order of TU > initializers, and second time with the reverted order (not sure if that's > easy). > > We probably don't want to do all that when initializing globals in a > dlopen-ed library, or in any situation when we have multiple threads. > > > >> 2. Instrument all reads and writes in global initializers >> > > This has been fixed today (thanks Nick!) > > >> 3. After each TU's initializers run, we unpoison the shadow memory >> for all global variables >> > > Once we know what globals we need to poison, un-poisoning them is trivial. > > >> >> Note that once main has started running, AddressSanitizer will run >> normally. This will result in AddressSanitizer catching all >> reads/writes to global variables defined in other TUs. >> We run all of AddressSanitizer after initialization because we cannot >> know prior to the completion of initialization which functions will be >> called from initializers. >> >> I'd welcome any feedback on this proposal! >> > > Sounds cool, make it happen! > > --kcc > > >> >> All the best, >> Reid >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120626/a0aa4dde/attachment.html>
Kostya Serebryany
2012-Jun-26 11:15 UTC
[LLVMdev] Proposed Enhancement to AddressSanitizer: Initialization Order
On Tue, Jun 26, 2012 at 2:43 PM, Kostya Serebryany <kcc at google.com> wrote:> +llvmdev, -llvm-dev > > > On Tue, Jun 26, 2012 at 2:28 PM, Kostya Serebryany <kcc at google.com> wrote: > >> Hi Reid, >> >> On Tue, Jun 26, 2012 at 4:30 AM, Reid Watson <reidw at google.com> wrote: >> >>> Hello, >>> >>> I'm starting work on a project to detect initialization order problems >>> in C++ files using AddressSanitizer. >>> The extension in question will hopefully result in AddressSanitizer >>> being able to detect initializers which read an undefined value from a >>> static or global variable defined in another TU. >>> I'm currently working on this as a patch to AddressSanitizer, but I'm >>> open to suggestions as to what the proper way to implement this >>> extension would be. >>> >>> One of the simplest examples of this is the following example: >>> It is undefined what this program will output, and it's fairly easy to >>> see this behavior. >>> >>> When compiled as: >>> $ clang++ file_1.cpp file_2.cpp main.cpp >>> $./a.out >>> x: 2 >>> y: 1 >>> >>> However, when compiled as: >>> $ clang++ file_2.cpp file_1.cpp main.cpp >>> $./a.out >>> x: 1 >>> y: 2 >>> >>> //file_1.cpp >>> extern int y; >>> int x = y + 1; >>> >>> //file_2.cpp >>> extern int x; >>> int y = x + 1; >>> >>> //main.cpp >>> #include <iostream> >>> extern int x,y; >>> >>> int main(){ >>> std::cout << "x: " << x << std::endl; >>> std::cout << "y: " << y << std::endl; >>> } >>> >>> Here's a sketch of the detection algorithm: >>> For each TU: >>> 1. Before each TU's initializers run, conditionally poison the >>> global variable shadow memory >>> -Each global variable is poisoned, unless it was defined in that >>> TU >>> -Additional information is added to struct __asan_global to >>> identify which TU a global was declared in >>> >> >> This could be tricky. >> First, we don't want to poison the linker-initialized globals because >> they are always initialized regardless the TU order. >> >> Second, consider we have 3 TUs, t1, t2, and t3, each has a global (g1, g2 >> and g3) with initializer. >> When we are running initializers in t2, we need to poison g1 and g3, but >> so far we have seen only g1. >> I don't know any good and portable way to get g3. >> >> One solution is to run the binary twice: once with the default order of >> TU initializers, and second time with the reverted order (not sure if >> that's easy). >> >Or it might be a bit simpler... Currently, asan creates an unnamed linker-initialized global array for all instrumented globals in a given module. % cat glob.cc int foo(); int bar(); int AAA = foo(); int BBB = bar(); % clang -O2 -faddress-sanitizer -S -o - -emit-llvm glob.cc ... @ 2 = private global [2 x { i64, i64, i64, i64 }] [{ i64, i64, i64, i64 } { i64 ptrtoint ({ i32, [60 x i8] }* @AAA to i64), i64 4, i64 64, i64 ptrtoint ([14 x i8]* @0 to i64) }, { i64, i64, i64, i64 } { i64 ptrtoint ({ i32, [60 x i8] }* @BBB to i64), i64 4, i64 64, i64 ptrtoint ([14 x i8]* @1 to i64) }] ... If we make this array discoverable by other modules (using appending linkage?), the problem is solved. --kcc> >> We probably don't want to do all that when initializing globals in a >> dlopen-ed library, or in any situation when we have multiple threads. >> >> >> >>> 2. Instrument all reads and writes in global initializers >>> >> >> This has been fixed today (thanks Nick!) >> >> >>> 3. After each TU's initializers run, we unpoison the shadow memory >>> for all global variables >>> >> >> Once we know what globals we need to poison, un-poisoning them is >> trivial. >> >> >>> >>> Note that once main has started running, AddressSanitizer will run >>> normally. This will result in AddressSanitizer catching all >>> reads/writes to global variables defined in other TUs. >>> We run all of AddressSanitizer after initialization because we cannot >>> know prior to the completion of initialization which functions will be >>> called from initializers. >>> >>> I'd welcome any feedback on this proposal! >>> >> >> Sounds cool, make it happen! >> >> --kcc >> >> >>> >>> All the best, >>> Reid >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120626/36fe62b1/attachment.html>
Alexander Potapenko
2012-Jun-26 11:25 UTC
[LLVMdev] Proposed Enhancement to AddressSanitizer: Initialization Order
On Tue, Jun 26, 2012 at 2:43 PM, Kostya Serebryany <kcc at google.com> wrote:> +llvmdev, -llvm-dev > > On Tue, Jun 26, 2012 at 2:28 PM, Kostya Serebryany <kcc at google.com> wrote: >> >> Hi Reid, >> >> On Tue, Jun 26, 2012 at 4:30 AM, Reid Watson <reidw at google.com> wrote: >>> >>> Hello, >>> >>> I'm starting work on a project to detect initialization order problems >>> in C++ files using AddressSanitizer. >>> The extension in question will hopefully result in AddressSanitizer >>> being able to detect initializers which read an undefined value from a >>> static or global variable defined in another TU. >>> I'm currently working on this as a patch to AddressSanitizer, but I'm >>> open to suggestions as to what the proper way to implement this >>> extension would be. >>> >>> One of the simplest examples of this is the following example: >>> It is undefined what this program will output, and it's fairly easy to >>> see this behavior. >>> >>> When compiled as: >>> $ clang++ file_1.cpp file_2.cpp main.cpp >>> $./a.out >>> x: 2 >>> y: 1 >>> >>> However, when compiled as: >>> $ clang++ file_2.cpp file_1.cpp main.cpp >>> $./a.out >>> x: 1 >>> y: 2 >>> >>> //file_1.cpp >>> extern int y; >>> int x = y + 1; >>> >>> //file_2.cpp >>> extern int x; >>> int y = x + 1; >>> >>> //main.cpp >>> #include <iostream> >>> extern int x,y; >>> >>> int main(){ >>> std::cout << "x: " << x << std::endl; >>> std::cout << "y: " << y << std::endl; >>> } >>> >>> Here's a sketch of the detection algorithm: >>> For each TU: >>> 1. Before each TU's initializers run, conditionally poison the >>> global variable shadow memory >>> -Each global variable is poisoned, unless it was defined in that >>> TU >>> -Additional information is added to struct __asan_global to >>> identify which TU a global was declared in >> >> >> This could be tricky. >> First, we don't want to poison the linker-initialized globals because they >> are always initialized regardless the TU order. >> >> Second, consider we have 3 TUs, t1, t2, and t3, each has a global (g1, g2 >> and g3) with initializer. >> When we are running initializers in t2, we need to poison g1 and g3, but >> so far we have seen only g1. >> I don't know any good and portable way to get g3. >> >> One solution is to run the binary twice: once with the default order of TU >> initializers, and second time with the reverted order (not sure if that's >> easy). >>As we've discussed offline, it may be easy to wrap and re-implement __libc_global_ctors (which essentially iterates over __CTOR_LIST__ and calls the ctors for each module in the linkage order). We can then shuffle the ctors in any order we want, e.g. explicitly ask for reverse order. Other means of changing the ctor order may require relinking the binary. We'll just need to associate each pointer in the __CTOR_LIST__ with the corresponding per-module structure that describes the globals, poison all the globals in a certain module after its ctor has been called, and unpoison all the globals after __libc_global_ctors is done.