Manman Ren via llvm-dev
2020-Oct-12 15:52 UTC
[llvm-dev] [RFC] Analysis and runtime check of objc_direct/objc_non_runtime_protocol
Currently diagnostics related to objc_direct/objc_non_runtime_protocol are done at each compilation unit. Even with these diagnostics, we will see issues at runtime - Due to lack of global scope for our diagnostics. We can catch these issues with a global analysis. - Due to usages of dynamic APIs that require the metadata. We can catch these issues with runtime checks. Builds with runtime check enabled will execute extra code at runtime to catch issues. Global Analysis #1 We will get a runtime error if a method is marked as direct but some callsites see the method as not direct. One example is with private and public headers: // public.h: @interface A @end // private.h: @interface A (Private) - (void)foo; @end // impl.mm: #include “public.h” @implementation A - (void)foo __attribute__((objc_direct)) {} @end // test.mm: #include “private.h” void test(A *a) { [a foo]; } → When compiling test.mm, foo is not treated as direct, and IR codegen will generate a call to bjc_msgSend. When compiling impl.mm, foo is direct and is removed from the ObjC metadata. We should emit an error message from the global analysis if there exists an objc_msgSend callsite that targets a direct method. With this global analysis, we can claim the invariant that within the link unit, all callsites for a direct method are converted to direct calls. What information do we need for the global analysis? List of direct methods (class-name selector-name) List of objc_msgSend callsites from IRGen (static-receiver-type selector-name) Global Analysis #2 We want to make sure a direct method is the single definition along the class hierarchy globally. This is checked for each TU via clang diagnostics. Within a single TU, we have the following diagnostics: - can't override a method that is declared direct by a superclass - methods that override superclass methods cannot be direct These diagnostics guarantee that if a method is direct, there is no override along the class hierarchy when compiling each source file. If we have a private header and a public header for a class, and we extend the class using the public header, current clang diagnostics will not report an issue. With this global analysis, we can claim the invariant that a direct method is the single definition along the class hierarchy globally. This will reduce runtime issues caused by overrides. What information do we need for the global analysis? List of direct methods (class-name selector-name) Class hierarchy information (class-name base-class-name) Global Analysis #3 A class that declares conformance to a protocol does not need to see the protocol definition in order to compile successful. Thus we can end up with a situation as shown below: // proto.h __attribute__((objc_non_runtime_protocol)) @protocol Static - (void) doThing; @end // source.m @protocol Static; @interface Something : Root<Static> - (void) doThing; @end @implementation Something - (void) doThing { ... } @end Compiling source.m will generate a reference to the protocol metadata for Static even though it was defined as non-runtime. This will emit a warning at compile time but a developer might choose to ignore it and face link errors. This will end up with a linker error for a missing symbol. A potentially better warning can be given to developers via global analysis saying protocol "Static" was defined to be non-runtime in file "SomeFileThatIncludedProtoDotH.m" to and dynamic in "source.m". This example problem can also expand to protocol inheritance hierarchies. Given an arbitrary hierarchy of runtime and non-runtime protocols and forward declarations of protocols, the compiler could potentially emit the wrong references given misleading forward decls. This can also be much more accurately diagnosed via global analysis. Runtime Check The goal of runtime checks is to catch the usages of dynamic APIs targeting direct methods at runtime, e.g, respondsToSelector, performSelector, etc. The idea is to intercept Objc runtime’s invocation of resolveInstanceMethod and resolveClassMethod , which is responsible for resolving methods that aren’t recorded in the metadata, and check if the unresolved method is a direct method. It requires we collect the list of direct method names, and the names of the classes where they’re defined. There is another category of runtime issues which runtime check can potentially catch: the direct call calls the incorrect direct method due to 1. usage of class_addMethod 2. random type casting at the call site This can be an incremental improvement outside of this RFC. Implementation Options We can potentially implement the analysis/runtime check with and without LTO. It is better to implement runtime check without LTO as we are expecting developers to use runtime-check build locally. LTO will slow down developer’s working cycle. For analysis, it is not clear which option is better. noLTO Implementation We create a special section containing those information for each TU. This section for the link unit will have the combined information. We then post-process this information to emit error messages. For each direct method, we create a struct containing the method name and the name of the class where it’s defined, e.g. // struct _objc_direct_method { // char *class_name; // char *method_name // } Then each class will have a list of _objc_direct_method structs, which can be put in a special section, e.g constant [4 x %struct._objc_direct_method][//the list of the structs],section "__DATA,__direct_method" This special section for the link unit will have the combined information for all TUs in this link unit. At runtime, the intercepted resolveInstanceMethod and resolveClassMethod invocation can then read and check against this section. If it finds the unresolved method is a direct method, it emits an error message of this direct method cannot be used dynamically at runtime. ThinLTO Implementation We add information in ModuleSummaryIndex and a ThinLTO analysis pass to analyze the combined Summary and emit error messages. We will upload patches for noLTO/runtime-check if there is no negative feedback about the RFC! Feedback/suggestions are welcome! Thanks, Manman -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201012/cb40382c/attachment-0001.html>
Steven Wu via llvm-dev
2020-Oct-13 15:36 UTC
[llvm-dev] [RFC] Analysis and runtime check of objc_direct/objc_non_runtime_protocol
> On Oct 12, 2020, at 8:52 AM, Manman Ren via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Currently diagnostics related to objc_direct/objc_non_runtime_protocol are done at each compilation unit. Even with these diagnostics, we will see issues at runtime > Due to lack of global scope for our diagnostics. We can catch these issues with a global analysis. > Due to usages of dynamic APIs that require the metadata. We can catch these issues with runtime checks. Builds with runtime check enabled will execute extra code at runtime to catch issues. > Global Analysis #1 > > We will get a runtime error if a method is marked as direct but some callsites see the method as not direct. One example is with private and public headers: > // public.h: > @interface A > @end > > // private.h: > @interface A (Private) > - (void)foo; > @end > > // impl.mm <http://impl.mm/>: > #include “public.h” > @implementation A > - (void)foo __attribute__((objc_direct)) {} > @end > > > // test.mm <http://test.mm/>: > #include “private.h” > void test(A *a) { [a foo]; } > → When compiling test.mm <http://test.mm/>, foo is not treated as direct, and IR codegen will generate a call to bjc_msgSend. When compiling impl.mm <http://impl.mm/>, foo is direct and is removed from the ObjC metadata. > > We should emit an error message from the global analysis if there exists an objc_msgSend callsite that targets a direct method. With this global analysis, we can claim the invariant that within the link unit, all callsites for a direct method are converted to direct calls. > What information do we need for the global analysis? > > List of direct methods (class-name selector-name) > List of objc_msgSend callsites from IRGen (static-receiver-type selector-name) > Global Analysis #2 > > We want to make sure a direct method is the single definition along the class hierarchy globally. This is checked for each TU via clang diagnostics. > > Within a single TU, we have the following diagnostics: > can't override a method that is declared direct by a superclass > methods that override superclass methods cannot be direct > These diagnostics guarantee that if a method is direct, there is no override along the class hierarchy when compiling each source file. > > If we have a private header and a public header for a class, and we extend the class using the public header, current clang diagnostics will not report an issue. > > With this global analysis, we can claim the invariant that a direct method is the single definition along the class hierarchy globally. This will reduce runtime issues caused by overrides. > What information do we need for the global analysis? > > List of direct methods (class-name selector-name) > Class hierarchy information (class-name base-class-name) > Global Analysis #3 > > A class that declares conformance to a protocol does not need to see the protocol definition in order to compile successful. Thus we can end up with a situation as shown below: > > // proto.h > __attribute__((objc_non_runtime_protocol)) > @protocol Static > - (void) doThing; > @end > > // source.m > @protocol Static; > @interface Something : Root<Static> > - (void) doThing; > @end > @implementation Something > - (void) doThing { ... } > @end > Compiling source.m will generate a reference to the protocol metadata for Static even though it was defined as non-runtime. This will emit a warning at compile time but a developer might choose to ignore it and face link errors. This will end up with a linker error for a missing symbol. > > A potentially better warning can be given to developers via global analysis saying protocol "Static" was defined to be non-runtime in file "SomeFileThatIncludedProtoDotH.m" to and dynamic in "source.m". > > This example problem can also expand to protocol inheritance hierarchies. Given an arbitrary hierarchy of runtime and non-runtime protocols and forward declarations of protocols, the compiler could potentially emit the wrong references given misleading forward decls. This can also be much more accurately diagnosed via global analysis. > Runtime Check > > The goal of runtime checks is to catch the usages of dynamic APIs targeting direct methods at runtime, e.g, respondsToSelector, performSelector, etc. The idea is to intercept Objc runtime’s invocation of resolveInstanceMethod and resolveClassMethod , which is responsible for resolving methods that aren’t recorded in the metadata, and check if the unresolved method is a direct method. It requires we collect the list of direct method names, and the names of the classes where they’re defined. > There is another category of runtime issues which runtime check can potentially catch: the direct call calls the incorrect direct method due to > usage of class_addMethod > random type casting at the call site > This can be an incremental improvement outside of this RFC. > Implementation Options > > We can potentially implement the analysis/runtime check with and without LTO. It is better to implement runtime check without LTO as we are expecting developers to use runtime-check build locally. LTO will slow down developer’s working cycle. For analysis, it is not clear which option is better. > noLTO Implementation > > We create a special section containing those information for each TU. This section for the link unit will have the combined information. We then post-process this information to emit error messages. > For each direct method, we create a struct containing the method name and the name of the class where it’s defined, e.g. > // struct _objc_direct_method { > // char *class_name; > // char *method_name > // } > Then each class will have a list of _objc_direct_method structs, which can be put in a special section, e.g > constant [4 x %struct._objc_direct_method][//the list of the structs],section "__DATA,__direct_method" > This special section for the link unit will have the combined information for all TUs in this link unit. At runtime, the intercepted resolveInstanceMethod and resolveClassMethod invocation can then read and check against this section. If it finds the unresolved method is a direct method, it emits an error message of this direct method cannot be used dynamically at runtime. > ThinLTO Implementation > > We add information in ModuleSummaryIndex and a ThinLTO analysis pass to analyze the combined Summary and emit error messages. > > > We will upload patches for noLTO/runtime-check if there is no negative feedback about the RFC! Feedback/suggestions are welcome!Some of my thought about runtime check options. * LTO/ThinLTO option doesn't sound attractive. This is because it doesn't catch violations across link unit, which is probably more worrisome than within the link unit. * The noLTO implementation is paying a cost of code size to enable runtime check. The struct you proposed is only marginally better than a normal `method64_t` type. What if you just emit `method64_t` struct for direct methods, which `imp` points to nullptr or a function that calls abort with proper error message? That should result in crash when a message is sent to direct method and it requires no or minimal objc runtime change. Steven> > Thanks, > Manman > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201013/126a7fba/attachment.html>
Manman Ren via llvm-dev
2020-Oct-14 00:29 UTC
[llvm-dev] [RFC] Analysis and runtime check of objc_direct/objc_non_runtime_protocol
Hi Steven, Thanks for the suggestion! In our current proposal, we will swizzle resolveInstanceMethod to a runtime check version that does additional check against the list of direct methods. The code size overhead should be minimal, one per annotated class. Your proposal is pretty interesting. We will get an error message that shows a certain direct method is the issue, and we can probably grab the selector name somehow. It may not be easy to get the class name for the direct method. Sharon: what do you think? Manman On Tue, Oct 13, 2020 at 8:36 AM Steven Wu <stevenwu at apple.com> wrote:> > > On Oct 12, 2020, at 8:52 AM, Manman Ren via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Currently diagnostics related to objc_direct/objc_non_runtime_protocol are > done at each compilation unit. Even with these diagnostics, we will see > issues at runtime > > - Due to lack of global scope for our diagnostics. We can catch these > issues with a global analysis. > - Due to usages of dynamic APIs that require the metadata. We can > catch these issues with runtime checks. Builds with runtime check enabled > will execute extra code at runtime to catch issues. > > Global Analysis #1 We will get a runtime error if a method is marked as > direct but some callsites see the method as not direct. One example is with > private and public headers: > > // public.h: > @interface A > @end > > // private.h: > @interface A (Private) > - (void)foo; > @end > > // impl.mm: > #include “public.h” > @implementation A > - (void)foo __attribute__((objc_direct)) {} > @end > > > // test.mm: > #include “private.h” > void test(A *a) { [a foo]; } > > → When compiling test.mm, foo is not treated as direct, and IR codegen > will generate a call to bjc_msgSend. When compiling impl.mm, foo is > direct and is removed from the ObjC metadata. > > We should emit an error message from the global analysis if there exists > an objc_msgSend callsite that targets a direct method. With this global > analysis, we can claim the invariant that within the link unit, all > callsites for a direct method are converted to direct calls. > What information do we need for the global analysis? List of direct > methods (class-name selector-name) > List of objc_msgSend callsites from IRGen (static-receiver-type > selector-name) > Global Analysis #2 We want to make sure a direct method is the single > definition along the class hierarchy globally. This is checked for each TU > via clang diagnostics. > > Within a single TU, we have the following diagnostics: > > - can't override a method that is declared direct by a superclass > > > - methods that override superclass methods cannot be direct > > These diagnostics guarantee that if a method is direct, there is no > override along the class hierarchy when compiling each source file. > > If we have a private header and a public header for a class, and we extend > the class using the public header, current clang diagnostics will not > report an issue. > > With this global analysis, we can claim the invariant that a direct method > is the single definition along the class hierarchy globally. This will > reduce runtime issues caused by overrides. > What information do we need for the global analysis? List of direct > methods (class-name selector-name) > Class hierarchy information (class-name base-class-name) > Global Analysis #3 A class that declares conformance to a protocol does > not need to see the protocol definition in order to compile successful. > Thus we can end up with a situation as shown below: > > // proto.h > __attribute__((objc_non_runtime_protocol)) > @protocol Static > - (void) doThing; > @end > > // source.m > @protocol Static; > @interface Something : Root<Static> > - (void) doThing; > @end > @implementation Something > - (void) doThing { ... } > @end > > Compiling source.m will generate a reference to the protocol metadata for > Static even though it was defined as non-runtime. This will emit a > warning at compile time but a developer might choose to ignore it and face > link errors. This will end up with a linker error for a missing symbol. > > A potentially better warning can be given to developers via global > analysis saying protocol "Static" was defined to be non-runtime in file > "SomeFileThatIncludedProtoDotH.m" to and dynamic in "source.m". > > This example problem can also expand to protocol inheritance hierarchies. > Given an arbitrary hierarchy of runtime and non-runtime protocols and > forward declarations of protocols, the compiler could potentially emit the > wrong references given misleading forward decls. This can also be much more > accurately diagnosed via global analysis. > Runtime Check The goal of runtime checks is to catch the usages of > dynamic APIs targeting direct methods at runtime, e.g, respondsToSelector, > performSelector, etc. The idea is to intercept Objc runtime’s invocation > of resolveInstanceMethod and resolveClassMethod , which is responsible > for resolving methods that aren’t recorded in the metadata, and check if > the unresolved method is a direct method. It requires we collect the list > of direct method names, and the names of the classes where they’re defined. > There is another category of runtime issues which runtime check can > potentially catch: the direct call calls the incorrect direct method due to > > 1. usage of class_addMethod > 2. random type casting at the call site > > This can be an incremental improvement outside of this RFC. > Implementation Options We can potentially implement the analysis/runtime > check with and without LTO. It is better to implement runtime check without > LTO as we are expecting developers to use runtime-check build locally. LTO > will slow down developer’s working cycle. For analysis, it is not clear > which option is better. > noLTO Implementation We create a special section containing those > information for each TU. This section for the link unit will have the > combined information. We then post-process this information to emit error > messages. > For each direct method, we create a struct containing the method name and > the name of the class where it’s defined, e.g. > > // struct _objc_direct_method { > // char *class_name; > // char *method_name > // } > > Then each class will have a list of _objc_direct_method structs, which > can be put in a special section, e.g > > constant [4 x %struct._objc_direct_method][//the list of the structs],section "__DATA,__direct_method" > > This special section for the link unit will have the combined information > for all TUs in this link unit. At runtime, the intercepted > resolveInstanceMethod and resolveClassMethod invocation can then read and > check against this section. If it finds the unresolved method is a direct > method, it emits an error message of this direct method cannot be used > dynamically at runtime. > ThinLTO Implementation We add information in ModuleSummaryIndex and a > ThinLTO analysis pass to analyze the combined Summary and emit error > messages. > > > We will upload patches for noLTO/runtime-check if there is no negative > feedback about the RFC! Feedback/suggestions are welcome! > > > Some of my thought about runtime check options. > > * LTO/ThinLTO option doesn't sound attractive. This is because it doesn't > catch violations across link unit, which is probably more worrisome than > within the link unit. > * The noLTO implementation is paying a cost of code size to enable runtime > check. The struct you proposed is only marginally better than a normal > `method64_t` type. What if you just emit `method64_t` struct for direct > methods, which `imp` points to nullptr or a function that calls abort with > proper error message? That should result in crash when a message is sent to > direct method and it requires no or minimal objc runtime change. > > Steven > > > > Thanks, > Manman > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201013/b9587ea7/attachment-0001.html>