Evan Cheng
2012-May-08 19:35 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On May 8, 2012, at 2:08 AM, Tobias Grosser wrote:> On 05/08/2012 05:13 AM, Evan Cheng wrote: >> Sorry Tobias, I'm not in favor of this change. From what I can tell, this enables some features which can implemented via other means. It adds all kinds of complexity to LLVM and I'm also highly concerned about bitcode that can embed illegal (or worse malicious) code using this feature. > > Hi Evan, > > there is no need to force this change in. I am rather trying to understand the shortcomings of my approach and look for possible better solutions.Hi Tobias, When you are proposing a significant extension to LLVM, the burden is on the person who is proposing the change to convince folks there is a significant advantage to LLVM developers / users who relay on LLVM mainline.> > That's why I was asking you where you see the possibility of illegal/malicious code? You did not really explain it yet and I would > be more than happy to be understand such a problem. From my point of view embedded and host module code are both compiled at the same time and are both checked by the LLVM bitcode verifier. How could this introduce any malicious code, that could not be introduced by normal LLVM-IR?You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance.> > In terms of the complexity. The only alternative proposal I have heard of was making LLVM-IR multi module aware or adding multi-module support to all LLVM-IR tools. Both of these changes are way more complex than the codegen intrinsic. Actually, they are soo complex that I doubt that they can be implemented any time soon. What is the simpler approach you are talking about?We don't need multi-module either. The system you are designing should be able to handle multiple bitcode files with multiple modules. I don't claim to know the specifics of your projects. But it seems to be you want this new complexity to LLVM to simplify your tools (single .o rather than multiple). Given how specific your need is, it's just not appropriate for LLVM mainline. Evan> > Maybe I completely missed the point, but if there would be a good alternative there would be no need to discuss. I would happily go ahead and implement the said alternative. Even if there is non I would keep quiet, after I understand the concerns that block this proposal. For now, I don't think I understood the concerns yet. > > Cheers > Tobi >
Yabin Hu
2012-May-09 02:39 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Hi Evan, Thanks for your time. You're adding a feature that embed code inside a module. When the module is> loaded, is the string going to be verified? How are users of LLVM IR able > to ensure the embedded string is safe? I am not saying it cannot be done. > This feature just increases the risk and that again raises the bar for > acceptance. >I think the embedded string in the form of a "string" is never harmful. When the module is loaded, the string may not be verified, just like any other "normal" strings in the module. When we transform the embedded string into a piece of "code" temporarily, the intrinsic verify the loading process just like what we do when loading a module from a file. So I don't think this adds additional security problems.> We don't need multi-module either. The system you are designing should be > able to handle multiple bitcode files with multiple modules. I don't claim > to know the specifics of your projects. But it seems to be you want this > new complexity to LLVM to simplify your tools (single .o rather than > multiple). >As Tobias explained before, if any llvm-based compiler wants to add a feature of generating code for heterogeneous platform (e.g. CPU+GPU) or employ an optimization pass like ours, this intrinsic helps. They needn't revise their compiler driver too much and needn't add a new linker between host bitcode file and device bitcode file (or some others). What they need do is to prepare the device code as a string of llvm IR and add a call to the intrinsic. We believe that there is no need for all these external tools to be changed if we can avoid this by adding a new and light weight intrinsic. Thanks, Yabin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120509/68fed1d3/attachment.html>
Evan Cheng
2012-May-09 05:42 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Please understand you are proposing a feature that *everyone* will be affected.The feature you are proposing introduce significant complexities to LLVM. The only real benefit is it reduces changes to existing tools for heterogeneous platforms. That's not a strong argument. Adding a string to embed llvm module means the bitcode file cannot be easily per-checked. You are leaving the runtime checking to specific implementations. That makes llvm bit code inherently less safe. Thats a no no. I'm sorry for taking a hard line on this. But the decision is clearly a no unless Chris says otherwise. Evan On May 8, 2012, at 7:39 PM, Yabin Hu <yabin.hwu at gmail.com> wrote:> Hi Evan, > > Thanks for your time. > > You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance. > I think the embedded string in the form of a "string" is never harmful. When the module is loaded, the string may not be verified, just like any other "normal" strings in the module. When we transform the embedded string into a piece of "code" temporarily, the intrinsic verify the loading process just like what we do when loading a module from a file. So I don't think this adds additional security problems. > > We don't need multi-module either. The system you are designing should be able to handle multiple bitcode files with multiple modules. I don't claim to know the specifics of your projects. But it seems to be you want this new complexity to LLVM to simplify your tools (single .o rather than multiple). > As Tobias explained before, if any llvm-based compiler wants to add a feature of generating code for heterogeneous platform (e.g. CPU+GPU) or employ an optimization pass like ours, this intrinsic helps. They needn't revise their compiler driver too much and needn't add a new linker between host bitcode file and device bitcode file (or some others). What they need do is to prepare the device code as a string of llvm IR and add a call to the intrinsic. We believe that there is no need for all these external tools to be changed if we can avoid this by adding a new and light weight intrinsic. > > Thanks, > Yabin-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120508/51b409a2/attachment.html>
Tobias Grosser
2012-May-09 09:12 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On 05/08/2012 09:35 PM, Evan Cheng wrote:> > On May 8, 2012, at 2:08 AM, Tobias Grosser wrote: > >> On 05/08/2012 05:13 AM, Evan Cheng wrote: >>> Sorry Tobias, I'm not in favor of this change. From what I can tell, this enables some features which can implemented via other means. It adds all kinds of complexity to LLVM and I'm also highly concerned about bitcode that can embed illegal (or worse malicious) code using this feature. >> >> Hi Evan, >> >> there is no need to force this change in. I am rather trying to understand the shortcomings of my approach and look for possible better solutions. > > Hi Tobias, > > When you are proposing a significant extension to LLVM, the burden is on the person who is proposing the change to convince folks there is a significant advantage to LLVM developers / users who relay on LLVM mainline.Hi Evan, thanks for replying. I understand that the burden is on me. I have no intentions on pushing a patch to LLVM where people strongly disagree. Still, to be able to propose something more sensible we need to get feedback pointing out the problems of this patch and what would be a better solution. We did not yet get this kind of feedback yet, probably because we did not not describe the patch and our requirements well enough.>> That's why I was asking you where you see the possibility of illegal/malicious code? You did not really explain it yet and I would >> be more than happy to be understand such a problem. From my point of view embedded and host module code are both compiled at the same time and are both checked by the LLVM bitcode verifier. How could this introduce any malicious code, that could not be introduced by normal LLVM-IR? > > You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance.What do you mean by verified? How is normal LLVM-IR verified? What do you mean by ensuring an embedded string is safe? How do you ensure normal LLVM-IR is safe? The only existing kind of verification I am aware of is the '-verify' pass that checks an LLVM-IR module. This pass is run over the embedded module at the same time as target code is generated for the host function. In case the verification fails, no target code is generated and an empty string is returned. In case target code is generated it is stored back in memory. It can obviously be executed through a function pointer, but this is not different than executing code that is stored through other means in memory. I am kind of surprised security is a concern here. If we really want to do a proper risk analysis, we should first define the security guarantees LLVM gives. I am kind of surprised such security guarantees exist. To me securely verifying LLVM-IR is difficult for other reasons than this intrinsic. Google PNaCL does, for good reasons, not rely on LLVM to provide security guarantees. Still, if this is a concern we could make this intrinsic a target option that is disabled by default.>> In terms of the complexity. The only alternative proposal I have heard of was making LLVM-IR multi module aware or adding multi-module support to all LLVM-IR tools. Both of these changes are way more complex than the codegen intrinsic. Actually, they are soo complex that I doubt that they can be implemented any time soon. What is the simpler approach you are talking about? > > We don't need multi-module either. The system you are designing should be able to handle multiple bitcode files with multiple modules. I don't claim to know the specifics of your projects. But it seems to be you want this new complexity to LLVM to simplify your tools (single .o rather than multiple). Given how specific your need is, it's just not appropriate for LLVM mainline.I can follow this argument. We do not want to include project specific features in LLVM, if such features are not needed by a broader audience. Such features should rather be implemented outside of LLVM. This worked especially well, as LLVM provides features to make such external implementations possible and in some rare cases (calling conventions) it includes project specific patches to allow such projects to use a vanilla LLVM installation. This intrinsic was proposed in the very same light. I think it would be nice to _significantly_ facilitate the development of optimizers that target GPGPU accelerators. Yabin's project would use this, but I am convinced a wider audience could use this. A design that handles multiple bitcode files does not seem like a good option. It would require large changes to all projects that want to use such an optimizers, does not work with a vanilla clang installation and causes further problems with jit compilation. It seems we were not able to convince enough people that such an extension is useful. I think this was partially because we did not explain our project and the codegen intrinsic well enough and also because there is no actual use case yet. For now, further discussions seem pointless. Thanks for your comments! Cheers Tobi
Evan Cheng
2012-May-09 21:15 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On May 9, 2012, at 2:12 AM, Tobias Grosser wrote:> >>> That's why I was asking you where you see the possibility of illegal/malicious code? You did not really explain it yet and I would >>> be more than happy to be understand such a problem. From my point of view embedded and host module code are both compiled at the same time and are both checked by the LLVM bitcode verifier. How could this introduce any malicious code, that could not be introduced by normal LLVM-IR? >> >> You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance. > > What do you mean by verified? How is normal LLVM-IR verified? > What do you mean by ensuring an embedded string is safe? How do you ensure normal LLVM-IR is safe? > > The only existing kind of verification I am aware of is the '-verify' pass that checks an LLVM-IR module. This pass is run over the embedded module at the same time as target code is generated for the host function. In case the verification fails, no target code is generated and an empty string is returned. In case target code is generated it is > stored back in memory. It can obviously be executed through a function > pointer, but this is not different than executing code that is stored through other means in memory. > > I am kind of surprised security is a concern here. If we really want to do a proper risk analysis, we should first define the security guarantees LLVM gives. I am kind of surprised such security guarantees exist. To me securely verifying LLVM-IR is difficult for other reasons than this intrinsic. Google PNaCL does, for good reasons, not rely on LLVM to provide security guarantees. > > Still, if this is a concern we could make this intrinsic a target option that is disabled by default.You are missing the point. Don't think in turns of existing implementations. Don't think in turns of clang or other static compilers. There are plenty of systems which use LLVM out there. There can be plenty of different ways to verify / check LLVM IR. We don't know about them. A LLVM bitcode module as it is today is a representation of some program. It has semantics that are clearly defined by its instructions and data. Now you want to embed some other programs in strings. That makes the IR inherently harder to understand, it's more risky by definition. Of course systems which use LLVM can solve this problem. But it's a big fundamental change and I (and other people on this thread) has pointed out the benefits are just not worth it. Evan
Reasonably Related Threads
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation