thr3ads.net - llvm dev - [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation [May 2012]

If this information is useful, please help other people find it:
Share via:

Evan Cheng

2012-May-08 19:35 UTC

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

On May 8, 2012, at 2:08 AM, Tobias Grosser wrote:
> On 05/08/2012 05:13 AM, Evan Cheng wrote:
>> Sorry Tobias, I'm not in favor of this change. From what I can
tell, this enables some features which can implemented via other means. It adds
all kinds of complexity to LLVM and I'm also highly concerned about bitcode
that can embed illegal (or worse malicious) code using this feature.
> 
> Hi Evan,
> 
> there is no need to force this change in. I am rather trying to understand
the shortcomings of my approach and look for possible better solutions.
Hi Tobias,

When you are proposing a significant extension to LLVM, the burden is on the
person who is proposing the change to convince folks there is a significant
advantage to LLVM developers / users who relay on LLVM mainline.
> 
> That's why I was asking you where you see the possibility of
illegal/malicious code? You did not really explain it yet and I would
> be more than happy to be understand such a problem. From my point of view
embedded and host module code are both compiled at the same time and are both
checked by the LLVM bitcode verifier. How could this introduce any malicious
code, that could not be introduced by normal LLVM-IR?
You're adding a feature that embed code inside a module. When the module is
loaded, is the string going to be verified? How are users of LLVM IR able to
ensure the embedded string is safe? I am not saying it cannot be done. This
feature just increases the risk and that again raises the bar for acceptance.
> 
> In terms of the complexity. The only alternative proposal I have heard of
was making LLVM-IR multi module aware or adding multi-module support to all
LLVM-IR tools. Both of these changes are way more complex than the codegen
intrinsic. Actually, they are soo complex that I doubt that they can be
implemented any time soon. What is the simpler approach you are talking about?
We don't need multi-module either. The system you are designing should be
able to handle multiple bitcode files with multiple modules. I don't claim
to know the specifics of your projects. But it seems to be you want this new
complexity to LLVM to simplify your tools (single .o rather than multiple).
Given how specific your need is, it's just not appropriate for LLVM
mainline.

Evan

> 
> Maybe I completely missed the point, but if there would be a good
alternative there would be no need to discuss. I would happily go ahead and
implement the said alternative. Even if there is non I would keep quiet, after I
understand the concerns that block this proposal. For now, I don't think I
understood the concerns yet.
> 
> Cheers
> Tobi
>

Yabin Hu

2012-May-09 02:39 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Hi Evan,

Thanks for your time.

You're adding a feature that embed code inside a module. When the module
is> loaded, is the string going to be verified? How are users of LLVM IR able
> to ensure the embedded string is safe? I am not saying it cannot be done.
> This feature just increases the risk and that again raises the bar for
> acceptance.
>I think the embedded string in the form of a "string" is never
harmful.
When the module is loaded, the string may not be verified, just like any
other "normal" strings in the module. When we transform the embedded
string
into a piece of "code" temporarily, the intrinsic verify the loading
process just like what we do when loading a module from a file.  So I don't
think this adds additional security problems.

> We don't need multi-module either. The system you are designing should
be
> able to handle multiple bitcode files with multiple modules. I don't
claim
> to know the specifics of your projects. But it seems to be you want this
> new complexity to LLVM to simplify your tools (single .o rather than
> multiple).
>As Tobias explained before, if any llvm-based compiler wants to add a
feature of generating code for heterogeneous platform (e.g. CPU+GPU) or
employ an optimization pass like ours, this intrinsic helps. They needn't
revise their compiler driver too much and needn't add a new linker between
host bitcode file and device bitcode file (or some others).  What they need
do is to prepare the device code as a string of llvm IR and add a call to
the intrinsic. We believe that there is no need for all these external
tools to be changed if we can avoid this by adding a new and light weight
intrinsic.

Thanks,
Yabin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120509/68fed1d3/attachment.html>

Evan Cheng

2012-May-09 05:42 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Please understand you are proposing a feature that *everyone* will be
affected.The feature you are proposing introduce significant complexities to
LLVM. The only real benefit is it reduces changes to existing tools for
heterogeneous platforms. That's not a strong argument.

Adding a string to embed llvm module means the bitcode file cannot be easily
per-checked. You are leaving the runtime checking to specific implementations.
That makes llvm bit code inherently less safe. Thats a no no.

I'm sorry for taking a hard line on this. But the decision is clearly a no
unless Chris says otherwise.

Evan

On May 8, 2012, at 7:39 PM, Yabin Hu <yabin.hwu at gmail.com> wrote:
> Hi Evan,
> 
> Thanks for your time.
> 
> You're adding a feature that embed code inside a module. When the
module is loaded, is the string going to be verified? How are users of LLVM IR
able to ensure the embedded string is safe? I am not saying it cannot be done.
This feature just increases the risk and that again raises the bar for
acceptance.
> I think the embedded string in the form of a "string" is never
harmful. When the module is loaded, the string may not be verified, just like
any other "normal" strings in the module. When we transform the
embedded string into a piece of "code" temporarily, the intrinsic
verify the loading process just like what we do when loading a module from a
file.  So I don't think this adds additional security problems.
>  
> We don't need multi-module either. The system you are designing should
be able to handle multiple bitcode files with multiple modules. I don't
claim to know the specifics of your projects. But it seems to be you want this
new complexity to LLVM to simplify your tools (single .o rather than multiple).
> As Tobias explained before, if any llvm-based compiler wants to add a
feature of generating code for heterogeneous platform (e.g. CPU+GPU) or employ
an optimization pass like ours, this intrinsic helps. They needn't revise
their compiler driver too much and needn't add a new linker between host
bitcode file and device bitcode file (or some others).  What they need do is to
prepare the device code as a string of llvm IR and add a call to the intrinsic.
We believe that there is no need for all these external tools to be changed if
we can avoid this by adding a new and light weight intrinsic.
> 
> Thanks,
> Yabin-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120508/51b409a2/attachment.html>

Tobias Grosser

2012-May-09 09:12 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

On 05/08/2012 09:35 PM, Evan Cheng wrote:>
> On May 8, 2012, at 2:08 AM, Tobias Grosser wrote:
>
>> On 05/08/2012 05:13 AM, Evan Cheng wrote:
>>> Sorry Tobias, I'm not in favor of this change. From what I can
tell, this enables some features which can implemented via other means. It adds
all kinds of complexity to LLVM and I'm also highly concerned about bitcode
that can embed illegal (or worse malicious) code using this feature.
>>
>> Hi Evan,
>>
>> there is no need to force this change in. I am rather trying to
understand the shortcomings of my approach and look for possible better
solutions.
>
> Hi Tobias,
>
> When you are proposing a significant extension to LLVM, the burden is on
the person who is proposing the change to convince folks there is a significant
advantage to LLVM developers / users who relay on LLVM mainline.
Hi Evan,

thanks for replying. I understand that the burden is on me. I have no 
intentions on pushing a patch to LLVM where people strongly disagree.

Still, to be able to propose something more sensible we need to get 
feedback pointing out the problems of this patch and what would be a 
better solution. We did not yet get this kind of feedback yet, probably 
because we did not not describe the patch and our requirements well enough.
>> That's why I was asking you where you see the possibility of
illegal/malicious code? You did not really explain it yet and I would
>> be more than happy to be understand such a problem. From my point of
view embedded and host module code are both compiled at the same time and are
both checked by the LLVM bitcode verifier. How could this introduce any
malicious code, that could not be introduced by normal LLVM-IR?
>
> You're adding a feature that embed code inside a module. When the
module is loaded, is the string going to be verified? How are users of LLVM IR
able to ensure the embedded string is safe? I am not saying it cannot be done.
This feature just increases the risk and that again raises the bar for
acceptance.
What do you mean by verified? How is normal LLVM-IR verified?
What do you mean by ensuring an embedded string is safe? How do you 
ensure normal LLVM-IR is safe?

The only existing kind of verification I am aware of is the '-verify' 
pass that checks an LLVM-IR module. This pass is run over the embedded 
module at the same time as target code is generated for the host 
function. In case the verification fails, no target code is generated 
and an empty string is returned. In case target code is generated it is
stored back in memory. It can obviously be executed through a function
pointer, but this is not different than executing code that is stored 
through other means in memory.

I am kind of surprised security is a concern here. If we really want to 
do a proper risk analysis, we should first define the security 
guarantees LLVM gives. I am kind of surprised such security guarantees 
exist. To me securely verifying LLVM-IR is difficult for other reasons 
than this intrinsic. Google PNaCL does, for good reasons, not rely on 
LLVM to provide security guarantees.

Still, if this is a concern we could make this intrinsic a target option 
that is disabled by default.
>> In terms of the complexity. The only alternative proposal I have heard
of was making LLVM-IR multi module aware or adding multi-module support to all
LLVM-IR tools. Both of these changes are way more complex than the codegen
intrinsic. Actually, they are soo complex that I doubt that they can be
implemented any time soon. What is the simpler approach you are talking about?
>
> We don't need multi-module either. The system you are designing should
be able to handle multiple bitcode files with multiple modules. I don't
claim to know the specifics of your projects. But it seems to be you want this
new complexity to LLVM to simplify your tools (single .o rather than multiple).
Given how specific your need is, it's just not appropriate for LLVM
mainline.
I can follow this argument. We do not want to include project specific 
features in LLVM, if such features are not needed by a broader audience. 
Such features should rather be implemented outside of LLVM.
This worked especially well, as LLVM provides features to make such 
external implementations possible and in some rare cases (calling 
conventions) it includes project specific patches to allow such projects
to use a vanilla LLVM installation.

This intrinsic was proposed in the very same light. I think it would be 
nice to _significantly_ facilitate the development of optimizers that 
target GPGPU accelerators. Yabin's project would use this, but I am 
convinced a wider audience could use this.

A design that handles multiple bitcode files does not seem like a good 
option. It would require large changes to all projects that want to use
such an optimizers, does not work with a vanilla clang installation and 
causes further problems with jit compilation.

It seems we were not able to convince enough people that such an 
extension is useful. I think this was partially because we did not 
explain our project and the codegen intrinsic well enough and also 
because there is no actual use case yet. For now, further discussions 
seem pointless. Thanks for your comments!

Cheers
Tobi

Evan Cheng

2012-May-09 21:15 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

On May 9, 2012, at 2:12 AM, Tobias Grosser wrote:
> 
>>> That's why I was asking you where you see the possibility of
illegal/malicious code? You did not really explain it yet and I would
>>> be more than happy to be understand such a problem. From my point
of view embedded and host module code are both compiled at the same time and are
both checked by the LLVM bitcode verifier. How could this introduce any
malicious code, that could not be introduced by normal LLVM-IR?
>> 
>> You're adding a feature that embed code inside a module. When the
module is loaded, is the string going to be verified? How are users of LLVM IR
able to ensure the embedded string is safe? I am not saying it cannot be done.
This feature just increases the risk and that again raises the bar for
acceptance.
> 
> What do you mean by verified? How is normal LLVM-IR verified?
> What do you mean by ensuring an embedded string is safe? How do you ensure
normal LLVM-IR is safe?
> 
> The only existing kind of verification I am aware of is the
'-verify' pass that checks an LLVM-IR module. This pass is run over the
embedded module at the same time as target code is generated for the host
function. In case the verification fails, no target code is generated and an
empty string is returned. In case target code is generated it is
> stored back in memory. It can obviously be executed through a function
> pointer, but this is not different than executing code that is stored
through other means in memory.
> 
> I am kind of surprised security is a concern here. If we really want to do
a proper risk analysis, we should first define the security guarantees LLVM
gives. I am kind of surprised such security guarantees exist. To me securely
verifying LLVM-IR is difficult for other reasons than this intrinsic. Google
PNaCL does, for good reasons, not rely on LLVM to provide security guarantees.
> 
> Still, if this is a concern we could make this intrinsic a target option
that is disabled by default.

You are missing the point. Don't think in turns of existing implementations.
Don't think in turns of clang or other static compilers. There are plenty of
systems which use LLVM out there. There can be plenty of different ways to
verify / check LLVM IR. We don't know about them.

A LLVM bitcode module as it is today is a representation of some program. It has
semantics that are clearly defined by its instructions and data. Now you want to
embed some other programs in strings. That makes the IR inherently harder to
understand, it's more risky by definition. Of course systems which use LLVM
can solve this problem. But it's a big fundamental change and I (and other
people on this thread) has pointed out the benefits are just not worth it.

Evan

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - May 2012 - [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Apparently Analagous Threads