thr3ads.net - llvm dev - [LLVMdev] Symbol folding with MC [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Borja Ferrer

2011-Apr-26 13:30 UTC

[LLVMdev] Symbol folding with MC

Hello, I have some questions regarding folding operations with symbols
during the instruction print stage with MC. At the moment I'm working with
global symbols but i guess that other symbol types should be equivalent.

My first question is how can i negate the address of a symbol?

Consider this piece of code:
char g_var[80];
char foo(int a) { return g_var[a]; }

this gets compiles into something like (in pseudo asm):
addi a, g_var
load retreg, a

but i dont have an add with immediate instruction so i have to do the
following
subi a, -g_var // negate g_var addr
load retreg, a

A solution I thought could be passing a target flag indicating that a
negation is needed when lowering the machineinstr into a MCInst, and adding
a MCExpr to negate the symbol. But I want to know if there's a better way to
do this, instead of delaying it to the stage of MCInst lowering.

The other questions is how to fold single and complex operations on symbols,
say we have something like:

unsigned int g_var[80];
unsigned int foo() { return (unsigned int)&g_var[0] & 0x1234; }

Currently this moves the g_var address into a register and then performs the
and operation, but i want this to be done at compilation time, so we have
something like:

move retreg, (g_val & 0x1234)

Without touching anything else only additions get folded, but this could be
expanded into other operations like or, xor, shifts, etc.. A more complex
case would be combining operations in a single statement. So my question is
how to achieve this. As an idea I've thought of using a pseudo instruction
that takes an operand depending of the instruction to fold, then expand this
pseudo instr into the real move instruction by setting a target flag
depending on the operation to fold, and in the MCInst lower stage create a
MCExpr depending on these flags, but this has the problem that it can't
handle more than one operation per statement.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110426/753b2356/attachment.html>

Jim Grosbach

2011-Apr-26 17:18 UTC

head link

[LLVMdev] Symbol folding with MC

Hello,

On Apr 26, 2011, at 6:30 AM, Borja Ferrer wrote:
> Hello, I have some questions regarding folding operations with symbols
during the instruction print stage with MC. At the moment I'm working with
global symbols but i guess that other symbol types should be equivalent.
> 
> My first question is how can i negate the address of a symbol?
> 
> Consider this piece of code:
> char g_var[80];
> char foo(int a) { return g_var[a]; }
> 
> this gets compiles into something like (in pseudo asm):
> addi a, g_var
> load retreg, a
> 
> but i dont have an add with immediate instruction so i have to do the
following
> subi a, -g_var // negate g_var addr
> load retreg, a
> 
> A solution I thought could be passing a target flag indicating that a
negation is needed when lowering the machineinstr into a MCInst, and adding a
MCExpr to negate the symbol. But I want to know if there's a better way to
do this, instead of delaying it to the stage of MCInst lowering.
> 
These sorts of constraints are normally enforced at prior to lowering to MC.
Doing them directly as part of instruction selection as much as possible is good
(the ARM target has examples of this for using ADD/SUB immediate instructions).
For example, don't express in the target .td file(s) that you have an
add-immediate instruction if you actually don't, but do add patterns for the
operation using the subtract-immediate instruction. For symbolic immediate
references, you're correct that the expression on the operand will include
the negation.

MC is designed such that it should always represent legal instructions, and only
legal instructions. That includes things like register operands being legal for
the instruction, immediates being in range, etc.. There's (currently) no
verification pass for those constraints, but that's the idea, so waiting
'til after MC lowering to check for and transform the instructions is not
preferable and likely to break if/when we add such a verification pass.

If your target has properties that make it impossible to do this at instruction
selection time, I would suggest a late machine function pass that will scan for
and transform the instructions as necessary. This would all be at the
MachineInstr level before lowering to MC.
> The other questions is how to fold single and complex operations on
symbols, say we have something like:
> 
> unsigned int g_var[80];
> unsigned int foo() { return (unsigned int)&g_var[0] & 0x1234; }
> 
> Currently this moves the g_var address into a register and then performs
the and operation, but i want this to be done at compilation time, so we have
something like:
> 
> move retreg, (g_val & 0x1234)
> 
For many targets this isn't legal, as the object file format used can't
represent those sorts of expressions in a relocation. It sounds like your
situation is different, though.
> Without touching anything else only additions get folded, but this could be
expanded into other operations like or, xor, shifts, etc.. A more complex case
would be combining operations in a single statement. So my question is how to
achieve this. As an idea I've thought of using a pseudo instruction that
takes an operand depending of the instruction to fold, then expand this pseudo
instr into the real move instruction by setting a target flag depending on the
operation to fold, and in the MCInst lower stage create a MCExpr depending on
these flags, but this has the problem that it can't handle more than one
operation per statement.
A custom lowering or a target DAG combine would likely be your best bet.

Regards,
  Jim

Borja Ferrer

2011-Apr-26 20:27 UTC

head link

[LLVMdev] Symbol folding with MC

Hello Jim thanks for the reply,

For normal additions with immediates I've done the same as ARM does,
basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in
the .td file like this:
def : Pat<(add DLDREGS:$src1, imm:$src2),
              (SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>;

Now, the typical pattern concerning additions with global addresses looks
like this: (taken from x86)
def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)),
              (ADD32ri GR32:$src1, tglobaladdr:$src2)>;

but i can't write that since i dont have an add with imm instr, and doing:

def : Pat<(add DREGS:$src, (Wrapper tglobaladdr:$src2)),
              (SUBIWRdK DREGS:$src, tglobaladdr:$src2)>;
is wrong because the tglobaladdr has to be negated somehow, so i don't
understand how should I negate the symbol reference using patterns, if it's
even possible. The obvious hack is adding a "-" char when lowering the
symbol reference into text.

Regarding my second question, as you mentioned all symbols have static
addresses so no relocations are performed, so it should be safe to fold
immediate operations with the symbol reference. My problem here is that i
don't know how to fold an arbitrary expression on a global (initially in the
form of a DAG) to something that can be translated later into an expression
with MC. It's something weird because operations are performed in the
operand of an instruction, and since it has to support any arbitrary
expression you can't have all combinations of operations using custom
instructions. So how should i proceed in here using custom lowering or
target dag combines?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110426/9289d36a/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Apr 2011 - [LLVMdev] Symbol folding with MC

[LLVMdev] Symbol folding with MC

[LLVMdev] Symbol folding with MC

[LLVMdev] Symbol folding with MC

Seemingly Similar Threads