Jonas Paulsson via llvm-dev
2018-May-01 13:15 UTC
[llvm-dev] IPRA and conditionally reserved registers
Hi Kit, I see you have been working on IPRA (https://reviews.llvm.org/D45308), and would therefore like to bring up an issue with it I am looking into on SystemZ (see https://reviews.llvm.org/D46232). I first realized that %r14, the return register, must be saved and restored with IPRA enabled, since otherwise the function can't return. This is a callee saved register so without IPRA this always gets saved, but if that is omitted and the function has no calls itself, we have to have a second check to add it with IPRA. A related issue (the topic of this mail) is then the frame pointer (%r11). If caller uses FP, %r11 becomes reserved and is expected to never be allocated. But if callee does not have an FP, it is free to allocate it. So the Collector / Propagate passes transform the regmask on the call to express that %r11 is clobbered, but the problem is that the register allocator does not care about %r11 in caller, since it is reserved. This seems currently unhandled, and this is what I would like to ask about. My first idea was to let callee always save/restore %r11, since it may be reserved in some caller. As Uli pointed out that is very conservative, and it seems to me also not be in agreement with IPRA, where the save/restore is generally done by caller as much as possible. So the question is how this should get handled in caller? I would like to see RegUsageInfoPropagate compare the unmodified regmask with the updated one, and then make sure that any registers reserved in the current function being clobbered by the call as a result of IPRA (updated regmask), should now be copied to and from a virtual register around that call, but this is not being done. Am I missing something here? So, in short, should these registers be saved/restored in caller, and if so how should this be done? /Jonas Attached is a test case where this happens on SystemZ. bin/llc -mcpu=z13 -enable-ipra ./tc_ipra_fp.ll -o out.s -------------- next part -------------- %0 = type { [3 x i64] } ; Function Attrs: norecurse nounwind declare dso_local fastcc signext i32 @foo(i16*, i32 signext) unnamed_addr ; Function Attrs: norecurse nounwind define internal fastcc void @fun1(i16*, i16* nocapture) unnamed_addr #0 { %3 = load i16, i16* undef, align 2 %4 = shl i16 %3, 4 %5 = tail call fastcc signext i32 @foo(i16* nonnull %0, i32 signext 5) %6 = or i16 0, %4 %7 = or i16 %6, 0 store i16 %7, i16* undef, align 2 %8 = getelementptr inbounds i16, i16* %0, i64 5 %9 = load i16, i16* %8, align 2 store i16 %9, i16* %1, align 2 ret void } ; Function Attrs: nounwind define fastcc void @fun0(i8* nocapture readonly, i16* nocapture, i32 signext) unnamed_addr { %4 = alloca i8, i64 undef, align 8 call fastcc void @fun1(i16* nonnull undef, i16* %1) ret void } attributes #0 = { norecurse nounwind "no-frame-pointer-elim"="false" }
Reasonably Related Threads
- A thought to improve IPRA
- [GSoC 2016] [Weekly Status] Interprocedural Register Allocation
- Tail call optimization is getting affected due to local function related optimization with IPRA
- Tail call optimization is getting affected due to local function related optimization with IPRA
- Tail call optimization is getting affected due to local function related optimization with IPRA