On Jul 26, 2009, at 3:42 AM, Sanjiv Gupta wrote:>> >> What happens with functions that are called both inside and outside >> ISR context? Do you have to codegen two copies of those? >> >> > Yes. That's precisely what we are trying to achieve in llvm-ld. > But the problems don't end there, as llvm-ld doesn't have any idea of > libcalls (they're generated in llc) and they could also be called from > both places.If you have to generate two copies of the function with different entrypoints, the *front-end* should handle the duplication. This is just like C++ constructors. One really old patch that apple guys experimented in the past was a "slow and fast call" attribute, which you could stick on function declarations. If you added it to a function, the frontend would generate an entry point with a standard calling convention as well as one with a faster in-register ABI. Direct calls would use the fast entry point, but if you took the address, you'd get the address of the normal one. All of this was handled by the front-end, and works fine. I think the patch eventually got ripped out of the compiler for other reasons though. -Chris
Alireza.Moshtaghi at microchip.com
2009-Jul-27 17:03 UTC
[LLVMdev] LLVM and Interrupt Service Routines.
> If you have to generate two copies of the function with different > entrypoints, the *front-end* should handle the duplication. This is > just like C++ constructors. > > One really old patch that apple guys experimented in the past was a > "slow and fast call" attribute, which you could stick on function > declarations. If you added it to a function, the frontend would > generate an entry point with a standard calling convention as well as > one with a faster in-register ABI. Direct calls would use the fast > entry point, but if you took the address, you'd get the address of the > normal one. > > All of this was handled by the front-end, and works fine. I think the > patch eventually got ripped out of the compiler for other reasons > though.I see 2 problems with this approach: 1 - Front end does not know which functions need to be cloned. The cloned functions are the ones that are called both from ISR thread and main thread; and this information is not available until all compilation units are merged. We collect this information in a pass at the end of llvm-ld. 2- Based on what you propose, for example the ISR thread would need to make indirect call to cloned function; and main thread make direct call to original function. Apart from the performance issue on PIC16 in relation to indirect function calls, if the main thread already makes indirect call to the function, then the same function would be called from both threads, and the original problem is still there A.>
Alireza.Moshtaghi at microchip.com wrote:>> If you have to generate two copies of the function with different >> entrypoints, the *front-end* should handle the duplication. This is >> just like C++ constructors. >> >> One really old patch that apple guys experimented in the past was a >> "slow and fast call" attribute, which you could stick on function >> declarations. If you added it to a function, the frontend would >> generate an entry point with a standard calling convention as well as >> one with a faster in-register ABI. Direct calls would use the fast >> entry point, but if you took the address, you'd get the address of the >> normal one. >> >> All of this was handled by the front-end, and works fine. I think the >> patch eventually got ripped out of the compiler for other reasons >> though. >> > > > I see 2 problems with this approach: > 1 - Front end does not know which functions need to be cloned. The > cloned functions are the ones that are called both from ISR thread and > main thread; and this information is not available until all compilation > units are merged. We collect this information in a pass at the end of > llvm-ld. > 2- Based on what you propose, for example the ISR thread would need to > make indirect call to cloned function; and main thread make direct call > to original function. Apart from the performance issue on PIC16 in > relation to indirect function calls, if the main thread already makes > indirect call to the function, then the same function would be called > from both threads, and the original problem is still there > > A. > >And the third problem, which I mentioned earlier is about intrinsics (libcalls); because the front-end won't have any idea of those. - Sanjiv> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Sun, Jul 26, 2009 at 1:39 PM, Chris Lattner<clattner at apple.com> wrote:> One really old patch that apple guys experimented in the past was a > "slow and fast call" attribute, which you could stick on function > declarations. If you added it to a function, the frontend would > generate an entry point with a standard calling convention as well as > one with a faster in-register ABI. Direct calls would use the fast > entry point, but if you took the address, you'd get the address of the > normal one. > > All of this was handled by the front-end, and works fine. I think the > patch eventually got ripped out of the compiler for other reasons > though.This is a very old idea indeed. At one point I hacked up an LLVM/ppc build to use an r2 based calling convention (similiar to AIX or Mac OS 9) and then shove some glue to form externalized entry points that were compatible with OS X calling conventions. I wish I could say I came up with that idea, but it is entirely based on the CFM-68k Mac OS calling conventions for indirect code <http://devworld.apple.com/documentation/mac/runtimehtml/RTArch-33.html#HEADING33-0>. I did it purely in the backend, but I built all functions using the internal convention and generated slow stubs for all of them. I then just mangle the entry point names to use the internal entry points if they were in my compilation unit. Since the stub was only ~4 instructions per entry point on PPC that was not a big deal on something like a G5. PIC16s have a lot less ram to work with and it probably requires substantially different code, not just a small stub. Conceivably one could generate ISR and non-ISR versions of every function and then track which ones you need and throw out the rest, but that seems pretty heavy handed. Louis