Alireza.Moshtaghi at microchip.com
2009-Aug-24 19:12 UTC
[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]
Up front I apologize for the lengthy email. Since our patch was not accepted, Chris asked us to follow up with this issue on llvm-dev. For the sake of completeness, let me give a bit of background and the problems that we are facing. I tried to capture as much as possible here so Please do give us feedback. Don't take your stack for granted.... PIC16 has very poor pointer handling, hence stack access are very limited and inefficient; so we do static analysis of function frame size and statically allocate the function frame in memory (including locals, arguments and spills); then, to save memory, we overlay the frame of functions that are not on the same call branch. Consequently, functions are non reentrant. We can sacrifice recursion (because of other limitations in pic16 architecture) but still reentrancy is needed because ISR thread can call same function that is also called from main thread; examples are stdlib functions and intrinsic for math (div/mul/float/etc) Here is what we did - but was rejected and we would like to know which part (if any) we can keep and which part we must change: 1- Root ISR function is defined by user. Since the ISR is located at certain address in program memory, we used the section attribute of function to encode that the function needs to be at a section with fixed address (our in-house linker takes care of the details). We did not add new interrupt attribute because it seemed redundant because we can encode whatever is needed in the section attribute in the front-end. 2- Added two passes at the end of llvm-ld. First pass does call graph analysis to distinguish ISR thread from main thread and add few more pieces of info to section attribute of the function to indicate whether it is called from main thread or ISR thread; if called from both, the function has to be cloned. To clone, code and data must be cloned. Not having stack, and accessing local variables through their symbolic reference, we need to also clone the local variables and modify the cloned function to use different naming convention for local variables (there are other ways to do this also but in practice, that requires two different frameIndexes for one function one for locals and the other for args and spills). So after this pass, the main thread and ISR thread are completely disjoint; this takes care of stdlib functions as well because they are defined in .bc or .a library files (there is a caveat on function pointers that I'm going to explain later). The second pass does simple function coloring to identify which function frames can be overlaid. Color info is also added to section attribute. 3- Code generation is mostly similar for main vs ISR threads except for function interface of root ISR; but the most important problem is the intrinsic functions for non-native operations such as mul/div and floating point. So while generating code for these calls, we use different naming conventions depending on whether we are generating code for a function on ISR-thread or main-thread (this information is in the section attribute of the function). Two versions of intrinsics exist in the pic16 specific library and the linker will use according to naming convention. 4- There is a bit of cleanup left to be done in AsmPrinter and we are good to go. 5- Handling of function pointers is not implemented yet. But since we don't have dynamic linking, through alias analysis we should be able to find targets of each pointer and when taking address of these functions, depending on whether it is in isr or main thread, we can resolve the problem efficiently. This can be done in a separate pass; the only thing that we have to disallow is calling same function pointer from both the ISR and main threads. 6- Handling of varargs requires that a separate frame be created for arguments for each call site with varargs. This is going to be handled while codegen and by the overlay algorithm to save memory. If you have come this far, thank you for being interested in my brain dysfunction. And I am really interested to know your input. -Ali> >> We should discuss this on llvmdev, I think it came up before butthere> >> was no conclusive plan that was proposed. > >> > > The approach that we thought for PIC16 can be described in a > > single line as below. > > > > "Keep the functions called from ISR and main separate by cloning the > > shared ones". > > > > > >> In short, I think that this sort of thing should be modeled as an > >> attribute on function definitions, not declarations. I don't > >> understand why you need to clone entire call graphs, but this isbest> >> discussed on llvmdev. > >> > > It isn't cloning the entire call graph. It's cloning only the shared > > ones as mentioned above. > > > > The information that a function is ISR is encoded in the sectionstring> > for the lack of a better attribute. > > > > Plus we also wanted to encode some information to inform codegenabout> > whether we are codegenerating an interrupt line function. Again the > > section string is used to encode the same. Probably annotationsmight be> > a better place but that does not work currently (and also, accessing > > them slower). This information is required so that the codegen canemit> > calls to appropriate intrinsics. > > > > So, IMO, the approach is simple, but probably the implementationwasn't> > clean. > > > > Ideas welcome. > >
Hi Ali, Thanks for bringing this up. You're definitely under very tight design constraints from the hardware. I can certainly sympathize. I think two design elements are being conflated here, and it would be worthwhile splitting them out. For correctness, you need to make sure any routines called from an ISR don't clobber equivalent routines called from mainline code. For efficiency, you want to overlay the static stack frames of functions as much as possible when you can prove those frames do not interfere. I believe you can solve these problems orthogonally with a bit of fiddling. Parallel call trees for the ISR and the mainline have the following primary components: * Constructing the parallel trees themselves such that there's no overlap * Keeping things straight in the presence of user-written assembly * Indirect calls (function pointers) To straightforwardly create parallel call trees, one for the mainline and one for the ISR, I believe it will be best to create two versions of every function that's compiled very early in the process, before any lowering of frames to static addresses or anything of that sort is done. The ISR-tree functions should be flagged as such and have their names mangled to reflect. Any call instructions in the ISR-tree function should be modified to call the equivalent ISR-tree function of the callee. Thus, with no further analysis, we have disjoint call trees for mainline code and ISR code. We also have a bunch of extra functions laying around, but they are unreachable, so existing optimizations should get rid of them for us. When it comes time to allocate static memory space for the call frames, we simply have to do it twice. Once for the mainline functions, and once for the ISR functions. We can assert that nothing can overlap between the trees since we know that the trees are disjoint except via the ISR itself. Thus, we don't overlay anything from one tree with anything from the other tree. Varargs and should come along for the ride without any special handling since the functions are distinct well before those are lowered to reference static buffers. User assembly throws a big wrench into the works, but it's not irreconcilable. I suggest requiring the user to flag assembly functions as being either for mainline or for ISR code. Doing so via a naming convention would likely tie in the easiest with the function cloning done above. With the convention being sufficiently magic (read: not legal C identifiers so as to avoid collisions), the assembler can issue diagnostics if a call instruction is seen that references a function not in the proper tree. That is, the disjointness of the trees will be enforceable even in user assembly with a bit of cooperation from the assembler. Function pointers are where things get fun. To do these, we need to determine at run time whether we need to call the ISR or the mainline version of a function, and since the same pointer can be dereferenced from both trees, and we're not guaranteed the pointer assignment will take place in the same tree, we can't rely on static analysis to figure out which of the two versions to reference. We need to defer 'til run time, and do so in a way that's not too horribly inefficient. We would likewise prefer not to have function pointers become unmanageably large, since they already essentially need to carry around a pointer to the function itself and a pointer to the argument frame for that function. Doubling that is not reasonable. If we use descriptor handles, however, we can reduce the load of what we're passing around at runtime to a single value. We build a list of functions whose addresses are taken, and create a descriptor table for them. Eventually, the descriptor handle will be relocated to the actual address of the proper half of the descriptor (since the invocation point knows at compile time which tree it's in). Conceptually, something like this. Depending on what level of program memory access you have (sorry, hazy memory about what the enhanced PIC16 was or wasn't going to do about that), a trickier actual implementation may be better. struct fptr_descriptor { void (*main_tree_fn)(); void *main_tree_args; void (*isr_tree_fn)(); void *isr_tree_args; }; Each translation unit builds up a table of these, with function pointers being a relocation indicating the appropriate slot in the table. For global unreachable function removal to work, we just need to consider all functions in this table as being called (both versions). Good analysis should enable even better results, but the simple solution should still be pretty good. All of this will work without any consideration of optimizing call frame allocation. That can be done completely separately without interfering (sorry for the pun) with any of these bits, including the function pointers, since we've kept the complete separation of ISR and mainline trees. To summarize, make two copies of each function very early and keep the ISR and mainline trees separate all the way through. Unreachable function removal will get rid of any copies that aren't needed. Function pointers become a descriptor table that reference both copies, and the runtime invokes the proper bit based on which tree the call point is in. Best regards, Jim On Aug 24, 2009, at 12:12 PM, Alireza.Moshtaghi at microchip.com wrote:> > Up front I apologize for the lengthy email. Since our patch was not > accepted, Chris asked us to follow up with this issue on llvm-dev. For > the sake of completeness, let me give a bit of background and the > problems that we are facing. I tried to capture as much as possible > here > so Please do give us feedback. > Don't take your stack for granted.... PIC16 has very poor pointer > handling, hence stack access are very limited and inefficient; so we > do > static analysis of function frame size and statically allocate the > function frame in memory (including locals, arguments and spills); > then, > to save memory, we overlay the frame of functions that are not on the > same call branch. > Consequently, functions are non reentrant. We can sacrifice recursion > (because of other limitations in pic16 architecture) but still > reentrancy is needed because ISR thread can call same function that is > also called from main thread; examples are stdlib functions and > intrinsic for math (div/mul/float/etc) > Here is what we did - but was rejected and we would like to know which > part (if any) we can keep and which part we must change: > 1- Root ISR function is defined by user. Since the ISR is located at > certain address in program memory, we used the section attribute of > function to encode that the function needs to be at a section with > fixed > address (our in-house linker takes care of the details). We did not > add > new interrupt attribute because it seemed redundant because we can > encode whatever is needed in the section attribute in the front-end. > 2- Added two passes at the end of llvm-ld. First pass does call graph > analysis to distinguish ISR thread from main thread and add few more > pieces of info to section attribute of the function to indicate > whether > it is called from main thread or ISR thread; if called from both, the > function has to be cloned. To clone, code and data must be cloned. Not > having stack, and accessing local variables through their symbolic > reference, we need to also clone the local variables and modify the > cloned function to use different naming convention for local variables > (there are other ways to do this also but in practice, that requires > two > different frameIndexes for one function one for locals and the other > for > args and spills). So after this pass, the main thread and ISR thread > are > completely disjoint; this takes care of stdlib functions as well > because > they are defined in .bc or .a library files (there is a caveat on > function pointers that I'm going to explain later). The second pass > does > simple function coloring to identify which function frames can be > overlaid. Color info is also added to section attribute. > 3- Code generation is mostly similar for main vs ISR threads except > for > function interface of root ISR; but the most important problem is the > intrinsic functions for non-native operations such as mul/div and > floating point. So while generating code for these calls, we use > different naming conventions depending on whether we are generating > code > for a function on ISR-thread or main-thread (this information is in > the > section attribute of the function). Two versions of intrinsics exist > in > the pic16 specific library and the linker will use according to naming > convention. > 4- There is a bit of cleanup left to be done in AsmPrinter and we are > good to go. > 5- Handling of function pointers is not implemented yet. But since we > don't have dynamic linking, through alias analysis we should be able > to > find targets of each pointer and when taking address of these > functions, > depending on whether it is in isr or main thread, we can resolve the > problem efficiently. This can be done in a separate pass; the only > thing > that we have to disallow is calling same function pointer from both > the > ISR and main threads. > 6- Handling of varargs requires that a separate frame be created for > arguments for each call site with varargs. This is going to be handled > while codegen and by the overlay algorithm to save memory. > > If you have come this far, thank you for being interested in my brain > dysfunction. And I am really interested to know your input. > > -Ali > > >>>> We should discuss this on llvmdev, I think it came up before but > there >>>> was no conclusive plan that was proposed. >>>> >>> The approach that we thought for PIC16 can be described in a >>> single line as below. >>> >>> "Keep the functions called from ISR and main separate by cloning the >>> shared ones". >>> >>> >>>> In short, I think that this sort of thing should be modeled as an >>>> attribute on function definitions, not declarations. I don't >>>> understand why you need to clone entire call graphs, but this is > best >>>> discussed on llvmdev. >>>> >>> It isn't cloning the entire call graph. It's cloning only the shared >>> ones as mentioned above. >>> >>> The information that a function is ISR is encoded in the section > string >>> for the lack of a better attribute. >>> >>> Plus we also wanted to encode some information to inform codegen > about >>> whether we are codegenerating an interrupt line function. Again the >>> section string is used to encode the same. Probably annotations > might be >>> a better place but that does not work currently (and also, accessing >>> them slower). This information is required so that the codegen can > emit >>> calls to appropriate intrinsics. >>> >>> So, IMO, the approach is simple, but probably the implementation > wasn't >>> clean. >>> >>> Ideas welcome. >>> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> Function pointers are where things get fun. To do these, we need to > determine at run time whether we need to call the ISR or the mainline > version of a functionThis sounds convenient but it may well be overkill. On a PIC-class platform we can probably consider it to be a design flaw if the programmer doesn't know whether a function pointer will be dereferenced from interrupt context or not. This suggests that for any function whose address is taken, there could be a required annotation such as ISR_ONLY or NONISR_ONLY. The compiler could use this to do the right thing without any heroic static analysis or dynamic binding. John
Jim Grosbach wrote:> Hi Ali, > > Thanks for bringing this up. You're definitely under very tight design > constraints from the hardware. I can certainly sympathize.Jim, First of all, thank you very much for understanding every detail of the problem at our hand and coming up with a solution that addresses every aspect of it. IMO, given the constraints, this is probably the best design we can implement. In this email, my queries/observations mostly relate to the implementation details.> To straightforwardly create parallel call trees, one for the mainline > and one for the ISR, I believe it will be best to create two versions > of every function that's compiled very early in the process, before > any lowering of frames to static addresses or anything of that sort is > done.The ISR-tree functions should be flagged as such and have their > names mangled to reflect. Any call instructions in the ISR-tree > function should be modified to call the equivalent ISR-tree function > of the callee.In our earlier patch, we tried to do the same thing by writing an llvm-ld pass. There we could clone the shared functions, their arguments frames and local frames to construct parallel call trees. But now it looks that clang will be a better place to do the same. For that, clang will need the appropriate TargetInfo interfaces to allow precisely the following things: 1. Allow targets to emit an ISR version of a function. 2. Allow targets to detect that they are emitting code for an ISR version, so that call sites/goto sites can be modified. And I assume that the DebugInfo for the renamed functions will automatically fall in place. My first choice would be clang to do it but was just curious to know if there are some inherent limitations with the llvm-ld pass approach?> When it comes time to allocate static memory space for the call > frames, we simply have to do it twice. Once for the mainline > functions, and once for the ISR functions. We can assert that nothing > can overlap between the trees since we know that the trees are > disjoint except via the ISR itself. Thus, we don't overlay anything > from one tree with anything from the other tree.True. To overlay within these two separate trees we used an llvm-ld pass again. Though we overlaid only the leaf nodes in that pass but I think using a runOnSCC we can extend this approach to overlay sections in such a way that a function and any of its predecessor have separate memory areas. Does that approach sound ok?> User assembly throws a big wrench into the works, but it's not > irreconcilable. I suggest requiring the user to flag assembly > functions as being either for mainline or for ISR code. Doing so via a > naming convention would likely tie in the easiest with the function > cloning done above. With the convention being sufficiently magic > (read: not legal C identifiers so as to avoid collisions), the > assembler can issue diagnostics if a call instruction is seen that > references a function not in the proper tree. That is, the > disjointness of the trees will be enforceable even in user assembly > with a bit of cooperation from the assembler.We used this in the AsmPrinter to rename intrinsics called from ISR code. As, codgen currently ties the libcall names in ISelLowering constructors and then doesn't allow changing the name dynamically. So we'll still need to handle the same in AsmPrinter. Thanks - Sanjiv