thr3ads.net - llvm dev - [LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...] [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Alireza.Moshtaghi at microchip.com

2009-Aug-24 19:12 UTC

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

Up front I apologize for the lengthy email. Since our patch was not
accepted, Chris asked us to follow up with this issue on llvm-dev. For
the sake of completeness, let me give a bit of background and the
problems that we are facing. I tried to capture as much as possible here
so Please do give us feedback. 
Don't take your stack for granted.... PIC16 has very poor pointer
handling, hence stack access are very limited and inefficient; so we do
static analysis of function frame size and statically allocate the
function frame in memory (including locals, arguments and spills); then,
to save memory, we overlay the frame of functions that are not on the
same call branch. 
Consequently, functions are non reentrant. We can sacrifice recursion
(because of other limitations in pic16 architecture) but still
reentrancy is needed because ISR thread can call same function that is
also called from main thread; examples are stdlib functions and
intrinsic for math (div/mul/float/etc)
Here is what we did - but was rejected and we would like to know which
part (if any) we can keep and which part we must change:
1- Root ISR function is defined by user. Since the ISR is located at
certain address in program memory, we used the section attribute of
function to encode that the function needs to be at a section with fixed
address (our in-house linker takes care of the details). We did not add
new interrupt attribute because it seemed redundant because we can
encode whatever is needed in the section attribute in the front-end.
2- Added two passes at the end of llvm-ld. First pass does call graph
analysis to distinguish ISR thread from main thread and add few more
pieces of info to section attribute of the function to indicate whether
it is called from main thread or ISR thread; if called from both, the
function has to be cloned. To clone, code and data must be cloned. Not
having stack, and accessing local variables through their symbolic
reference, we need to also clone the local variables and modify the
cloned function to use different naming convention for local variables
(there are other ways to do this also but in practice, that requires two
different frameIndexes for one function one for locals and the other for
args and spills). So after this pass, the main thread and ISR thread are
completely disjoint; this takes care of stdlib functions as well because
they are defined in .bc or .a library files (there is a caveat on
function pointers that I'm going to explain later). The second pass does
simple function coloring to identify which function frames can be
overlaid. Color info is also added to section attribute.
3- Code generation is mostly similar for main vs ISR threads except for
function interface of root ISR; but the most important problem is the
intrinsic functions for non-native operations such as mul/div and
floating point. So while generating code for these calls, we use
different naming conventions depending on whether we are generating code
for a function on ISR-thread or main-thread (this information is in the
section attribute of the function). Two versions of intrinsics exist in
the pic16 specific library and the linker will use according to naming
convention.
4- There is a bit of cleanup left to be done in AsmPrinter and we are
good to go.
5- Handling of function pointers is not implemented yet. But since we
don't have dynamic linking, through alias analysis we should be able to
find targets of each pointer and when taking address of these functions,
depending on whether it is in isr or main thread, we can resolve the
problem efficiently. This can be done in a separate pass; the only thing
that we have to disallow is calling same function pointer from both the
ISR and main threads.
6- Handling of varargs requires that a separate frame be created for
arguments for each call site with varargs. This is going to be handled
while codegen and by the overlay algorithm to save memory.

If you have come this far, thank you for being interested in my brain
dysfunction. And I am really interested to know your input.

-Ali

> >> We should discuss this on llvmdev, I think it came up before but
there> >> was no conclusive plan that was proposed.
> >>
> > The approach that we thought for PIC16 can be described in a
> > single line as below.
> >
> > "Keep the functions called from ISR and main separate by cloning
the
> > shared ones".
> >
> >
> >> In short, I think that this sort of thing should be modeled as an
> >> attribute on function definitions, not declarations.  I don't
> >> understand why you need to clone entire call graphs, but this is
best> >> discussed on llvmdev.
> >>
> > It isn't cloning the entire call graph. It's cloning only the
shared
> > ones as mentioned above.
> >
> > The information that a function is ISR is encoded in the section
string> > for the lack of a better attribute.
> >
> > Plus we also wanted to encode some information to inform codegen
about> > whether we are codegenerating an interrupt line function. Again the
> > section string is used to encode the same. Probably annotations
might be> > a better place but that does not work currently (and also, accessing
> > them slower). This information is required so that the codegen can
emit> > calls to appropriate intrinsics.
> >
> > So, IMO, the approach is simple, but probably the implementation
wasn't> > clean.
> >
> > Ideas welcome.
> >

Jim Grosbach

2009-Aug-25 04:24 UTC

head link

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

Hi Ali,

Thanks for bringing this up. You're definitely under very tight design  
constraints from the hardware. I can certainly sympathize.

I think two design elements are being conflated here, and it would be  
worthwhile splitting them out. For correctness, you need to make sure  
any routines called from an ISR don't clobber equivalent routines  
called from mainline code. For efficiency, you want to overlay the  
static stack frames of functions as much as possible when you can  
prove those frames do not interfere. I believe you can solve these  
problems orthogonally with a bit of fiddling.

Parallel call trees for the ISR and the mainline have the following  
primary components:
*  Constructing the parallel trees themselves such that there's no  
overlap
*  Keeping things straight in the presence of user-written assembly
*  Indirect calls (function pointers)

To straightforwardly create parallel call trees, one for the mainline  
and one for the ISR, I believe it will be best to create two versions  
of every function that's compiled very early in the process, before  
any lowering of frames to static addresses or anything of that sort is  
done. The ISR-tree functions should be flagged as such and have their  
names mangled to reflect. Any call instructions in the ISR-tree  
function should be modified to call the equivalent ISR-tree function  
of the callee. Thus, with no further analysis, we have disjoint call  
trees for mainline code and ISR code. We also have a bunch of extra  
functions laying around, but they are unreachable, so existing  
optimizations should get rid of them for us.

When it comes time to allocate static memory space for the call  
frames, we simply have to do it twice. Once for the mainline  
functions, and once for the ISR functions. We can assert that nothing  
can overlap between the trees since we know that the trees are  
disjoint except via the ISR itself. Thus, we don't overlay anything  
from one tree with anything from the other tree. Varargs and should  
come along for the ride without any special handling since the  
functions are distinct well before those are lowered to reference  
static buffers.

User assembly throws a big wrench into the works, but it's not  
irreconcilable. I suggest requiring the user to flag assembly  
functions as being either for mainline or for ISR code. Doing so via a  
naming convention would likely tie in the easiest with the function  
cloning done above. With the convention being sufficiently magic  
(read: not legal C identifiers so as to avoid collisions), the  
assembler can issue diagnostics if a call instruction is seen that  
references a function not in the proper tree. That is, the  
disjointness of the trees will be enforceable even in user assembly  
with a bit of cooperation from the assembler.

Function pointers are where things get fun. To do these, we need to  
determine at run time whether we need to call the ISR or the mainline  
version of a function, and since the same pointer can be dereferenced  
from both trees, and we're not guaranteed the pointer assignment will  
take place in the same tree, we can't rely on static analysis to  
figure out which of the two versions to reference. We need to defer  
'til run time, and do so in a way that's not too horribly inefficient.  
We would likewise prefer not to have function pointers become  
unmanageably large, since they already essentially need to carry  
around a pointer to the function itself and a pointer to the argument  
frame for that function. Doubling that is not reasonable. If we use  
descriptor handles, however, we can reduce the load of what we're  
passing around at runtime to a single value. We build a list of  
functions whose addresses are taken, and create a descriptor table for  
them. Eventually, the descriptor handle will be relocated to the  
actual address of the proper half of the descriptor (since the  
invocation point knows at compile time which tree it's in).

Conceptually, something like this. Depending on what level of program  
memory access you have (sorry, hazy memory about what the enhanced  
PIC16 was or wasn't going to do about that), a trickier actual  
implementation may be better.
struct fptr_descriptor {
   void (*main_tree_fn)();
   void *main_tree_args;
   void (*isr_tree_fn)();
   void *isr_tree_args;
};

Each translation unit builds up a table of these, with function  
pointers being a relocation indicating the appropriate slot in the  
table. For global unreachable function removal to work, we just need  
to consider all functions in this table as being called (both  
versions). Good analysis should enable even better results, but the  
simple solution should still be pretty good.

All of this will work without any consideration of optimizing call  
frame allocation. That can be done completely separately without  
interfering (sorry for the pun) with any of these bits, including the  
function pointers, since we've kept the complete separation of ISR and  
mainline trees.

To summarize, make two copies of each function very early and keep the  
ISR and mainline trees separate all the way through. Unreachable  
function removal will get rid of any copies that aren't needed.  
Function pointers become a descriptor table that reference both  
copies, and the runtime invokes the proper bit based on which tree the  
call point is in.

Best regards,

   Jim

On Aug 24, 2009, at 12:12 PM, Alireza.Moshtaghi at microchip.com wrote:
>
> Up front I apologize for the lengthy email. Since our patch was not
> accepted, Chris asked us to follow up with this issue on llvm-dev. For
> the sake of completeness, let me give a bit of background and the
> problems that we are facing. I tried to capture as much as possible  
> here
> so Please do give us feedback.
> Don't take your stack for granted.... PIC16 has very poor pointer
> handling, hence stack access are very limited and inefficient; so we  
> do
> static analysis of function frame size and statically allocate the
> function frame in memory (including locals, arguments and spills);  
> then,
> to save memory, we overlay the frame of functions that are not on the
> same call branch.
> Consequently, functions are non reentrant. We can sacrifice recursion
> (because of other limitations in pic16 architecture) but still
> reentrancy is needed because ISR thread can call same function that is
> also called from main thread; examples are stdlib functions and
> intrinsic for math (div/mul/float/etc)
> Here is what we did - but was rejected and we would like to know which
> part (if any) we can keep and which part we must change:
> 1- Root ISR function is defined by user. Since the ISR is located at
> certain address in program memory, we used the section attribute of
> function to encode that the function needs to be at a section with  
> fixed
> address (our in-house linker takes care of the details). We did not  
> add
> new interrupt attribute because it seemed redundant because we can
> encode whatever is needed in the section attribute in the front-end.
> 2- Added two passes at the end of llvm-ld. First pass does call graph
> analysis to distinguish ISR thread from main thread and add few more
> pieces of info to section attribute of the function to indicate  
> whether
> it is called from main thread or ISR thread; if called from both, the
> function has to be cloned. To clone, code and data must be cloned. Not
> having stack, and accessing local variables through their symbolic
> reference, we need to also clone the local variables and modify the
> cloned function to use different naming convention for local variables
> (there are other ways to do this also but in practice, that requires  
> two
> different frameIndexes for one function one for locals and the other  
> for
> args and spills). So after this pass, the main thread and ISR thread  
> are
> completely disjoint; this takes care of stdlib functions as well  
> because
> they are defined in .bc or .a library files (there is a caveat on
> function pointers that I'm going to explain later). The second pass  
> does
> simple function coloring to identify which function frames can be
> overlaid. Color info is also added to section attribute.
> 3- Code generation is mostly similar for main vs ISR threads except  
> for
> function interface of root ISR; but the most important problem is the
> intrinsic functions for non-native operations such as mul/div and
> floating point. So while generating code for these calls, we use
> different naming conventions depending on whether we are generating  
> code
> for a function on ISR-thread or main-thread (this information is in  
> the
> section attribute of the function). Two versions of intrinsics exist  
> in
> the pic16 specific library and the linker will use according to naming
> convention.
> 4- There is a bit of cleanup left to be done in AsmPrinter and we are
> good to go.
> 5- Handling of function pointers is not implemented yet. But since we
> don't have dynamic linking, through alias analysis we should be able  
> to
> find targets of each pointer and when taking address of these  
> functions,
> depending on whether it is in isr or main thread, we can resolve the
> problem efficiently. This can be done in a separate pass; the only  
> thing
> that we have to disallow is calling same function pointer from both  
> the
> ISR and main threads.
> 6- Handling of varargs requires that a separate frame be created for
> arguments for each call site with varargs. This is going to be handled
> while codegen and by the overlay algorithm to save memory.
>
> If you have come this far, thank you for being interested in my brain
> dysfunction. And I am really interested to know your input.
>
> -Ali
>
>
>>>> We should discuss this on llvmdev, I think it came up before
but
> there
>>>> was no conclusive plan that was proposed.
>>>>
>>> The approach that we thought for PIC16 can be described in a
>>> single line as below.
>>>
>>> "Keep the functions called from ISR and main separate by
cloning the
>>> shared ones".
>>>
>>>
>>>> In short, I think that this sort of thing should be modeled as
an
>>>> attribute on function definitions, not declarations.  I
don't
>>>> understand why you need to clone entire call graphs, but this
is
> best
>>>> discussed on llvmdev.
>>>>
>>> It isn't cloning the entire call graph. It's cloning only
the shared
>>> ones as mentioned above.
>>>
>>> The information that a function is ISR is encoded in the section
> string
>>> for the lack of a better attribute.
>>>
>>> Plus we also wanted to encode some information to inform codegen
> about
>>> whether we are codegenerating an interrupt line function. Again the
>>> section string is used to encode the same. Probably annotations
> might be
>>> a better place but that does not work currently (and also,
accessing
>>> them slower). This information is required so that the codegen can
> emit
>>> calls to appropriate intrinsics.
>>>
>>> So, IMO, the approach is simple, but probably the implementation
> wasn't
>>> clean.
>>>
>>> Ideas welcome.
>>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Regehr

2009-Aug-25 15:59 UTC

head link

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

> Function pointers are where things get fun. To do these, we need to
> determine at run time whether we need to call the ISR or the mainline
> version of a function
This sounds convenient but it may well be overkill.

On a PIC-class platform we can probably consider it to be a design flaw if 
the programmer doesn't know whether a function pointer will be 
dereferenced from interrupt context or not.  This suggests that for any 
function whose address is taken, there could be a required annotation such 
as ISR_ONLY or NONISR_ONLY.  The compiler could use this to do the right 
thing without any heroic static analysis or dynamic binding.

John

Sanjiv Gupta

2009-Aug-26 18:40 UTC

head link

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

Jim Grosbach wrote:> Hi Ali,
>
> Thanks for bringing this up. You're definitely under very tight design 
> constraints from the hardware. I can certainly sympathize.Jim,
First of all, thank you very much for understanding every detail of the 
problem at our hand and coming up with a solution that addresses every 
aspect of it. IMO, given the constraints, this is probably the best 
design we can implement. In this email, my queries/observations mostly 
relate to the implementation details.> To straightforwardly create parallel call trees, one for the mainline 
> and one for the ISR, I believe it will be best to create two versions 
> of every function that's compiled very early in the process, before 
> any lowering of frames to static addresses or anything of that sort is 
> done.The ISR-tree functions should be flagged as such and have their 
> names mangled to reflect. Any call instructions in the ISR-tree 
> function should be modified to call the equivalent ISR-tree function 
> of the callee. In our earlier patch, we tried to do the same thing by writing an 
llvm-ld pass. There we could clone the shared functions, their arguments 
frames and local frames to construct parallel call trees. But now it 
looks that clang will be a better place to do the same. For that, clang 
will need the appropriate TargetInfo interfaces to allow precisely the 
following things:
1. Allow targets to emit an ISR version of a function.
2. Allow targets to detect that they are emitting code for an ISR 
version, so that call sites/goto sites can be modified.
And I assume that the DebugInfo for the renamed functions will 
automatically fall in place.

My first choice would be clang to do it but was just curious to know if 
there are some inherent limitations with the llvm-ld pass
approach?> When it comes time to allocate static memory space for the call 
> frames, we simply have to do it twice. Once for the mainline 
> functions, and once for the ISR functions. We can assert that nothing 
> can overlap between the trees since we know that the trees are 
> disjoint except via the ISR itself. Thus, we don't overlay anything 
> from one tree with anything from the other tree. True. To overlay within these two separate trees we used an llvm-ld pass 
again. Though we overlaid only the leaf nodes in that pass but I think 
using a runOnSCC we can extend this approach to overlay sections in such 
a way that a function and any of its predecessor have separate memory 
areas. Does that approach sound ok?> User assembly throws a big wrench into the works, but it's not 
> irreconcilable. I suggest requiring the user to flag assembly 
> functions as being either for mainline or for ISR code. Doing so via a 
> naming convention would likely tie in the easiest with the function 
> cloning done above. With the convention being sufficiently magic 
> (read: not legal C identifiers so as to avoid collisions), the 
> assembler can issue diagnostics if a call instruction is seen that 
> references a function not in the proper tree. That is, the 
> disjointness of the trees will be enforceable even in user assembly 
> with a bit of cooperation from the assembler.We used this in the AsmPrinter to rename intrinsics called from ISR 
code. As, codgen currently ties the libcall names in ISelLowering 
constructors and then doesn't allow changing the name dynamically. So 
we'll still need to handle the same in AsmPrinter.

Thanks
- Sanjiv

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Aug 2009 - [LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

Possibly Parallel Threads