Michael Stellmann via llvm-dev
2018-Jul-22 07:26 UTC
[llvm-dev] Finding scratch register after function call
Thanks Bruce, and elaborately as ever. Again, I'm surprised about your very thorough Z80 knowledge when you said you only did little on the ZX81 in the eighties :D OK, understood. I was first thinking about doing something like this for small frames: 1. push bc # 1 byte; 11 cycles - part of call frame-cleanup: save scratch register +-----begin call-related 2. ld <rr>,stack-param 3. push <rr> ... more code to load non-stack params into registers 4. call ... 5. pop bc # 1 byte ; 10 cycles - call frame cleanup: restore stack-pointer (value in BC is not used) +-----end call-related 6. pop bc # part of call frame-cleanup: restore scratch reg's value The stack cleanup would insert line 5, and have to insert lines 1 and 6 - summing up to 3 bytes of instructions - and maybe the outer two could be eliminated in a late optimization pass, when register usage is known. But then again - looking at your math and the *total* mem and cycles, incl. setup and tear-down - convinced me of dropping my complex idea with saving "BC" and use it for cleanup or sacrificing the calling convention. The complexity just doesn't justify the gains. Instead, going for easy-to-implement solutions: For small call frames with only 1 or 2 params on the stack, two "inc sp" (1 byte, 6 cycles per inst) per parameter can be used, and your "big stack frame" suggestion for larger ones. This also allows keeping a "beneficial" param and return value calling convention: I want to assign the first 3 (HL + DE + BC - or at least 2) function params to registers, so the stack cleanup is only required for functions with more than 3 parameters at all - or vararg funcs. And only functions with more than 5 params will need the "big stack frame" cleanup. Those cases are rare (or at least can be avoided easily by a developer), or, knowing the mechanics, shouldn't be used for time critical inner loops anyway. Being able to keep HL for the return value allows very efficient nested function calls in the form "Func1(Func2(nnn));", as register shuffling can be avoided - the result of Func2 can be passed directly Func1. Thanks for pointing me again to the right direction! Michael Oh, and BTW, I'm planning to do the backend primarily for the MSX - my first computer in 1984. Just for the fun of it, I started now writing a small game for it after 25+ years of absence, and was wondering what 30+ years compiler technology would be able to achieve on such a simple (but challenging, as in "not-alway-straightforward") CPU ;-)
Bruce Hoult via llvm-dev
2018-Jul-22 07:42 UTC
[llvm-dev] Finding scratch register after function call
I had a quick look at some reference material this time :-) I also did some work on a DEC Rainbow and a Kaypro CP/M luggable a few years later. The compilers of the time were just awful! It should be possible to get llvm to produce very good code for the Z80 -- in larger functions probably better than any human would put the effort in to achieve. In particular, what the human regards as the "same variable" (but a different SSA value) might live in different registers at different times. On Sun, Jul 22, 2018 at 12:26 AM, Michael Stellmann < Michael.Stellmann at gmx.net> wrote:> Thanks Bruce, > > and elaborately as ever. Again, I'm surprised about your very thorough Z80 > knowledge when you said you only did little on the ZX81 in the eighties :D > > OK, understood. I was first thinking about doing something like this for > small frames: > > 1. push bc # 1 byte; 11 cycles - part of call frame-cleanup: save scratch > register > > +-----begin call-related > 2. ld <rr>,stack-param > 3. push <rr> > ... more code to load non-stack params into registers > 4. call ... > 5. pop bc # 1 byte ; 10 cycles - call frame cleanup: restore > stack-pointer (value in BC is not used) > +-----end call-related > > 6. pop bc # part of call frame-cleanup: restore scratch reg's value > > The stack cleanup would insert line 5, and have to insert lines 1 and 6 - > summing up to 3 bytes of instructions - and maybe the outer two could be > eliminated in a late optimization pass, when register usage is known. > > But then again - looking at your math and the *total* mem and cycles, > incl. setup and tear-down - convinced me of dropping my complex idea with > saving "BC" and use it for cleanup or sacrificing the calling convention. > The complexity just doesn't justify the gains. Instead, going for > easy-to-implement solutions: > For small call frames with only 1 or 2 params on the stack, two "inc sp" > (1 byte, 6 cycles per inst) per parameter can be used, and your "big stack > frame" suggestion for larger ones. > > This also allows keeping a "beneficial" param and return value calling > convention: > I want to assign the first 3 (HL + DE + BC - or at least 2) function > params to registers, so the stack cleanup is only required for functions > with more than 3 parameters at all - or vararg funcs. > And only functions with more than 5 params will need the "big stack frame" > cleanup. Those cases are rare (or at least can be avoided easily by a > developer), or, knowing the mechanics, shouldn't be used for time critical > inner loops anyway. > Being able to keep HL for the return value allows very efficient nested > function calls in the form "Func1(Func2(nnn));", as register shuffling can > be avoided - the result of Func2 can be passed directly Func1. > > Thanks for pointing me again to the right direction! > > Michael > > Oh, and BTW, I'm planning to do the backend primarily for the MSX - my > first computer in 1984. Just for the fun of it, I started now writing a > small game for it after 25+ years of absence, and was wondering what 30+ > years compiler technology would be able to achieve on such a simple (but > challenging, as in "not-alway-straightforward") CPU ;-) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180722/bb6c38d1/attachment.html>
Michael Stellmann via llvm-dev
2018-Jul-22 07:58 UTC
[llvm-dev] Finding scratch register after function call
>It should be possible to get llvm to produce very good code for the Z80...Yes, I was thinking that too. These techniques didn't exist back then, so I'm really looking forward to the point where the first regular C sources can be compiled and see the magic happening in action live :) ------------------------------------------------------------------------ *From:* Bruce Hoult *Sent:* Sunday, Jul 22, 2018 9:42 AM WEST *To:* Michael Stellmann *Cc:* LLVM Developers Mailing List *Subject:* [llvm-dev] Finding scratch register after function call> I had a quick look at some reference material this time :-) I also did > some work on a DEC Rainbow and a Kaypro CP/M luggable a few years > later. The compilers of the time were just awful! > > It should be possible to get llvm to produce very good code for the > Z80 -- in larger functions probably better than any human would put > the effort in to achieve. In particular, what the human regards as the > "same variable" (but a different SSA value) might live in different > registers at different times. > > On Sun, Jul 22, 2018 at 12:26 AM, Michael Stellmann > <Michael.Stellmann at gmx.net <mailto:Michael.Stellmann at gmx.net>> wrote: > > Thanks Bruce, > > and elaborately as ever. Again, I'm surprised about your very > thorough Z80 knowledge when you said you only did little on the > ZX81 in the eighties :D > > OK, understood. I was first thinking about doing something like > this for small frames: > > 1. push bc # 1 byte; 11 cycles - part of call frame-cleanup: save > scratch register > > +-----begin call-related > 2. ld <rr>,stack-param > 3. push <rr> > ... more code to load non-stack params into registers > 4. call ... > 5. pop bc # 1 byte ; 10 cycles - call frame cleanup: restore > stack-pointer (value in BC is not used) > +-----end call-related > > 6. pop bc # part of call frame-cleanup: restore scratch reg's value > > The stack cleanup would insert line 5, and have to insert lines 1 > and 6 - summing up to 3 bytes of instructions - and maybe the > outer two could be eliminated in a late optimization pass, when > register usage is known. > > But then again - looking at your math and the *total* mem and > cycles, incl. setup and tear-down - convinced me of dropping my > complex idea with saving "BC" and use it for cleanup or > sacrificing the calling convention. The complexity just doesn't > justify the gains. Instead, going for easy-to-implement solutions: > For small call frames with only 1 or 2 params on the stack, two > "inc sp" (1 byte, 6 cycles per inst) per parameter can be used, > and your "big stack frame" suggestion for larger ones. > > This also allows keeping a "beneficial" param and return value > calling convention: > I want to assign the first 3 (HL + DE + BC - or at least 2) > function params to registers, so the stack cleanup is only > required for functions with more than 3 parameters at all - or > vararg funcs. > And only functions with more than 5 params will need the "big > stack frame" cleanup. Those cases are rare (or at least can be > avoided easily by a developer), or, knowing the mechanics, > shouldn't be used for time critical inner loops anyway. > Being able to keep HL for the return value allows very efficient > nested function calls in the form "Func1(Func2(nnn));", as > register shuffling can be avoided - the result of Func2 can be > passed directly Func1. > > Thanks for pointing me again to the right direction! > > Michael > > Oh, and BTW, I'm planning to do the backend primarily for the MSX > - my first computer in 1984. Just for the fun of it, I started now > writing a small game for it after 25+ years of absence, and was > wondering what 30+ years compiler technology would be able to > achieve on such a simple (but challenging, as in > "not-alway-straightforward") CPU ;-) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180722/3da40f97/attachment.html>