Well, the stack pointer be a single byte, so pushing things on there doesn't work terribly well. Assuming I pass by reference, that's 128 values absolutely total before it wraps around and silently clobbers itself. It means single byte values will be incredibly inefficient... Tricky stuff. I'm lucky on the C64 since it's rare to exit back to the kernel with machine language apps (never did it when I was a kid at least), so if I destroy the Kernel's stack, no one will ever know! Mwahaha! With regard to code layout, ideally everything would get inlined since I have gobs of memory compared to everything else. I wouldn't need to worry as much about the stack as long as real values don't get stored there.> On Jul 2, 2014, at 9:44 PM, Bruce Hoult <bruce at hoult.org> wrote: > > I've considered doing this as well :-) As an exercise to learn the LLVM back end as much as anything. > > It probably makes sense to allocate 8 or 16 pairs of zero page locations as virtual 16 bit registers and make 32 bit operations available only as library routines/intrinsics. > > What would be *really* helpful would be if LLVM had a way to detect that certain functions and sets of functions (probably >90% of the program) are NOT recursive and statically allocate fixed zero page and/or high memory locations for their local variables. > > If you have the call DAG, you can turn that into a total ordering such that if A transitively calls B then the locations of A's locals will always be at higher addresses than B's locals. (or vice versa). > > This will probably be a bit bigger than the maximum dynamic depth of a stack implementation, but I think usually not a lot. And it lets you use absolute addressing instead of slow (zp),y, and also let you avoid saving and restoring simulated callee-save registers. > > > > >> On Thu, Jul 3, 2014 at 1:23 PM, Edwin Amsler <edwinguy at gmail.com> wrote: >> Hey there! >> >> I've started to embark on a path to try and create a backend for a 39 year old CPU with only an accumulator, two index registers, and a 256 byte stack. It does have a bank of 256 bytes before the stack that are pretty quick though. >> >> Really, if I can get an assembler out of `llc`, that'll be success enough for me. Clang would be better, but I think that might be crazy talk. >> >> I've been doing lots of research so far, but from the experts, how feasible does this sound? >> >> I've also been banging my head against the wall trying to figure out what all the classes for different instruction types do. Is there a nicely documented index? Is it in source somewhere, or should I start one? >> >> Thanks, >> >> Edwin >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140703/0c76f122/attachment.html>
On Fri, Jul 4, 2014 at 12:02 PM, Edwin Amsler <edwinguy at gmail.com> wrote:> Well, the stack pointer be a single byte, so pushing things on there > doesn't work terribly well. > > Assuming I pass by reference, that's 128 values absolutely total before it > wraps around and silently clobbers itself. It means single byte values will > be incredibly inefficient... Tricky stuff. >You absolutely don't want anything on the hardware stack except function return addresses and possibly very temp storage e.g. PHA (push A); do something that destroys A, PLA (pull A). Or you could use a ZP temp for that. STA ZP; LDA ZP is I think cycle or two faster, PHA/PLA is two bytes smaller ... size usually wins. The "C" local variables stack absolutely needs to be somewhere else, bigger, and using a pair of ZP locations as the stack pointer (SP). You can't index off the hardware stack pointer, for a start. As mentioned before, if possible you'd want to statically allocate as many local vars as possible, as LDA $nnnn is a byte smaller and twice as fast (4 vs 8) as LDY #nn; LDA (SP),Y. (you'll sometimes be able to use INY or DEY instead of the load .. or just reuse the last value. But still...) With regard to code layout, ideally everything would get inlined since I> have gobs of memory compared to everything else. I wouldn't need to worry > as much about the stack as long as real values don't get stored there. >I actually think that the ideal 6502 compiler would output actual 6502 code (mostly) only for leaf functions, and everything else should be compiled to some kind of byte code. The 6502 was a very very cheap chip to build hardware wise, but the code is BULKY. Even when operating on 8 bit values it's worse than, say, Thumb2, due to the lack of registers. On 16 or 32 bit values it's diabolical if everything is done inline. Wozniak's "Sweet 16" is still not a terrible design for this, but I think a bit more thought can come up with something better. The Sweet16 interpreter is pretty small though (under 512 bytes I think?), which is pretty important. http://www.6502.org/source/interpreters/sweet16.htm The criteria whether to use native or bytecode for a given function is pretty similar to the inlining decision. And a decent compact, small interpreter, byte code solution could be reused on other 8 bit CPUs. Some of which are still in active use today, and so even commercially important e.g. 8051, AVR, and PIC. Erm .. are we boring the rest of llvmdev yet? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140704/bb738245/attachment.html>
I suppose that once you've got a 6502 working, adding support for a 4510 shouldn't be too difficult.... (http://c65gs.blogspot.com.au/) On Fri, Jul 4, 2014 at 10:38 AM, Bruce Hoult <bruce at hoult.org> wrote:> On Fri, Jul 4, 2014 at 12:02 PM, Edwin Amsler <edwinguy at gmail.com> wrote: > >> Well, the stack pointer be a single byte, so pushing things on there >> doesn't work terribly well. >> >> Assuming I pass by reference, that's 128 values absolutely total before >> it wraps around and silently clobbers itself. It means single byte values >> will be incredibly inefficient... Tricky stuff. >> > > You absolutely don't want anything on the hardware stack except function > return addresses and possibly very temp storage e.g. PHA (push A); do > something that destroys A, PLA (pull A). Or you could use a ZP temp for > that. STA ZP; LDA ZP is I think cycle or two faster, PHA/PLA is two bytes > smaller ... size usually wins. > > The "C" local variables stack absolutely needs to be somewhere else, > bigger, and using a pair of ZP locations as the stack pointer (SP). You > can't index off the hardware stack pointer, for a start. > > As mentioned before, if possible you'd want to statically allocate as many > local vars as possible, as LDA $nnnn is a byte smaller and twice as fast (4 > vs 8) as LDY #nn; LDA (SP),Y. (you'll sometimes be able to use INY or DEY > instead of the load .. or just reuse the last value. But still...) > > > With regard to code layout, ideally everything would get inlined since I >> have gobs of memory compared to everything else. I wouldn't need to worry >> as much about the stack as long as real values don't get stored there. >> > > I actually think that the ideal 6502 compiler would output actual 6502 > code (mostly) only for leaf functions, and everything else should be > compiled to some kind of byte code. The 6502 was a very very cheap chip to > build hardware wise, but the code is BULKY. Even when operating on 8 bit > values it's worse than, say, Thumb2, due to the lack of registers. On 16 or > 32 bit values it's diabolical if everything is done inline. > > Wozniak's "Sweet 16" is still not a terrible design for this, but I think > a bit more thought can come up with something better. The Sweet16 > interpreter is pretty small though (under 512 bytes I think?), which is > pretty important. > > http://www.6502.org/source/interpreters/sweet16.htm > > The criteria whether to use native or bytecode for a given function is > pretty similar to the inlining decision. And a decent compact, small > interpreter, byte code solution could be reused on other 8 bit CPUs. > > Some of which are still in active use today, and so even commercially > important e.g. 8051, AVR, and PIC. > > Erm .. are we boring the rest of llvmdev yet? > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140704/675d2e0a/attachment.html>