On Fri, 27 Aug 2004 02:20:34 -0500 (CDT) Chris Lattner <sabre at nondot.org> wrote:> On Thu, 26 Aug 2004, Jeff Cohen wrote: > > > Also, the store into the arrays generates two x86 machine > > instructions: > > > > lea %ECX, DWORD PTR [%ESP + 16] > > mov DWORD PTR [%ECX + <element offset>], %EAX > > > > These can be combined into a single instruction. I'm tempted to > > work on this one myself :) > > Yup, there are several things that can be improved in the X86 > instruction selector. If you submit a patch to implement this, we > would be happy to apply it. Continuing improvements in the code > generator should eventually make this kind of thing fall out > automatically, but for now they must be implemented manually. > > -ChrisI succumbed to temptation and made the improvement. Diffs are attached for X86ISelSimple.cpp and X86InstrBuilder.h. I determined that the reason two instructions are generated in the first place, instead of being folded immediately into one, is because locals do not have a physical offset assigned at that time. There is a peephole optimization pass after the stack frame is finalized, but the problem with folding them there is that the physical registers have also been assigned, so register CX (in this case) would be wasted. So I generalized the concept of a "full address". There were four variables, base reg, scale, index reg and displacement, that were being passed around. I converted them into a single struct and added a new type field that allows the base reg to instead be a frame index. The extra lea instruction is now folded immediately, and the code for processing store instructions shrinks quite nicely. This should be generalizable even further to handle global variables. I didn't go that far, because it appeared there may be other work required. It wasn't clear a displacement to a global was supported. Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: X86ISelSimple.cpp.diff Type: application/octet-stream Size: 21366 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: X86InstrBuilder.h.diff Type: application/octet-stream Size: 2211 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/c7756315/attachment-0001.obj>
Jeff, Chris isn't likely to respond to this for a while as he's on vacation. I'll take a look at it and will commit it if it looks good. Since code gen isn't my specialty, could you increase my comfort level a little by giving me some examples of the test results you got when testing your patches? Ideally, I'd like to see some of the test/Programs/MultiSource programs working and generating correct code with this path. Thanks, Reid. On Sat, 2004-08-28 at 21:21, Jeff Cohen wrote:> On Fri, 27 Aug 2004 02:20:34 -0500 (CDT) > Chris Lattner <sabre at nondot.org> wrote: > > > On Thu, 26 Aug 2004, Jeff Cohen wrote: > > > > > Also, the store into the arrays generates two x86 machine > > > instructions: > > > > > > lea %ECX, DWORD PTR [%ESP + 16] > > > mov DWORD PTR [%ECX + <element offset>], %EAX > > > > > > These can be combined into a single instruction. I'm tempted to > > > work on this one myself :) > > > > Yup, there are several things that can be improved in the X86 > > instruction selector. If you submit a patch to implement this, we > > would be happy to apply it. Continuing improvements in the code > > generator should eventually make this kind of thing fall out > > automatically, but for now they must be implemented manually. > > > > -Chris > > I succumbed to temptation and made the improvement. Diffs are attached > for X86ISelSimple.cpp and X86InstrBuilder.h. > > I determined that the reason two instructions are generated in the first > place, instead of being folded immediately into one, is because locals > do not have a physical offset assigned at that time. There is a > peephole optimization pass after the stack frame is finalized, but the > problem with folding them there is that the physical registers have also > been assigned, so register CX (in this case) would be wasted. > > So I generalized the concept of a "full address". There were four > variables, base reg, scale, index reg and displacement, that were being > passed around. I converted them into a single struct and added a new > type field that allows the base reg to instead be a frame index. The > extra lea instruction is now folded immediately, and the code for > processing store instructions shrinks quite nicely. > > This should be generalizable even further to handle global variables. I > didn't go that far, because it appeared there may be other work > required. It wasn't clear a displacement to a global was supported. > > Jeff > > ______________________________________________________________________ > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040828/0ad538e3/attachment.sig>
Fair enough... The following tests under MultiSource fail: Benchmarks/Olden/power Benchmarks/OptimizerEval Benchmarks/Ptrdist/ks Benchmarks/MallocBench/perl Applications/sgefa However, they also fail in the exact same way without my change. OptimizerEval appears to be non-deterministic; it produces different output each time it runs. Everything else passes. I also compared a few *.s files generated by the tests, with and without my change, to see if everything is as I expected. I tried running SingleSource also, and ran into a few problems. Some of the tests failed because they cannot include alloca.h. This may be a linux dependency. Anyway, again I saw the same results with or without my change. I cannot run qmtest at all. I don't seem to have it or any way of building it. There's no rush to commit this. If you don't feel comfortable doing so, it can wait until Chris gets back. And as it's my first submission, you shouldn't feel comfortable :) Jeff On Sat, 28 Aug 2004 22:20:57 -0700 Reid Spencer <reid at x10sys.com> wrote:> Jeff, > > Chris isn't likely to respond to this for a while as he's on vacation. > I'll take a look at it and will commit it if it looks good. Since code > gen isn't my specialty, could you increase my comfort level a little by > giving me some examples of the test results you got when testing your > patches? Ideally, I'd like to see some of the test/Programs/MultiSource > programs working and generating correct code with this path. > > Thanks, > > Reid.
On Sat, 28 Aug 2004, Jeff Cohen wrote:> I succumbed to temptation and made the improvement. Diffs are attached > for X86ISelSimple.cpp and X86InstrBuilder.h.The patches look great, that's a very elegant way to address the problem, and also simplified some code, thanks!> I determined that the reason two instructions are generated in the first > place, instead of being folded immediately into one, is because locals > do not have a physical offset assigned at that time. There is a > peephole optimization pass after the stack frame is finalized, but the > problem with folding them there is that the physical registers have also > been assigned, so register CX (in this case) would be wasted.Yup.> So I generalized the concept of a "full address". There were four > variables, base reg, scale, index reg and displacement, that were being > passed around. I converted them into a single struct and added a new > type field that allows the base reg to instead be a frame index. The > extra lea instruction is now folded immediately, and the code for > processing store instructions shrinks quite nicely.Sounds good.> This should be generalizable even further to handle global variables. I > didn't go that far, because it appeared there may be other work > required. It wasn't clear a displacement to a global was supported.That would be cool as well. A displacement from a global should be supported. Look at the code generated for: int G; void foo() { G = 4; } ... and you'll see what I mean. :) -Chris -- http://llvm.org/ http://nondot.org/sabre/