Hi Daniel, Thank you for your detailed reply, and thank you for working on GVNPRE. I’d more than happy to test/evaluate it with our benchmark once it is ready. Please let me know if you need any help. Thanks, Taewook From: Daniel Berlin <dberlin at dberlin.org> Date: Tuesday, April 4, 2017 at 6:13 PM To: Taewook Oh <twoh at fb.com> Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [NewGVN] Plan for GVNPRE? Hi Taewook, I have prototypes, and they work well enough that i was convinced it's not a big deal to implement. If you are interested in working on it, let me know. Otherwise, i'll get to it. I'm just knocking down the rest of the generated code perf differences i can find that matter between gvn and newgvn. we are fairly close (IMHO) at this point. Though i wrote GCC's current GVN-PRE, i am unlikely to take exactly the same approach for LLVM (iterative dataflow) I am also still testing out various non-"lifetime optimal" approaches. I am busy making the current GVN a complete one, which will eliminate: 1. the need for full redundancy elimination 2. issues that pop up where we discover we could phi values together (this is really a case of #1). That will leave *literally* partial redundancies as the only thing that needs to be eliminated, simplifying our task. The only source of incompleteness in NewGVN is the inability to consider phi of ops and op of phis the same thing (and the related phi + op and op + phi) Here is an example (pull from a real program where it makes a significant perf difference) : ; <label>:9: ; preds = %7, %4 %10 = phi i64 [ %0, %4 ], [ %11, %7 ] %11 = add nsw i64 %10, -1 %12 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* @a, i64 0, i64 0), align 16, !tbaa !2 %13 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* @b, i64 0, i64 0), align 16, !tbaa !2 %14 = mul nsw i64 %13, %12 %15 = icmp eq i64 %14, 0 br i1 %15, label %7, label %16 ; <label>: 17 %18 = phi i64 [ %26, %17 ], [ %13, %16 ] %19 = phi i64 [ %24, %17 ], [ %12, %16 ] %20 = phi i64 [ %22, %17 ], [ 0, %16 ] %21 = mul nsw i64 %18, %19 store i64 2, i64* %2, align 8, !tbaa !2 %22 = add nuw nsw i64 %20, 1 %23 = getelementptr inbounds [100 x i64], [100 x i64]* @a, i64 0, i64 %22 %24 = load i64, i64* %23, align 8, !tbaa !2 %25 = getelementptr inbounds [100 x i64], [100 x i64]* @b, i64 0, i64 %22 %26 = load i64, i64* %25, align 8, !tbaa !2 %27 = mul nsw i64 %26, %24 %28 = icmp eq i64 %22, %27 br i1 %28, label %6, label %17 The multiply at 18, 19 is completely redundant. It is a phi of the computations in 9 and 17. Once done (almost!), newgvn will actually detect that a simple phi inserted in 17 would have the same value as %21. The work is https://bugs.llvm.org//show_bug.cgi?id=31868<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.llvm.org__show-5Fbug.cgi-3Fid-3D31868&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=bOw6bI0rowF-LG3wUx222kr4j4lEwZCz1sGk-DR39Ts&e=> Wwhen we see op of phis, we use fake instructions to treat it like we saw ops in the predecessor blocks, and see whether the fake phi we build ends up with real instructions as for the operands. if it does, we insert it. Note this computation is similar to the one SSAPRE does to determine whether anything is redundant. Once you do this, i believe all initial partial redundancies (IE not second order effects) in the program will be of a form of a fake phi with one real leader, one fake., recursively [1] Then, again, you perform a similar sparse dataflow that SSAPRE does to figure out which will, if inserted, make other computations redundant. IE c = a + b if (foo) b = 50 d = a + b this will end up as d = op + phi WE will convert to phi of ops form, phi(a + 50, a +b) One real (a+b), one fake (a+50). full availability will show up as real,real: int x, c, y; x = 3; if (c) x = 2; y = x + 1; return y; phi(2 + 1, 3 + 1) etc The only practical difference will be the computation of what we want to insert (in FRE, we would insert only fully-available phis, in PRE, we insert instructions to make the not-fully-available ones real) On Tue, Apr 4, 2017 at 3:04 PM, Taewook Oh via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hello, In some of our internal benchmarks, I observe that LLVM performs worse than GCC because LLVM fails to perform PRE when GCC can. I hope this problem goes away when NewGVN equipped with PRE, and wonder if anyone has an idea about the status of PRE on top of NewGVN. Thanks! Best, Taewook _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=Vsgb6wKgVyG2pYh24S1iO6teXM5YmLuMw8Gh_-rPbvI&e=> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170405/40e2e563/attachment-0001.html>
Of course As a heads up, it's likely to be a while (ie 3-6 months or maybe longer) if you leave it to me. I try to stay out of the critical path, because my real day job is managing folks who work on llvm (and other things), not working on llvm, and that has a lot of interrupts :) This is why i try to hand work off if it's going to become important to someone. On Tue, Apr 4, 2017 at 9:29 PM, Taewook Oh <twoh at fb.com> wrote:> Hi Daniel, > > > > Thank you for your detailed reply, and thank you for working on GVNPRE. > I’d more than happy to test/evaluate it with our benchmark once it is > ready. Please let me know if you need any help. > > > > Thanks, > > Taewook > > > > *From: *Daniel Berlin <dberlin at dberlin.org> > *Date: *Tuesday, April 4, 2017 at 6:13 PM > *To: *Taewook Oh <twoh at fb.com> > *Cc: *"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> > *Subject: *Re: [llvm-dev] [NewGVN] Plan for GVNPRE? > > > > Hi Taewook, > > I have prototypes, and they work well enough that i was convinced it's not > a big deal to implement. > > If you are interested in working on it, let me know. > > Otherwise, i'll get to it. > > I'm just knocking down the rest of the generated code perf differences i > can find that matter between gvn and newgvn. > > we are fairly close (IMHO) at this point. > > > > Though i wrote GCC's current GVN-PRE, i am unlikely to take exactly the > same approach for LLVM (iterative dataflow) > > I am also still testing out various non-"lifetime optimal" approaches. > > > > I am busy making the current GVN a complete one, which will eliminate: > 1. the need for full redundancy elimination > > 2. issues that pop up where we discover we could phi values together (this > is really a case of #1). > > > > That will leave *literally* partial redundancies as the only thing that > needs to be eliminated, simplifying our task. > > > > > > The only source of incompleteness in NewGVN is the inability to consider > phi of ops and op of phis the same thing (and the related phi + op and op + > phi) > > Here is an example (pull from a real program where it makes a significant > perf difference) > > : > > ; <label>:9: ; preds = %7, %4 > > %10 = phi i64 [ %0, %4 ], [ %11, %7 ] > > %11 = add nsw i64 %10, -1 > > %12 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* > @a, i64 0, i64 0), align 16, !tbaa !2 > > %13 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* > @b, i64 0, i64 0), align 16, !tbaa !2 > > %14 = mul nsw i64 %13, %12 > > %15 = icmp eq i64 %14, 0 > > br i1 %15, label %7, label %16 > > > > ; <label>: 17 > > %18 = phi i64 [ %26, %17 ], [ %13, %16 ] > > %19 = phi i64 [ %24, %17 ], [ %12, %16 ] > > %20 = phi i64 [ %22, %17 ], [ 0, %16 ] > > %21 = mul nsw i64 %18, %19 > > store i64 2, i64* %2, align 8, !tbaa !2 > > %22 = add nuw nsw i64 %20, 1 > > %23 = getelementptr inbounds [100 x i64], [100 x i64]* @a, i64 0, i64 > %22 > > %24 = load i64, i64* %23, align 8, !tbaa !2 > > %25 = getelementptr inbounds [100 x i64], [100 x i64]* @b, i64 0, i64 > %22 > > %26 = load i64, i64* %25, align 8, !tbaa !2 > > %27 = mul nsw i64 %26, %24 > > %28 = icmp eq i64 %22, %27 > > br i1 %28, label %6, label %17 > > > > The multiply at 18, 19 is completely redundant. > > It is a phi of the computations in 9 and 17. > > > > Once done (almost!), newgvn will actually detect that a simple phi > inserted in 17 would have the same value as %21. > > The work is https://bugs.llvm.org//show_bug.cgi?id=31868 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.llvm.org__show-5Fbug.cgi-3Fid-3D31868&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=bOw6bI0rowF-LG3wUx222kr4j4lEwZCz1sGk-DR39Ts&e=> > > > > Wwhen we see op of phis, we use fake instructions to treat it like we saw > ops in the predecessor blocks, and see whether the fake phi we build ends > up with real instructions as for the operands. if it does, we insert it. > > Note this computation is similar to the one SSAPRE does to determine > whether anything is redundant. > > > > Once you do this, i believe all initial partial redundancies (IE not > second order effects) in the program will be of a form of a fake phi with > one real leader, one fake., recursively [1] > > Then, again, you perform a similar sparse dataflow that SSAPRE does to > figure out which will, if inserted, make other computations redundant. > > > > IE c = a + b > > if (foo) > > b = 50 > > d = a + b > > > > this will end up as d = op + phi > > > > WE will convert to phi of ops form, phi(a + 50, a +b) > > One real (a+b), one fake (a+50). > > full availability will show up as real,real: > > int x, c, y; > > x = 3; > > if (c) > > x = 2; > > y = x + 1; > > return y; > > > > phi(2 + 1, 3 + 1) > > > > etc > > > > The only practical difference will be the computation of what we want to > insert (in FRE, we would insert only fully-available phis, in PRE, we > insert instructions to make the not-fully-available ones real) > > > > > > > > > > On Tue, Apr 4, 2017 at 3:04 PM, Taewook Oh via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hello, > > > > In some of our internal benchmarks, I observe that LLVM performs worse > than GCC because LLVM fails to perform PRE when GCC can. I hope this > problem goes away when NewGVN equipped with PRE, and wonder if anyone has > an idea about the status of PRE on top of NewGVN. Thanks! > > > > Best, > > Taewook > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=Vsgb6wKgVyG2pYh24S1iO6teXM5YmLuMw8Gh_-rPbvI&e=> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/72a18a6d/attachment.html>
Hi Daniel, Got it. If that's the case, can I implement it under the guidance of your insights/prototype? I think I can spend more time on implementation. Thanks, Taewook ________________________________ From: Daniel Berlin <dberlin at dberlin.org> Sent: Tuesday, April 4, 2017 9:41:30 PM To: Taewook Oh Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [NewGVN] Plan for GVNPRE? Of course As a heads up, it's likely to be a while (ie 3-6 months or maybe longer) if you leave it to me. I try to stay out of the critical path, because my real day job is managing folks who work on llvm (and other things), not working on llvm, and that has a lot of interrupts :) This is why i try to hand work off if it's going to become important to someone. On Tue, Apr 4, 2017 at 9:29 PM, Taewook Oh <twoh at fb.com<mailto:twoh at fb.com>> wrote: Hi Daniel, Thank you for your detailed reply, and thank you for working on GVNPRE. I’d more than happy to test/evaluate it with our benchmark once it is ready. Please let me know if you need any help. Thanks, Taewook From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at dberlin.org>> Date: Tuesday, April 4, 2017 at 6:13 PM To: Taewook Oh <twoh at fb.com<mailto:twoh at fb.com>> Cc: "llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: [llvm-dev] [NewGVN] Plan for GVNPRE? Hi Taewook, I have prototypes, and they work well enough that i was convinced it's not a big deal to implement. If you are interested in working on it, let me know. Otherwise, i'll get to it. I'm just knocking down the rest of the generated code perf differences i can find that matter between gvn and newgvn. we are fairly close (IMHO) at this point. Though i wrote GCC's current GVN-PRE, i am unlikely to take exactly the same approach for LLVM (iterative dataflow) I am also still testing out various non-"lifetime optimal" approaches. I am busy making the current GVN a complete one, which will eliminate: 1. the need for full redundancy elimination 2. issues that pop up where we discover we could phi values together (this is really a case of #1). That will leave *literally* partial redundancies as the only thing that needs to be eliminated, simplifying our task. The only source of incompleteness in NewGVN is the inability to consider phi of ops and op of phis the same thing (and the related phi + op and op + phi) Here is an example (pull from a real program where it makes a significant perf difference) : ; <label>:9: ; preds = %7, %4 %10 = phi i64 [ %0, %4 ], [ %11, %7 ] %11 = add nsw i64 %10, -1 %12 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* @a, i64 0, i64 0), align 16, !tbaa !2 %13 = load i64, i64* getelementptr inbounds ([100 x i64], [100 x i64]* @b, i64 0, i64 0), align 16, !tbaa !2 %14 = mul nsw i64 %13, %12 %15 = icmp eq i64 %14, 0 br i1 %15, label %7, label %16 ; <label>: 17 %18 = phi i64 [ %26, %17 ], [ %13, %16 ] %19 = phi i64 [ %24, %17 ], [ %12, %16 ] %20 = phi i64 [ %22, %17 ], [ 0, %16 ] %21 = mul nsw i64 %18, %19 store i64 2, i64* %2, align 8, !tbaa !2 %22 = add nuw nsw i64 %20, 1 %23 = getelementptr inbounds [100 x i64], [100 x i64]* @a, i64 0, i64 %22 %24 = load i64, i64* %23, align 8, !tbaa !2 %25 = getelementptr inbounds [100 x i64], [100 x i64]* @b, i64 0, i64 %22 %26 = load i64, i64* %25, align 8, !tbaa !2 %27 = mul nsw i64 %26, %24 %28 = icmp eq i64 %22, %27 br i1 %28, label %6, label %17 The multiply at 18, 19 is completely redundant. It is a phi of the computations in 9 and 17. Once done (almost!), newgvn will actually detect that a simple phi inserted in 17 would have the same value as %21. The work is https://bugs.llvm.org//show_bug.cgi?id=31868<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.llvm.org__show-5Fbug.cgi-3Fid-3D31868&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=bOw6bI0rowF-LG3wUx222kr4j4lEwZCz1sGk-DR39Ts&e=> Wwhen we see op of phis, we use fake instructions to treat it like we saw ops in the predecessor blocks, and see whether the fake phi we build ends up with real instructions as for the operands. if it does, we insert it. Note this computation is similar to the one SSAPRE does to determine whether anything is redundant. Once you do this, i believe all initial partial redundancies (IE not second order effects) in the program will be of a form of a fake phi with one real leader, one fake., recursively [1] Then, again, you perform a similar sparse dataflow that SSAPRE does to figure out which will, if inserted, make other computations redundant. IE c = a + b if (foo) b = 50 d = a + b this will end up as d = op + phi WE will convert to phi of ops form, phi(a + 50, a +b) One real (a+b), one fake (a+50). full availability will show up as real,real: int x, c, y; x = 3; if (c) x = 2; y = x + 1; return y; phi(2 + 1, 3 + 1) etc The only practical difference will be the computation of what we want to insert (in FRE, we would insert only fully-available phis, in PRE, we insert instructions to make the not-fully-available ones real) On Tue, Apr 4, 2017 at 3:04 PM, Taewook Oh via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hello, In some of our internal benchmarks, I observe that LLVM performs worse than GCC because LLVM fails to perform PRE when GCC can. I hope this problem goes away when NewGVN equipped with PRE, and wonder if anyone has an idea about the status of PRE on top of NewGVN. Thanks! Best, Taewook _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=kOsLCgQzH7N8ptZ7diJD9g&m=Sv5T08BLc94DspNTrXv67YwHnXVNmVllN9p8B7fZgTs&s=Vsgb6wKgVyG2pYh24S1iO6teXM5YmLuMw8Gh_-rPbvI&e=> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170405/7bfe8f29/attachment.html>