Mehdi Amini via llvm-dev
2017-Jan-14 18:53 UTC
[llvm-dev] RFC: Building GlobalISel by default
> On Jan 14, 2017, at 4:57 AM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 14 January 2017 at 01:54, Quentin Colombet via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Now, four backends (if I am counting right: X86, ARM, AArch64, AMDGPU) are working on bringing-up GlobalISel, I’d like to switch the default of the LLVM_BUILD_GLOBAL_ISEL variable in CMake, such that the framework gets built by default. > > Hi Quentin, > > I'm not extremely confident in a full switch right now, mainly due to > Michael's concerns. The ARM port is in its initial stage and there > could still be many refactorings and destructive changes. I'm not > foreseeing a cross between the isel tests, but every failure makes the > bots red, and if Global ISel incurs in more failures due to its stage > in the development cycle, we'll start seeing the bots more red than > we'd like. Right now, it's already very likely that you'll receive bot > emails on commits, I don't want to increase that.I don’t see why having global ISel is supposed to increase the rate. `make check` will runs on the committer machine with GlobalISel enabled. The good question from Michael is IMO "if my (non-GlobalISel) change breaks GlobalISel, how likely is it to be a bug in my code vs. a bug in GlobalISel?” But that’s totally unrelated to the bots, since it’ll fails on the committer machine as well.> > One alternative is to create a buildbot that turns it on by default, > build all back-ends that use GlobalISel and then let it running for a > while. I don't think we're self-hosting on any architecture right now, > nor we're passing the test-suites, so the bot will probably just be a > simple "make check-all", which is more than enough for the time being.This is in place for months: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel/> > As soon as we can self-host and pass the test-suite, I think we can > enable it by default.I don’t understand why it is an interesting property to be able to pass the test-suite to have `make check` testing it.> > Some comments inline... > > >> * Upsides * >> >> For people developing on GlobalISel, it will: >> - Simplify the CMake command to type :) >> - Build/Test GlobalISel on all the LLVM bots > > This is already covered by the people developing Global ISel on the > different architectures. As an experimental feature, I'm ok with the > round time of "ARM developers picking up x86 code breaking their > stuff". > > >> For people not developing on GlobalISel, it will: >> - Test that GlobalISel still works with your changes (make check will test that for you) > > A separate buildbot will do that for you. > > >> - Allow you to play with it! > > The cost of adding one CMake option is really small. > > >> * Downsides * >> >> For people developing on GlobalISel, it will: >> - Leave the status as it is now, meaning that mainly only people working on GlobalISel look at the failures of GlobalISel specific bots > > If we already have those, I don't see why we need more. We're not > self-hosting not passing the test-suite. the "check-all" tests should > not matter which platform they run.The `check-all` are *not* testing GlobalISel right now because GlobalISel is not *built* on the bots. This is *exactly* the point of the proposal: have `make` building the global isel code on every platform, and have `make check` testing the lit-tests for GlobalISel on every platform. This is *not* about enabling GlobalISel *by default* on any target AFAIK. — Mehdi> > >> For people not developing for GlobalISel, it will: >> - Increase the compile time since the GlobalISel framework and the related target specific parts will have to be built >> - Increase the size of the binary (depending on what backend you pull) >> - Require the setting of an additional CMake variable to disable it (-DLLVM_BUILD_GLOBAL_ISEL=OFF) > > This has a higher impact on slow bots, such as ARM and MIPS, and it's > not a trivial amount of time and space. > > On previous occasions (for example integrated-asm, lld, vecorizer), we > have used the "self-host + test-suite passing" model and it worked > well. Meaning, before it passes those two, it should have special > buildbots, after, it can be turned on by default. > > To move to the new technology we need an additional step, which is to > fix most remaining bugs, which we have done for the vectorizer and > integrated-asm, but not lld, and that's why the two first are not only > built by default, but enabled by default, while the latter isn't. > > I want Global ISel to be a success as much as you do, but I'd rather > go through a smooth path, even if it takes a bit longer. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2017-Jan-14 19:32 UTC
[llvm-dev] RFC: Building GlobalISel by default
On 14 January 2017 at 18:53, Mehdi Amini <mehdi.amini at apple.com> wrote:> I don’t see why having global ISel is supposed to increase the rate. `make check` will runs on the committer machine with GlobalISel enabled.You assume the environments are the same, so the same bugs will be caught on an x86 host or on an ARM host. If that was true, than most of the breakages we have would just not happen. To make matters worse, if a breakage is unknown to the committer (ie, due to some GISel idiosyncrasy), the committer will not worry about it and will leave the bot red, which increases the load on bot maintainers.> But that’s totally unrelated to the bots, since it’ll fails on the committer machine as well.If that's unrelated, than why is it a plus that we have all other buildbots building it instead of just the one Apple has it already?> This is *not* about enabling GlobalISel *by default* on any target AFAIK.I got that. That is a second step, for later. My point is that, if a feature is not good enough to be used anywhere, then it can remain as experimental, since the cost is just adding a flag to enable the back-end. I can easily do that on one of my bots, just for that purpose, but I wouldn't feel comfortable doing that to all my bots at once (or having to use special flags to remove experimental features). --renato
Mehdi Amini via llvm-dev
2017-Jan-14 20:03 UTC
[llvm-dev] RFC: Building GlobalISel by default
> On Jan 14, 2017, at 11:32 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 14 January 2017 at 18:53, Mehdi Amini <mehdi.amini at apple.com> wrote: >> I don’t see why having global ISel is supposed to increase the rate. `make check` will runs on the committer machine with GlobalISel enabled. > > You assume the environments are the same, so the same bugs will be > caught on an x86 host or on an ARM host. If that was true, than most > of the breakages we have would just not happen. > > To make matters worse, if a breakage is unknown to the committer (ie, > due to some GISel idiosyncrasy), the committer will not worry about it > and will leave the bot red, which increases the load on bot > maintainers. > > >> But that’s totally unrelated to the bots, since it’ll fails on the committer machine as well. > > If that's unrelated, than why is it a plus that we have all other > buildbots building it instead of just the one Apple has it already?Because everyone on their local machine will get the failure before pushing their patch. And it’s not just the people working on GlobalISel that will have to cleanup behind.> > >> This is *not* about enabling GlobalISel *by default* on any target AFAIK. > > I got that. That is a second step, for later. My point is that, if a > feature is not good enough to be used anywhere, then it can remain as > experimental, since the cost is just adding a flag to enable the > back-end.This is a very similar situation to any experimental backend: we turn them on by default when they are stable enough to not disturb the other features in LLVM. There has never be a constraint about “being able to bootstrap” or “run the test-suite”, I don’t see any reason to apply such criteria to GlobalISel, that seems totally arbitrary to me.> > I can easily do that on one of my bots, just for that purpose, but I > wouldn't feel comfortable doing that to all my bots at once (or having > to use special flags to remove experimental features).With this state of mind, Green dragon would build X86 and ARM and disable all the other targets, because... why should we care about these? They only make our build less “stable” and “take time”. — Mehdi
Michael Kuperstein via llvm-dev
2017-Jan-15 04:24 UTC
[llvm-dev] RFC: Building GlobalISel by default
On 14 January 2017 at 10:53, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Jan 14, 2017, at 4:57 AM, Renato Golin via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > On 14 January 2017 at 01:54, Quentin Colombet via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > >> Now, four backends (if I am counting right: X86, ARM, AArch64, AMDGPU) > are working on bringing-up GlobalISel, I’d like to switch the default of > the LLVM_BUILD_GLOBAL_ISEL variable in CMake, such that the framework gets > built by default. > > > > Hi Quentin, > > > > I'm not extremely confident in a full switch right now, mainly due to > > Michael's concerns. The ARM port is in its initial stage and there > > could still be many refactorings and destructive changes. I'm not > > foreseeing a cross between the isel tests, but every failure makes the > > bots red, and if Global ISel incurs in more failures due to its stage > > in the development cycle, we'll start seeing the bots more red than > > we'd like. Right now, it's already very likely that you'll receive bot > > emails on commits, I don't want to increase that. > > I don’t see why having global ISel is supposed to increase the rate. `make > check` will runs on the committer machine with GlobalISel enabled. > > The good question from Michael is IMO "if my (non-GlobalISel) change > breaks GlobalISel, how likely is it to be a bug in my code vs. a bug in > GlobalISel?” > > But that’s totally unrelated to the bots, since it’ll fails on the > committer machine as well. > >To emphasize this - I don't think the bots should be a blocker, the issue that bothers me is local developer workflow. Regarding "we shouldn't enable it because it will make the bots slower" - well, yes, but that's just postponing the inevitable. We will enable GlobalISel eventually, and there will probably be a very long time-frame during which both are enabled concurrently. If we expect to have some magic solution to make overall testing much faster in the near future, it may make sense to delay until it's implemented. But I'm not aware of anything like that. As to it making the bots "redder" because of failures in the newer GlobalISel ports - if devs don't see a lot of local breakage, then the effect on the bots shouldn't be big either. And if they do, I think it's a non-starter anyway. Still, as a strawman proposal - would it make sense to add LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL, or something of that sort? The more mature ports will build under LLVM_BUILD_GLOBAL_ISEL (which will be enabled by default), the less mature ones only under LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL (which won't be), and moving a port from GLOBAL_ISEL_EXPERIMENTAL to GLOBAL_ISEL would be equivalent to marking a regular backend non-experimental. We can start with just AArch64 in the non-experimental category, and move the others as they become more stable. Or is this too much complexity for what we gain? Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170114/da8f2558/attachment.html>
Renato Golin via llvm-dev
2017-Jan-15 13:14 UTC
[llvm-dev] RFC: Building GlobalISel by default
On 15 January 2017 at 04:24, Michael Kuperstein <michael.kuperstein at gmail.com> wrote:> Regarding "we shouldn't enable it because it will make the bots slower" - > well, yes, but that's just postponing the inevitable. We will enable > GlobalISel eventually, and there will probably be a very long time-frame > during which both are enabled concurrently.No magic solutions expected. I just don't want to crank all bots up to 11 on day one. We start with the fast bots, when they're happy, we move on until we're confident that the whole thing is stable, then we make it build by default. There are two problems here: 1. Environmental issues. We do find more of those than we'd hope. Different compilers, linkers, libraries that produce not only crashes, but unstable builds and unpredictable or irreproducible test failures. The environment on our bots are wildly different (on purpose) and dealing with different unpredictable issues all at once is something that I'm not willing to take on lightly. Due to the nature of buildbots and some of our hardware and the fact that virtually all ARM/AArch64 bots are ours, this will be something that Linaro will pay the price in full. 2. Experimental nature. If the past serves as a guide, experimental features at the core of LLVM will be largely independent until they start to merge, when they can become a huge issue and be in merging state for months, if not years, or worse, never be merged. Right now, we're not at that state, we're still largely independent, so your worries regarding the development tests (which also hit buildbots at a different pace because of the environment issues above), will be lower for now, but will increase massively soon enough. There is the argument that having the bots building means we'll have *less* teething issues and the people building GlobalISel (which also includes Linaro) will have a lot less work. But there's a cost to be paid, and that cost is a lot higher on the ARM/AArch64 side than x86, just because most developers use an x86 as their main machine.> Still, as a strawman proposal - would it make sense to add > LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL, or something of that sort? The more > mature ports will build under LLVM_BUILD_GLOBAL_ISEL (which will be enabled > by default), the less mature ones only under > LLVM_BUILD_GLOBAL_ISEL_EXPERIMENTAL (which won't be), and moving a port from > GLOBAL_ISEL_EXPERIMENTAL to GLOBAL_ISEL would be equivalent to marking a > regular backend non-experimental. We can start with just AArch64 in the > non-experimental category, and move the others as they become more stable. > Or is this too much complexity for what we gain?It is. We already separate that by not building any other back-end than ARM and AArch64. So I think we can do that without the added complexity. I'm not trying to block the move, I'm just trying to make it smother and raise awareness that the cost we pay for any large move is not well distributed. For the past few months I have reverted patches that only broke the ARM bots and it was clear on most of those instances which patch was to blame, but people didn't bother and after 8, 12 sometimes 24 hours, I reverted. This is not a nice place to be in, and given that resurgence lately, I am honestly afraid adding more things will make it worse. A red bot can't catch new errors. A slow red bot can go on red for weeks (and we have seen it numerous times) because once you fix bug #1, #2 has made #3 appear and not be warned. This is not everyone, and I do appreciate all the people that helped us and worried about the breakages, but it's a trend that is increasing, not decreasing. Our team is too small to cope with further increases, so if the community is willing to do its part, I think we can make it work. But it has to be slow and steady. cheers, --renato