On 19 July 2016 at 22:35, Mehdi Amini <mehdi.amini at apple.com> wrote:> Claiming that it "will be *a lot* less” burden that now is easy, but I don’t see any obvious fact to back this up. > What is the current maintenance requirement of SVN/Git? Can someone who knows provides some fact?I'll let Anton tell his side, and Tanya talk about the real costs, but here are some facts I know: Our ARM/AArch64 buildbots fail around 2~3 times a month with SVN errors. Sometimes it's only the fast ones, sometimes all of them (depends on how long it takes to fix). Sometimes the fix is just "wait", sometimes Anton has to actively fix it. (he also has to work, sleep, eat, etc). In the past, we were hit by web spiders that ignored completely the robots.txt file. Anton has made that better, but it can escalate if the spider realise we blocked them. There are ways to work around, but not without accidentally blocking innocent people (mostly in China). The cost of the AWS servers is ~$5k / year. It's not *only* for SVN, but also for web servers and hosting packages. Recently we turned off the deb hosting because of budget (our server and bandwidth couldn't cope with it). So, while $5k/year might not look like much, it's enough to pay a lot of students to go to the LLVM events, that couldn't otherwise go. It's also nowhere near what we would like if we were to host a robust repository with the features that GitHub can provide. Mainly bandwidth, storage, stability and support. Given the AWS costs that I've seen at Linaro, we'd have to *at least* double that money to host a dedicated machine with enough bandwidth to have repositories, binaries, videos etc. not counting paying someone to actively maintain it, if we want to compare one to one with what GitHub provides for free. I will make no attempt at estimating Anton's time, or Tanya's or anyone else's, but I believe they (and their companies/universities) would very much rather they work on actual compiler stuff. I'm sure that, if we join the human cost, it'll far outweigh the infrastructure costs, even if we double/triple our current spending. On the other hand, as Tim has shown, a web-service with a JSON file will be running some web server which is light and cheaper than a normal web-page to deliver (less content, less bandwidth, less storage, less I/O), and could serve hundreds, if not thousands of queries per second with a small AWS image. The web-hooks would be setup once and hosted by GitHub, so zero additional work from our side, as well as all the forking, branching, merging, SVN interface (which we can't easily get if we move to local Git). The level of failure in the web-services will be lower (lower load, less probability of barfing) and even if it does, it will only affect the services that use it (buildbots, LNT, bisect), not any other developer. Moreover, our side of the web-service can fail catastrophically and need a wipe and restart, and *none* of our commit history would be affected. On the other hand, if the SVN fails catastrophically today, I don't know if we have a good backup policy that will mean commits could be lost. GitHub may not provide guarantees, but they do have proper backup policies. All in all, may not look much, but running a decent and stable web service with so much at stake is *not* a simple task, and we shouldn't take it for granted. cheers, --renato
> On Jul 19, 2016, at 2:59 PM, Renato Golin <renato.golin at linaro.org> wrote: > > On 19 July 2016 at 22:35, Mehdi Amini <mehdi.amini at apple.com> wrote: >> Claiming that it "will be *a lot* less” burden that now is easy, but I don’t see any obvious fact to back this up. >> What is the current maintenance requirement of SVN/Git? Can someone who knows provides some fact? > > I'll let Anton tell his side, and Tanya talk about the real costs, but > here are some facts I know: > > Our ARM/AArch64 buildbots fail around 2~3 times a month with SVN > errors. Sometimes it's only the fast ones, sometimes all of them > (depends on how long it takes to fix).That’s relevant data.> Sometimes the fix is just > "wait", sometimes Anton has to actively fix it. (he also has to work, > sleep, eat, etc). > > In the past, we were hit by web spiders that ignored completely the > robots.txt file. Anton has made that better, but it can escalate if > the spider realise we blocked them. There are ways to work around, but > not without accidentally blocking innocent people (mostly in China).That’s not relevant: this is about the WWW server, it does not have to be related to the hosting the repos.> > The cost of the AWS servers is ~$5k / year. It's not *only* for SVN, > but also for web servers and hosting packages. Recently we turned off > the deb hosting because of budget (our server and bandwidth couldn't > cope with it).Same.> > So, while $5k/year might not look like much, it's enough to pay a lot > of students to go to the LLVM events, that couldn't otherwise go.Moving the SVN repo does not solve hosting videos, Debian packages, etc. I suspect most of the bandwidth does not come from `svn up` or `git pull`.> It's > also nowhere near what we would like if we were to host a robust > repository with the features that GitHub can provide.Like… proper hooks?> Mainly > bandwidth, storage, stability and support. > > Given the AWS costs that I've seen at Linaro, we'd have to *at least* > double that money to host a dedicated machine with enough bandwidth to > have repositories, binaries, videos etc. not counting paying someone > to actively maintain it, if we want to compare one to one with what > GitHub provides for free.You’re again conflating svn/git and hosting “binaries and videos”. I don’t think we ever planned to host these on github?> I will make no attempt at estimating Anton's time, or Tanya's or > anyone else's, but I believe they (and their companies/universities) > would very much rather they work on actual compiler stuff. I'm sure > that, if we join the human cost, it'll far outweigh the infrastructure > costs, even if we double/triple our current spending.Possibly, I don’t know, but that’s exactly why I asked for first hand data on the subject (i.e. Anton and/or Tanya) about hosting the git/SVN repos themselves, instead of hand-wavy “I believe” discussions.> > On the other hand, as Tim has shown, a web-service with a JSON file > will be running some web server which is light and cheaper than a > normal web-page to deliver (less content, less bandwidth, less > storage, less I/O), and could serve hundreds, if not thousands of > queries per second with a small AWS image. > > The web-hooks would be setup once and hosted by GitHub, so zero > additional work from our side, as well as all the forking, branching, > merging, SVN interface (which we can't easily get if we move to local > Git). > > The level of failure in the web-services will be lower (lower load, > less probability of barfing) and even if it does, it will only affect > the services that use it (buildbots, LNT, bisect), not any other > developer. > > Moreover, our side of the web-service can fail catastrophically and > need a wipe and restart, and *none* of our commit history would be > affected. On the other hand, if the SVN fails catastrophically today, > I don't know if we have a good backup policy that will mean commits > could be lost. GitHub may not provide guarantees, but they do have > proper backup policies. > > All in all, may not look much, but running a decent and stable web > service with so much at stake is *not* a simple task, and we shouldn't > take it for granted.Sure, "running a decent and stable web service is not a simple task”, that’s what I’m saying. — Mehdi
On 19 July 2016 at 23:16, Mehdi Amini <mehdi.amini at apple.com> wrote:>> In the past, we were hit by web spiders that ignored completely the >> robots.txt file. Anton has made that better, but it can escalate if >> the spider realise we blocked them. There are ways to work around, but >> not without accidentally blocking innocent people (mostly in China). > > That’s not relevant: this is about the WWW server, it does not have to be related to the hosting the repos.No, this is about hosting the SVN server. The SVN view was disabled for months this year before we could really see what was going on.> Moving the SVN repo does not solve hosting videos, Debian packages, etc. > I suspect most of the bandwidth does not come from `svn up` or `git pull`.They share the same bandwidth, and sometimes the same server. It is relevant. One thing making SVN slow was the amount of Debian packages being downloaded form the same place.> Like… proper hooks?If we can work around it, and it seems we can, this is not such a big issue.> You’re again conflating svn/git and hosting “binaries and videos”. I don’t think we ever planned to host these on github?No, but they all share bandwidth. We moved videos to Youtube to offload the bandwidth, and moving the code to GitHub shares the same mindset.> Possibly, I don’t know, but that’s exactly why I asked for first hand data on the subject (i.e. Anton and/or Tanya) about hosting the git/SVN repos themselves, instead of hand-wavy “I believe” discussions.Bear in mind that I gave you facts (bandwidth problems, turned off SVN services, constant breakdowns, expertise in handling traffic, backup solutions). I also made you aware that the human cost is not *just* Tanya and Anton, but also me and everyone else that maintains buildbots, external mirrors, etc. and it *is* larger than the hardware costs. You just don't see it because we're all volunteers. Branding them as "hand-wavy I believe" is *not* appropriate. cheers, --renato
On 2016-07-20 00:16, Mehdi Amini via llvm-dev wrote:> Sure, "running a decent and stable web service is not a simple task”, that’s what I’m saying.Perhaps Travis CI could be used. They support triggering builds using an API [1]. [1] https://docs.travis-ci.com/user/triggering-builds -- /Jacob Carlborg