On Mon, Dec 29, 2014 at 8:04 PM, Warren Young <wyml at etr-usa.com> wrote:> >>> >>> the world where you design, build, and deploy The System is disappearing fast. >> >> Sure, if you don't care if you lose data, you can skip those steps. > > How did you jump from incremental feature roll-outs to data loss? There is no necessary connection there.No, it's not necessary for either code interfaces or data structures to change in backward-incompatible ways. But the people who push one kind of change aren't likely to care about the other either.> In fact, I?d say you have a bigger risk of data loss when moving between two systems released years apart than two systems released a month apart. That?s a huge software market in its own right: legacy data conversion.I'm not really arguing about the timing of changes, I'm concerned about the cost of unnecessary user interface changes, code interface breakage, and data incompatibility, regardless of when it happens. RHEL's reason for existence is that it mostly shields users from that within a major release. That doesn't make it better when it happens when you are forced to move to the next one.> If your software is DBMS-backed and a new feature changes the schema, you can use one of the many available systems for managing schema versions. Or, roll your own; it isn?t hard.Are you offering to do it for free?> You test before rolling something to production, and you run backups so that if all else fails, you can roll back to the prior version.That's fine if you have one machine and can afford to shut down while you make something work. Most businesses aren't like that.> None of this is revolutionary. It?s just what you do, every day.And it is time consuming and expensive.>> when it breaks it's not the developer answering >> the phones if anyone answers at all. > > Tech support calls shouldn?t go straight to the developers under any development model, short of sole proprietorship, and not even then, if you can get away with it. There needs to be at least one layer of buffering in there: train up the secretary to some basic level of cluefulness, do everything via email, or even hire some dedicated support staff. > > It simply costs too much to break a developer out of flow to allow a customer to ring a bell on a developer?s desk at will.Beg your pardon? How about not breaking the things that trigger the calls in the first place - or taking some responsibility for it. Do you think other people have nothing better to do?> Since we?re contrasting with waterfall development processes that may last many years, but not decades, I?d say the error has already been made if you?re still working with a waterfall-based methodology today. >We never change more than half of a load-balenced set of servers at once. So all changes have to be compatible when running concurrently, or worth rolling out a whole replacement farm.>> some stuff can't be. > > Very little software must be developed in waterfall fashion.If you run continuous services you either have to be able to run new/old concurrently or completely duplicate your server farm as you roll out incompatible clients.> Last time I checked, this sort of software only accounted for about ~5% of all software produced, and that fraction is likely dropping, with the moves toward cloud services, open source software, subscription software, and subsidized software. > > The vast majority of software developed is in-house stuff, where the developers and the users *can* enter into an agile delivery cycle.OK, but they have to not break existing interfaces when they do that. And that's not the case with OS upgrades.>> If you are, say, adding up dollars, how many times do you want that >> functionality to change? > > I?m not sure what you?re asking.I'm asking if computer science has advanced to the point where adding up a total needs new functionality, or if you would like the same total for the same numbers that you would have gotten last year. Or more to the point, if the same program ran correctly last year, wouldn't it be nice if it still ran the same way this year, in spite of the OS upgrade you need to do because of the security bugs that keep getting shipped while developers spend their time making arbitrary changes to user interfaces.> Compare a rolling release model like that of Cygwin or Ubuntu (not LTS). Something might break every few months, which sounds bad until you consider that the alternative is for *everything* to break at the same time, every 3-7 years.When your system requires extensive testing, the few times it breaks the better. Never would be nice...>>> I don?t mean that glibly. I mean you have made a fundamental mistake if your system breaks badly enough due to an OS change that you can?t fix it within an iteration or two of your normal development process. The most likely mistake is staffing your team entirely with people who have never been through a platform shift before. >> >> Please quantify that. How much should a business expect to spend per >> person to re-train their operations staff to keep their systems >> working across a required OS update? Not to add functionality. To >> keep something that was working running the way it was? > > If you hire competent people, you pay zero extra to do this, because this is the job they have been hired to do.That's nonsense for any complex system. There are always _many_ different OS versions in play and many different development groups that only understand a subset, and every new change they need to know about costs time and risks mistakes.> That's pretty much what IT/custom development is: coping with churn.And it is expensive. Unnecessarily so, in my opinion.>> How many customers for your service did you keep running non-stop >> across those transitions? > > Most of our customers are K-12 schools, so we?re not talking about a 24/7 system to begin with. K-12 runs maybe 9 hours a day (7am - 4pm), 5 days a week, 9 months out of the year. That gives us many upgrade windows.That's a very different scenario than a farm of data servers that have to be available 24/7.> We rarely change out hardware or the OS at a particular site. We generally run it until it falls over, dead. > > This means we?re still building binaries for EL3.I have a few of those, but I don't believe that is a sane thing to recommend.> This also means our software must *remain* broadly portable. When we talk about porting to EL7, we don?t mean that it stops working on EL6 and earlier. We might have some graceful feature degradation where the older OS simply can?t do something the newer one can, but we don?t just chop off an old OS because a new one came out. >You'd probably be better off in java if you aren't already.>>> Everyone?s moaning about systemd...at least it?s looking to be a real de facto standard going forward. >> >> What you expect to pay to re-train operations staff -just- for this >> change, -just- to keep things working the same.. > > You ask that as if you think you have a no-cost option in the question of how to address the churn.I ask it as if I think that software developers could make changes without breaking existing interfaces. And yes, I do think they could if they cared about anyone who built on those interfaces.>> We've got lots of stuff that will drop into Windows server versions >> spanning well over a 10 year range. > > Yes, well, Linux has always had a problem with ABI stability. Apparently the industry doesn?t really care about this, evidenced by the fizzling of LSB, and the current attacks on the work at freedesktop.org. Apparently we?d all rather be fractious than learn to get along well enough that we can nail down some real standards.Well, that has done a great job of keeping Microsoft in business.> I?ve never done much with Windows Server, but my sense is that they have plenty of churn over in their world, too. We?ve got SELinux and SystemD, they?ve got UAC, SxS DLLs, API deprecation, and tools that shuffle positions on every release. (Where did they move the IPv4 configuration dialog this time?!) > > We get worked up here about things like the loss of 32-bit support, but over in MS land, they get API-of-the-year. JET, ODBC, OLE DB, or ADO? Win32, .NET desktop, Silverlight, or Metro? GDI, WinG, DirectX, Windows Forms or XAML? On and on, and that?s just if you stay within the MSDN walls.Yes, there are changes - and sometimes mysterious breakage. But an outright abandonment of an existing interface that breaks previously working code s pretty rare (and I don't like it when they do it either...).>> Were you paying attention when Microsoft wanted to make XP obsolete? >> There is a lot of it still running. > > Were you paying attention when Target?s XP-based POS terminals all got pwned? > > Stability and compatibility are not universal goods.Well, some things you have to get right in the first place - and then stability is good.> Google already did that cost/benefit calculation: they tried staying on RH 7.1 indefinitely, and thereby built up 10 years of technical debt. Then when they did jump, it was a major undertaking, though one they apparently felt was worth doing.And conversely, they felt is was worth _not_ doing for a very very long time. So can the rest of us wait until we have google's resources?>> And why do you think it is a good thing >> for this to be a hard problem or for every individual user to be >> forced to solve it himself? > > I never said it was a good thing. I?m just reporting some observations from the field.Maybe I misunderstood - I thought you were defending the status quo - and the fedora developers that bring it to us. -- Les Mikesell lesmikesell at gmail.com
On Dec 29, 2014, at 10:07 PM, Les Mikesell <lesmikesell at gmail.com> wrote:> it's not necessary for either code interfaces or data structures > to change in backward-incompatible ways.You keep talking about the cost of coping with change, but apparently you believe maintaining legacy interfaces is cost-free. Take it from a software developer: it isn?t. People praise Microsoft for maintaining ancient interfaces, and attribute their success to it, but it?s really the other way around: their success pays for the army of software developers it takes to keep a handle on the complexity that results from piling 20-30 years of change on top of the same base. Even having mobilized that army, a huge amount of the problems with Windows come directly as a result of choosing to maintain such a huge legacy of backwards compatibility. Just one example: By default, anyone can write to the root of the C: drive on Windows. Why? Because DOS and Win16 allowed it, so a huge amount of software was written to expect that they could do it, too. Hence, the root of your Windows box?s filesystem is purposely left insecure. Most organizations cannot afford to create the equivalents of WOW64, which basically emulates Win32 on top of Win64. (Or *its* predecessor, WOW, which emulates Win16 on top of Win32.) That isn?t trivial to do, especially at the level Microsoft does it, where a whole lot of clever low-level code is employed to allow WOW64 code to run nearly as fast as native Win64 code. Meanwhile over in the Linux world, we have a whole lot of the code being written by unpaid volunteers, and a lot of the rest is being written by developers employed by organizations that do not enjoy a legal means for forcing their customers to pay for each and every seat of the software their developers created. Result? We cannot afford to maintain every interface created during the quarter century of Linux?s existence. Every now and then, we have to throw some ballast overboard. I?m not saying that CentOS should be killed off, and all its users be forced to pay for RHEL licenses. I?m saying that one of the trade-offs of using a free OS is that you have to pick up some of the slack on your end.>> If your software is DBMS-backed and a new feature changes the schema, you can use one of the many available systems for managing schema versions. Or, roll your own; it isn?t hard. > > Are you offering to do it for free?This is one of the things my employer pays me to do. This is what I?m telling you: the job description is, ?Cope with change.?> I'm asking if computer science has advanced to the point where adding > up a total needs new functionality, or if you would like the same > total for the same numbers that you would have gotten last year.Mathematics doesn?t change. The business and technology worlds do. Your example is a non sequitur.>>> How many customers for your service did you keep running non-stop >>> across those transitions? >> >> Most of our customers are K-12 schools, so we?re not talking about a 24/7 system to begin with. > > That's a very different scenario than a farm of data servers that have > to be available 24/7.How many single computers have to be up 24/7? I mean really. If you have any form of cluster ? from old-school shared-everything style to new-style shared-nothing style ? you can partition it and upgrade individual nodes. If your system isn?t in use across the world, you must have windows of low or zero usage where upgrades can happen. If your system *is* in use across the world, you likely have it partitioned across continents anyway. The days of the critical single mainframe computer are fading fast. We?re going to get to a point where it makes as much sense to talk about 100% uptime for single computers as it does to talk about hard drives that never fail.>> We rarely change out hardware or the OS at a particular site. We generally run it until it falls over, dead. >> >> This means we?re still building binaries for EL3. > > I have a few of those, but I don't believe that is a sane thing to recommend.It depends on the market. A lot of Linux boxes are basically appliances. When was the last time you upgraded the OS on your home router? I don?t mean flashing new firmware ? which is rare enough already ? I mean upgrading it to a truly different OS. Okay, so that?s embedded Linux, it doesn?t seem remarkable that such systems never change, once deployed. The thing is, there really isn?t a narrow, bright line between ?embedded? and the rest of the Linux world. It?s a wide, gray line, covering a huge amount of the Linux world.>> This also means our software must *remain* broadly portable. > > You'd probably be better off in java if you aren't already.If you actually had a basis for making such a sweeping prescription like that, 90% of software written would be written in Java. There?s a pile of good reasons why software continues to be written in other languages, either on top of other runtimes or on the bare metal. No, don?t argue. I don?t want to start a Java flame war here. Just take it from a software developer, Java is not a universal, unalloyed good.>>>> Everyone?s moaning about systemd...at least it?s looking to be a real de facto standard going forward. >>> >>> What you expect to pay to re-train operations staff -just- for this >>> change, -just- to keep things working the same.. >> >> You ask that as if you think you have a no-cost option in the question of how to address the churn. > > I ask it as if I think that software developers could make changes > without breaking existing interfaces. And yes, I do think they could > if they cared about anyone who built on those interfaces.Legacy code isn?t free to keep around. Take systemd. You can go two ways here: 1. sysvinit should also be supported as a first-class citizen in EL7. If that?s your point, then just because the sysvinit code was already written doesn?t mean there isn?t a cost to continuing to maintain and package it. 2. sysvinit should never have been replaced. If that?s your position, you?re free to switch to a sysvinit based OS, or fork EL6. What, sounds like work? Too costly? That must be because it isn?t free to keep maintaining old code.>> I?ve never done much with Windows Server, but my sense is that they have plenty of churn over in their world, too. > > Yes, there are changes - and sometimes mysterious breakage. But an > outright abandonment of an existing interface that breaks previously > working code s pretty rareYes, well, that?s one of the things you can do when you?ve got a near-monopoly on PC OSes, which allows you to employ 128,000 people. [1] When you only employ 6,500 [2] and a huge chunk of your customer base doesn?t pay you for the use of the software you write, you necessarily have to do business differently. [1] http://en.wikipedia.org/wiki/Microsoft [2] http://en.wikipedia.org/wiki/Red_Hat>>> Were you paying attention when Microsoft wanted to make XP obsolete? >>> There is a lot of it still running. >> >> Were you paying attention when Target?s XP-based POS terminals all got pwned? >> >> Stability and compatibility are not universal goods. > > Well, some things you have to get right in the first place - and then > stability is good.Security changes, too. 10 years ago, 2FA was something you only saw in high-security environments. Today, I have two different 2FA apps on the phone in my pocket. That phone is protected by a biometric system, which protects access to a trapdoor secure data store. My *phone* does this. The phone I had 10 years ago would let you hook a serial cable up and suck its entire contents out without even asking you for a password.>> Google already did that cost/benefit calculation: they tried staying on RH 7.1 indefinitely, and thereby built up 10 years of technical debt. Then when they did jump, it was a major undertaking, though one they apparently felt was worth doing. > > And conversely, they felt is was worth _not_ doing for a very very > long time. So can the rest of us wait until we have google's > resources?You?re never going to have Google?s resources. Therefore, you will never have the *option* to roll your own custom OS. So, cope with change.
On Dec 31, 2014, at 11:00 AM, m.roth at 5-cent.us wrote:> Warren Young wrote: >> >> How many single computers have to be up 24/7? > > A hundred or more, here, individual servers, 24x7.I?m more interested in a percentage than absolute values. And I?m only interested in boxes that simply cannot go down for a bit of maintenance every now and then. As counterexamples, DNS, NTP, and SMTP servers are out, because these protocols were explicitly designed to cope with short temporary outages.> Home directory servers, > backup servers, compute nodes, some of which have jobs that run for days, > or a week or two, and that's not counting the clusters that do the same... > and mostly dump the data to home or project directories.That?s all possible to work around. Home servers: SAN design points the way. Backup servers: Ditto if you mean home directory mirrors. If you mean hot failover nodes in a cluster, I already pointed out that clusters let you upgrade via temporary partitioning. Compute nodes: I didn?t ask how many boxes you have that share the same 9/5/180 usage pattern of our customers. I asked how many you have that must run 24/7/365 or Bad Things happen. When a job that?s been running for 2 weeks finishes, there?s your maintenance window. Take it if you need it, let it go if you don?t.
On Wed, Dec 31, 2014 at 11:03 AM, Warren Young <wyml at etr-usa.com> wrote:> On Dec 29, 2014, at 10:07 PM, Les Mikesell <lesmikesell at gmail.com> wrote: > >> it's not necessary for either code interfaces or data structures >> to change in backward-incompatible ways. > > You keep talking about the cost of coping with change, but apparently you believe maintaining legacy interfaces is cost-free. > > Take it from a software developer: it isn?t.OK, but should one developer make an extra effort or the bazillion people affected by it?> People praise Microsoft for maintaining ancient interfaces, and attribute their success to it, but it?s really the other way around: their success pays for the army of software developers it takes to keep a handle on the complexity that results from piling 20-30 years of change on top of the same base.That's what it takes to build and keep a user base.> Most organizations cannot afford to create the equivalents of WOW64, which basically emulates Win32 on top of Win64. (Or *its* predecessor, WOW, which emulates Win16 on top of Win32.) That isn?t trivial to do, especially at the level Microsoft does it, where a whole lot of clever low-level code is employed to allow WOW64 code to run nearly as fast as native Win64 code.It's hard to the extent that you made bad choices in interfaces in the first place. Microsoft's job was hard. But Unix SysV which Linux basically emulates wasn't so bad. Maybe a few size definitions could have been better.> Result? We cannot afford to maintain every interface created during the quarter century of Linux?s existence. Every now and then, we have to throw some ballast overboard.And the user base that depended on them.>>> If your software is DBMS-backed and a new feature changes the schema, you can use one of the many available systems for managing schema versions. Or, roll your own; it isn?t hard. >> >> Are you offering to do it for free? > > This is one of the things my employer pays me to do. This is what I?m telling you: the job description is, ?Cope with change.?So either it "isn't hard", or "you need a trained, experienced, professional staff to do it". Big difference. Which is it?>> I'm asking if computer science has advanced to the point where adding >> up a total needs new functionality, or if you would like the same >> total for the same numbers that you would have gotten last year. > > Mathematics doesn?t change. The business and technology worlds do. Your example is a non sequitur.If you are embedding business logic in your library interfaces, something is wrong. I'm talking about things that are shipped in the distribution and the commands to manage them. The underlying jobs they do were pretty well established long ago.> How many single computers have to be up 24/7? I mean really.All of our customer-facing services - a nd most internal infrastructure. Admittedly, not individual boxes - but who wants to have systems running concurrently with major differences in code base and operations/maintenance procedures?> If you have any form of cluster ? from old-school shared-everything style to new-style shared-nothing style ? you can partition it and upgrade individual nodes.Yes, everything is redundant. But when changes are not backwards compatible it makes piecemeal updates way harder than they should be. Take something simple like the dhcp server in the disto. It allows for redundant servers - but the versions are not compatible. How do you manage that by individual node upgrades when they won't fail over to each other?> If your system isn?t in use across the world, you must have windows of low or zero usage where upgrades can happen. If your system *is* in use across the world, you likely have it partitioned across continents anyway.How nice for you...>>> This means we?re still building binaries for EL3. >> >> I have a few of those, but I don't believe that is a sane thing to recommend. > > It depends on the market. A lot of Linux boxes are basically appliances. When was the last time you upgraded the OS on your home router? I don?t mean flashing new firmware ? which is rare enough already ? I mean upgrading it to a truly different OS. > > Okay, so that?s embedded Linux, it doesn?t seem remarkable that such systems never change, once deployed.Which sort of points out that the wild and crazy changes in the mainstream distributions weren't all that necessary either...>>> This also means our software must *remain* broadly portable. >> >> You'd probably be better off in java if you aren't already. > > If you actually had a basis for making such a sweeping prescription like that, 90% of software written would be written in Java.I do. We have a broad mix of languages, some with requirements that force it, some just for historical reasons and the team that maintains it. The java stuff has been much less problematic in porting across systems - or running the same code concurrently under different OS's/versions at once. I don't think the C++ guys have even figured out a sane way to use a standard boost version on 2 different Linux's, even doing separate builds for them.> There?s a pile of good reasons why software continues to be written in other languages, either on top of other runtimes or on the bare metal.Maybe. I think there's a bigger pile of not-so-good reasons that things aren't done portably. Java isn't the only way to be portable, but you don't see much on the scale of elasticsearch, jenkins or opennms done cross-platform in other languages.> No, don?t argue. I don?t want to start a Java flame war here. Just take it from a software developer, Java is not a universal, unalloyed good.The syntax is cumbersome - but there are things like groovy or jruby that run on top of it. And there's a lot of start-up overhead, but that doesn't matter much to long-running servers.> Take systemd. You can go two ways here: > > 1. sysvinit should also be supported as a first-class citizen in EL7. If that?s your point, then just because the sysvinit code was already written doesn?t mean there isn?t a cost to continuing to maintain and package it. > > 2. sysvinit should never have been replaced. If that?s your position, you?re free to switch to a sysvinit based OS, or fork EL6. What, sounds like work? Too costly? That must be because it isn?t free to keep maintaining old code.Yes, I'm forced to deal with #1. That doesn't keep me from wishing that whatever code change had been done had kept backwards compatibility in the user interface commands and init scripts department.>>> I?ve never done much with Windows Server, but my sense is that they have plenty of churn over in their world, too. >> >> Yes, there are changes - and sometimes mysterious breakage. But an >> outright abandonment of an existing interface that breaks previously >> working code s pretty rare > > Yes, well, that?s one of the things you can do when you?ve got a near-monopoly on PC OSes, which allows you to employ 128,000 people. [1]And you only get that with code that keeps users instead of driving them away.>> And conversely, they felt is was worth _not_ doing for a very very >> long time. So can the rest of us wait until we have google's >> resources? > > You?re never going to have Google?s resources. Therefore, you will never have the *option* to roll your own custom OS. > > So, cope with change.What google does points out how unsuitable the distro really is. I just don't see why it has to stay that way. -- Les Mikesell lesmikesell at gmail.com