thr3ads.net - CentOS - [CentOS] Design changes are done in Fedora [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Les Mikesell

2014-Dec-30 05:07 UTC

[CentOS] Design changes are done in Fedora

On Mon, Dec 29, 2014 at 8:04 PM, Warren Young <wyml at etr-usa.com>
wrote:> >>>
>>> the world where you design, build, and deploy The System is
disappearing fast.
>>
>> Sure, if you don't care if you lose data, you can skip those steps.
>
> How did you jump from incremental feature roll-outs to data loss?  There is
no necessary connection there.
No, it's not necessary for either code interfaces or data structures
to change in backward-incompatible ways.  But the people who push one
kind of change aren't likely to care about the other either.
> In fact, I?d say you have a bigger risk of data loss when moving between
two systems released years apart than two systems released a month apart. 
That?s a huge software market in its own right: legacy data conversion.
I'm not really arguing about the timing of changes, I'm concerned
about the cost of unnecessary user interface changes, code interface
breakage, and data incompatibility, regardless of when it happens.
RHEL's reason for existence is that it mostly shields users from that
within a major release.  That doesn't make it better when it happens
when you are forced to move to the next one.
> If your software is DBMS-backed and a new feature changes the schema, you
can use one of the many available systems for managing schema versions.  Or,
roll your own; it isn?t hard.
Are you offering to do it for free?
> You test before rolling something to production, and you run backups so
that if all else fails, you can roll back to the prior version.
That's fine if you have one machine and can afford to shut down while
you make something work.   Most businesses aren't like that.
> None of this is revolutionary.  It?s just what you do, every day.
And it is time consuming and expensive.
>> when it breaks it's not the developer answering
>> the phones if anyone answers at all.
>
> Tech support calls shouldn?t go straight to the developers under any
development model, short of sole proprietorship, and not even then, if you can
get away with it.  There needs to be at least one layer of buffering in there:
train up the secretary to some basic level of cluefulness, do everything via
email, or even hire some dedicated support staff.
>
> It simply costs too much to break a developer out of flow to allow a
customer to ring a bell on a developer?s desk at will.
Beg your pardon?   How about not breaking the things that trigger the
calls in the first place - or taking some responsibility for it.  Do
you think other people have nothing better to do?
> Since we?re contrasting with waterfall development processes that may last
many years, but not decades, I?d say the error has already been made if you?re
still working with a waterfall-based methodology today.
>
We never change more than half of a load-balenced set of servers at
once.  So all changes have to be compatible when running concurrently,
or worth rolling out a whole replacement farm.
>> some stuff can't be.
>
> Very little software must be developed in waterfall fashion.
If you run continuous services you either have to be able to run
new/old concurrently or completely duplicate your server farm as you
roll out incompatible clients.
> Last time I checked, this sort of software only accounted for about ~5% of
all software produced, and that fraction is likely dropping, with the moves
toward cloud services, open source software, subscription software, and
subsidized software.
>
> The vast majority of software developed is in-house stuff, where the
developers and the users *can* enter into an agile delivery cycle.
OK, but they have to not break existing interfaces when they do that.
 And that's not the case with OS upgrades.
>> If you are, say, adding up dollars, how many times do you want that
>> functionality to change?
>
> I?m not sure what you?re asking.
I'm asking if computer science has advanced to the point where adding
up a total needs new functionality, or if you would like the same
total for the same numbers that you would have gotten last year.   Or
more to the point, if the same program ran correctly last year,
wouldn't it be nice if it still ran the same way this year, in spite
of the OS upgrade you need to do because of the security bugs that
keep getting shipped while developers spend their time making
arbitrary changes to user interfaces.
> Compare a rolling release model like that of Cygwin or Ubuntu (not LTS). 
Something might break every few months, which sounds bad until you consider that
the alternative is for *everything* to break at the same time, every 3-7 years.
When your system requires extensive testing, the few times it breaks
the better.  Never would be nice...

>>> I don?t mean that glibly.  I mean you have made a fundamental
mistake if your system breaks badly enough due to an OS change that you can?t
fix it within an iteration or two of your normal development process.  The most
likely mistake is staffing your team entirely with people who have never been
through a platform shift before.
>>
>> Please quantify that.  How much should a business expect to spend per
>> person to re-train their operations staff to keep their systems
>> working across a required OS update?  Not to add functionality.  To
>> keep something that was working running the way it was?
>
> If you hire competent people, you pay zero extra to do this, because this
is the job they have been hired to do.
That's nonsense for any complex system.   There are always _many_
different OS versions in play and many different development groups
that only understand a subset, and every new change they need to know
about costs time and risks mistakes.
> That's pretty much what IT/custom development is: coping with churn.
And it is expensive.  Unnecessarily so, in my opinion.
>> How many customers for your service did you keep running non-stop
>> across those transitions?
>
> Most of our customers are K-12 schools, so we?re not talking about a 24/7
system to begin with.  K-12 runs maybe 9 hours a day (7am - 4pm), 5 days a week,
9 months out of the year.  That gives us many upgrade windows.
That's a very different scenario than a farm of data servers that have
to be available 24/7.
> We rarely change out hardware or the OS at a particular site.  We generally
run it until it falls over, dead.
>
> This means we?re still building binaries for EL3.
I have a few of those, but I don't believe that is a sane thing to
recommend.
> This also means our software must *remain* broadly portable.  When we talk
about porting to EL7, we don?t mean that it stops working on EL6 and earlier. 
We might have some graceful feature degradation where the older OS simply can?t
do something the newer one can, but we don?t just chop off an old OS because a
new one came out.
>
You'd probably be better off in java if you aren't already.
>>> Everyone?s moaning about systemd...at least it?s looking to be a
real de facto standard going forward.
>>
>> What you expect to pay to re-train operations staff -just- for this
>> change, -just- to keep things working the same..
>
> You ask that as if you think you have a no-cost option in the question of
how to address the churn.
I ask it as if I think that software developers could make changes
without breaking existing interfaces.   And yes, I do think they could
if they cared about anyone who built on those interfaces.
>> We've got lots of stuff that will drop into Windows server versions
>> spanning well over a 10 year range.
>
> Yes, well, Linux has always had a problem with ABI stability.  Apparently
the industry doesn?t really care about this, evidenced by the fizzling of LSB,
and the current attacks on the work at freedesktop.org.  Apparently we?d all
rather be fractious than learn to get along well enough that we can nail down
some real standards.
Well, that has done a great job of keeping Microsoft in business.
> I?ve never done much with Windows Server, but my sense is that they have
plenty of churn over in their world, too.  We?ve got SELinux and SystemD,
they?ve got UAC, SxS DLLs, API deprecation, and tools that shuffle positions on
every release.  (Where did they move the IPv4 configuration dialog this time?!)
>
> We get worked up here about things like the loss of 32-bit support, but
over in MS land, they get API-of-the-year.  JET, ODBC, OLE DB, or ADO?  Win32,
.NET desktop, Silverlight, or Metro?  GDI, WinG, DirectX, Windows Forms or XAML?
On and on, and that?s just if you stay within the MSDN walls.
Yes, there are changes - and sometimes mysterious breakage.  But an
outright abandonment of an existing interface that breaks previously
working code s pretty rare (and I don't like it when they do it
either...).
>> Were you paying attention when Microsoft wanted to make XP obsolete?
>> There is a lot of it still running.
>
> Were you paying attention when Target?s XP-based POS terminals all got
pwned?
>
> Stability and compatibility are not universal goods.
Well, some things you have to get right in the first place - and then
stability is good.
> Google already did that cost/benefit calculation: they tried staying on RH
7.1 indefinitely, and thereby built up 10 years of technical debt.  Then when
they did jump, it was a major undertaking, though one they apparently felt was
worth doing.
And conversely, they felt is was worth _not_ doing for a very very
long time.   So can the rest of us wait until we have google's
resources?
>> And why do you think it is a good thing
>> for this to be a hard problem or for every individual user to be
>> forced to solve it himself?
>
> I never said it was a good thing.  I?m just reporting some observations
from the field.
Maybe I misunderstood - I thought you were defending the status quo -
and the fedora developers that bring it to us.

-- 
   Les Mikesell
     lesmikesell at gmail.com

Warren Young

2014-Dec-31 17:03 UTC

head link

[CentOS] Design changes are done in Fedora

On Dec 29, 2014, at 10:07 PM, Les Mikesell <lesmikesell at gmail.com>
wrote:
> it's not necessary for either code interfaces or data structures
> to change in backward-incompatible ways.
You keep talking about the cost of coping with change, but apparently you
believe maintaining legacy interfaces is cost-free.

Take it from a software developer: it isn?t.

People praise Microsoft for maintaining ancient interfaces, and attribute their
success to it, but it?s really the other way around: their success pays for the
army of software developers it takes to keep a handle on the complexity that
results from piling 20-30 years of change on top of the same base.

Even having mobilized that army, a huge amount of the problems with Windows come
directly as a result of choosing to maintain such a huge legacy of backwards
compatibility.

Just one example: By default, anyone can write to the root of the C: drive on
Windows.  Why?  Because DOS and Win16 allowed it, so a huge amount of software
was written to expect that they could do it, too.  Hence, the root of your
Windows box?s filesystem is purposely left insecure.

Most organizations cannot afford to create the equivalents of WOW64, which
basically emulates Win32 on top of Win64.  (Or *its* predecessor, WOW, which
emulates Win16 on top of Win32.)  That isn?t trivial to do, especially at the
level Microsoft does it, where a whole lot of clever low-level code is employed
to allow WOW64 code to run nearly as fast as native Win64 code.

Meanwhile over in the Linux world, we have a whole lot of the code being written
by unpaid volunteers, and a lot of the rest is being written by developers
employed by organizations that do not enjoy a legal means for forcing their
customers to pay for each and every seat of the software their developers
created.

Result?  We cannot afford to maintain every interface created during the quarter
century of Linux?s existence.  Every now and then, we have to throw some ballast
overboard.

I?m not saying that CentOS should be killed off, and all its users be forced to
pay for RHEL licenses.  I?m saying that one of the trade-offs of using a free OS
is that you have to pick up some of the slack on your end.
>> If your software is DBMS-backed and a new feature changes the schema,
you can use one of the many available systems for managing schema versions.  Or,
roll your own; it isn?t hard.
> 
> Are you offering to do it for free?
This is one of the things my employer pays me to do.  This is what I?m telling
you: the job description is, ?Cope with change.?
> I'm asking if computer science has advanced to the point where adding
> up a total needs new functionality, or if you would like the same
> total for the same numbers that you would have gotten last year.
Mathematics doesn?t change.  The business and technology worlds do.  Your
example is a non sequitur.
>>> How many customers for your service did you keep running non-stop
>>> across those transitions?
>> 
>> Most of our customers are K-12 schools, so we?re not talking about a
24/7 system to begin with.
> 
> That's a very different scenario than a farm of data servers that have
> to be available 24/7.
How many single computers have to be up 24/7?  I mean really.

If you have any form of cluster ? from old-school shared-everything style to
new-style shared-nothing style ? you can partition it and upgrade individual
nodes.

If your system isn?t in use across the world, you must have windows of low or
zero usage where upgrades can happen.  If your system *is* in use across the
world, you likely have it partitioned across continents anyway.

The days of the critical single mainframe computer are fading fast.  We?re going
to get to a point where it makes as much sense to talk about 100% uptime for
single computers as it does to talk about hard drives that never fail.
>> We rarely change out hardware or the OS at a particular site.  We
generally run it until it falls over, dead.
>> 
>> This means we?re still building binaries for EL3.
> 
> I have a few of those, but I don't believe that is a sane thing to
recommend.
It depends on the market.  A lot of Linux boxes are basically appliances.  When
was the last time you upgraded the OS on your home router?  I don?t mean
flashing new firmware ? which is rare enough already ? I mean upgrading it to a
truly different OS.

Okay, so that?s embedded Linux, it doesn?t seem remarkable that such systems
never change, once deployed.

The thing is, there really isn?t a narrow, bright line between ?embedded? and
the rest of the Linux world.  It?s a wide, gray line, covering a huge amount of
the Linux world.
>> This also means our software must *remain* broadly portable.
> 
> You'd probably be better off in java if you aren't already.
If you actually had a basis for making such a sweeping prescription like that,
90% of software written would be written in Java.

There?s a pile of good reasons why software continues to be written in other
languages, either on top of other runtimes or on the bare metal.

No, don?t argue.  I don?t want to start a Java flame war here.  Just take it
from a software developer, Java is not a universal, unalloyed good.
>>>> Everyone?s moaning about systemd...at least it?s looking to be
a real de facto standard going forward.
>>> 
>>> What you expect to pay to re-train operations staff -just- for this
>>> change, -just- to keep things working the same..
>> 
>> You ask that as if you think you have a no-cost option in the question
of how to address the churn.
> 
> I ask it as if I think that software developers could make changes
> without breaking existing interfaces.   And yes, I do think they could
> if they cared about anyone who built on those interfaces.
Legacy code isn?t free to keep around.

Take systemd.  You can go two ways here:

1. sysvinit should also be supported as a first-class citizen in EL7.  If that?s
your point, then just because the sysvinit code was already written doesn?t mean
there isn?t a cost to continuing to maintain and package it.

2. sysvinit should never have been replaced.  If that?s your position, you?re
free to switch to a sysvinit based OS, or fork EL6.  What, sounds like work? 
Too costly?  That must be because it isn?t free to keep maintaining old code.
>> I?ve never done much with Windows Server, but my sense is that they
have plenty of churn over in their world, too.
> 
> Yes, there are changes - and sometimes mysterious breakage.  But an
> outright abandonment of an existing interface that breaks previously
> working code s pretty rare
Yes, well, that?s one of the things you can do when you?ve got a near-monopoly
on PC OSes, which allows you to employ 128,000 people. [1]

When you only employ 6,500 [2] and a huge chunk of your customer base doesn?t
pay you for the use of the software you write, you necessarily have to do
business differently.

[1] http://en.wikipedia.org/wiki/Microsoft
[2] http://en.wikipedia.org/wiki/Red_Hat
>>> Were you paying attention when Microsoft wanted to make XP
obsolete?
>>> There is a lot of it still running.
>> 
>> Were you paying attention when Target?s XP-based POS terminals all got
pwned?
>> 
>> Stability and compatibility are not universal goods.
> 
> Well, some things you have to get right in the first place - and then
> stability is good.
Security changes, too.

10 years ago, 2FA was something you only saw in high-security environments.

Today, I have two different 2FA apps on the phone in my pocket.  That phone is
protected by a biometric system, which protects access to a trapdoor secure data
store.  My *phone* does this.

The phone I had 10 years ago would let you hook a serial cable up and suck its
entire contents out without even asking you for a password.
>> Google already did that cost/benefit calculation: they tried staying on
RH 7.1 indefinitely, and thereby built up 10 years of technical debt.  Then when
they did jump, it was a major undertaking, though one they apparently felt was
worth doing.
> 
> And conversely, they felt is was worth _not_ doing for a very very
> long time.   So can the rest of us wait until we have google's
> resources?
You?re never going to have Google?s resources.  Therefore, you will never have
the *option* to roll your own custom OS.

So, cope with change.

Warren Young

2014-Dec-31 21:07 UTC

head link

[CentOS] Design changes are done in Fedora

On Dec 31, 2014, at 11:00 AM, m.roth at 5-cent.us wrote:
> Warren Young wrote:
>> 
>> How many single computers have to be up 24/7?
> 
> A hundred or more, here, individual servers, 24x7.
I?m more interested in a percentage than absolute values.

And I?m only interested in boxes that simply cannot go down for a bit of
maintenance every now and then.

As counterexamples, DNS, NTP, and SMTP servers are out, because these protocols
were explicitly designed to cope with short temporary outages.
> Home directory servers,
> backup servers, compute nodes, some of which have jobs that run for days,
> or a week or two, and that's not counting the clusters that do the
same...
> and mostly dump the data to home or project directories.
That?s all possible to work around.

Home servers: SAN design points the way.

Backup servers: Ditto if you mean home directory mirrors.  If you mean hot
failover nodes in a cluster, I already pointed out that clusters let you upgrade
via temporary partitioning.

Compute nodes: I didn?t ask how many boxes you have that share the same 9/5/180
usage pattern of our customers.  I asked how many you have that must run
24/7/365 or Bad Things happen.  When a job that?s been running for 2 weeks
finishes, there?s your maintenance window.  Take it if you need it, let it go if
you don?t.

Les Mikesell

2014-Dec-31 23:41 UTC

head link

[CentOS] Design changes are done in Fedora

On Wed, Dec 31, 2014 at 11:03 AM, Warren Young <wyml at etr-usa.com>
wrote:> On Dec 29, 2014, at 10:07 PM, Les Mikesell <lesmikesell at gmail.com>
wrote:
>
>> it's not necessary for either code interfaces or data structures
>> to change in backward-incompatible ways.
>
> You keep talking about the cost of coping with change, but apparently you
believe maintaining legacy interfaces is cost-free.
>
> Take it from a software developer: it isn?t.
OK, but should one developer make an extra effort or the bazillion
people affected by it?
> People praise Microsoft for maintaining ancient interfaces, and attribute
their success to it, but it?s really the other way around: their success pays
for the army of software developers it takes to keep a handle on the complexity
that results from piling 20-30 years of change on top of the same base.
That's what it takes to build and keep a user base.
> Most organizations cannot afford to create the equivalents of WOW64, which
basically emulates Win32 on top of Win64.  (Or *its* predecessor, WOW, which
emulates Win16 on top of Win32.)  That isn?t trivial to do, especially at the
level Microsoft does it, where a whole lot of clever low-level code is employed
to allow WOW64 code to run nearly as fast as native Win64 code.
It's hard to the extent that you made bad choices in interfaces in the
first place.  Microsoft's job was hard.  But Unix SysV which Linux
basically emulates wasn't so bad.  Maybe a few size definitions could
have been better.
> Result?  We cannot afford to maintain every interface created during the
quarter century of Linux?s existence.  Every now and then, we have to throw some
ballast overboard.
And the user base that depended on them.

>>> If your software is DBMS-backed and a new feature changes the
schema, you can use one of the many available systems for managing schema
versions.  Or, roll your own; it isn?t hard.
>>
>> Are you offering to do it for free?
>
> This is one of the things my employer pays me to do.  This is what I?m
telling you: the job description is, ?Cope with change.?
So either it "isn't hard",  or "you need a trained,
experienced,
professional staff to do it".   Big difference.  Which is it?
>> I'm asking if computer science has advanced to the point where
adding
>> up a total needs new functionality, or if you would like the same
>> total for the same numbers that you would have gotten last year.
>
> Mathematics doesn?t change.  The business and technology worlds do.  Your
example is a non sequitur.
If you are embedding business logic in your library interfaces,
something is wrong.    I'm talking about things that are shipped in
the distribution and the commands to manage them. The underlying jobs
they do were pretty well established long ago.
> How many single computers have to be up 24/7?  I mean really.
All of our customer-facing services - a nd most internal
infrastructure.  Admittedly, not individual boxes - but who wants to
have systems running concurrently with major differences in code base
and operations/maintenance procedures?
> If you have any form of cluster ? from old-school shared-everything style
to new-style shared-nothing style ? you can partition it and upgrade individual
nodes.
Yes, everything is redundant.  But when changes are not backwards
compatible it makes piecemeal updates way harder than they should be.
 Take something simple like the dhcp server in the disto.    It allows
for redundant servers - but the versions are not compatible.   How do
you manage that by individual node upgrades when they won't fail over
to each other?
> If your system isn?t in use across the world, you must have windows of low
or zero usage where upgrades can happen.  If your system *is* in use across the
world, you likely have it partitioned across continents anyway.
How nice for you...
>>> This means we?re still building binaries for EL3.
>>
>> I have a few of those, but I don't believe that is a sane thing to
recommend.
>
> It depends on the market.  A lot of Linux boxes are basically appliances. 
When was the last time you upgraded the OS on your home router?  I don?t mean
flashing new firmware ? which is rare enough already ? I mean upgrading it to a
truly different OS.
>
> Okay, so that?s embedded Linux, it doesn?t seem remarkable that such
systems never change, once deployed.
Which sort of points out that the wild and crazy changes in the
mainstream distributions weren't all that necessary either...
>>> This also means our software must *remain* broadly portable.
>>
>> You'd probably be better off in java if you aren't already.
>
> If you actually had a basis for making such a sweeping prescription like
that, 90% of software written would be written in Java.
I do.  We have a broad mix of languages, some with requirements that
force it, some just for historical reasons and the team that maintains
it.   The java stuff has been much less problematic in porting across
systems - or running the same code concurrently under different
OS's/versions at once.    I don't think the C++ guys have even figured
out a sane way to use a standard boost version on 2 different Linux's,
even doing separate builds for them.
> There?s a pile of good reasons why software continues to be written in
other languages, either on top of other runtimes or on the bare metal.
Maybe.  I think there's a bigger pile of not-so-good reasons that
things aren't done portably.   Java isn't the only way to be portable,
but you don't see much on the scale of elasticsearch, jenkins or
opennms done cross-platform in other languages.
> No, don?t argue.  I don?t want to start a Java flame war here.  Just take
it from a software developer, Java is not a universal, unalloyed good.
The syntax is cumbersome - but there are things like groovy or jruby
that run on top of it.   And there's a lot of start-up overhead, but
that doesn't matter much to long-running servers.
> Take systemd.  You can go two ways here:
>
> 1. sysvinit should also be supported as a first-class citizen in EL7.  If
that?s your point, then just because the sysvinit code was already written
doesn?t mean there isn?t a cost to continuing to maintain and package it.
>
> 2. sysvinit should never have been replaced.  If that?s your position,
you?re free to switch to a sysvinit based OS, or fork EL6.  What, sounds like
work?  Too costly?  That must be because it isn?t free to keep maintaining old
code.
Yes, I'm forced to deal with #1.   That doesn't keep me from wishing
that whatever code change had been done had kept backwards
compatibility in the user interface commands and init scripts
department.
>>> I?ve never done much with Windows Server, but my sense is that they
have plenty of churn over in their world, too.
>>
>> Yes, there are changes - and sometimes mysterious breakage.  But an
>> outright abandonment of an existing interface that breaks previously
>> working code s pretty rare
>
> Yes, well, that?s one of the things you can do when you?ve got a
near-monopoly on PC OSes, which allows you to employ 128,000 people. [1]
And you only get that with code that keeps users instead of driving them away.
>> And conversely, they felt is was worth _not_ doing for a very very
>> long time.   So can the rest of us wait until we have google's
>> resources?
>
> You?re never going to have Google?s resources.  Therefore, you will never
have the *option* to roll your own custom OS.
>
> So, cope with change.
What google does points out how unsuitable the distro really is.    I
just don't see why it has to stay that way.

-- 
   Les Mikesell
     lesmikesell at gmail.com

Maybe Matching Threads

Search for more reasonably related threads

CentOS - Dec 2014 - Design changes are done in Fedora

[CentOS] Design changes are done in Fedora

[CentOS] Design changes are done in Fedora

[CentOS] Design changes are done in Fedora

[CentOS] Design changes are done in Fedora

Maybe Matching Threads