Halo Valeri, let's think about what a hpc cluster is for. Second, one should always ask the question where security is to be applied,then one can come to the following decision: - The firewall is placed in front of the cluster. - After you have found a safe base for this, you freeze it. - We have a rsync of centos and epel on the head node.>From here, we can always reinstall a node (tftp / http)- To relieve the internal network printing, I create rpm packages that are installed on the nodes. All this happened about 3 years ago. Centos 1511 was established as a stable variant for the environment. It was one of my many different tasks in my work. The physicists and mathematicians who count there need high durations. My decision much on Centos because: - free - Maintaining until the year 2024, longer than the cluster will live. My way in the beginning was hard, because I had to learn everything from the scratch and I'm no longer the youngest, but my feeling gave me right. Sincerely Andy
> let's think about what a hpc cluster is for. > Second, one should always ask the question where security is to be > applied,+1 You have to assess your environment and weigh up the benefits of uptime vs security. Sometimes the security that is fixed in a new kernel is inconsequential in your environment; sometimes the external security on your network is such that the attack vector is tiny. You make a judgement based on your needs.> > > The physicists and mathematicians who count there need high durations.Yes. I too run HPC clusters and I have had uptimes of over 1000 days - clusters that are turned on when they are delivered and turned off when they are obsolete. It is crucial for long running calculations that you have a stable OS - you have never seen wrath like a computational scientist whose 200 day calculation has just failed because you needed to reboot the node it was running on.> > My decision much on Centos because: > > - free > - Maintaining until the year 2024, > longer than the cluster will live. >And stability ... P.
On Sun, Jul 16, 2017 at 06:02:15PM +0100, Pete Biggs wrote:> > > > The physicists and mathematicians who count there need high durations. > > Yes. I too run HPC clusters and I have had uptimes of over 1000 days - > clusters that are turned on when they are delivered and turned off when > they are obsolete. It is crucial for long running calculations that you > have a stable OS - you have never seen wrath like a computational > scientist whose 200 day calculation has just failed because you needed > to reboot the node it was running on.I too was a HPC admin, and I knew people who believed the above, and their clusters were compromised. You're running a service where the weakest link are the researchers who use your cluster -- they're able to run code on your nodes, so local exploits are possible. They often have poor security practices (share passwords, use them for multiple accounts). Also, if your researchers can't write code that performs checkpoints, they're going to be awfully unhappy when a bug in their code makes it segfault 199 days into a 200 day run. Scheduled downtime and rolling cluster upgrades is a necessity of HPC cluster administration. I do wish that the ksplice/kpatch stuff was available in CentOS. -- Jonathan Billings <billings at negate.org>
On 07/16/2017 12:30 PM, Andreas Benzler wrote:> - The firewall is placed in front of the cluster. > - After you have found a safe base for this, you freeze it.Sorry, but this statement really urks me in a wrong way. Why do you think a firewall is the ONLY part that needs to be provide security? That's the way I read this statement - that it doesn't matter anywhere else. In addition, the majority of attacks and compromises come from INSIDE the firewall - ie. the "wannacry" and similar attacks are all distributed via email, executed on a local workstation and it propagates from there - your external firewall is not even hit before your servers/cluster is scanned. Another aspect here is all the other stuff outside the kernel. Even if you do "yum update" frequently if you don't restart, there are several daemons and features of your system that doesn't get patched - the code is in memory and changing the disk has no effect at all. Bottom line is, I would not be proud of tripple digit single server uptimes. It simply tells me, I can find lots of ways in - not that you're running a rock solid setup. -- Regards, Peter Larsen
On Thu, July 20, 2017 8:07 am, Peter Larsen wrote:> On 07/16/2017 12:30 PM, Andreas Benzler wrote: >> - The firewall is placed in front of the cluster. >> - After you have found a safe base for this, you freeze it. > > Sorry, but this statement really urks me in a wrong way. Why do you > think a firewall is the ONLY part that needs to be provide security? > That's the way I read this statement - that it doesn't matter anywhere > else. In addition, the majority of attacks and compromises come from > INSIDE the firewall - ie. the "wannacry" and similar attacks are all > distributed via email, executed on a local workstation and it propagates > from there - your external firewall is not even hit before your > servers/cluster is scanned.I will second that. I personally run servers under assumption that bad guys are already inside. Doesn't negate other measures as firewall, brute force attack protection etc. But I've seen bad guys attempting to elevate privileges (unsuccessfully) twice during last over decade and a half. Both times I thanked myself for taking appropriate security measures. I am really unimpressed how MicroSoft's misconception "safe internal network" became widely spread over allegedly much more intelligent community which Linux community is (or should be). There is nothing safe on the network for me if: 1. there is at least one computer on this network which is installed and maintained not by me (assuming all machines I maintained are secured appropriately, include here sysadmins who do the same) 2. there is at least one user except for me (and my mate sysadmins who are same security aware as hopefully I am) In other words: if you are sysadmin, paranoia is one of the words in your job description. I really find it difficult to have people take it to their hearts (except sysadmins who _had_ an incident, and had to sweep up after that, and had to tell their users that machine/cluster he administers was hacked and why). I hope, this helps someone. Valeri> > Another aspect here is all the other stuff outside the kernel. Even if > you do "yum update" frequently if you don't restart, there are several > daemons and features of your system that doesn't get patched - the code > is in memory and changing the disk has no effect at all. > > Bottom line is, I would not be proud of tripple digit single server > uptimes. It simply tells me, I can find lots of ways in - not that > you're running a rock solid setup. > > -- > Regards, Peter Larsen > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++