Hi All After upgrading to 7.2, I'm getting an immediate kernel panic on boot Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine How can I go about diagnosing the problem here? thanks Duncan
On 12/03/2015 10:29 AM, Duncan Brown wrote: initramfs is missing... check if /boot/initramfs-{kernelversion}.img is correctly there, if not do a "yum reinstall kernel-{version}" and it should be ok !> Hi All > > After upgrading to 7.2, I'm getting an immediate kernel panic on boot > > Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine > > How can I go about diagnosing the problem here? > > thanks > > Duncan > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos-- (?- Bernard Lheureux Gestionnaire des MailingLists ML, TechML, LinuxML //\ http://www.bbsoft4.org/Mailinglists.htm ** MailTo:root at bbsoft4.org v_/_ http://www.bbsoft4.org/ <<<<<< * >>>>>> http://www.portalinux.org/
On 03/12/2015 09:40, Bernard Lheureux wrote:> On 12/03/2015 10:29 AM, Duncan Brown wrote: > initramfs is missing... > check if /boot/initramfs-{kernelversion}.img is correctly there, if > not do a "yum reinstall kernel-{version}" and it should be ok ! > >> Hi All >> >> After upgrading to 7.2, I'm getting an immediate kernel panic on boot >> >> Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine >> >> How can I go about diagnosing the problem here? >> >> thanks >> >> Duncan >> >No joy unfortunately, the correct initramfs is there I tried reinstalling just in case, but no change thanks for the reply Duncan
On 03/12/15 09:40, Bernard Lheureux wrote:> On 12/03/2015 10:29 AM, Duncan Brown wrote: > initramfs is missing... > check if /boot/initramfs-{kernelversion}.img is correctly there, if not > do a "yum reinstall kernel-{version}" and it should be ok !You might want to also check there is enough diskspace for the initrd to be built and hosted in the right place.. -- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc
On Thu, Dec 03, 2015 at 09:29:21AM +0000, Duncan Brown wrote:> > Hi All > > After upgrading to 7.2, I'm getting an immediate kernel panic on boot > > Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine > > How can I go about diagnosing the problem here?It'd probably help if you could give us more details on the kernel panic. Can you see where it is panicking? Does it happen during the kernel/initrd stage or later during boot? I suggest installing the kdump service if it is panicking later in boot, you might be able to capture a kernel dump which makes debugging these things a lot easier. Otherwise, I suggest trying to capture the panic message some other way. -- Jonathan Billings <billings at negate.org>
On 03/12/2015 13:33, Jonathan Billings wrote:> On Thu, Dec 03, 2015 at 09:29:21AM +0000, Duncan Brown wrote: >> Hi All >> >> After upgrading to 7.2, I'm getting an immediate kernel panic on boot >> >> Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine >> >> How can I go about diagnosing the problem here? > It'd probably help if you could give us more details on the kernel > panic. > > Can you see where it is panicking? Does it happen during the > kernel/initrd stage or later during boot? > > I suggest installing the kdump service if it is panicking later in > boot, you might be able to capture a kernel dump which makes debugging > these things a lot easier. Otherwise, I suggest trying to capture the > panic message some other way. >The last message before it is "switching to clocksource hpet" Then the panic scrolls by I've no idea if that counts as later or not
On 12/03/2015 04:24 PM, Phelps, Matthew wrote:> ... ton of work that we have to do for each new release, and we have > depended in the past on the versions matching the RHEL ones. Now, they > don't, and that's wrong.I would respectfully disagree here, in that my opinion is that relying on any distribution minor version number in the first place is what is wrong. (And I think that regardless of which distribution we're talking about....) I honestly wish Red Hat would have stuck to the 'XupdateY' format that they started with, as that is more correct. The update rollup number is not a minor version number in the strict sense of the word, at least IMO. Heh, I am waiting to see if the differences between RHEL 7.2 and RHEL 7.3 will be as large as the differences were between RHL 7.2 and RHL 7.3 back in the day......
On 12/03/2015 11:01 AM, m.roth at 5-cent.us wrote:> Sorry, you seem to not have dealt with enough managers who only really > know Windows, or other divisions (esp. ones that are 95% Windows) who > require documentation, etc.... I can live with the x.y.yymm, but not > showing the relation to upstream is annoying.So tell the Win-centric managers that this is 'CentOS 7 Service Pack 2' or 'CentOS 7 Update Rollup for 11/2015' and they will understand what you mean (and it is an accurate comparison, and was what upstream did once upon a time by calling it 'Version X update Y.' Heh, the latest Windows 10 build is actually referred to as the '1511' version. What's really annoying is the thought that we're going to have the same gripes on the mailing list every six to seven months or so.
On Thu, Dec 3, 2015 at 5:05 PM, Lamar Owen <lowen at pari.edu> wrote:> On 12/03/2015 11:01 AM, m.roth at 5-cent.us wrote: > >> Sorry, you seem to not have dealt with enough managers who only really >> know Windows, or other divisions (esp. ones that are 95% Windows) who >> require documentation, etc.... I can live with the x.y.yymm, but not >> showing the relation to upstream is annoying. >> > So tell the Win-centric managers that this is 'CentOS 7 Service Pack 2' or > 'CentOS 7 Update Rollup for 11/2015' and they will understand what you mean > (and it is an accurate comparison, and was what upstream did once upon a > time by calling it 'Version X update Y.' Heh, the latest Windows 10 build > is actually referred to as the '1511' version. > > What's really annoying is the thought that we're going to have the same > gripes on the mailing list every six to seven months or so. > >Maybe that's because of a bad decision that affects a lot of users in ways that were never imagined. The reason I gripe about it every new RHEL release is because I want CentOS to change back. The people who actually have to deal with the ramifications of this decision were not involved in it. There was never a call for feedback on this list. We are not developers, and don't have time to read the developer lists where this decision was made. How can we possibly lobby to change it back? We can't use IRC (where a lot of the CentOS folks seem to think they can be "available"). because we're in an *enterprise* environment that forbids it. We aren't developers. We're not on the board. And don't ask us to "get involved"; we don't have time! I have hundreds of machines, our own private copy of the mirrors, and lots of postinstall scripts. The "version number" is important to maintaining this environment, especially in a mixed version and distro environment. OK, I'm done griping. Until the next RHEL release, that is :) -- Matt Phelps System Administrator, Computation Facility Harvard - Smithsonian Center for Astrophysics mphelps at cfa.harvard.edu, http://www.cfa.harvard.edu
On 12/3/2015 2:05 PM, Lamar Owen wrote:> Heh, the latest Windows 10 build is actually referred to as the '1511' > version.yet it returns... C:\> ver Microsoft Windows [Version 10.0.10586] go figger. -- john r pierce, recycling bits in santa cruz
On Thu, December 3, 2015 4:05 pm, Lamar Owen wrote:> On 12/03/2015 11:01 AM, m.roth at 5-cent.us wrote: >> Sorry, you seem to not have dealt with enough managers who only really >> know Windows, or other divisions (esp. ones that are 95% Windows) who >> require documentation, etc.... I can live with the x.y.yymm, but not >> showing the relation to upstream is annoying. > So tell the Win-centric managers that this is 'CentOS 7 Service Pack 2' > or 'CentOS 7 Update Rollup for 11/2015' and they will understand what > you mean (and it is an accurate comparison, and was what upstream did > once upon a time by calling it 'Version X update Y.' Heh, the latest > Windows 10 build is actually referred to as the '1511' version. >This scared me to death. For a split second I thought "What, are my CentOS 7 workstations Windows 10 already?" What a nightmarish thought! Luckily just my wild imagination ;-) Valeri ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 12/04/2015 08:02 AM, mark wrote:> No, *you* don't understand what we're saying: pre-systemd, if the o/p > saw that one stmt before the panic, they could look at what the system > was doing *sequentially*, and so have an idea what it was failing on. > With systemd's parallelism, we have no clue, other than what it's > done, and no idea what's happening that's failing. >It has never been true that a kernel panic was necessarily caused by the immediately preceding step in a sequential init. I ran into one instance (back in 4.x days, incidentally, where 'x' was 1 or 2) where a panic was caused by the tg3 driver, but it wasn't tickled until a variable number of packets passed the interface, and it didn't happen very often. Typically, when it happened it happened during ssh startup (almost every time it occurred, in fact). But the root cause was the tg3 driver module, not sshd. So having the last line before the panic being the ssh startup was actually a hindrance rather than a help in that case; I would have been looking for an sshd problem that didn't actually exist. I don't think that's an isolated instance, either. You need the module information from the panic more than information on what was started immediately prior to the panic. This was fixed without me having to file a bug report, incidentally, and so there is no BZ # to point you to that I recall, and a quick search of bugzilla doesn't show one for that particular issue that I had. I ended up seeing that it was a tg3 problem after setting up a serial console and grabbing the panic output from that. By the time I got to that point, the next update rollup for CentOS 4 was coming down, and that was the end of that problem. I keep thinking I'll track down the panic I saw a few months ago with CentOS 7 and gkrellm on my hardware, but by the time I get enough 'round toits' to do the troubleshooting the kernel has been updated, and I have to wait on the debuginfo.....lather, rinse, repeat. Eventually I'll get my timing right and see what is (or maybe is not) happening.