We have Xeon IA32 dual-processor servers running Centos 3.5 in an HPC batch-only compute grid configuration . We have yum update operating automatically with default updates being applied weekly. Because of the workload pattern of long-runing jobs, the servers tend to stay up without a reboot for very long periods. Recently, yum installed an updated kernel 2.4.21-32.0.1.ELsmp; when we got around to rebooting, we found that some of the machines were running the uniprocessor kernel 2.4.21-32.0.1.EL , showing only a single cpu. The grub.conf file had been modified in the usual pushdown manner but the default kernel had been set at #2 instead of #0. Bizzarely, some of the systems DID boot the upgraded SMP kernel as expected. Here is the grub.conf from an affected server: default=2 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title CentOS (2.4.21-32.0.1.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-32.0.1.ELsmp ro root=LABEL=/ initrd /initrd-2.4.21-32.0.1.ELsmp.img title CentOS (2.4.21-32.0.1.EL) root (hd0,0) kernel /vmlinuz-2.4.21-32.0.1.EL ro root=LABEL=/ initrd /initrd-2.4.21-32.0.1.EL.img title CentOS-3 (2.4.21-32.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-32.ELsmp ro root=LABEL=/ initrd /initrd-2.4.21-32.ELsmp.img title CentOS-3-up (2.4.21-32.EL) root (hd0,0) kernel /vmlinuz-2.4.21-32.EL ro root=LABEL=/ initrd /initrd-2.4.21-32.EL.img Changing the default back to 0 has no effect, it still boots the 2.4.21-32.0.1.EL kernel and not the required SMP one. However, if we use the interactive GRUB boot menu & select the correct kernel interactively, it then boots SMP OK with both processors and all memory available. I tried the obvious ploy of removing the last three kernel entries in grub.conf & setting default=0 but it still manages to boot the 2.4.21-32.0.1.EL UP kernel even though it is no longer in the kernel menu list. We think we will disable automatic yum kernel updates in future , but meanwhile, has anyone any suggestions or experiences to share on this apart from a complete re-install of each affected node? Les Oswald -------------- next part -------------- A non-text attachment was scrubbed... Name: L.Oswald.vcf Type: text/x-vcard Size: 354 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20050920/f66d8a94/attachment.vcf>
Dr R L Oswald wrote:> Recently, yum installed an updated kernel 2.4.21-32.0.1.ELsmp; when we > got around to rebooting, we found that some of the machines were running > the uniprocessor kernel 2.4.21-32.0.1.EL , showing only a single cpu. > The grub.conf file had been modified in the usual pushdown manner but > the default kernel had been set at #2 instead of #0.I had an extremely similar thing happen to me on a RHEL3 box when I applied update 5. I didn't notice it for a few days so I just assumed I had made a mistake, but your post inclines me to believe its a bug in the upgrade scripts. -jim
Good Day Les, First off, I tried a reply off list, but your mailbox is not responding to direct input. My appologies to the list for an off topic reply... I'm sorry I can't help with your question, but I did want to ask since you appear to be running a similar setup as I have here with the dual Xeons. Might I ask what software are you running on the machines? I'm looking for someone with a handle on the WRF or MM5 numerical model that has gotten the thing to compile with either the Intel compiler or with the Portland group compiler with either/and / or both models. I have not switched to the yum update for automatic software updates, rather staying with the up2date as it came on installation. I'm very new to CentOS and have not done much in the way of updating except what comes down the tubes from RH. So far, I've not seen any kernal updates to date... Regards, Sam -- Snowman
hey, Dr R L Oswald wrote:> Recently, yum installed an updated kernel 2.4.21-32.0.1.ELsmp; when we > got around to rebooting, we found that some of the machines were running > the uniprocessor kernel 2.4.21-32.0.1.EL , showing only a single cpu.Easy workaround = just yum erase the UP kernel package, that way the system can only come back with a SMP kernel.> The grub.conf file had been modified in the usual pushdown manner but > the default kernel had been set at #2 instead of #0.Do you have a sample from 'before' the update ? also what version of mkinitrd do you have installed on these machines ? Was that updated at the same time as the kernel ? Also, what does /etc/redhat-release say ?> Here is the grub.conf from an affected server: > default=2 > timeout=10 > splashimage=(hd0,0)/grub/splash.xpm.gz> Changing the default back to 0 has no effect, it still boots the > 2.4.21-32.0.1.EL kernel and not the required SMP one. However, if we usedisable the splashimage, and reboot the machine with default=0, what kernel version is highlighted as the default ?> I tried the obvious ploy of removing the last three kernel entries in > grub.conf & setting default=0 but it still manages to boot the > 2.4.21-32.0.1.EL UP kernel even though it is no longer in the kernel > menu list.Are you sure the grub.conf you are editing is indeed the one that is being used ? ( should be the /boot/grub/grub.conf file ) What does 'parted <bootdev> print' say ?> We think we will disable automatic yum kernel updates in future , but > meanwhile, has anyone any suggestions or experiences to share on this > apart from a complete re-install of each affected node?I would suggest you provide some more info, and also try to reinstall grub. At the very least grub should accomodate changes being made in the /boot/grub/grub.conf file. fwiw, I've tried to reproduce this issue here on a CentOS3/i386 SMP machine [1] and am unable to do so. The kernel update installs and sets up grub.conf fine. - K [1] CentOS 3.4 install and yum update from there.
Mystery Solved! Karanbir Singh gave a clue by suggesting the dsiabling of the splash screen at which point the magic word "LILO" flashed up briefly on the screen. It appears that the machines which were affected all had lilo installed but not configured as bootloader. These machines are all configured identically using a kickstart script. This script does not have lilo enabled in the bootloader. How they actually came to have lilo installed is a bit of a mystery as all the rest plainly do not have it. However when the kernel update was installed by yum, it looked for a bootloader & picked on lilo instead of grub in machines with lilo installed. I the log files of affected systems: Kernel Updated/Installed, checking for bootloader Lilo found - adding kernel to lilo and making it the default Thanks for help Les -------------- next part -------------- A non-text attachment was scrubbed... Name: L.Oswald.vcf Type: text/x-vcard Size: 354 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20050921/7e894bec/attachment.vcf>
Dr R L Oswald wrote:> Mystery Solved! > Karanbir Singh gave a clue by suggesting the dsiabling of the splash > screen at which point the magic word "LILO" flashed up briefly on the > screen. > > It appears that the machines which were affected all had lilo installed > but not configured as bootloader. These machines are all configuredIf the word LILO pop'ed up - then it is setup to be the bootloader. And would explain why changes to grub.conf were having no effect.> identically using a kickstart script. This script does not have lilo > enabled in the bootloader. How they actually came to have lilo installed > is a bit of a mystery as all the rest plainly do not have it. Howevertry "rpm -qa --last" That should give you an idea as to when the packages were installed.> when the kernel update was installed by yum, it looked for a bootloader > & picked on lilo instead of grub in machines with lilo installed. > > I the log files of affected systems: > > Kernel Updated/Installed, checking for bootloader > Lilo found - adding kernel to lilo and making it the defaultIf you are interested in the process, look at /sbin/new-kernel-pkg - thats the script called to install / remove config's for a kernel package. - K -- Karanbir Singh : http://www.karan.org/ : 2522219 at icq