FYI, a follow-up on my stability issues with RHEL4-U2 on the Dell 2850 server... It turns out that the system apparently wasn't locking up or crashing, just the console video was getting scrambled, making access at the console impossible. I was able to ssh in from the network and everything appeared to be running. I was also able to duplicate the problem on a second identical server by swapping the drives. I loaded the latest ATI video driver from Dell's site (dated October xx, 2005), and that appears to have solved the problem. So, it appears that the ATI video driver that comes with RHEL4-U2 has some compatibility problems with the on-board ATI video on the Dell 2850 server. I'll be able to start the real testing now... --Peter Sunil Mushran wrote:> Sanjeet, > > Have you encountered/heard anything like this? This could > be a hardware/setup issue with X on the 2850. But that's > a guess. > > Sunil > > ------------------------------------------------------------------------ > > Subject: > Re: [Ocfs2-users] OCFS2 Installation woes > From: > Peter Sylvester <peters@mitre.org> > Date: > Mon, 17 Oct 2005 18:40:30 -0400 > To: > Sunil Mushran <Sunil.Mushran@oracle.com> > > To: > Sunil Mushran <Sunil.Mushran@oracle.com> > CC: > ocfs2-users@oss.oracle.com > > > I seem to have some system stability problems... > > After installing RHEL4 Update-2 (on Dell 2850 server, dual CPU, RAID, > 4GB) and OCFS2 1.0.7 and getting a volume formatted and mounted I left > for the weekend. The server was locked up Monday AM when I came in - > no response to ping, "no video sync" reported by the KVM switch. > > I attempted to reboot and sometimes would boot, but most often would > go through all the motions, up to the HAL loading, then instead of the > X-Windows login screen would get some random colors at the top of the > screen. > Nothing seems to get reported to the /var/log/messages file. > > I tried disabling the OCFS2 service and mount, but still have > stability problems at boot with X. > I disabled X (set inittab to use run level 3) and I can boot with or > without OCFS2. > But, without OCFS2 I can run the "startx" and have the X-Windows > environment come up. > If OCFS2 is running (loaded + mounted) and I run startx I get the > crash with the colors at the top of the screen. > > Suggestions? > > thanks, > --Peter > > > Sunil Mushran wrote: >> /etc/ocfs2/cluster.conf holds list of nodes in the cluster. >> /etc/sysconfig/o2cb holds the "default" cluster name which >> is picked by the o2cb service. >> >> The latter is updated during: >> /etc/init.d/o2cb configure >> >> Appears the cluster name in the two do not match. >> >> What difference did you notice? >> >> Peter Sylvester wrote: >> >>> Here is some feedback on my OCFS2 installation... >>> >>> You were correct about selinux. I was able to disable it by >>> specifying "SELINUX=disabled" in /etc/selinux/config and rebooting. >>> I was then able to run the console tool correctly. >>> >>> I did stumble on another issue, though. It appears that the console >>> tool hard codes the cluster name as "ocfs2", but the cluster service >>> seems to be looking for cluster "racdb", and errors out when >>> attempting to bring online. I hand edited the cluster.conf file and >>> changed the cluster name to "racdb" and everything then appeared to >>> work. >>> >>> Also, I noticed that the cluster.conf configuration is a bit >>> different from that shown in the User's Guide documentation. >>> >>> I was able to mkfs.ocfs2 and mount the partition and it appears to >>> work. I'll plan on doing some testing next week to stress it out a >>> bit. >>> >>> --Peter >>> >>> Sunil Mushran wrote: >>> >>>> You appear to have selinux enabled which is missing policies for >>>> configfs/ocfs2. >>>> We are still investigating the issue, but one quick solution would >>>> be to disable selinux. :) >>>> >>>> Peter Sylvester wrote: >>>> >>>>> I've got a fresh RHEL AS 4-U2 installation on a Dell PE2850 server. >>>>> >>>>> I downloaded and installed the latest RPMs: >>>>> ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm >>>>> ocfs2-tools-1.0.2-1.i386.rpm >>>>> ocfs2console-1.0.2-1.i386.rpm >>>>> >>>>> I was able to start the console, but when I try to run >>>>> cluster->configure_nodes, I get the following error message: >>>>> Could not start cluster stack. This must be resolved before any >>>>> OCFS2 filesystem can be mounted. >>>>> >>>>> I tried to start the service via the command line: >>>>> /etc/init.d/o2cb load >>>>> >>>>> And got the error: >>>>> Mounting configfs filesystem at /config: mount: block device >>>>> configfs is write-protected, mounting read-only >>>>> mount: cannot mount block device configfs read-only >>>>> Unable to mount configfs filesystem >>>>> Failed >>>>> >>>>> I tried mounting /config myself via the following, which worked >>>>> (not sure if thats how it *should* be done though): >>>>> mount -t configfs none /config >>>>> >>>>> Now o2cb load puts out the following message: >>>>> Loading module "ocfs2_nodemanager": Unable to load module >>>>> "ocfs2_nodemanager" >>>>> Failed >>>>> >>>>> And an o2cb status shows: >>>>> Module "configfs": Loaded >>>>> Filesystem "configfs": Mounted >>>>> Module "ocfs2_nodemanager": Not loaded >>>>> Module "ocfs2_dlm": Not loaded >>>>> Module "ocfs2_dlmfs": Not loaded >>>>> Filesystem "ocfs2_dlmfs": Not mounted >>>>> >>>>> I tried creating a /etc/ocfs2/cluster.conf myself (using vi), but >>>>> it did not seem to have any effect. >>>>> I tried a fresh reboot but that did not seem to change anything. >>>>> Also note that all commands were run as root, and I am just trying >>>>> to get a single node with local disk working. >>>>> >>>>> There were some error messages reported in /var/log/messages, of >>>>> the form: >>>>> Oct 13 17:02:56 dblinux1 kernel: SELinux: initialized (dev >>>>> configfs, type configfs), not configured for labeling >>>>> Oct 13 17:02:56 dblinux1 kernel: audit(1129237376.191:5): avc: >>>>> denied { mount } for pid=14922 comm="mount" name="/" >>>>> dev=configfs ino=70286 scontext=root:system_r:initrc_t >>>>> tcontext=system_u:object_r:unlabeled_t tclass=filesystem >>>>> >>>>> Also have some errors of form: >>>>> Oct 13 18:03:49 dblinux1 dbus: Can't send to audit system: >>>>> USER_AVC pid=2587 uid=81 loginuid=-1 message=avc: denied { >>>>> send_msg } for scontext=user_u:system_r:unconfined_t >>>>> tcontext=user_u:system_r:initrc_t tclass=dbus >>>>> >>>>> And this one: >>>>> Oct 13 17:46:36 dblinux1 kernel: OCFS2 Node Manager 1.0.7 Wed Oct >>>>> 12 13:18:42 PDT 2005 (build 6cb35edfedddf6b4d606b95f2579cb39) >>>>> Oct 13 17:46:36 dblinux1 kernel: audit(1129239996.953:8): avc: >>>>> denied { mount } for pid=3903 comm="modprobe" name="/" >>>>> dev=configfs ino=10949 scontext=root:system_r:initrc_t >>>>> tcontext=system_u:object_r:unlabeled_t tclass=filesystem >>>>> Oct 13 17:46:36 dblinux1 kernel: nodemanager: Registration >>>>> returned -13 >>>>> Oct 13 17:46:36 dblinux1 modprobe: FATAL: Error inserting >>>>> ocfs2_nodemanager >>>>> (/lib/modules/2.6.9-22.ELsmp/kernel/fs/ocfs2/ocfs2_nodemanager.ko): >>>>> Permission denied >>>>> >>>>> Any clues as to what I should do I try from here???? >>>>> >>>>> thanks, >>>>> --Peter >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-users mailing list >>>>> Ocfs2-users@oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>> >>>> >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users@oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20051103/2086e673/attachment.html
Thanks for the update. :) Peter Sylvester wrote:> FYI, a follow-up on my stability issues with RHEL4-U2 on the Dell 2850 > server... > > It turns out that the system apparently wasn't locking up or crashing, > just the console video was getting scrambled, making access at the > console impossible. I was able to ssh in from the network and > everything appeared to be running. I was also able to duplicate the > problem on a second identical server by swapping the drives. > > I loaded the latest ATI video driver from Dell's site (dated October > xx, 2005), and that appears to have solved the problem. > So, it appears that the ATI video driver that comes with RHEL4-U2 has > some compatibility problems with the on-board ATI video on the Dell > 2850 server. > > I'll be able to start the real testing now... > > --Peter > > > Sunil Mushran wrote: > >> Sanjeet, >> >> Have you encountered/heard anything like this? This could >> be a hardware/setup issue with X on the 2850. But that's >> a guess. >> >> Sunil >> >> ------------------------------------------------------------------------ >> >> Subject: >> Re: [Ocfs2-users] OCFS2 Installation woes >> From: >> Peter Sylvester <peters@mitre.org> >> Date: >> Mon, 17 Oct 2005 18:40:30 -0400 >> To: >> Sunil Mushran <Sunil.Mushran@oracle.com> >> >> To: >> Sunil Mushran <Sunil.Mushran@oracle.com> >> CC: >> ocfs2-users@oss.oracle.com >> >> >> I seem to have some system stability problems... >> >> After installing RHEL4 Update-2 (on Dell 2850 server, dual CPU, RAID, >> 4GB) and OCFS2 1.0.7 and getting a volume formatted and mounted I >> left for the weekend. The server was locked up Monday AM when I came >> in - no response to ping, "no video sync" reported by the KVM switch. >> >> I attempted to reboot and sometimes would boot, but most often would >> go through all the motions, up to the HAL loading, then instead of >> the X-Windows login screen would get some random colors at the top of >> the screen. >> Nothing seems to get reported to the /var/log/messages file. >> >> I tried disabling the OCFS2 service and mount, but still have >> stability problems at boot with X. >> I disabled X (set inittab to use run level 3) and I can boot with or >> without OCFS2. >> But, without OCFS2 I can run the "startx" and have the X-Windows >> environment come up. >> If OCFS2 is running (loaded + mounted) and I run startx I get the >> crash with the colors at the top of the screen. >> >> Suggestions? >> >> thanks, >> --Peter >> >> >> Sunil Mushran wrote: >> >>> /etc/ocfs2/cluster.conf holds list of nodes in the cluster. >>> /etc/sysconfig/o2cb holds the "default" cluster name which >>> is picked by the o2cb service. >>> >>> The latter is updated during: >>> /etc/init.d/o2cb configure >>> >>> Appears the cluster name in the two do not match. >>> >>> What difference did you notice? >>> >>> Peter Sylvester wrote: >>> >>>> Here is some feedback on my OCFS2 installation... >>>> >>>> You were correct about selinux. I was able to disable it by >>>> specifying "SELINUX=disabled" in /etc/selinux/config and rebooting. >>>> I was then able to run the console tool correctly. >>>> >>>> I did stumble on another issue, though. It appears that the >>>> console tool hard codes the cluster name as "ocfs2", but the >>>> cluster service seems to be looking for cluster "racdb", and errors >>>> out when attempting to bring online. I hand edited the >>>> cluster.conf file and changed the cluster name to "racdb" and >>>> everything then appeared to work. >>>> >>>> Also, I noticed that the cluster.conf configuration is a bit >>>> different from that shown in the User's Guide documentation. >>>> >>>> I was able to mkfs.ocfs2 and mount the partition and it appears to >>>> work. I'll plan on doing some testing next week to stress it out a >>>> bit. >>>> >>>> --Peter >>>> >>>> Sunil Mushran wrote: >>>> >>>>> You appear to have selinux enabled which is missing policies for >>>>> configfs/ocfs2. >>>>> We are still investigating the issue, but one quick solution would >>>>> be to disable selinux. :) >>>>> >>>>> Peter Sylvester wrote: >>>>> >>>>>> I've got a fresh RHEL AS 4-U2 installation on a Dell PE2850 server. >>>>>> >>>>>> I downloaded and installed the latest RPMs: >>>>>> ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm >>>>>> ocfs2-tools-1.0.2-1.i386.rpm >>>>>> ocfs2console-1.0.2-1.i386.rpm >>>>>> >>>>>> I was able to start the console, but when I try to run >>>>>> cluster->configure_nodes, I get the following error message: >>>>>> Could not start cluster stack. This must be resolved before any >>>>>> OCFS2 filesystem can be mounted. >>>>>> >>>>>> I tried to start the service via the command line: >>>>>> /etc/init.d/o2cb load >>>>>> >>>>>> And got the error: >>>>>> Mounting configfs filesystem at /config: mount: block device >>>>>> configfs is write-protected, mounting read-only >>>>>> mount: cannot mount block device configfs read-only >>>>>> Unable to mount configfs filesystem >>>>>> Failed >>>>>> >>>>>> I tried mounting /config myself via the following, which worked >>>>>> (not sure if thats how it *should* be done though): >>>>>> mount -t configfs none /config >>>>>> >>>>>> Now o2cb load puts out the following message: >>>>>> Loading module "ocfs2_nodemanager": Unable to load module >>>>>> "ocfs2_nodemanager" >>>>>> Failed >>>>>> >>>>>> And an o2cb status shows: >>>>>> Module "configfs": Loaded >>>>>> Filesystem "configfs": Mounted >>>>>> Module "ocfs2_nodemanager": Not loaded >>>>>> Module "ocfs2_dlm": Not loaded >>>>>> Module "ocfs2_dlmfs": Not loaded >>>>>> Filesystem "ocfs2_dlmfs": Not mounted >>>>>> >>>>>> I tried creating a /etc/ocfs2/cluster.conf myself (using vi), but >>>>>> it did not seem to have any effect. >>>>>> I tried a fresh reboot but that did not seem to change anything. >>>>>> Also note that all commands were run as root, and I am just >>>>>> trying to get a single node with local disk working. >>>>>> >>>>>> There were some error messages reported in /var/log/messages, of >>>>>> the form: >>>>>> Oct 13 17:02:56 dblinux1 kernel: SELinux: initialized (dev >>>>>> configfs, type configfs), not configured for labeling >>>>>> Oct 13 17:02:56 dblinux1 kernel: audit(1129237376.191:5): avc: >>>>>> denied { mount } for pid=14922 comm="mount" name="/" >>>>>> dev=configfs ino=70286 scontext=root:system_r:initrc_t >>>>>> tcontext=system_u:object_r:unlabeled_t tclass=filesystem >>>>>> >>>>>> Also have some errors of form: >>>>>> Oct 13 18:03:49 dblinux1 dbus: Can't send to audit system: >>>>>> USER_AVC pid=2587 uid=81 loginuid=-1 message=avc: denied { >>>>>> send_msg } for scontext=user_u:system_r:unconfined_t >>>>>> tcontext=user_u:system_r:initrc_t tclass=dbus >>>>>> >>>>>> And this one: >>>>>> Oct 13 17:46:36 dblinux1 kernel: OCFS2 Node Manager 1.0.7 Wed Oct >>>>>> 12 13:18:42 PDT 2005 (build 6cb35edfedddf6b4d606b95f2579cb39) >>>>>> Oct 13 17:46:36 dblinux1 kernel: audit(1129239996.953:8): avc: >>>>>> denied { mount } for pid=3903 comm="modprobe" name="/" >>>>>> dev=configfs ino=10949 scontext=root:system_r:initrc_t >>>>>> tcontext=system_u:object_r:unlabeled_t tclass=filesystem >>>>>> Oct 13 17:46:36 dblinux1 kernel: nodemanager: Registration >>>>>> returned -13 >>>>>> Oct 13 17:46:36 dblinux1 modprobe: FATAL: Error inserting >>>>>> ocfs2_nodemanager >>>>>> (/lib/modules/2.6.9-22.ELsmp/kernel/fs/ocfs2/ocfs2_nodemanager.ko): >>>>>> Permission denied >>>>>> >>>>>> Any clues as to what I should do I try from here???? >>>>>> >>>>>> thanks, >>>>>> --Peter >>>>>> >>>>>> _______________________________________________ >>>>>> Ocfs2-users mailing list >>>>>> Ocfs2-users@oss.oracle.com >>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Ocfs2-users mailing list >>>> Ocfs2-users@oss.oracle.com >>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >>> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > > >------------------------------------------------------------------------ > >_______________________________________________ >Ocfs2-users mailing list >Ocfs2-users@oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs2-users > >