thr3ads.net - Ocfs devel - [Ocfs-devel] Re: URGENT: OCFS2 hang

If this information is useful, please help other people find it:
Share via:

Sunil Mushran

2006-Aug-09 18:57 UTC

[Ocfs-devel] Re: URGENT: OCFS2 hang - 32 node cluster POC

Run:
# top
# vmstat 1
# iostat -x /dev/emcpowerb 1

The latter two you can save to a file. For top, just monitor cpu usage
and see if any process is hogging all of it.

Colin Laird wrote:> and the fstab settings:
>
> # This file is edited by fstab-sync - see 'man fstab-sync' for
details
> /dev/VolGroup00/LogVol01 /                       ext3    
> defaults        1 1
> LABEL=/boot             /boot                   ext3    
> defaults        1 2
> none                    /dev/pts                devpts  
> gid=5,mode=620  0 0
> none                    /dev/shm                tmpfs   
> defaults        0 0
> /dev/VolGroup00/LogVol02 /home                   ext3    
> defaults        1 2
> none                    /proc                   proc    
> defaults        0 0
> none                    /sys                    sysfs   
> defaults        0 0
> /dev/VolGroup00/LogVol00 swap                    swap    
> defaults        0 0
> /dev/emcpowerb          /ocfs2                  ocfs2   
> _netdev         0 0
> /dev/hda                /media/cdrom            auto    
> pamconsole,exec,noauto,managed 0 0
> /dev/fd0                /media/floppy           auto    
> pamconsole,exec,noauto,managed 0 0
>
> We are not storing the voting disk and cluster reg for RAC in here.
>
> Thanks
>
>
> Colin Laird wrote:
>> Hi,
>>
>> We are in the middle of a very large bid (Centrelink, Australia) with 
>> time at a premium.  So PLEASE HELP.  we have been experiencing 
>> machine hangs whenever we do large copies (5-18G) into OCFS2.  Either 
>> from ftp or local disk.  The whole machine just freezes and we need 
>> to run off and on.  we now cannot get the data available for the POC 
>> across the nodes!
>>
>> The setup is:
>>
>> 32 clustered Dell 6850 nodes running RHEL4 U3 - Linux 
>> c2.au.oracle.com 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> We have the following ocfs2 packages installed:
>> ocfs2-2.6.9-34.ELsmp-1.2.3-1
>> ocfs2-2.6.9-34.EL-1.2.3-1
>> ocfs2-tools-debuginfo-1.2.1-1
>> ocfs2-2.6.9-34.ELlargesmp-1.2.3-1
>> ocfs2console-1.2.1-1
>> ocfs2-tools-1.2.1-1
>>
>> We have* elevator=deadline* set as per instructions too.
>>
>> We are currently looking for a log to see if we can find anything.  
>> The system and ftp logs show nothing.
>>
>> Can anyone provide any pointers?  Have we missed applying anything?
>>
>> Thanks,
>>
>> -- 
>> Colin Laird
>> Principal Solutions Consultant
>>
>> Oracle New Zealand Ltd
>> Level 10
>> Todd Building
>> 93-97 Customhouse Quay
>> Wellington
>> New Zealand
>>
>> main: +64 4 978 5400
>> ddi:  +64 4 978 5423
>> mob:  +64 21 617 025
>> fax:  +64 4 978 5401 
>
> -- 
> Colin Laird
> Principal Solutions Consultant
>
> Oracle New Zealand Ltd
> Level 10
> Todd Building
> 93-97 Customhouse Quay
> Wellington
> New Zealand
>
> main: +64 4 978 5400
> ddi:  +64 4 978 5423
> mob:  +64 21 617 025
> fax:  +64 4 978 5401

Wim Coekaerts

2006-Aug-09 19:24 UTC

head link

[Ocfs-devel] Re: URGENT: OCFS2 hang - 32 node cluster POC

alt-sysrq-t should still work w/ netdump configured

On Thu, Aug 10, 2006 at 12:22:39PM +1000, Colin Laird
wrote:> The problem is during the hang you can't get on to the box, its 
> completely dead.
> 
> Something we have found is that the heartbeat is set to 7, on the test 
> cluster which has worked fine it is at 61.  We are setting this value to 
> 61 across the cluster.
> 
> Sunil Mushran wrote:
> >Run:
> ># top
> ># vmstat 1
> ># iostat -x /dev/emcpowerb 1
> >
> >The latter two you can save to a file. For top, just monitor cpu usage
> >and see if any process is hogging all of it.
> >
> >Colin Laird wrote:
> >>and the fstab settings:
> >>
> >># This file is edited by fstab-sync - see 'man fstab-sync'
for details
> >>/dev/VolGroup00/LogVol01 /                       ext3    
> >>defaults        1 1
> >>LABEL=/boot             /boot                   ext3    
> >>defaults        1 2
> >>none                    /dev/pts                devpts  
> >>gid=5,mode=620  0 0
> >>none                    /dev/shm                tmpfs   
> >>defaults        0 0
> >>/dev/VolGroup00/LogVol02 /home                   ext3    
> >>defaults        1 2
> >>none                    /proc                   proc    
> >>defaults        0 0
> >>none                    /sys                    sysfs   
> >>defaults        0 0
> >>/dev/VolGroup00/LogVol00 swap                    swap    
> >>defaults        0 0
> >>/dev/emcpowerb          /ocfs2                  ocfs2   
> >>_netdev         0 0
> >>/dev/hda                /media/cdrom            auto    
> >>pamconsole,exec,noauto,managed 0 0
> >>/dev/fd0                /media/floppy           auto    
> >>pamconsole,exec,noauto,managed 0 0
> >>
> >>We are not storing the voting disk and cluster reg for RAC in here.
> >>
> >>Thanks
> >>
> >>
> >>Colin Laird wrote:
> >>>Hi,
> >>>
> >>>We are in the middle of a very large bid (Centrelink,
Australia)
> >>>with time at a premium.  So PLEASE HELP.  we have been
experiencing
> >>>machine hangs whenever we do large copies (5-18G) into OCFS2.  
> >>>Either from ftp or local disk.  The whole machine just freezes
and
> >>>we need to run off and on.  we now cannot get the data
available for
> >>>the POC across the nodes!
> >>>
> >>>The setup is:
> >>>
> >>>32 clustered Dell 6850 nodes running RHEL4 U3 - Linux 
> >>>c2.au.oracle.com 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST
2006
> >>>x86_64 x86_64 x86_64 GNU/Linux
> >>>
> >>>We have the following ocfs2 packages installed:
> >>>ocfs2-2.6.9-34.ELsmp-1.2.3-1
> >>>ocfs2-2.6.9-34.EL-1.2.3-1
> >>>ocfs2-tools-debuginfo-1.2.1-1
> >>>ocfs2-2.6.9-34.ELlargesmp-1.2.3-1
> >>>ocfs2console-1.2.1-1
> >>>ocfs2-tools-1.2.1-1
> >>>
> >>>We have* elevator=deadline* set as per instructions too.
> >>>
> >>>We are currently looking for a log to see if we can find
anything.
> >>>The system and ftp logs show nothing.
> >>>
> >>>Can anyone provide any pointers?  Have we missed applying
anything?
> >>>
> >>>Thanks,
> >>>
> >>>-- 
> >>>Colin Laird
> >>>Principal Solutions Consultant
> >>>
> >>>Oracle New Zealand Ltd
> >>>Level 10
> >>>Todd Building
> >>>93-97 Customhouse Quay
> >>>Wellington
> >>>New Zealand
> >>>
> >>>main: +64 4 978 5400
> >>>ddi:  +64 4 978 5423
> >>>mob:  +64 21 617 025
> >>>fax:  +64 4 978 5401 
> >>
> >>-- 
> >>Colin Laird
> >>Principal Solutions Consultant
> >>
> >>Oracle New Zealand Ltd
> >>Level 10
> >>Todd Building
> >>93-97 Customhouse Quay
> >>Wellington
> >>New Zealand
> >>
> >>main: +64 4 978 5400
> >>ddi:  +64 4 978 5423
> >>mob:  +64 21 617 025
> >>fax:  +64 4 978 5401 
> 
> -- 
> Colin Laird
> Principal Solutions Consultant
> 
> Oracle New Zealand Ltd
> Level 10
> Todd Building
> 93-97 Customhouse Quay
> Wellington
> New Zealand
> 
> main: +64 4 978 5400
> ddi:  +64 4 978 5423
> mob:  +64 21 617 025
> fax:  +64 4 978 5401 
>

Apparently Analagous Threads

Search for more maybe matching threads

Ocfs devel - Aug 2006 - Re: URGENT: OCFS2 hang - 32 node cluster POC

[Ocfs-devel] Re: URGENT: OCFS2 hang - 32 node cluster POC

[Ocfs-devel] Re: URGENT: OCFS2 hang - 32 node cluster POC

Apparently Analagous Threads