thr3ads.net - Ocfs users - [Ocfs-users] Cluster Manager Issue on OCFS Firewire ? [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Darren Scott

2004-Jul-14 10:31 UTC

[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?

I have set up an ocfs system using two linux nodes connected using 
firewire.

Details:

# uname -a
Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686 
i686 i386 GNU/Linux

# cat /etc/issue
Red Hat Enterprise Linux ES release 3 (Taroon Update 1)

# rpm -qa |grep ocfs
ocfs-tools-1.1.2-1
ocfs-2.4.21-EL-1.0.12-1
ocfs-support-1.1.2-1
<>

Oracle version: 9.2.0.5
Cluster manager version:  9.2.0.4.0.48

Everything appears to be fine the cluster manager can be started on both 
nodes and remains running (no crashes have been seen) and the database 
starts correctly and performs as a RAC database.<>  However I have 
noticed the following messages appearing at random in the system message 
files.

On Node1

# tail -f /var/log/messages
Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)


On Node2

# tail -f /var/log/messages
Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00
Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00

These messages seem to imply that the cluster manager is constantly 
reconfiguring itself.  I was previously on 9.2.0.4 (CM  9.2.0.2.0.47) 
and have since upgraded but this did not resolve the situation.  If the 
database attempts to write to disk during the brief time that either of 
these nodes has been removed from the cluster then errors are written to 
the alert log and the database hangs/falls over.

Strangely I have another test system set up using firewire which is 
almost identical except for the fact that it utilises raw disks and NOT 
ocfs and there have been no issues regarding the cluster manager.  
Whilst I realise that this is a firewire install and thus is not 
supported by Oracle, I was wondering if anyone else seen this type of 
behaviour?

Your comments would be most appreciated.


________________________________________________________________________

E-mail is an informal method of communication and may be subject to data
corruption, interception and unauthorised amendment for which Digital Bridges
Ltd will accept no liability. Therefore, it will normally be inappropriate to
rely on information contained on e-mail without obtaining written confirmation.

This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient (or have received this e-mail in error) please notify
the sender immediately and destroy this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.

________________________________________________________________________

Chris Robertson

2004-Jul-14 12:22 UTC

head link

[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?

We saw a lot of similar firewire related errors on our test setup as well.  It
turned out to be a combination of cabling and host port problems.  After trying
different cable/port combinations all of our problems went away.

HTH

Chris  

-----Original Message-----
From: ocfs-users-bounces@oss.oracle.com
[mailto:ocfs-users-bounces@oss.oracle.com]On Behalf Of Darren Scott
Sent: Wednesday, July 14, 2004 8:31 AM
To: ocfs-users@oss.oracle.com
Subject: [Ocfs-users] Cluster Manager Issue on OCFS Firewire ?


I have set up an ocfs system using two linux nodes connected using 
firewire.

Details:

# uname -a
Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686 
i686 i386 GNU/Linux

# cat /etc/issue
Red Hat Enterprise Linux ES release 3 (Taroon Update 1)

# rpm -qa |grep ocfs
ocfs-tools-1.1.2-1
ocfs-2.4.21-EL-1.0.12-1
ocfs-support-1.1.2-1
<>

Oracle version: 9.2.0.5
Cluster manager version:  9.2.0.4.0.48

Everything appears to be fine the cluster manager can be started on both 
nodes and remains running (no crashes have been seen) and the database 
starts correctly and performs as a RAC database.<>  However I have 
noticed the following messages appearing at random in the system message 
files.

On Node1

# tail -f /var/log/messages
Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)


On Node2

# tail -f /var/log/messages
Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00
Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00

These messages seem to imply that the cluster manager is constantly 
reconfiguring itself.  I was previously on 9.2.0.4 (CM  9.2.0.2.0.47) 
and have since upgraded but this did not resolve the situation.  If the 
database attempts to write to disk during the brief time that either of 
these nodes has been removed from the cluster then errors are written to 
the alert log and the database hangs/falls over.

Strangely I have another test system set up using firewire which is 
almost identical except for the fact that it utilises raw disks and NOT 
ocfs and there have been no issues regarding the cluster manager.  
Whilst I realise that this is a firewire install and thus is not 
supported by Oracle, I was wondering if anyone else seen this type of 
behaviour?

Your comments would be most appreciated.


________________________________________________________________________

E-mail is an informal method of communication and may be subject to data
corruption, interception and unauthorised amendment for which Digital Bridges
Ltd will accept no liability. Therefore, it will normally be inappropriate to
rely on information contained on e-mail without obtaining written confirmation.

This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient (or have received this e-mail in error) please notify
the sender immediately and destroy this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.

________________________________________________________________________

_______________________________________________
Ocfs-users mailing list
Ocfs-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs-users

Possibly Parallel Threads

Search for more seemingly similar threads

Ocfs users - Jul 2004 - Cluster Manager Issue on OCFS Firewire ?

[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?

[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?

Possibly Parallel Threads