I have set up an ocfs system using two linux nodes connected using firewire. Details: # uname -a Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686 i686 i386 GNU/Linux # cat /etc/issue Red Hat Enterprise Linux ES release 3 (Taroon Update 1) # rpm -qa |grep ocfs ocfs-tools-1.1.2-1 ocfs-2.4.21-EL-1.0.12-1 ocfs-support-1.1.2-1 <> Oracle version: 9.2.0.5 Cluster manager version: 9.2.0.4.0.48 Everything appears to be fine the cluster manager can be started on both nodes and remains running (no crashes have been seen) and the database starts correctly and performs as a RAC database.<> However I have noticed the following messages appearing at random in the system message files. On Node1 # tail -f /var/log/messages Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) On Node2 # tail -f /var/log/messages Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00 Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 These messages seem to imply that the cluster manager is constantly reconfiguring itself. I was previously on 9.2.0.4 (CM 9.2.0.2.0.47) and have since upgraded but this did not resolve the situation. If the database attempts to write to disk during the brief time that either of these nodes has been removed from the cluster then errors are written to the alert log and the database hangs/falls over. Strangely I have another test system set up using firewire which is almost identical except for the fact that it utilises raw disks and NOT ocfs and there have been no issues regarding the cluster manager. Whilst I realise that this is a firewire install and thus is not supported by Oracle, I was wondering if anyone else seen this type of behaviour? Your comments would be most appreciated. ________________________________________________________________________ E-mail is an informal method of communication and may be subject to data corruption, interception and unauthorised amendment for which Digital Bridges Ltd will accept no liability. Therefore, it will normally be inappropriate to rely on information contained on e-mail without obtaining written confirmation. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ________________________________________________________________________
Chris Robertson
2004-Jul-14 12:22 UTC
[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?
We saw a lot of similar firewire related errors on our test setup as well. It turned out to be a combination of cabling and host port problems. After trying different cable/port combinations all of our problems went away. HTH Chris -----Original Message----- From: ocfs-users-bounces@oss.oracle.com [mailto:ocfs-users-bounces@oss.oracle.com]On Behalf Of Darren Scott Sent: Wednesday, July 14, 2004 8:31 AM To: ocfs-users@oss.oracle.com Subject: [Ocfs-users] Cluster Manager Issue on OCFS Firewire ? I have set up an ocfs system using two linux nodes connected using firewire. Details: # uname -a Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686 i686 i386 GNU/Linux # cat /etc/issue Red Hat Enterprise Linux ES release 3 (Taroon Update 1) # rpm -qa |grep ocfs ocfs-tools-1.1.2-1 ocfs-2.4.21-EL-1.0.12-1 ocfs-support-1.1.2-1 <> Oracle version: 9.2.0.5 Cluster manager version: 9.2.0.4.0.48 Everything appears to be fine the cluster manager can be started on both nodes and remains running (no crashes have been seen) and the database starts correctly and performs as a RAC database.<> However I have noticed the following messages appearing at random in the system message files. On Node1 # tail -f /var/log/messages Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from clustered device (8,0) Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to clustered device (8,0) On Node2 # tail -f /var/log/messages Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from clustered device (8,0) Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to clustered device (8,0) Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00 Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00 These messages seem to imply that the cluster manager is constantly reconfiguring itself. I was previously on 9.2.0.4 (CM 9.2.0.2.0.47) and have since upgraded but this did not resolve the situation. If the database attempts to write to disk during the brief time that either of these nodes has been removed from the cluster then errors are written to the alert log and the database hangs/falls over. Strangely I have another test system set up using firewire which is almost identical except for the fact that it utilises raw disks and NOT ocfs and there have been no issues regarding the cluster manager. Whilst I realise that this is a firewire install and thus is not supported by Oracle, I was wondering if anyone else seen this type of behaviour? Your comments would be most appreciated. ________________________________________________________________________ E-mail is an informal method of communication and may be subject to data corruption, interception and unauthorised amendment for which Digital Bridges Ltd will accept no liability. Therefore, it will normally be inappropriate to rely on information contained on e-mail without obtaining written confirmation. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ________________________________________________________________________ _______________________________________________ Ocfs-users mailing list Ocfs-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs-users