Vladan Gunjic
2006-Aug-01 04:22 UTC
AW: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature
I'm using ocfs2 and all modules from Suse (SLES9), no self compilations. Here are the details: * 32-bit machine (writing to ocfs2 partition/LUN and where the corruption was reported): Kernel: 2.6.5-7.257-bigsmp #1 SMP i686 i386 GNU/Linux OCFS2 rpms: ocfs2console-1.2.1-4.2 ocfs2-tools-1.2.1-4.2 o2cb_ctl -V: o2cb_ctl version 1.2.1 /etc/init.d/o2cb status: Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking cluster dbrac: Online Checking heartbeat: Active /etc/init.d/ocfs2 status: Configured OCFS2 mountpoints: /mnt/emcpowera1 mnt/emcpowere1 Active OCFS2 mountpoints: /mnt/emcpowera1 /mnt/emcpowere1 * 2 identical 64-bit machines (that are supposed to use the data after 32->64 bit conversion): Kernel: 2.6.5-7.257-smp #1 SMP x86_64 GNU/Linux OCFS2 rpms: ocfs2console-1.2.1-4.2 ocfs2-tools-1.2.1-4.2 o2cb_ctl -V: o2cb_ctl version 1.2.1 /etc/init.d/o2cb status: Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking cluster dbrac: Online Checking heartbeat: Active /etc/init.d/ocfs2 status: Configured OCFS2 mountpoints: /mnt/emcpowerd1 Active OCFS2 mountpoints: /mnt/emcpowerd1 (other 2 64-bit machines have other LUN from 32-bit machine mounted) modinfo on all 5 machines: 1. (32-bit) license: GPL author: Oracle version: 1.2.1-SLES AC2C92855997647E2A862F0 description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends: ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-bigsmp SMP PENTIUMII REGPARM gcc-3.3 ========== next 2 machines are mounting the LUN that was corrupted (will be one Oracle RAC): 2. (64-bit) license: GPL author: Oracle version: 1.2.1-SLES AC2C92855997647E2A862F0 description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends: ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-smp SMP gcc-3.3 3. (64-bit) license: GPL author: Oracle version: 1.2.1-SLES AC2C92855997647E2A862F0 description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends: ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-smp SMP gcc-3.3 ========== next 2 machines are mounting the LUN that was NOT corrupted (will be another Oracle RAC): 4. (64-bit) license: GPL author: Oracle version: 1.1.8-SLES E9BF6AA66857FAE88EF441B description: OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) depends: ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.252-smp SMP gcc-3.3 5. (64-bit) license: GPL author: Oracle version: 1.1.8-SLES E9BF6AA66857FAE88EF441B description: OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) depends: ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.252-smp SMP gcc-3.3 Additionally I noticed last night, when I was shortly disabling the complete network of all of those machines that after restoring the network, the last two machines (older ocfs2 version) were confused and didn't rejoin the cluster before the system reboot. So, I guess first step is to update last two on ocfs2 version 1.2.1 ? Although they were not directly involved in corruption, maybe indirect ? Thanks, Vladan -----Urspr?ngliche Nachricht----- Von: Sunil Mushran [mailto:Sunil.Mushran@oracle.com] Gesendet: Dienstag, 1. August 2006 04:29 An: Vladan Gunjic Cc: ocfs2-users@oss.oracle.com Betreff: Re: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature What version of ocfs2 is on the nodes? Do modinfo ocfs2 on all nodes. The version of OCFS2 shipped with SLES9 SP3 varies with kernel. Are you using the modules shipped by suse or building them yourself?
Sunil Mushran
2006-Aug-01 20:02 UTC
AW: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature
The ocfs2 version should be the same on all the nodes. Mixing nodes with 1.1.8 and 1.2.1 will cause problems. We had fixed a lot of issues in 1.2.1. I'll write more when I reread your prev email. Vladan Gunjic wrote:> I'm using ocfs2 and all modules from Suse (SLES9), no self compilations. > Here are the details: > > * 32-bit machine (writing to ocfs2 partition/LUN and where the corruption was reported): > Kernel: 2.6.5-7.257-bigsmp #1 SMP i686 i386 GNU/Linux > OCFS2 rpms: ocfs2console-1.2.1-4.2 > ocfs2-tools-1.2.1-4.2 > o2cb_ctl -V: o2cb_ctl version 1.2.1 > /etc/init.d/o2cb status: > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemanager": Loaded > Module "ocfs2_dlm": Loaded > Module "ocfs2_dlmfs": Loaded > Filesystem "ocfs2_dlmfs": Mounted > Checking cluster dbrac: Online > Checking heartbeat: Active > /etc/init.d/ocfs2 status: > Configured OCFS2 mountpoints: /mnt/emcpowera1 mnt/emcpowere1 > Active OCFS2 mountpoints: /mnt/emcpowera1 /mnt/emcpowere1 > > * 2 identical 64-bit machines (that are supposed to use the data after 32->64 bit conversion): > Kernel: 2.6.5-7.257-smp #1 SMP x86_64 GNU/Linux > OCFS2 rpms: ocfs2console-1.2.1-4.2 > ocfs2-tools-1.2.1-4.2 > o2cb_ctl -V: o2cb_ctl version 1.2.1 > /etc/init.d/o2cb status: > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemanager": Loaded > Module "ocfs2_dlm": Loaded > Module "ocfs2_dlmfs": Loaded > Filesystem "ocfs2_dlmfs": Mounted > Checking cluster dbrac: Online > Checking heartbeat: Active > /etc/init.d/ocfs2 status: > Configured OCFS2 mountpoints: /mnt/emcpowerd1 > Active OCFS2 mountpoints: /mnt/emcpowerd1 > (other 2 64-bit machines have other LUN from 32-bit machine mounted) > > modinfo on all 5 machines: > > 1. (32-bit) > license: GPL > author: Oracle > version: 1.2.1-SLES AC2C92855997647E2A862F0 > description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) > depends: ocfs2_nodemanager,ocfs2_dlm,jbd > supported: yes > vermagic: 2.6.5-7.257-bigsmp SMP PENTIUMII REGPARM gcc-3.3 > > > ========== next 2 machines are mounting the LUN that was corrupted (will be one Oracle RAC): > 2. (64-bit) > license: GPL > author: Oracle > version: 1.2.1-SLES AC2C92855997647E2A862F0 > description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) > depends: ocfs2_nodemanager,ocfs2_dlm,jbd > supported: yes > vermagic: 2.6.5-7.257-smp SMP gcc-3.3 > > 3. (64-bit) > license: GPL > author: Oracle > version: 1.2.1-SLES AC2C92855997647E2A862F0 > description: OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) > depends: ocfs2_nodemanager,ocfs2_dlm,jbd > supported: yes > vermagic: 2.6.5-7.257-smp SMP gcc-3.3 > > ========== next 2 machines are mounting the LUN that was NOT corrupted (will be another Oracle RAC): > 4. (64-bit) > license: GPL > author: Oracle > version: 1.1.8-SLES E9BF6AA66857FAE88EF441B > description: OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) > depends: ocfs2_nodemanager,ocfs2_dlm,jbd > supported: yes > vermagic: 2.6.5-7.252-smp SMP gcc-3.3 > > 5. (64-bit) > license: GPL > author: Oracle > version: 1.1.8-SLES E9BF6AA66857FAE88EF441B > description: OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) > depends: ocfs2_nodemanager,ocfs2_dlm,jbd > supported: yes > vermagic: 2.6.5-7.252-smp SMP gcc-3.3 > > Additionally I noticed last night, when I was shortly disabling the complete network of all of those machines that after restoring the network, the last two machines (older ocfs2 version) were confused and didn't rejoin the cluster before the system reboot. > > So, I guess first step is to update last two on ocfs2 version 1.2.1 ? > Although they were not directly involved in corruption, maybe indirect ? > > Thanks, > Vladan > > > -----Urspr?ngliche Nachricht----- > Von: Sunil Mushran [mailto:Sunil.Mushran@oracle.com] > Gesendet: Dienstag, 1. August 2006 04:29 > An: Vladan Gunjic > Cc: ocfs2-users@oss.oracle.com > Betreff: Re: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature > > What version of ocfs2 is on the nodes? Do modinfo ocfs2 on all nodes. > > The version of OCFS2 shipped with SLES9 SP3 varies with kernel. > Are you using the modules shipped by suse or building them yourself? > >