thr3ads.net - Ocfs2 devel - [Ocfs2-users] Mixed mounts w/ different physical block sizes (long post) [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Michael Ulbrich

2017-Sep-14 18:58 UTC

[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?

Hi again,

I made some progress with debugging the situation.

To recap:

2 ocfs2 file systems:

/dev/drbd0 -> lvm -> RAID1 from 2 x 600 GB SAS disks

/dev/drbd1 -> lvm -> RAID1 from 2 x 6 TB NL (Near-Line) SAS disks

This is configured identically on 2 DELL R 530 servers (node 1 + 2 as
hypervisors). Disks are connected via PERC H730 mini (Linux kernel
driver: megaraid_sas ver. 06.811.02.00-rc1). drbd has a private GigE
link for replication traffic. Both hypervisors run 3 virtual machines each.

/dev/drbd0 works as expected as long as it's allocated on the 600 GB
RAID 1. If it's moved to the large 6 TB RAID1 device the behaviuor gets
identical to /dev/drbd1.

As described in my previous post there's an unusual slot (?) numbering
which prevents the mount of the ocfs2 file system /dev/drbd1 on node 8.
As a quick fix we could swap node numbers 1 <-> 8 in cluster.conf. But
this does not address the underlying problem as we will soon see. In
deliberately formatted form the list of nodes looks as follows:

node (number = 8, name = h1a) -  Hypervisor
node (number = 2, name = h1b) -  Hypervisor
node (number = 3, name = web1) - Guest 1 on h1a
node (number = 4, name = db1)  - Guest 2 on h1a
node (number = 5, name = srv1) - Guest 3 on h1a
node (number = 6, name = web2) - Guest 4 on h1b
node (number = 7, name = db2)  - Guest 5 on h1b
node (number = 1, name = srv2) - Guest 6 on h1b

Now node 8 is the first (Hypervisor) node to mount /dev/drbd1 which
leads to ('watch -d -n 1 "echo \"hb\" | debugfs.ocfs2 -n
/dev/drbd1"):

hb

        node: node              seq       generation checksum
          64:    8 0000000059b8d9ba 73a63eb550a33095 f4e074d1

Node 2 is the second (Hypervisor) node to mount:

hb
        node: node              seq       generation checksum
          16:    2 0000000059b8d9b9 5c7504c05637983e 07d696ec
          64:    8 0000000059b8d9ba 73a63eb550a33095 f4e074d1

Again we see the strange "* 8" or "shift left 3"
relationship between columns "node:" and "node".

Now the guests are brought up and mount the file system in order 3, 5, 6, 1 (I
don't have the actual seq / gen values, so from memory):

hb
        node: node              seq       generation checksum
           1:    1 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx
           3:    3 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx
           5:    5 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx
           6:    6 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx
          16:    2 0000000059b8d9b9 5c7504c05637983e 07d696ec
          64:    8 0000000059b8d9ba 73a63eb550a33095 f4e074d1

Please note that the virtual machines get assigned the corresponding
"node:" = "node" values as expected.

Now we went a step further and enabled tracing: "debugfs.ocfs2 -l HEARTBEAT
allow". This periodically logs messages from the heartbeat threads of the
individual file systems. For the file system /dev/drbd1 we get on the
hypervisors:

(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 1 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 2 gen 0x98be08e71122efed
cksum 0x33a84ac0 seq 1505346907 last 1505346907 changed 1 equal 0
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 3 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 5 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 6 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 8 gen 0x551934cc4ba0b1bf
cksum 0xf606e2be seq 1505346907 last 1505346907 changed 1 equal 0

We only see the hypervisors heartbeating in slots 2 and 8 although 4 additional
guests have also mounted the same file system.

Tracing the ocfs2 heartbeat on one of the guests (web1) gives the following:

(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 1 gen 0xd1f96dee2509bc73 cksum
0x1dc10931 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 2 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 3 gen 0x5d8c200c0113510f cksum
0xbfc95a14 seq 1505371590 last 1505371590 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 5 gen 0x39a8da3bae49161b cksum
0x49b4a110 seq 1505371588 last 1505371588 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 6 gen 0xc00a0ba3931ad15 cksum
0x92625e99 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 8 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674

Here we have the mirrored situation in that only guests are seen heartbeating in
slots 1, 3, 5 and 6. No trace of the hypervisors in slots 2 (16) and 8 (64) ...

Uhm, well ... trying to wrap my head around this ... :-)

Ok, so let's dig a little deeper.

Watching the init of a mount operation with HEARTBEAT tracing enabled gives
different results depending on whether the same file system is mounted on a
hypervisor:

(mount.ocfs2,28264,0):o2hb_init_region_params:1729 hr_start_block = 273,
hr_blocks = 255
(mount.ocfs2,28264,0):o2hb_init_region_params:1731 hr_block_bytes = 4096,
*hr_block_bits = 12*
(mount.ocfs2,28264,0):o2hb_init_region_params:1732 hr_timeout_ms = 2000
(mount.ocfs2,28264,0):o2hb_init_region_params:1733 dead threshold = 31
(mount.ocfs2,28264,0):o2hb_map_slot_data:1764 Going to require 255 pages to
cover 255 blocks at 1 blocks per page
(o2hb-C27AC49D2B,28265,10):o2hb_thread:1221 hb thread running

... or on a guest:

(mount.ocfs2,3505,1):o2hb_init_region_params:1729 hr_start_block = 2184,
hr_blocks = 255
(mount.ocfs2,3505,1):o2hb_init_region_params:1731 hr_block_bytes = 512,
*hr_block_bits = 9*
(mount.ocfs2,3505,1):o2hb_init_region_params:1732 hr_timeout_ms = 2000
(mount.ocfs2,3505,1):o2hb_init_region_params:1733 dead threshold = 31
(mount.ocfs2,3505,1):o2hb_map_slot_data:1764 Going to require 32 pages to cover
255 blocks at 8 blocks per page
(o2hb-C27AC49D2B,3506,0):o2hb_thread:1221 hb thread running

So on the hypervisors we have -> hr_block_bytes = 4096, hr_block_bits = 12
and on the virtual machines ->   hr_block_bytes = 512,  hr_block_bits = 9

There we find BTW the factor of 8 in the different block size and a 3 bit length
difference in hr_block_bits.

Is this ok? From my limited understanding I would have expected that the nodes
mounting the shared file system would share the heartbeat system file and inside
of that share a common data structure (hearbeat region). But then the nodes
should have a common understanding of the size of this structure, right? Here it
looks as if the hypervisors are interacting with a "large" hr
structure while the guests do the same but on a "small" heartbeat
region. That would be an explanation that heartbeat threads from hypervisors and
guests do not "see" each other as described above.

Or are the heartbeat regions in-memory structures of different size which get
translated into a common "disk" structure when being written to the hb
system file?

It would be great if you could give me a little guidance here. If it actually is
a bug I'm willing to work on this further. If I'm heading in the wrong
direction just give me a short note and probably a hint what's wrong with my
setup or which ocfs2 version might fix this problem.

Thanks a lot + Regards from Berlin ... Michael U.


On 09/13/2017 09:54 AM, Michael Ulbrich wrote:> Hi all,
>
> we've a small (?) problem with a 2-node cluster on Debian 8:
>
> Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)
> x86_64 GNU/Linux
>
> ocfs2-tools 1.6.4-3
>
> Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6
> slots) are created on top of drbd w/ 4k block and cluster size,
> 'max_features' enabled.
>
> cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the
> hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the
> corresponding VMs on node 2.
>
> VMs all run Debian 8 as well:
>
> Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64
> GNU/Linux
>
> When mounting drbd0 in order of increasing node numbers and concurrently
> watching the 'hb' output from debugsfs.ocfs2 we get a clean slot
map (?):
>
> hb
>         node: node              seq       generation checksum
>            1:    1 0000000059b8d94a fa60f0d8423590d9 edec9643
>            2:    2 0000000059b8d94c aca059df4670f467 994e3458
>            3:    3 0000000059b8d949 f03dc9ba8f27582c d4473fc2
>            4:    4 0000000059b8d94b df5bbdb756e757f8 12a198eb
>            5:    5 0000000059b8d94a 1af81d94a7cb681b 91fba906
>            6:    6 0000000059b8d94b 104538f30cdb35fa 8713e798
>            7:    7 0000000059b8d94b 195658c9fb8ca7f9 5e54edf6
>            8:    8 0000000059b8d949 dc6bfb46b9cf1ac3 de7a8757
>
> Device drbd1 in contrast yields the following table after mounting on
> nodes 1, 2:
>
> hb
>         node: node              seq       generation checksum
>            8:    1 0000000059b8d9ba 73a63eb550a33095 f4e074d1
>           16:    2 0000000059b8d9b9 5c7504c05637983e 07d696ec
>
> Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to:
>
> hb
>         node: node              seq       generation checksum
>            3:    3 0000000059b8da3b 9443b4b209b16175 f2cc87ec
>            5:    5 0000000059b8da3c 4b742f709377466f 3ac41cf3
>            6:    6 0000000059b8da3b d96e2de0a55514f6 335a4d90
>            8:    1 0000000059b8da3c 73a63eb550a33095 2312c1c4
>           16:    2 0000000059b8da3d 5c7504c05637983e 659571a1
>
> The problem arises when trying to mount node 8 since its slot is already
> occupied by node 1:
>
> kern.log node 1:
>
> (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
> heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
> 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)
>
> kern.log node 8:
>
> ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
> (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
> heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
> 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)
>
> This can be "fixed" by exchanging node numbers 1 and 8 in
cluster.conf.
> Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as
> expected. There is no node 16 configured so there's no conflict. But
> since we experience some other so far not explainable instabilities with
> this ocfs2 device / system during operation further down the road we
> decided to take care of and try to fix this issue first.
>
> Somehow the failure reminds of bit shift or masking problems:
>
> 1 << 3 = 8
> 2 << 3 = 16
>
> But then again - what do I know ...
>
> Tried so far:
>
> A. Create offending file system with 8 slots instead of 6 -> same issue.
> B. Set features to 'default' (disables feature
'extended-slotmap') ->
> same issue.
>
> We'd very much appreciate any comments on this. Has anything similar
> ever been experienced before? Are we completely missing something
> important here?
>
> If there's a fix already out for this any pointers (src files /
commits)
> to where to look would be greatly appreciated.
>
> Thanks in advance + Best regards ... Michael U.

Michael Ulbrich

2017-Sep-18 15:43 UTC

head link

[Ocfs2-users] Mixed mounts w/ different physical block sizes (long post)

Hi again,

chatting with a helpful person on #ocfs2 IRC channel this morning  I got
encouraged to cross-post to ocsf2-devel. For historic background and
further details pls. see my two previous posts to ocfs2-users from last
week which are unanswered so far.

According to my current state of inspection I changed the topic from

"Node 8 doesn't mount / Wrong slot map assignment" to the current
"Mixed
mounts ..."

Here we go:

I've learnt that large hard disks in increasing number come formatted w/
4k physical blocks size.

Now I've created an ocfs2 shared file system on top of drbd on a RAID1
of two 6 TB disks with such 4k physical block size. File system creation
was done on a hypervisor which actually saw the device as having 4k
physical sector size.

I'm using the default o2cb cluster stack. Version is ocfs2 1.6.4 on
stock Debian 8.

A node (numbered "1" in cluster.conf) which mounts this device with 4k
phys. blocks leads to a strange "times 8" numbering when checking
heartbeat debug info with 'echo "hb" | debugfs.ocfs2 -n
/dev/drbd1':

hb
        node: node              seq       generation checksum
           8:    1 0000000059bfd253 00bfa1b63f30e494 c518c55a

I'm not sure why the first 2 columns are named "node:" and
"node" but
assume the first "node:" is an index into some internal data structure
(slot map ?, heartbeat region ?) while the second "node" column shows
the actual node number as given in cluster.conf

Now a second node mounts the shared file system again as 4k block device:

hb
        node: node              seq       generation checksum
           8:    1 0000000059bfd36a 00bfa1b63f30e494 d4f79d63
          16:    2 0000000059bfd369 7acf8521da342228 4b8cd74d

As it actually happened in my setup of a two node cluster with 2
hypervisors and  3 virtual machines on top of each (8 nodes in total),
when mounting the fs on the first virtual machine with node number 3 we get:

hb
        node: node              seq       generation checksum
           3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
           8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
          16:    2 0000000059bfd413 7acf8521da342228 cd48c018

Uhm, ... wait ... 3 ??

Mounting on further VMs (nodes 4, 5, 6 and 7) leads to:

hb
        node: node              seq       generation checksum
           3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
           4:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
           5:    5 0000000059bfd414 529a98c758325d5b 60080c42
           6:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
           7:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
           8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
          16:    2 0000000059bfd413 7acf8521da342228 cd48c018

Up to this point I did not notice any error or warning in the machines'
console or kernel logs.

And then trying to mount on node 8 finally there's an error:

kern.log node 1:

(o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)

kern.log node 8:

ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
(o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)

(actual seq and generation are not from above hb debug dump)

Now we have a conflict on slot 8.

When I encountered this error for the first time, I didn't know about
heartbeat debug info, slot maps or heartbeat regions and had no idea
what might have gone wrong so I started experimenting and found a
"solution" by swapping nodes 1 <-> 8 in cluster.conf. This leads
to the
following layout of the heartbeat region (?):

hb
        node: node              seq       generation checksum
           1:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
           3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
           4:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
           5:    5 0000000059bfd414 529a98c758325d5b 60080c42
           6:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
           7:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
          16:    2 0000000059bfd413 7acf8521da342228 cd48c018
          64:    8 0000000059bfd413 73a63eb550a33095 f4e074d1

Voila - all 8 nodes mounted, problem solved - let's continue with
getting this cluster ready for production ...

As it turned out this was in no way a stable configuration in that after
few weeks spurious reboots (fencing peer) started to happen (drbd losing
its replication connection, all kinds of weird kernel oopses and panics
from drbd and ocfs2). Reboots were usually preceded by burst of errors like:

Sep 11 00:01:27 web1 kernel: [ 9697.644436]
(o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
0x59b5b635), ondisk(3:0x743493e99d19e721, 0x59b5b633)
Sep 11 00:03:43 web1 kernel: [ 9833.918668]
(o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
0x59b5b6bd), ondisk(3:0x743493e99d19e721, 0x59b5b6bb)
Sep 11 00:03:45 web1 kernel: [ 9835.920551]
(o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
0x59b5b6bf), ondisk(3:0x743493e99d19e721, 0x59b5b6bb)
Sep 11 00:09:10 web1 kernel: [10160.576453]
(o2hb-10254DCA50,515,0):o2hb_check_own_slot:582 ERROR: Heartbeat
sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
0x59b5b804), ondisk(3:0x743493e99d19e721, 0x59b5b802)

In the end the ocfs2 filesystem had to be rebuilt to get rid of the
errors. It went ok for a while before the same symptoms of fs corruption
came back again.

To make a long story short: we found out that the virtual machines did
not see the disk device having 4k sectors but the standard 512 byte
blocks. So we had what I coined a "mixed mount" of the same ocfs2 file
system: 2 nodes mounted with 4k phys. block size the other 6 nodes
mounted w/ 512 byte block size.

Configuring the VMs with:

<blockio logical_block_size='4096'
physical_block_size='4096'/>

leads to a heartbeat slot map:

hb
        node: node              seq       generation checksum
           8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
          16:    2 0000000059bfd413 7acf8521da342228 cd48c018
          24:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
          32:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
          40:    5 0000000059bfd414 529a98c758325d5b 60080c42
          48:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
          56:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
          64:    8 0000000059bfd413 73a63eb550a33095 f4e074d1

Operation is stable so far. No 'Heartbeat sequence mismatch' errors.
Still strange the "times 8" values in column "node:" but
this may be a
purely aesthetical issue.

Browsing the code of heartbeat.c I'm not sure if such a "mixed
mount" is
*supposed* to work and it's just a minor bug we triggered that can
easily be fixed - or if such a scenario is a definite no-no and should
seriously be avoided. In the latter case an error message and cancelling
of an inappropriate mount operation would be very helpful.

Anyway, it would be greatly appreciated to hear a knowledgeable opinion
from the members of the ocfs2-devel list on this topic - any takers?

Thanks in advance + Best regards ... Michael

Changwei Ge

2017-Sep-19 03:32 UTC

head link

[Ocfs2-devel] Mixed mounts w/ different physical block sizes (long post)

Hi Michael,

On 2017/9/18 23:45, Michael Ulbrich wrote:> Hi again,
> 
> chatting with a helpful person on #ocfs2 IRC channel this morning  I got
> encouraged to cross-post to ocsf2-devel. For historic background and
> further details pls. see my two previous posts to ocfs2-users from last
> week which are unanswered so far.
> 
> According to my current state of inspection I changed the topic from
> 
> "Node 8 doesn't mount / Wrong slot map assignment" to the
current "Mixed
> mounts ..."
> 
> Here we go:
> 
> I've learnt that large hard disks in increasing number come formatted
w/
> 4k physical blocks size.
> 
> Now I've created an ocfs2 shared file system on top of drbd on a RAID1
> of two 6 TB disks with such 4k physical block size. File system creation
> was done on a hypervisor which actually saw the device as having 4k
> physical sector size.
> 
> I'm using the default o2cb cluster stack. Version is ocfs2 1.6.4 on
> stock Debian 8.
> 
> A node (numbered "1" in cluster.conf) which mounts this device
with 4k
> phys. blocks leads to a strange "times 8" numbering when checking
> heartbeat debug info with 'echo "hb" | debugfs.ocfs2 -n
/dev/drbd1':
> 
> hb
>          node: node              seq       generation checksum
>             8:    1 0000000059bfd253 00bfa1b63f30e494 c518c55a
> 
> I'm not sure why the first 2 columns are named "node:" and
"node" but
> assume the first "node:" is an index into some internal data
structure
> (slot map ?, heartbeat region ?) while the second "node" column
shows
> the actual node number as given in cluster.conf
> 
> Now a second node mounts the shared file system again as 4k block device:
> 
> hb
>          node: node              seq       generation checksum
>             8:    1 0000000059bfd36a 00bfa1b63f30e494 d4f79d63
>            16:    2 0000000059bfd369 7acf8521da342228 4b8cd74d
> 
> As it actually happened in my setup of a two node cluster with 2
> hypervisors and  3 virtual machines on top of each (8 nodes in total),
> when mounting the fs on the first virtual machine with node number 3 we
get:
> 
> hb
>          node: node              seq       generation checksum
>             3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
>             8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
>            16:    2 0000000059bfd413 7acf8521da342228 cd48c018
> 
> Uhm, ... wait ... 3 ??
> 
> Mounting on further VMs (nodes 4, 5, 6 and 7) leads to:
> 
> hb
>          node: node              seq       generation checksum
>             3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
>             4:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
>             5:    5 0000000059bfd414 529a98c758325d5b 60080c42
>             6:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
>             7:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
>             8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
>            16:    2 0000000059bfd413 7acf8521da342228 cd48c018
> 
> Up to this point I did not notice any error or warning in the machines'
> console or kernel logs.
> 
> And then trying to mount on node 8 finally there's an error:
> 
> kern.log node 1:
> 
> (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
> heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
> 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)
> 
> kern.log node 8:
> 
> ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
> (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
> heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
> 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)
> 
> (actual seq and generation are not from above hb debug dump)
> 
> Now we have a conflict on slot 8.
> 
> When I encountered this error for the first time, I didn't know about
> heartbeat debug info, slot maps or heartbeat regions and had no idea
> what might have gone wrong so I started experimenting and found a
> "solution" by swapping nodes 1 <-> 8 in cluster.conf. This
leads to the
> following layout of the heartbeat region (?):
> 
> hb
>          node: node              seq       generation checksum
>             1:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
>             3:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
>             4:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
>             5:    5 0000000059bfd414 529a98c758325d5b 60080c42
>             6:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
>             7:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
>            16:    2 0000000059bfd413 7acf8521da342228 cd48c018
>            64:    8 0000000059bfd413 73a63eb550a33095 f4e074d1
> 
> Voila - all 8 nodes mounted, problem solved - let's continue with
> getting this cluster ready for production ...
> 
> As it turned out this was in no way a stable configuration in that after
> few weeks spurious reboots (fencing peer) started to happen (drbd losing
> its replication connection, all kinds of weird kernel oopses and panics
> from drbd and ocfs2). Reboots were usually preceded by burst of errors
like:
> 
> Sep 11 00:01:27 web1 kernel: [ 9697.644436]
> (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
> sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
> 0x59b5b635), ondisk(3:0x743493e99d19e721, 0x59b5b633)
> Sep 11 00:03:43 web1 kernel: [ 9833.918668]
> (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
> sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
> 0x59b5b6bd), ondisk(3:0x743493e99d19e721, 0x59b5b6bb)
> Sep 11 00:03:45 web1 kernel: [ 9835.920551]
> (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
> sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
> 0x59b5b6bf), ondisk(3:0x743493e99d19e721, 0x59b5b6bb)
> Sep 11 00:09:10 web1 kernel: [10160.576453]
> (o2hb-10254DCA50,515,0):o2hb_check_own_slot:582 ERROR: Heartbeat
> sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
> 0x59b5b804), ondisk(3:0x743493e99d19e721, 0x59b5b802)
> 
> In the end the ocfs2 filesystem had to be rebuilt to get rid of the
> errors. It went ok for a while before the same symptoms of fs corruption
> came back again.
> 
> To make a long story short: we found out that the virtual machines did
> not see the disk device having 4k sectors but the standard 512 byte
> blocks. So we had what I coined a "mixed mount" of the same ocfs2
file
> system: 2 nodes mounted with 4k phys. block size the other 6 nodes
> mounted w/ 512 byte block size.
> 
> Configuring the VMs with:
> 
> <blockio logical_block_size='4096'
physical_block_size='4096'/>
> 
> leads to a heartbeat slot map:
> 
> hb
>          node: node              seq       generation checksum
>             8:    1 0000000059bfd412 00bfa1b63f30e494 e782d86e
>            16:    2 0000000059bfd413 7acf8521da342228 cd48c018
>            24:    3 0000000059bfd413 59eb77b4db07884b 87a5057d
>            32:    4 0000000059bfd413 debf95d5ff50dc10 3839c791
>            40:    5 0000000059bfd414 529a98c758325d5b 60080c42
>            48:    6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d
>            56:    7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275
>            64:    8 0000000059bfd413 73a63eb550a33095 f4e074d1Could you please also provide information about *slot_map*, just type 
"slotmap" in debugfs.ocfs2 tool. This will be helpful to analysis your
case.

Please also paste output generated by :
cat /sys/kernel/config/cluster/<you cluster name>/heartbeat/<file
system
UUID>
So we see how your cluster is configured.
Files like block_bytes, blocks and start_block are preferred.

> 
> Operation is stable so far. No 'Heartbeat sequence mismatch'
errors.
> Still strange the "times 8" values in column "node:"
but this may be a
> purely aesthetical issue.I suppose this is because debugfs.ocfs2 *assumes* that block devices are 
all 512 bytes formatted.
Perhaps we can improve this.
> 
> Browsing the code of heartbeat.c I'm not sure if such a "mixed
mount" is
> *supposed* to work and it's just a minor bug we triggered that can
> easily be fixed - or if such a scenario is a definite no-no and should
> seriously be avoided. In the latter case an error message and cancelling
> of an inappropriate mount operation would be very helpful.
> 
> Anyway, it would be greatly appreciated to hear a knowledgeable opinion
> from the members of the ocfs2-devel list on this topic - any takers?
> 
> Thanks in advance + Best regards ... Michael
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>

Ocfs2 devel - Sep 2017 - Mixed mounts w/ different physical block sizes (long post)

[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?

[Ocfs2-users] Mixed mounts w/ different physical block sizes (long post)

[Ocfs2-devel] Mixed mounts w/ different physical block sizes (long post)