Michael Ulbrich
2017-Sep-14 18:58 UTC
[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?
Hi again, I made some progress with debugging the situation. To recap: 2 ocfs2 file systems: /dev/drbd0 -> lvm -> RAID1 from 2 x 600 GB SAS disks /dev/drbd1 -> lvm -> RAID1 from 2 x 6 TB NL (Near-Line) SAS disks This is configured identically on 2 DELL R 530 servers (node 1 + 2 as hypervisors). Disks are connected via PERC H730 mini (Linux kernel driver: megaraid_sas ver. 06.811.02.00-rc1). drbd has a private GigE link for replication traffic. Both hypervisors run 3 virtual machines each. /dev/drbd0 works as expected as long as it's allocated on the 600 GB RAID 1. If it's moved to the large 6 TB RAID1 device the behaviuor gets identical to /dev/drbd1. As described in my previous post there's an unusual slot (?) numbering which prevents the mount of the ocfs2 file system /dev/drbd1 on node 8. As a quick fix we could swap node numbers 1 <-> 8 in cluster.conf. But this does not address the underlying problem as we will soon see. In deliberately formatted form the list of nodes looks as follows: node (number = 8, name = h1a) - Hypervisor node (number = 2, name = h1b) - Hypervisor node (number = 3, name = web1) - Guest 1 on h1a node (number = 4, name = db1) - Guest 2 on h1a node (number = 5, name = srv1) - Guest 3 on h1a node (number = 6, name = web2) - Guest 4 on h1b node (number = 7, name = db2) - Guest 5 on h1b node (number = 1, name = srv2) - Guest 6 on h1b Now node 8 is the first (Hypervisor) node to mount /dev/drbd1 which leads to ('watch -d -n 1 "echo \"hb\" | debugfs.ocfs2 -n /dev/drbd1"): hb node: node seq generation checksum 64: 8 0000000059b8d9ba 73a63eb550a33095 f4e074d1 Node 2 is the second (Hypervisor) node to mount: hb node: node seq generation checksum 16: 2 0000000059b8d9b9 5c7504c05637983e 07d696ec 64: 8 0000000059b8d9ba 73a63eb550a33095 f4e074d1 Again we see the strange "* 8" or "shift left 3" relationship between columns "node:" and "node". Now the guests are brought up and mount the file system in order 3, 5, 6, 1 (I don't have the actual seq / gen values, so from memory): hb node: node seq generation checksum 1: 1 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx 3: 3 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx 5: 5 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx 6: 6 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxx 16: 2 0000000059b8d9b9 5c7504c05637983e 07d696ec 64: 8 0000000059b8d9ba 73a63eb550a33095 f4e074d1 Please note that the virtual machines get assigned the corresponding "node:" = "node" values as expected. Now we went a step further and enabled tracing: "debugfs.ocfs2 -l HEARTBEAT allow". This periodically logs messages from the heartbeat threads of the individual file systems. For the file system /dev/drbd1 we get on the hypervisors: (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 1 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 2 gen 0x98be08e71122efed cksum 0x33a84ac0 seq 1505346907 last 1505346907 changed 1 equal 0 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 3 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 5 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 6 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 1544 (o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 8 gen 0x551934cc4ba0b1bf cksum 0xf606e2be seq 1505346907 last 1505346907 changed 1 equal 0 We only see the hypervisors heartbeating in slots 2 and 8 although 4 additional guests have also mounted the same file system. Tracing the ocfs2 heartbeat on one of the guests (web1) gives the following: (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 1 gen 0xd1f96dee2509bc73 cksum 0x1dc10931 seq 1505371587 last 1505371587 changed 1 equal 0 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 2 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 13674 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 3 gen 0x5d8c200c0113510f cksum 0xbfc95a14 seq 1505371590 last 1505371590 changed 1 equal 0 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 13674 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 5 gen 0x39a8da3bae49161b cksum 0x49b4a110 seq 1505371588 last 1505371588 changed 1 equal 0 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 6 gen 0xc00a0ba3931ad15 cksum 0x92625e99 seq 1505371587 last 1505371587 changed 1 equal 0 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 13674 (o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 8 gen 0x0 cksum 0x0 seq 0 last 0 changed 0 equal 13674 Here we have the mirrored situation in that only guests are seen heartbeating in slots 1, 3, 5 and 6. No trace of the hypervisors in slots 2 (16) and 8 (64) ... Uhm, well ... trying to wrap my head around this ... :-) Ok, so let's dig a little deeper. Watching the init of a mount operation with HEARTBEAT tracing enabled gives different results depending on whether the same file system is mounted on a hypervisor: (mount.ocfs2,28264,0):o2hb_init_region_params:1729 hr_start_block = 273, hr_blocks = 255 (mount.ocfs2,28264,0):o2hb_init_region_params:1731 hr_block_bytes = 4096, *hr_block_bits = 12* (mount.ocfs2,28264,0):o2hb_init_region_params:1732 hr_timeout_ms = 2000 (mount.ocfs2,28264,0):o2hb_init_region_params:1733 dead threshold = 31 (mount.ocfs2,28264,0):o2hb_map_slot_data:1764 Going to require 255 pages to cover 255 blocks at 1 blocks per page (o2hb-C27AC49D2B,28265,10):o2hb_thread:1221 hb thread running ... or on a guest: (mount.ocfs2,3505,1):o2hb_init_region_params:1729 hr_start_block = 2184, hr_blocks = 255 (mount.ocfs2,3505,1):o2hb_init_region_params:1731 hr_block_bytes = 512, *hr_block_bits = 9* (mount.ocfs2,3505,1):o2hb_init_region_params:1732 hr_timeout_ms = 2000 (mount.ocfs2,3505,1):o2hb_init_region_params:1733 dead threshold = 31 (mount.ocfs2,3505,1):o2hb_map_slot_data:1764 Going to require 32 pages to cover 255 blocks at 8 blocks per page (o2hb-C27AC49D2B,3506,0):o2hb_thread:1221 hb thread running So on the hypervisors we have -> hr_block_bytes = 4096, hr_block_bits = 12 and on the virtual machines -> hr_block_bytes = 512, hr_block_bits = 9 There we find BTW the factor of 8 in the different block size and a 3 bit length difference in hr_block_bits. Is this ok? From my limited understanding I would have expected that the nodes mounting the shared file system would share the heartbeat system file and inside of that share a common data structure (hearbeat region). But then the nodes should have a common understanding of the size of this structure, right? Here it looks as if the hypervisors are interacting with a "large" hr structure while the guests do the same but on a "small" heartbeat region. That would be an explanation that heartbeat threads from hypervisors and guests do not "see" each other as described above. Or are the heartbeat regions in-memory structures of different size which get translated into a common "disk" structure when being written to the hb system file? It would be great if you could give me a little guidance here. If it actually is a bug I'm willing to work on this further. If I'm heading in the wrong direction just give me a short note and probably a hint what's wrong with my setup or which ocfs2 version might fix this problem. Thanks a lot + Regards from Berlin ... Michael U. On 09/13/2017 09:54 AM, Michael Ulbrich wrote:> Hi all, > > we've a small (?) problem with a 2-node cluster on Debian 8: > > Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26) > x86_64 GNU/Linux > > ocfs2-tools 1.6.4-3 > > Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6 > slots) are created on top of drbd w/ 4k block and cluster size, > 'max_features' enabled. > > cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the > hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the > corresponding VMs on node 2. > > VMs all run Debian 8 as well: > > Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64 > GNU/Linux > > When mounting drbd0 in order of increasing node numbers and concurrently > watching the 'hb' output from debugsfs.ocfs2 we get a clean slot map (?): > > hb > node: node seq generation checksum > 1: 1 0000000059b8d94a fa60f0d8423590d9 edec9643 > 2: 2 0000000059b8d94c aca059df4670f467 994e3458 > 3: 3 0000000059b8d949 f03dc9ba8f27582c d4473fc2 > 4: 4 0000000059b8d94b df5bbdb756e757f8 12a198eb > 5: 5 0000000059b8d94a 1af81d94a7cb681b 91fba906 > 6: 6 0000000059b8d94b 104538f30cdb35fa 8713e798 > 7: 7 0000000059b8d94b 195658c9fb8ca7f9 5e54edf6 > 8: 8 0000000059b8d949 dc6bfb46b9cf1ac3 de7a8757 > > Device drbd1 in contrast yields the following table after mounting on > nodes 1, 2: > > hb > node: node seq generation checksum > 8: 1 0000000059b8d9ba 73a63eb550a33095 f4e074d1 > 16: 2 0000000059b8d9b9 5c7504c05637983e 07d696ec > > Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to: > > hb > node: node seq generation checksum > 3: 3 0000000059b8da3b 9443b4b209b16175 f2cc87ec > 5: 5 0000000059b8da3c 4b742f709377466f 3ac41cf3 > 6: 6 0000000059b8da3b d96e2de0a55514f6 335a4d90 > 8: 1 0000000059b8da3c 73a63eb550a33095 2312c1c4 > 16: 2 0000000059b8da3d 5c7504c05637983e 659571a1 > > The problem arises when trying to mount node 8 since its slot is already > occupied by node 1: > > kern.log node 1: > > (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is > heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c, > 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b) > > kern.log node 8: > > ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode. > (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is > heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c, > 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c) > > This can be "fixed" by exchanging node numbers 1 and 8 in cluster.conf. > Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as > expected. There is no node 16 configured so there's no conflict. But > since we experience some other so far not explainable instabilities with > this ocfs2 device / system during operation further down the road we > decided to take care of and try to fix this issue first. > > Somehow the failure reminds of bit shift or masking problems: > > 1 << 3 = 8 > 2 << 3 = 16 > > But then again - what do I know ... > > Tried so far: > > A. Create offending file system with 8 slots instead of 6 -> same issue. > B. Set features to 'default' (disables feature 'extended-slotmap') -> > same issue. > > We'd very much appreciate any comments on this. Has anything similar > ever been experienced before? Are we completely missing something > important here? > > If there's a fix already out for this any pointers (src files / commits) > to where to look would be greatly appreciated. > > Thanks in advance + Best regards ... Michael U.
Michael Ulbrich
2017-Sep-18 15:43 UTC
[Ocfs2-users] Mixed mounts w/ different physical block sizes (long post)
Hi again, chatting with a helpful person on #ocfs2 IRC channel this morning I got encouraged to cross-post to ocsf2-devel. For historic background and further details pls. see my two previous posts to ocfs2-users from last week which are unanswered so far. According to my current state of inspection I changed the topic from "Node 8 doesn't mount / Wrong slot map assignment" to the current "Mixed mounts ..." Here we go: I've learnt that large hard disks in increasing number come formatted w/ 4k physical blocks size. Now I've created an ocfs2 shared file system on top of drbd on a RAID1 of two 6 TB disks with such 4k physical block size. File system creation was done on a hypervisor which actually saw the device as having 4k physical sector size. I'm using the default o2cb cluster stack. Version is ocfs2 1.6.4 on stock Debian 8. A node (numbered "1" in cluster.conf) which mounts this device with 4k phys. blocks leads to a strange "times 8" numbering when checking heartbeat debug info with 'echo "hb" | debugfs.ocfs2 -n /dev/drbd1': hb node: node seq generation checksum 8: 1 0000000059bfd253 00bfa1b63f30e494 c518c55a I'm not sure why the first 2 columns are named "node:" and "node" but assume the first "node:" is an index into some internal data structure (slot map ?, heartbeat region ?) while the second "node" column shows the actual node number as given in cluster.conf Now a second node mounts the shared file system again as 4k block device: hb node: node seq generation checksum 8: 1 0000000059bfd36a 00bfa1b63f30e494 d4f79d63 16: 2 0000000059bfd369 7acf8521da342228 4b8cd74d As it actually happened in my setup of a two node cluster with 2 hypervisors and 3 virtual machines on top of each (8 nodes in total), when mounting the fs on the first virtual machine with node number 3 we get: hb node: node seq generation checksum 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 Uhm, ... wait ... 3 ?? Mounting on further VMs (nodes 4, 5, 6 and 7) leads to: hb node: node seq generation checksum 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d 4: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 5: 5 0000000059bfd414 529a98c758325d5b 60080c42 6: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d 7: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 Up to this point I did not notice any error or warning in the machines' console or kernel logs. And then trying to mount on node 8 finally there's an error: kern.log node 1: (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c, 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b) kern.log node 8: ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode. (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c, 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c) (actual seq and generation are not from above hb debug dump) Now we have a conflict on slot 8. When I encountered this error for the first time, I didn't know about heartbeat debug info, slot maps or heartbeat regions and had no idea what might have gone wrong so I started experimenting and found a "solution" by swapping nodes 1 <-> 8 in cluster.conf. This leads to the following layout of the heartbeat region (?): hb node: node seq generation checksum 1: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d 4: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 5: 5 0000000059bfd414 529a98c758325d5b 60080c42 6: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d 7: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 64: 8 0000000059bfd413 73a63eb550a33095 f4e074d1 Voila - all 8 nodes mounted, problem solved - let's continue with getting this cluster ready for production ... As it turned out this was in no way a stable configuration in that after few weeks spurious reboots (fencing peer) started to happen (drbd losing its replication connection, all kinds of weird kernel oopses and panics from drbd and ocfs2). Reboots were usually preceded by burst of errors like: Sep 11 00:01:27 web1 kernel: [ 9697.644436] (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, 0x59b5b635), ondisk(3:0x743493e99d19e721, 0x59b5b633) Sep 11 00:03:43 web1 kernel: [ 9833.918668] (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, 0x59b5b6bd), ondisk(3:0x743493e99d19e721, 0x59b5b6bb) Sep 11 00:03:45 web1 kernel: [ 9835.920551] (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, 0x59b5b6bf), ondisk(3:0x743493e99d19e721, 0x59b5b6bb) Sep 11 00:09:10 web1 kernel: [10160.576453] (o2hb-10254DCA50,515,0):o2hb_check_own_slot:582 ERROR: Heartbeat sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, 0x59b5b804), ondisk(3:0x743493e99d19e721, 0x59b5b802) In the end the ocfs2 filesystem had to be rebuilt to get rid of the errors. It went ok for a while before the same symptoms of fs corruption came back again. To make a long story short: we found out that the virtual machines did not see the disk device having 4k sectors but the standard 512 byte blocks. So we had what I coined a "mixed mount" of the same ocfs2 file system: 2 nodes mounted with 4k phys. block size the other 6 nodes mounted w/ 512 byte block size. Configuring the VMs with: <blockio logical_block_size='4096' physical_block_size='4096'/> leads to a heartbeat slot map: hb node: node seq generation checksum 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 24: 3 0000000059bfd413 59eb77b4db07884b 87a5057d 32: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 40: 5 0000000059bfd414 529a98c758325d5b 60080c42 48: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d 56: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 64: 8 0000000059bfd413 73a63eb550a33095 f4e074d1 Operation is stable so far. No 'Heartbeat sequence mismatch' errors. Still strange the "times 8" values in column "node:" but this may be a purely aesthetical issue. Browsing the code of heartbeat.c I'm not sure if such a "mixed mount" is *supposed* to work and it's just a minor bug we triggered that can easily be fixed - or if such a scenario is a definite no-no and should seriously be avoided. In the latter case an error message and cancelling of an inappropriate mount operation would be very helpful. Anyway, it would be greatly appreciated to hear a knowledgeable opinion from the members of the ocfs2-devel list on this topic - any takers? Thanks in advance + Best regards ... Michael
Changwei Ge
2017-Sep-19 03:32 UTC
[Ocfs2-devel] Mixed mounts w/ different physical block sizes (long post)
Hi Michael, On 2017/9/18 23:45, Michael Ulbrich wrote:> Hi again, > > chatting with a helpful person on #ocfs2 IRC channel this morning I got > encouraged to cross-post to ocsf2-devel. For historic background and > further details pls. see my two previous posts to ocfs2-users from last > week which are unanswered so far. > > According to my current state of inspection I changed the topic from > > "Node 8 doesn't mount / Wrong slot map assignment" to the current "Mixed > mounts ..." > > Here we go: > > I've learnt that large hard disks in increasing number come formatted w/ > 4k physical blocks size. > > Now I've created an ocfs2 shared file system on top of drbd on a RAID1 > of two 6 TB disks with such 4k physical block size. File system creation > was done on a hypervisor which actually saw the device as having 4k > physical sector size. > > I'm using the default o2cb cluster stack. Version is ocfs2 1.6.4 on > stock Debian 8. > > A node (numbered "1" in cluster.conf) which mounts this device with 4k > phys. blocks leads to a strange "times 8" numbering when checking > heartbeat debug info with 'echo "hb" | debugfs.ocfs2 -n /dev/drbd1': > > hb > node: node seq generation checksum > 8: 1 0000000059bfd253 00bfa1b63f30e494 c518c55a > > I'm not sure why the first 2 columns are named "node:" and "node" but > assume the first "node:" is an index into some internal data structure > (slot map ?, heartbeat region ?) while the second "node" column shows > the actual node number as given in cluster.conf > > Now a second node mounts the shared file system again as 4k block device: > > hb > node: node seq generation checksum > 8: 1 0000000059bfd36a 00bfa1b63f30e494 d4f79d63 > 16: 2 0000000059bfd369 7acf8521da342228 4b8cd74d > > As it actually happened in my setup of a two node cluster with 2 > hypervisors and 3 virtual machines on top of each (8 nodes in total), > when mounting the fs on the first virtual machine with node number 3 we get: > > hb > node: node seq generation checksum > 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d > 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e > 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 > > Uhm, ... wait ... 3 ?? > > Mounting on further VMs (nodes 4, 5, 6 and 7) leads to: > > hb > node: node seq generation checksum > 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d > 4: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 > 5: 5 0000000059bfd414 529a98c758325d5b 60080c42 > 6: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d > 7: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 > 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e > 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 > > Up to this point I did not notice any error or warning in the machines' > console or kernel logs. > > And then trying to mount on node 8 finally there's an error: > > kern.log node 1: > > (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is > heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c, > 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b) > > kern.log node 8: > > ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode. > (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is > heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c, > 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c) > > (actual seq and generation are not from above hb debug dump) > > Now we have a conflict on slot 8. > > When I encountered this error for the first time, I didn't know about > heartbeat debug info, slot maps or heartbeat regions and had no idea > what might have gone wrong so I started experimenting and found a > "solution" by swapping nodes 1 <-> 8 in cluster.conf. This leads to the > following layout of the heartbeat region (?): > > hb > node: node seq generation checksum > 1: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e > 3: 3 0000000059bfd413 59eb77b4db07884b 87a5057d > 4: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 > 5: 5 0000000059bfd414 529a98c758325d5b 60080c42 > 6: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d > 7: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 > 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 > 64: 8 0000000059bfd413 73a63eb550a33095 f4e074d1 > > Voila - all 8 nodes mounted, problem solved - let's continue with > getting this cluster ready for production ... > > As it turned out this was in no way a stable configuration in that after > few weeks spurious reboots (fencing peer) started to happen (drbd losing > its replication connection, all kinds of weird kernel oopses and panics > from drbd and ocfs2). Reboots were usually preceded by burst of errors like: > > Sep 11 00:01:27 web1 kernel: [ 9697.644436] > (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat > sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, > 0x59b5b635), ondisk(3:0x743493e99d19e721, 0x59b5b633) > Sep 11 00:03:43 web1 kernel: [ 9833.918668] > (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat > sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, > 0x59b5b6bd), ondisk(3:0x743493e99d19e721, 0x59b5b6bb) > Sep 11 00:03:45 web1 kernel: [ 9835.920551] > (o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat > sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, > 0x59b5b6bf), ondisk(3:0x743493e99d19e721, 0x59b5b6bb) > Sep 11 00:09:10 web1 kernel: [10160.576453] > (o2hb-10254DCA50,515,0):o2hb_check_own_slot:582 ERROR: Heartbeat > sequence mismatch on device (vdc): expected(3:0x743493e99d19e721, > 0x59b5b804), ondisk(3:0x743493e99d19e721, 0x59b5b802) > > In the end the ocfs2 filesystem had to be rebuilt to get rid of the > errors. It went ok for a while before the same symptoms of fs corruption > came back again. > > To make a long story short: we found out that the virtual machines did > not see the disk device having 4k sectors but the standard 512 byte > blocks. So we had what I coined a "mixed mount" of the same ocfs2 file > system: 2 nodes mounted with 4k phys. block size the other 6 nodes > mounted w/ 512 byte block size. > > Configuring the VMs with: > > <blockio logical_block_size='4096' physical_block_size='4096'/> > > leads to a heartbeat slot map: > > hb > node: node seq generation checksum > 8: 1 0000000059bfd412 00bfa1b63f30e494 e782d86e > 16: 2 0000000059bfd413 7acf8521da342228 cd48c018 > 24: 3 0000000059bfd413 59eb77b4db07884b 87a5057d > 32: 4 0000000059bfd413 debf95d5ff50dc10 3839c791 > 40: 5 0000000059bfd414 529a98c758325d5b 60080c42 > 48: 6 0000000059bfd412 14acfb487fa8c8b8 f54cef9d > 56: 7 0000000059bfd413 4d2d36de0b0d6b2e 3f1ad275 > 64: 8 0000000059bfd413 73a63eb550a33095 f4e074d1Could you please also provide information about *slot_map*, just type "slotmap" in debugfs.ocfs2 tool. This will be helpful to analysis your case. Please also paste output generated by : cat /sys/kernel/config/cluster/<you cluster name>/heartbeat/<file system UUID> So we see how your cluster is configured. Files like block_bytes, blocks and start_block are preferred.> > Operation is stable so far. No 'Heartbeat sequence mismatch' errors. > Still strange the "times 8" values in column "node:" but this may be a > purely aesthetical issue.I suppose this is because debugfs.ocfs2 *assumes* that block devices are all 512 bytes formatted. Perhaps we can improve this.> > Browsing the code of heartbeat.c I'm not sure if such a "mixed mount" is > *supposed* to work and it's just a minor bug we triggered that can > easily be fixed - or if such a scenario is a definite no-no and should > seriously be avoided. In the latter case an error message and cancelling > of an inappropriate mount operation would be very helpful. > > Anyway, it would be greatly appreciated to hear a knowledgeable opinion > from the members of the ocfs2-devel list on this topic - any takers? > > Thanks in advance + Best regards ... Michael > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >