Erik Jacobson
2020-Mar-30 01:01 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
Thank you for replying!! Responses below... I have attached the volume def (meant to before). I have attached a couple logs from one of the leaders.> That's odd. > As far as I know, the client's are accessing one of the gluster nodes that serves as NFS server and then syncs data across the peers ,right?Correct, although in this case, with a 1x3, all of them should have local copies. Our first reports came in from 3x3 (9 server) systems but we have been able to duplicate on 1x3 thankfully in house. This is a huge step forward as I had no reproducer previously.> What happens when the virtual IP(s) are failed over to the other gluster node? Is the issue resolved?While we do use CTDB for managing the IPs aliases, I don't start the test until the IP is stabilized. I put all 76 nodes on one IP alias to make a more similar load to what we have in the field. I think it is important to point out that if I reduce the load, all is well. For examples, if the test were just booting -- where the initial reports were seen -- just 1 or 2 nodes out of 1,000 would have an issue each cycle. They all boot the same way and are all using the same IP alias for NFS in my test case. So I think the split-brain messages are maybe a symptom of some sort of timeout ??? (making stuff up here).> Also, what kind of load balancing are you using ?[I moved this question up because the below answer has too much output] We are doing very simple balancing - manual balancing. As we add compute nodes to the cluster, a couple racks are assigned to IP alias #1, the next couple to IP alias #2, and so on. I'm happy to not have the complexity of a real load balancer right now.> Do you get any split brain entries via 'gluster volume geal <VOL> info' ?I ran two trials for the 'gluster volume heal ...' Trial 1 - with all 3 servers up and while running the load: [root at leader2 ~]# gluster volume heal cm_shared info Brick 172.23.0.4:/data/brick_cm_shared Status: Connected Number of entries: 0 Brick 172.23.0.5:/data/brick_cm_shared Status: Connected Number of entries: 0 Brick 172.23.0.6:/data/brick_cm_shared Status: Connected Number of entries: 0 Trial 2 - with 1 server down (stopped glusterd on 1 server) - and without doing any testing yet -- I see this. Let me explain though - not in the error path, I am using RW NFS filesystem image blobs on this same volume for the writable areas of the node. In the field, we duplicate the problem with using TMPFS for that writable area. I am happy to re-do the test with RO NFS and TMPFS for writable, which my GUESS says the healing messages would go away. Would that help? If you look at the heal count -- 76 -- that equals the node count - the number of writable XFS image files using for writing for each node. [root at leader2 ~]# gluster volume heal cm_shared info Brick 172.23.0.4:/data/brick_cm_shared Status: Transport endpoint is not connected Number of entries: - Brick 172.23.0.5:/data/brick_cm_shared <gfid:b9412b45-d380-4789-a335-af5af33bde24> <gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> <gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> <gfid:8847f8a4-5509-463d-ac49-836bf921858c> <gfid:a35fef6b-9174-495f-a661-d9837a1243ac> <gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> <gfid:5ea92161-c91a-4d51-877c-a3362966e850> <gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> Status: Connected Number of entries: 8 Brick 172.23.0.6:/data/brick_cm_shared <gfid:b9412b45-d380-4789-a335-af5af33bde24> <gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> <gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> <gfid:8847f8a4-5509-463d-ac49-836bf921858c> <gfid:a35fef6b-9174-495f-a661-d9837a1243ac> <gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> <gfid:5ea92161-c91a-4d51-877c-a3362966e850> <gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> Status: Connected Number of entries: 8 Trial 3 - ran the heal command around the time the split-brain errors were being reported [root at leader2 glusterfs]# gluster volume heal cm_shared info Brick 172.23.0.4:/data/brick_cm_shared Status: Transport endpoint is not connected Number of entries: - Brick 172.23.0.5:/data/brick_cm_shared <gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> <gfid:b9412b45-d380-4789-a335-af5af33bde24> <gfid:08aff8a9-2818-44d6-a67d-d08c7894c496> <gfid:8847f8a4-5509-463d-ac49-836bf921858c> <gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> <gfid:cd896244-f7e9-41ad-8510-d1fe5d0bf836> <gfid:611fa1e0-dc0d-4ddc-9273-6035e51e1acf> <gfid:686581b2-7515-4d0a-a1c8-369f01f60ecd> <gfid:875e893b-f2ed-4805-95fd-6955ea310757> <gfid:eb4203eb-06a4-4577-bddb-ba400d5cc7c7> <gfid:4dd86ddd-aca3-403f-87eb-03a9c8116993> <gfid:70c90d83-9fb7-4e8e-ac1b-592c4d2b1df8> <gfid:de9de454-a8f4-4c3f-b8b8-b28b0c444e31> <gfid:c44b7d98-f83b-4498-aa43-168ce4e35d52> <gfid:61fde2e7-1898-4e5b-8b7f-f9702b595d3a> <gfid:e44fd656-62a6-4c06-bafc-66de0ec99022> <gfid:04aa47b5-52fa-47d0-9b5f-a39bc95eb1fe> <gfid:6357f8f6-aa5b-40b8-a0f4-6c3366ff4fc2> <gfid:19728e57-2cc9-4c3a-bb45-e72bc59f3e60> <gfid:6e1fd334-43a7-4410-b3ef-6566d41d8574> <gfid:d3b423da-484f-44a6-91d9-365e313bb2ef> <gfid:da5215c1-565d-4419-beec-db50791de4c4> <gfid:ff8348dc-8acc-40d5-a0ed-f9b3b5ba61ae> <gfid:54523a5e-ccd7-4464-806e-3897f297b749> <gfid:7bf00945-7b9a-46bb-8c73-bc233c644ca5> <gfid:67ac7750-0b3c-4f88-aa8f-222183d39690> <gfid:4f4da7fa-819d-45a4-bdb9-a81374b6df86> <gfid:1b69ff6c-1dcc-4a9b-8c54-d4146cdfdd6c> <gfid:e3bfb26e-7987-45cb-8824-99b353846c12> <gfid:18b777df-72dc-4a57-a8a3-f22b54ceac3e> <gfid:b6994926-5788-492b-8224-3a02100be9a2> <gfid:434bb8e9-75a7-4670-960f-fefa6893da68> <gfid:d4c4bc62-705b-405d-a4b4-941f8e55e5d2> <gfid:f9d3580b-7b24-4061-819d-d62978fd35d0> <gfid:14f73281-21c9-4830-9a39-1eb6998eb955> <gfid:26d87a63-8318-4b6c-9093-4817cacc76ef> <gfid:ff38c782-b28c-46cc-8a6b-e93a2c4d504f> <gfid:7dbd1e30-c4e0-4b19-8f0d-d4ef9199b89f> <gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> <gfid:23edaf65-7f90-47a7-bc1b-cccaf6648543> <gfid:cf46ac85-50a8-4660-8a2f-564e4825f93e> <gfid:27dbd511-cf7d-4fa8-bd98-dd2006a0a06b> <gfid:76661586-1cc6-421d-b0ad-081c105b6532> <gfid:4db3cac8-1fdb-4b52-9647-dd3979907773> <gfid:3ffbc798-7733-4ef6-a253-3dc5259c20aa> <gfid:ea2af645-29ef-4911-9dfd-0409ae1df5a5> <gfid:a35fef6b-9174-495f-a661-d9837a1243ac> <gfid:5ea92161-c91a-4d51-877c-a3362966e850> <gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> <gfid:036e02e5-1062-4b48-a090-6e363454aac5> <gfid:8312c8bd-18c7-449e-8482-16320f3ee8e9> <gfid:15cfabab-e8df-4cad-b883-b80861ee5775> <gfid:f804bfed-b17a-4abf-a104-26b01569609b> <gfid:77253670-1791-4910-9f0d-38c2b1ec0f17> <gfid:ca502545-5ca2-4db6-baf4-b2eb0e4176f6> <gfid:964ca255-b2e2-45e4-bb86-51d3e8a4c3f4> <gfid:7bcfaddd-a65c-41f5-919b-8fb8b501f735> <gfid:f884e860-6d3e-4597-9249-da0fc17c456f> <gfid:5960eb89-3ca1-4d9e-8c13-16f0ee4822e3> <gfid:361517df-19a8-4e43-b601-7822f7e36ef8> <gfid:09e8541b-a150-41da-aff8-3764c33635ba> <gfid:b30f6bdb-e439-44c5-bd4c-143486439091> <gfid:ae983848-3ba9-4f72-ab0c-d309f96d2678> <gfid:9cddb5cd-a721-4d63-9522-7546a9c01303> <gfid:91c1b906-14a5-4fe1-8103-91d14134706b> <gfid:55cc28b7-80f1-428e-9334-5a0742cce1c6> <gfid:8183219a-6d4a-4369-82dc-2233e2eba656> <gfid:6cfacb2b-e247-488b-adde-e00dfc0c25f8> <gfid:5184933d-6470-47dc-a010-6f7cb5661160> <gfid:66e6842c-fe87-4797-8a01-a9b0a4124cde> <gfid:55884d32-2e3f-42ba-a173-6c9362e331e2> <gfid:47b9316a-7896-4efd-8704-3acdde6f2cb8> <gfid:3a3013e2-06dd-41c7-b759-bd9e945c9743> <gfid:3dc0b834-6f3e-409a-9549-f015f3b66af1> <gfid:c9fc97ce-f8bc-42b2-b428-37953c172a30> <gfid:a94ac3ba-9777-4844-892b-5526c00f2f7b> Status: Connected Number of entries: 76 Brick 172.23.0.6:/data/brick_cm_shared <gfid:9cddb5cd-a721-4d63-9522-7546a9c01303> <gfid:4f4da7fa-819d-45a4-bdb9-a81374b6df86> <gfid:91c1b906-14a5-4fe1-8103-91d14134706b> <gfid:55cc28b7-80f1-428e-9334-5a0742cce1c6> <gfid:18b777df-72dc-4a57-a8a3-f22b54ceac3e> <gfid:b6994926-5788-492b-8224-3a02100be9a2> <gfid:6cfacb2b-e247-488b-adde-e00dfc0c25f8> <gfid:5184933d-6470-47dc-a010-6f7cb5661160> <gfid:d4c4bc62-705b-405d-a4b4-941f8e55e5d2> <gfid:f9d3580b-7b24-4061-819d-d62978fd35d0> <gfid:14f73281-21c9-4830-9a39-1eb6998eb955> <gfid:26d87a63-8318-4b6c-9093-4817cacc76ef> <gfid:ff38c782-b28c-46cc-8a6b-e93a2c4d504f> <gfid:7dbd1e30-c4e0-4b19-8f0d-d4ef9199b89f> <gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> <gfid:23edaf65-7f90-47a7-bc1b-cccaf6648543> <gfid:cf46ac85-50a8-4660-8a2f-564e4825f93e> <gfid:27dbd511-cf7d-4fa8-bd98-dd2006a0a06b> <gfid:76661586-1cc6-421d-b0ad-081c105b6532> <gfid:4db3cac8-1fdb-4b52-9647-dd3979907773> <gfid:3ffbc798-7733-4ef6-a253-3dc5259c20aa> <gfid:ea2af645-29ef-4911-9dfd-0409ae1df5a5> <gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> <gfid:b9412b45-d380-4789-a335-af5af33bde24> <gfid:a35fef6b-9174-495f-a661-d9837a1243ac> <gfid:08aff8a9-2818-44d6-a67d-d08c7894c496> <gfid:5ea92161-c91a-4d51-877c-a3362966e850> <gfid:8847f8a4-5509-463d-ac49-836bf921858c> <gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> <gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> <gfid:cd896244-f7e9-41ad-8510-d1fe5d0bf836> <gfid:036e02e5-1062-4b48-a090-6e363454aac5> <gfid:611fa1e0-dc0d-4ddc-9273-6035e51e1acf> <gfid:8312c8bd-18c7-449e-8482-16320f3ee8e9> <gfid:686581b2-7515-4d0a-a1c8-369f01f60ecd> <gfid:15cfabab-e8df-4cad-b883-b80861ee5775> <gfid:875e893b-f2ed-4805-95fd-6955ea310757> <gfid:eb4203eb-06a4-4577-bddb-ba400d5cc7c7> <gfid:f804bfed-b17a-4abf-a104-26b01569609b> <gfid:4dd86ddd-aca3-403f-87eb-03a9c8116993> <gfid:77253670-1791-4910-9f0d-38c2b1ec0f17> <gfid:70c90d83-9fb7-4e8e-ac1b-592c4d2b1df8> <gfid:de9de454-a8f4-4c3f-b8b8-b28b0c444e31> <gfid:ca502545-5ca2-4db6-baf4-b2eb0e4176f6> <gfid:c44b7d98-f83b-4498-aa43-168ce4e35d52> <gfid:964ca255-b2e2-45e4-bb86-51d3e8a4c3f4> <gfid:61fde2e7-1898-4e5b-8b7f-f9702b595d3a> <gfid:7bcfaddd-a65c-41f5-919b-8fb8b501f735> <gfid:e44fd656-62a6-4c06-bafc-66de0ec99022> <gfid:04aa47b5-52fa-47d0-9b5f-a39bc95eb1fe> <gfid:f884e860-6d3e-4597-9249-da0fc17c456f> <gfid:6357f8f6-aa5b-40b8-a0f4-6c3366ff4fc2> <gfid:19728e57-2cc9-4c3a-bb45-e72bc59f3e60> <gfid:5960eb89-3ca1-4d9e-8c13-16f0ee4822e3> <gfid:6e1fd334-43a7-4410-b3ef-6566d41d8574> <gfid:361517df-19a8-4e43-b601-7822f7e36ef8> <gfid:d3b423da-484f-44a6-91d9-365e313bb2ef> <gfid:09e8541b-a150-41da-aff8-3764c33635ba> <gfid:da5215c1-565d-4419-beec-db50791de4c4> <gfid:ff8348dc-8acc-40d5-a0ed-f9b3b5ba61ae> <gfid:b30f6bdb-e439-44c5-bd4c-143486439091> <gfid:54523a5e-ccd7-4464-806e-3897f297b749> <gfid:ae983848-3ba9-4f72-ab0c-d309f96d2678> <gfid:7bf00945-7b9a-46bb-8c73-bc233c644ca5> <gfid:67ac7750-0b3c-4f88-aa8f-222183d39690> <gfid:1b69ff6c-1dcc-4a9b-8c54-d4146cdfdd6c> <gfid:e3bfb26e-7987-45cb-8824-99b353846c12> <gfid:8183219a-6d4a-4369-82dc-2233e2eba656> <gfid:434bb8e9-75a7-4670-960f-fefa6893da68> <gfid:66e6842c-fe87-4797-8a01-a9b0a4124cde> <gfid:55884d32-2e3f-42ba-a173-6c9362e331e2> <gfid:47b9316a-7896-4efd-8704-3acdde6f2cb8> <gfid:3a3013e2-06dd-41c7-b759-bd9e945c9743> <gfid:3dc0b834-6f3e-409a-9549-f015f3b66af1> <gfid:c9fc97ce-f8bc-42b2-b428-37953c172a30> <gfid:a94ac3ba-9777-4844-892b-5526c00f2f7b> Status: Connected Number of entries: 76 -------------- next part -------------- Volume Name: cm_shared Type: Replicate Volume ID: f6175f56-8422-4056-9891-f9ba84756b87 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.23.0.4:/data/brick_cm_shared Brick2: 172.23.0.5:/data/brick_cm_shared Brick3: 172.23.0.6:/data/brick_cm_shared Options Reconfigured: nfs.event-threads: 3 config.brick-threads: 16 config.client-threads: 16 performance.iot-pass-through: false config.global-threading: off performance.client-io-threads: on nfs.disable: off storage.fips-mode-rchecksum: on transport.address-family: inet features.cache-invalidation: on features.cache-invalidation-timeout: 600 cluster.lookup-optimize: on client.event-threads: 32 server.event-threads: 32 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 1000000 performance.io-thread-count: 32 performance.cache-size: 8GB performance.parallel-readdir: on cluster.lookup-unhashed: auto performance.flush-behind: on performance.aggregate-size: 2048KB performance.write-behind-trickling-writes: off transport.listen-backlog: 16384 performance.write-behind-window-size: 1024MB server.outstanding-rpc-limit: 1024 nfs.outstanding-rpc-limit: 1024 nfs.acl: on storage.max-hardlinks: 0 performance.cache-refresh-timeout: 60 performance.md-cache-statfs: off performance.nfs.io-cache: on nfs.mount-rmtab: /- nfs.nlm: off nfs.export-volumes: on nfs.export-dirs: on nfs.exports-auth-enable: on nfs.auth-refresh-interval-sec: 360 nfs.auth-cache-ttl-sec: 360 cluster.favorite-child-policy: none nfs.mem-factor: 15 cluster.choose-local: true network.ping-timeout: 42 cluster.read-hash-mode: 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: nfs.log.xz Type: application/x-xz Size: 4992 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200329/8d6d1423/attachment.xz> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd.log.xz Type: application/x-xz Size: 4080 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200329/8d6d1423/attachment-0001.xz> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd.log.xz Type: application/x-xz Size: 4244 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200329/8d6d1423/attachment-0002.xz> -------------- next part -------------- A non-text attachment was scrubbed... Name: data-brick_cm_shared.log.xz Type: application/x-xz Size: 11376 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200329/8d6d1423/attachment-0003.xz> -------------- next part -------------- Status of volume: cm_shared Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.23.0.5:/data/brick_cm_shared 49153 0 Y 50199 Brick 172.23.0.6:/data/brick_cm_shared 49153 0 Y 59380 Self-heal Daemon on localhost N/A N/A Y 10817 NFS Server on localhost 2049 0 Y 10775 Self-heal Daemon on 172.23.0.5 N/A N/A Y 16645 NFS Server on 172.23.0.5 2049 0 Y 16603 Task Status of Volume cm_shared ------------------------------------------------------------------------------ There are no active volume tasks
Strahil Nikolov
2020-Mar-30 16:25 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On March 30, 2020 4:01:06 AM GMT+03:00, Erik Jacobson <erik.jacobson at hpe.com> wrote:>Thank you for replying!! Responses below... > >I have attached the volume def (meant to before). >I have attached a couple logs from one of the leaders. > >> That's odd. >> As far as I know, the client's are accessing one of the gluster >nodes that serves as NFS server and then syncs data across the peers >,right? > >Correct, although in this case, with a 1x3, all of them should have >local copies. Our first reports came in from 3x3 (9 server) systems but >we have been able to duplicate on 1x3 thankfully in house. This is a >huge step forward as I had no reproducer previously. > >> What happens when the virtual IP(s) are failed over to the other >gluster node? Is the issue resolved? > >While we do use CTDB for managing the IPs aliases, I don't start the >test until >the IP is stabilized. I put all 76 nodes on one IP alias to make a more >similar load to what we have in the field. > >I think it is important to point out that if I reduce the load, all is >well. For examples, if the test were just booting -- where the initial >reports were seen -- just 1 or 2 nodes out of 1,000 would have an issue >each cycle. They all boot the same way and are all using the same IP >alias for NFS in my test case. So I think the split-brain messages are >maybe >a symptom of some sort of timeout ??? (making stuff up here). > >> Also, what kind of load balancing are you using ? >[I moved this question up because the below answer has too much >output] > >We are doing very simple balancing - manual balancing. As we add >compute >nodes to the cluster, a couple racks are assigned to IP alias #1, the >next couple to IP alias #2, and so on. I'm happy to not have the >complexity of a real load balancer right now. > > >> Do you get any split brain entries via 'gluster volume geal <VOL> >info' ? > >I ran two trials for the 'gluster volume heal ...' > >Trial 1 - with all 3 servers up and while running the load: >[root at leader2 ~]# gluster volume heal cm_shared info >Brick 172.23.0.4:/data/brick_cm_shared >Status: Connected >Number of entries: 0 > >Brick 172.23.0.5:/data/brick_cm_shared >Status: Connected >Number of entries: 0 > >Brick 172.23.0.6:/data/brick_cm_shared >Status: Connected >Number of entries: 0 > > >Trial 2 - with 1 server down (stopped glusterd on 1 server) - and >without doing any testing yet -- I see this. Let me explain though - >not in the error path, I am using RW NFS filesystem image blobs on this >same volume for the writable areas of the node. In the field, we >duplicate the problem with using TMPFS for that writable area. I am >happy to re-do the test with RO NFS and TMPFS for writable, which my >GUESS says the healing messages would go away. Would that help? >If you look at the heal count -- 76 -- that equals the node count - the >number of writable XFS image files using for writing for each node. > >[root at leader2 ~]# gluster volume heal cm_shared info >Brick 172.23.0.4:/data/brick_cm_shared >Status: Transport endpoint is not connected >Number of entries: - > >Brick 172.23.0.5:/data/brick_cm_shared ><gfid:b9412b45-d380-4789-a335-af5af33bde24> ><gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> ><gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> ><gfid:8847f8a4-5509-463d-ac49-836bf921858c> ><gfid:a35fef6b-9174-495f-a661-d9837a1243ac> ><gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> ><gfid:5ea92161-c91a-4d51-877c-a3362966e850> ><gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> >Status: Connected >Number of entries: 8 > >Brick 172.23.0.6:/data/brick_cm_shared ><gfid:b9412b45-d380-4789-a335-af5af33bde24> ><gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> ><gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> ><gfid:8847f8a4-5509-463d-ac49-836bf921858c> ><gfid:a35fef6b-9174-495f-a661-d9837a1243ac> ><gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> ><gfid:5ea92161-c91a-4d51-877c-a3362966e850> ><gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> >Status: Connected >Number of entries: 8 > > > >Trial 3 - ran the heal command around the time the split-brain errors >were being reported > > >[root at leader2 glusterfs]# gluster volume heal cm_shared info >Brick 172.23.0.4:/data/brick_cm_shared >Status: Transport endpoint is not connected >Number of entries: - > >Brick 172.23.0.5:/data/brick_cm_shared ><gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> ><gfid:b9412b45-d380-4789-a335-af5af33bde24> ><gfid:08aff8a9-2818-44d6-a67d-d08c7894c496> ><gfid:8847f8a4-5509-463d-ac49-836bf921858c> ><gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> ><gfid:cd896244-f7e9-41ad-8510-d1fe5d0bf836> ><gfid:611fa1e0-dc0d-4ddc-9273-6035e51e1acf> ><gfid:686581b2-7515-4d0a-a1c8-369f01f60ecd> ><gfid:875e893b-f2ed-4805-95fd-6955ea310757> ><gfid:eb4203eb-06a4-4577-bddb-ba400d5cc7c7> ><gfid:4dd86ddd-aca3-403f-87eb-03a9c8116993> ><gfid:70c90d83-9fb7-4e8e-ac1b-592c4d2b1df8> ><gfid:de9de454-a8f4-4c3f-b8b8-b28b0c444e31> ><gfid:c44b7d98-f83b-4498-aa43-168ce4e35d52> ><gfid:61fde2e7-1898-4e5b-8b7f-f9702b595d3a> ><gfid:e44fd656-62a6-4c06-bafc-66de0ec99022> ><gfid:04aa47b5-52fa-47d0-9b5f-a39bc95eb1fe> ><gfid:6357f8f6-aa5b-40b8-a0f4-6c3366ff4fc2> ><gfid:19728e57-2cc9-4c3a-bb45-e72bc59f3e60> ><gfid:6e1fd334-43a7-4410-b3ef-6566d41d8574> ><gfid:d3b423da-484f-44a6-91d9-365e313bb2ef> ><gfid:da5215c1-565d-4419-beec-db50791de4c4> ><gfid:ff8348dc-8acc-40d5-a0ed-f9b3b5ba61ae> ><gfid:54523a5e-ccd7-4464-806e-3897f297b749> ><gfid:7bf00945-7b9a-46bb-8c73-bc233c644ca5> ><gfid:67ac7750-0b3c-4f88-aa8f-222183d39690> ><gfid:4f4da7fa-819d-45a4-bdb9-a81374b6df86> ><gfid:1b69ff6c-1dcc-4a9b-8c54-d4146cdfdd6c> ><gfid:e3bfb26e-7987-45cb-8824-99b353846c12> ><gfid:18b777df-72dc-4a57-a8a3-f22b54ceac3e> ><gfid:b6994926-5788-492b-8224-3a02100be9a2> ><gfid:434bb8e9-75a7-4670-960f-fefa6893da68> ><gfid:d4c4bc62-705b-405d-a4b4-941f8e55e5d2> ><gfid:f9d3580b-7b24-4061-819d-d62978fd35d0> ><gfid:14f73281-21c9-4830-9a39-1eb6998eb955> ><gfid:26d87a63-8318-4b6c-9093-4817cacc76ef> ><gfid:ff38c782-b28c-46cc-8a6b-e93a2c4d504f> ><gfid:7dbd1e30-c4e0-4b19-8f0d-d4ef9199b89f> ><gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> ><gfid:23edaf65-7f90-47a7-bc1b-cccaf6648543> ><gfid:cf46ac85-50a8-4660-8a2f-564e4825f93e> ><gfid:27dbd511-cf7d-4fa8-bd98-dd2006a0a06b> ><gfid:76661586-1cc6-421d-b0ad-081c105b6532> ><gfid:4db3cac8-1fdb-4b52-9647-dd3979907773> ><gfid:3ffbc798-7733-4ef6-a253-3dc5259c20aa> ><gfid:ea2af645-29ef-4911-9dfd-0409ae1df5a5> ><gfid:a35fef6b-9174-495f-a661-d9837a1243ac> ><gfid:5ea92161-c91a-4d51-877c-a3362966e850> ><gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> ><gfid:036e02e5-1062-4b48-a090-6e363454aac5> ><gfid:8312c8bd-18c7-449e-8482-16320f3ee8e9> ><gfid:15cfabab-e8df-4cad-b883-b80861ee5775> ><gfid:f804bfed-b17a-4abf-a104-26b01569609b> ><gfid:77253670-1791-4910-9f0d-38c2b1ec0f17> ><gfid:ca502545-5ca2-4db6-baf4-b2eb0e4176f6> ><gfid:964ca255-b2e2-45e4-bb86-51d3e8a4c3f4> ><gfid:7bcfaddd-a65c-41f5-919b-8fb8b501f735> ><gfid:f884e860-6d3e-4597-9249-da0fc17c456f> ><gfid:5960eb89-3ca1-4d9e-8c13-16f0ee4822e3> ><gfid:361517df-19a8-4e43-b601-7822f7e36ef8> ><gfid:09e8541b-a150-41da-aff8-3764c33635ba> ><gfid:b30f6bdb-e439-44c5-bd4c-143486439091> ><gfid:ae983848-3ba9-4f72-ab0c-d309f96d2678> ><gfid:9cddb5cd-a721-4d63-9522-7546a9c01303> ><gfid:91c1b906-14a5-4fe1-8103-91d14134706b> ><gfid:55cc28b7-80f1-428e-9334-5a0742cce1c6> ><gfid:8183219a-6d4a-4369-82dc-2233e2eba656> ><gfid:6cfacb2b-e247-488b-adde-e00dfc0c25f8> ><gfid:5184933d-6470-47dc-a010-6f7cb5661160> ><gfid:66e6842c-fe87-4797-8a01-a9b0a4124cde> ><gfid:55884d32-2e3f-42ba-a173-6c9362e331e2> ><gfid:47b9316a-7896-4efd-8704-3acdde6f2cb8> ><gfid:3a3013e2-06dd-41c7-b759-bd9e945c9743> ><gfid:3dc0b834-6f3e-409a-9549-f015f3b66af1> ><gfid:c9fc97ce-f8bc-42b2-b428-37953c172a30> ><gfid:a94ac3ba-9777-4844-892b-5526c00f2f7b> >Status: Connected >Number of entries: 76 > >Brick 172.23.0.6:/data/brick_cm_shared ><gfid:9cddb5cd-a721-4d63-9522-7546a9c01303> ><gfid:4f4da7fa-819d-45a4-bdb9-a81374b6df86> ><gfid:91c1b906-14a5-4fe1-8103-91d14134706b> ><gfid:55cc28b7-80f1-428e-9334-5a0742cce1c6> ><gfid:18b777df-72dc-4a57-a8a3-f22b54ceac3e> ><gfid:b6994926-5788-492b-8224-3a02100be9a2> ><gfid:6cfacb2b-e247-488b-adde-e00dfc0c25f8> ><gfid:5184933d-6470-47dc-a010-6f7cb5661160> ><gfid:d4c4bc62-705b-405d-a4b4-941f8e55e5d2> ><gfid:f9d3580b-7b24-4061-819d-d62978fd35d0> ><gfid:14f73281-21c9-4830-9a39-1eb6998eb955> ><gfid:26d87a63-8318-4b6c-9093-4817cacc76ef> ><gfid:ff38c782-b28c-46cc-8a6b-e93a2c4d504f> ><gfid:7dbd1e30-c4e0-4b19-8f0d-d4ef9199b89f> ><gfid:1f10c050-7c50-4044-abc5-0a980ac6af79> ><gfid:23edaf65-7f90-47a7-bc1b-cccaf6648543> ><gfid:cf46ac85-50a8-4660-8a2f-564e4825f93e> ><gfid:27dbd511-cf7d-4fa8-bd98-dd2006a0a06b> ><gfid:76661586-1cc6-421d-b0ad-081c105b6532> ><gfid:4db3cac8-1fdb-4b52-9647-dd3979907773> ><gfid:3ffbc798-7733-4ef6-a253-3dc5259c20aa> ><gfid:ea2af645-29ef-4911-9dfd-0409ae1df5a5> ><gfid:80ea53ba-a960-402b-9c6c-1cc62b2c59b3> ><gfid:b9412b45-d380-4789-a335-af5af33bde24> ><gfid:a35fef6b-9174-495f-a661-d9837a1243ac> ><gfid:08aff8a9-2818-44d6-a67d-d08c7894c496> ><gfid:5ea92161-c91a-4d51-877c-a3362966e850> ><gfid:8847f8a4-5509-463d-ac49-836bf921858c> ><gfid:782dd55f-d85d-4f5e-b76f-8dd562356a59> ><gfid:57e5c49d-36c9-4a70-afd5-34ffbddb7da5> ><gfid:cd896244-f7e9-41ad-8510-d1fe5d0bf836> ><gfid:036e02e5-1062-4b48-a090-6e363454aac5> ><gfid:611fa1e0-dc0d-4ddc-9273-6035e51e1acf> ><gfid:8312c8bd-18c7-449e-8482-16320f3ee8e9> ><gfid:686581b2-7515-4d0a-a1c8-369f01f60ecd> ><gfid:15cfabab-e8df-4cad-b883-b80861ee5775> ><gfid:875e893b-f2ed-4805-95fd-6955ea310757> ><gfid:eb4203eb-06a4-4577-bddb-ba400d5cc7c7> ><gfid:f804bfed-b17a-4abf-a104-26b01569609b> ><gfid:4dd86ddd-aca3-403f-87eb-03a9c8116993> ><gfid:77253670-1791-4910-9f0d-38c2b1ec0f17> ><gfid:70c90d83-9fb7-4e8e-ac1b-592c4d2b1df8> ><gfid:de9de454-a8f4-4c3f-b8b8-b28b0c444e31> ><gfid:ca502545-5ca2-4db6-baf4-b2eb0e4176f6> ><gfid:c44b7d98-f83b-4498-aa43-168ce4e35d52> ><gfid:964ca255-b2e2-45e4-bb86-51d3e8a4c3f4> ><gfid:61fde2e7-1898-4e5b-8b7f-f9702b595d3a> ><gfid:7bcfaddd-a65c-41f5-919b-8fb8b501f735> ><gfid:e44fd656-62a6-4c06-bafc-66de0ec99022> ><gfid:04aa47b5-52fa-47d0-9b5f-a39bc95eb1fe> ><gfid:f884e860-6d3e-4597-9249-da0fc17c456f> ><gfid:6357f8f6-aa5b-40b8-a0f4-6c3366ff4fc2> ><gfid:19728e57-2cc9-4c3a-bb45-e72bc59f3e60> ><gfid:5960eb89-3ca1-4d9e-8c13-16f0ee4822e3> ><gfid:6e1fd334-43a7-4410-b3ef-6566d41d8574> ><gfid:361517df-19a8-4e43-b601-7822f7e36ef8> ><gfid:d3b423da-484f-44a6-91d9-365e313bb2ef> ><gfid:09e8541b-a150-41da-aff8-3764c33635ba> ><gfid:da5215c1-565d-4419-beec-db50791de4c4> ><gfid:ff8348dc-8acc-40d5-a0ed-f9b3b5ba61ae> ><gfid:b30f6bdb-e439-44c5-bd4c-143486439091> ><gfid:54523a5e-ccd7-4464-806e-3897f297b749> ><gfid:ae983848-3ba9-4f72-ab0c-d309f96d2678> ><gfid:7bf00945-7b9a-46bb-8c73-bc233c644ca5> ><gfid:67ac7750-0b3c-4f88-aa8f-222183d39690> ><gfid:1b69ff6c-1dcc-4a9b-8c54-d4146cdfdd6c> ><gfid:e3bfb26e-7987-45cb-8824-99b353846c12> ><gfid:8183219a-6d4a-4369-82dc-2233e2eba656> ><gfid:434bb8e9-75a7-4670-960f-fefa6893da68> ><gfid:66e6842c-fe87-4797-8a01-a9b0a4124cde> ><gfid:55884d32-2e3f-42ba-a173-6c9362e331e2> ><gfid:47b9316a-7896-4efd-8704-3acdde6f2cb8> ><gfid:3a3013e2-06dd-41c7-b759-bd9e945c9743> ><gfid:3dc0b834-6f3e-409a-9549-f015f3b66af1> ><gfid:c9fc97ce-f8bc-42b2-b428-37953c172a30> ><gfid:a94ac3ba-9777-4844-892b-5526c00f2f7b> >Status: Connected >Number of entries: 76Hi Erik, Sadly I didn't have the time to take a look in your logs, but I would like to ask you whether you have statiatics of the network bandwidth usage. Could it be possible that the gNFS server is starved for bandwidth and fails to reach all bricks leading to 'split-brain' errors ? Best Regards, Strahil Nikolov