I am using a dozen odriod HC2 ARM systems each with a single HD/brick. Running ubuntu 18 and glusterfs 7.2 installed from the gluster PPA. I can create a dispersed volume and use it. If one of the cluster members duck out, say gluster12 reboots, when it comes back online it shows connected in the peer list but using gluster volume heal <volname> info summary It shows up as Brick gluster12:/exports/sda/brick1/disp1 Status: Transport endpoint is not connected Total Number of entries: - Number of entries in heal pending: - Number of entries in split-brain: - Number of entries possibly healing: - Trying to force a full heal doesn't fix it. The cluster member otherwise works and heals for other non-disperse volumes even while showing up as disconnected for the dispersed volume. I have attached a terminal log of the volume creation and diagnostic output. Could this be an ARM specific problem? I tested a similar setup on x86 virtual machines. They were able to heal a dispersed volume no problem. One thing I see in the ARM logs I don't see in the x86 logs is lots of this.. [2020-03-01 03:54:45.856769] W [MSGID: 122035] [ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing operation with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on '(null)' with gfid 0d3c4cf3-e09c-4b9a-87d3-cdfc4f49b692 [2020-03-01 03:54:45.910203] W [MSGID: 122035] [ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing operation with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on '(null)' with gfid 0d806805-81e4-47ee-a331-1808b34949bf [2020-03-01 03:54:45.932734] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-disp1-client-11: changing port to 49152 (from 0) [2020-03-01 03:54:45.956803] W [MSGID: 122035] [ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing operation with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on '(null)' with gfid d5768bad-7409-40f4-af98-4aef391d7ae4 [2020-03-01 03:54:46.000102] W [MSGID: 122035] [ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing operation with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on '(null)' with gfid 216f5583-e1b4-49cf-bef9-8cd34617beaf [2020-03-01 03:54:46.044184] W [MSGID: 122035] [ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing operation with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on '(null)' with gfid 1b610b49-2d69-4ee6-a440-5d3edd6693d1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200229/f81ea03e/attachment.html> -------------- next part -------------- root at gluster01:~# gluster peer status Number of Peers: 11 Hostname: gluster02 Uuid: bed38dda-279e-4ee2-9c35-4bf2976b93bf State: Peer in Cluster (Connected) Hostname: gluster03 Uuid: 662bf82c-3097-4259-9674-ec4081f3fc08 State: Peer in Cluster (Connected) Hostname: gluster04 Uuid: 4b6e1594-75b5-43a7-88d1-44e17077c805 State: Peer in Cluster (Connected) Hostname: gluster05 Uuid: 601882c1-5c05-4b1f-839c-f497ad1b1e70 State: Peer in Cluster (Connected) Hostname: gluster06 Uuid: 5c37e57c-c0e6-412c-ac21-a42eaf6d0426 State: Peer in Cluster (Connected) Hostname: gluster07 Uuid: f85ba854-0136-4e0e-ba59-d28dff76d58c State: Peer in Cluster (Connected) Hostname: gluster08 Uuid: b8d2908d-b747-4b34-87c5-360011923b1f State: Peer in Cluster (Connected) Hostname: gluster09 Uuid: f4f3b416-ca8a-4d3f-a309-51f639f32665 State: Peer in Cluster (Connected) Hostname: gluster10 Uuid: d3dc64f6-1a41-44af-90a9-64bf792b8b80 State: Peer in Cluster (Connected) Hostname: gluster11 Uuid: b80cfaee-0343-4b0d-b068-415993149969 State: Peer in Cluster (Connected) Hostname: gluster12 Uuid: c5934246-48ab-419e-9aff-e20d9af27b18 State: Peer in Cluster (Connected) root at gluster01:~# gluster volume create disp1 disperse 12 gluster01:/exports/sda/brick1/disp1 gluster02:/exports/sda/brick1/disp1 gluster03:/exports/sda/brick1/disp1 gluster04:/exports/sda/brick1/disp1 gluster05:/exports/sda/brick1/disp1 gluster06:/exports/sda/brick1/disp1 gluster07:/exports/sda/brick1/disp1 gluster08:/exports/sda/brick1/disp1 gluster09:/exports/sda/brick1/disp1 gluster10:/exports/sda/brick1/disp1 gluster11:/exports/sda/brick1/disp1 gluster12:/exports/sda/brick1/disp1 The optimal redundancy for this configuration is 4. Do you want to create the volume with this value ? (y/n) y volume create: disp1: success: please start the volume to access data root at gluster01:~# gluster volume start disp1 volume start: disp1: success root at gluster01:~# gluster volume heal disp1 enable Enable heal on volume disp1 has been successful root at gluster01:~# gluster volume info disp1 Volume Name: disp1 Type: Disperse Volume ID: 9c4070e5-e0b8-46ca-a783-96bd240247d1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (8 + 4) = 12 Transport-type: tcp Bricks: Brick1: gluster01:/exports/sda/brick1/disp1 Brick2: gluster02:/exports/sda/brick1/disp1 Brick3: gluster03:/exports/sda/brick1/disp1 Brick4: gluster04:/exports/sda/brick1/disp1 Brick5: gluster05:/exports/sda/brick1/disp1 Brick6: gluster06:/exports/sda/brick1/disp1 Brick7: gluster07:/exports/sda/brick1/disp1 Brick8: gluster08:/exports/sda/brick1/disp1 Brick9: gluster09:/exports/sda/brick1/disp1 Brick10: gluster10:/exports/sda/brick1/disp1 Brick11: gluster11:/exports/sda/brick1/disp1 Brick12: gluster12:/exports/sda/brick1/disp1 Options Reconfigured: cluster.disperse-self-heal-daemon: enable transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on (CLIENT MOUNTS VOLUME AND BEGINS WRITING FILES) (GLUSTER12 IS REBOOTED DURING) root at gluster01:~# gluster volume info disp1 Volume Name: disp1 Type: Disperse Volume ID: 9c4070e5-e0b8-46ca-a783-96bd240247d1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (8 + 4) = 12 Transport-type: tcp Bricks: Brick1: gluster01:/exports/sda/brick1/disp1 Brick2: gluster02:/exports/sda/brick1/disp1 Brick3: gluster03:/exports/sda/brick1/disp1 Brick4: gluster04:/exports/sda/brick1/disp1 Brick5: gluster05:/exports/sda/brick1/disp1 Brick6: gluster06:/exports/sda/brick1/disp1 Brick7: gluster07:/exports/sda/brick1/disp1 Brick8: gluster08:/exports/sda/brick1/disp1 Brick9: gluster09:/exports/sda/brick1/disp1 Brick10: gluster10:/exports/sda/brick1/disp1 Brick11: gluster11:/exports/sda/brick1/disp1 Brick12: gluster12:/exports/sda/brick1/disp1 Options Reconfigured: cluster.disperse-self-heal-daemon: enable transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on root at gluster01:~# gluster peer status Number of Peers: 11 Hostname: gluster02 Uuid: bed38dda-279e-4ee2-9c35-4bf2976b93bf State: Peer in Cluster (Connected) Hostname: gluster03 Uuid: 662bf82c-3097-4259-9674-ec4081f3fc08 State: Peer in Cluster (Connected) Hostname: gluster04 Uuid: 4b6e1594-75b5-43a7-88d1-44e17077c805 State: Peer in Cluster (Connected) Hostname: gluster05 Uuid: 601882c1-5c05-4b1f-839c-f497ad1b1e70 State: Peer in Cluster (Connected) Hostname: gluster06 Uuid: 5c37e57c-c0e6-412c-ac21-a42eaf6d0426 State: Peer in Cluster (Connected) Hostname: gluster07 Uuid: f85ba854-0136-4e0e-ba59-d28dff76d58c State: Peer in Cluster (Connected) Hostname: gluster08 Uuid: b8d2908d-b747-4b34-87c5-360011923b1f State: Peer in Cluster (Connected) Hostname: gluster09 Uuid: f4f3b416-ca8a-4d3f-a309-51f639f32665 State: Peer in Cluster (Connected) Hostname: gluster10 Uuid: d3dc64f6-1a41-44af-90a9-64bf792b8b80 State: Peer in Cluster (Connected) Hostname: gluster11 Uuid: b80cfaee-0343-4b0d-b068-415993149969 State: Peer in Cluster (Connected) Hostname: gluster12 Uuid: c5934246-48ab-419e-9aff-e20d9af27b18 State: Peer in Cluster (Connected) root at gluster01:~# gluster volume heal disp1 info summary Brick gluster01:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster02:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster03:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster04:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster05:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster06:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster07:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster08:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster09:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster10:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster11:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 306 Number of entries in heal pending: 306 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster12:/exports/sda/brick1/disp1 Status: Transport endpoint is not connected Total Number of entries: - Number of entries in heal pending: - Number of entries in split-brain: - Number of entries possibly healing: - root at gluster01:~# gluster volume heal disp1 full Launching heal operation to perform full self heal on volume disp1 has been successful Use heal info commands to check status. root at gluster01:~# gluster volume heal disp1 info summary Brick gluster01:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster02:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster03:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster04:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster05:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster06:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster07:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster08:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster09:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster10:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster11:/exports/sda/brick1/disp1 Status: Connected Total Number of entries: 293 Number of entries in heal pending: 293 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gluster12:/exports/sda/brick1/disp1 Status: Transport endpoint is not connected Total Number of entries: - Number of entries in heal pending: - Number of entries in split-brain: - Number of entries possibly healing: - root at gluster01:~# gluster volume status Status of volume: disp1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster01:/exports/sda/brick1/disp1 49152 0 Y 3931 Brick gluster02:/exports/sda/brick1/disp1 49152 0 Y 2755 Brick gluster03:/exports/sda/brick1/disp1 49152 0 Y 2787 Brick gluster04:/exports/sda/brick1/disp1 49152 0 Y 2780 Brick gluster05:/exports/sda/brick1/disp1 49152 0 Y 2764 Brick gluster06:/exports/sda/brick1/disp1 49152 0 Y 2760 Brick gluster07:/exports/sda/brick1/disp1 49152 0 Y 2740 Brick gluster08:/exports/sda/brick1/disp1 49152 0 Y 2729 Brick gluster09:/exports/sda/brick1/disp1 49152 0 Y 2772 Brick gluster10:/exports/sda/brick1/disp1 49152 0 Y 2791 Brick gluster11:/exports/sda/brick1/disp1 49152 0 Y 2026 Brick gluster12:/exports/sda/brick1/disp1 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 3952 Self-heal Daemon on gluster03 N/A N/A Y 2808 Self-heal Daemon on gluster02 N/A N/A Y 2776 Self-heal Daemon on gluster06 N/A N/A Y 2781 Self-heal Daemon on gluster07 N/A N/A Y 2761 Self-heal Daemon on gluster05 N/A N/A Y 2785 Self-heal Daemon on gluster08 N/A N/A Y 2750 Self-heal Daemon on gluster04 N/A N/A Y 2801 Self-heal Daemon on gluster09 N/A N/A Y 2793 Self-heal Daemon on gluster11 N/A N/A Y 2047 Self-heal Daemon on gluster10 N/A N/A Y 2812 Self-heal Daemon on gluster12 N/A N/A Y 542 Task Status of Volume disp1 ------------------------------------------------------------------------------ There are no active volume tasks
On March 1, 2020 6:08:31 AM GMT+02:00, Fox <foxxz.net at gmail.com> wrote:>I am using a dozen odriod HC2 ARM systems each with a single HD/brick. >Running ubuntu 18 and glusterfs 7.2 installed from the gluster PPA. > >I can create a dispersed volume and use it. If one of the cluster >members >duck out, say gluster12 reboots, when it comes back online it shows >connected in the peer list but using >gluster volume heal <volname> info summary > >It shows up as >Brick gluster12:/exports/sda/brick1/disp1 >Status: Transport endpoint is not connected >Total Number of entries: - >Number of entries in heal pending: - >Number of entries in split-brain: - >Number of entries possibly healing: - > >Trying to force a full heal doesn't fix it. The cluster member >otherwise >works and heals for other non-disperse volumes even while showing up as >disconnected for the dispersed volume. > >I have attached a terminal log of the volume creation and diagnostic >output. Could this be an ARM specific problem? > >I tested a similar setup on x86 virtual machines. They were able to >heal a >dispersed volume no problem. One thing I see in the ARM logs I don't >see in >the x86 logs is lots of this.. >[2020-03-01 03:54:45.856769] W [MSGID: 122035] >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing >operation >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on >'(null)' >with gfid 0d3c4cf3-e09c-4b9a-87d3-cdfc4f49b692 >[2020-03-01 03:54:45.910203] W [MSGID: 122035] >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing >operation >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on >'(null)' >with gfid 0d806805-81e4-47ee-a331-1808b34949bf >[2020-03-01 03:54:45.932734] I [rpc-clnt.c:1963:rpc_clnt_reconfig] >0-disp1-client-11: changing port to 49152 (from 0) >[2020-03-01 03:54:45.956803] W [MSGID: 122035] >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing >operation >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on >'(null)' >with gfid d5768bad-7409-40f4-af98-4aef391d7ae4 >[2020-03-01 03:54:46.000102] W [MSGID: 122035] >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing >operation >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on >'(null)' >with gfid 216f5583-e1b4-49cf-bef9-8cd34617beaf >[2020-03-01 03:54:46.044184] W [MSGID: 122035] >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing >operation >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on >'(null)' >with gfid 1b610b49-2d69-4ee6-a440-5d3edd6693d1Hi, Are you sure that the gluster bricks on this node is up and running ? What is the output of 'gluster volume status' on this system ? Best Regards, Strahil Nikolov