Atin Mukherjee
2016-Dec-14 04:06 UTC
[Gluster-users] Pre Validation failed when adding bricks
>From gl4.dump file:glusterd.peer4.hostname=gl5 glusterd.peer4.port=0 glusterd.peer4.state=3 glusterd.peer4.quorum-action=0 glusterd.peer4.quorum-contrib=2 glusterd.peer4.detaching=0 glusterd.peer4.locked=0 glusterd.peer4.rpc.peername *glusterd.peer4.rpc.connected=0 * *<===== this indicates the gl5 is not connected with gl4, so add-brick command failed as it supposed to in this case * glusterd.peer4.rpc.total-bytes-read=0 glusterd.peer4.rpc.total-bytes-written=0 glusterd.peer4.rpc.ping_msgs_sent=0 glusterd.peer4.rpc.msgs_sent=0 And the same goes true for gl6 as well as per this dump. So the issue is with gl4 node. Now, in gl4's glusterd log I see the repetitive entries of following logs: [2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5 [2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6 [2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5 [2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6 [2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5 The above indicates that gl4 is not able to resolve the DNS name for gl5 & gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS configuration and see if there are any incorrect entries put up there. From our side what we need to check is why peer status didn't show both gl5 & gl6 as disconnected. On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <yipikai7 at gmail.com> wrote:> Thanks Atin, the files you asked : https://we.tl/XrOvFhffGq > > On 13 Dec 2016, at 19:08, Atin Mukherjee <amukherj at redhat.com> wrote: > > Thanks, we will get back on this. In the mean time can you please also > share glusterd statedump file from both the nodes? The way to take > statedump is 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at > /var/run/gluster directory. > > On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <yipikai7 at gmail.com> > wrote: > >> 1. sorry, 3.9.0-1 >> 2. no it does nothing >> 3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoR >> >> >> >> On 13 Dec 2016, at 16:49, Atin Mukherjee <amukherj at redhat.com> wrote: >> >> And 3. In case 2 doesn't work, please provide the glusterd log files from >> gl1 & gl5 >> >> On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >> 1. Could you mention which gluster version are you running with? >> 2. Does restarting glusterd instance on gl1 & gl5 solves the issue (after >> removing the volume-id xattr from the bricks) ? >> >> On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <yipikai7 at gmail.com> >> wrote: >> >> Hello, >> >> >> >> >> >> When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 >> bricks in dispersed mode 2+1, it fails like this : >> >> >> >> >> >> root at gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >> gl6:/data/br1 >> >> >> volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not >> connected >> >> >> >> >> >> However all peers are connected and there aren't networking issues : >> >> >> >> >> >> root at gl1:~# gluster peer status >> >> >> Number of Peers: 5 >> >> >> >> >> >> Hostname: gl2 >> >> >> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl3 >> >> >> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl4 >> >> >> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl5 >> >> >> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl6 >> >> >> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >> >> >> State: Peer in Cluster (Connected) >> >> >> : >> >> >> >> >> >> When I try a second time, the error is different : >> >> >> >> >> >> root at gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >> gl6:/data/br1 >> >> >> volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is >> already part of a volume >> >> >> Pre Validation failed on gl6. /data/br1 is already part of a volume >> >> >> Pre Validation failed on gl4. /data/br1 is already part of a volume >> >> >> >> >> >> It seems the previous try, even if it has failed, have created the >> gluster attributes on file system as shown by attr on gl4/5/6 : >> >> >> >> >> >> Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1 >> >> >> >> >> >> I already purge gluster and reformat brick on gl4/5/6 but the issue >> persist, any ideas ? did I miss something ? >> >> >> >> >> >> >> >> >> Some informations : >> >> >> >> >> >> root at gl1:~# gluster volume info >> >> >> >> >> >> Volume Name: vol1 >> >> >> Type: Disperse >> >> >> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >> >> >> Status: Started >> >> >> Snapshot Count: 0 >> >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> >> Transport-type: tcp >> >> >> Bricks: >> >> >> Brick1: gl1:/data/br1 >> >> >> Brick2: gl2:/data/br1 >> >> >> Brick3: gl3:/data/br1 >> >> >> Options Reconfigured: >> >> >> features.scrub-freq: hourly >> >> >> features.scrub: Inactive >> >> >> features.bitrot: off >> >> >> cluster.disperse-self-heal-daemon: enable >> >> >> transport.address-family: inet >> >> >> performance.readdir-ahead: on >> >> >> nfs.disable: on, I have the following error : >> >> >> >> >> >> root at gl1:~# gluster volume status >> >> >> Status of volume: vol1 >> >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> >> ------------------------------------------------------------ >> ------------------ >> >> >> Brick gl1:/data/br1 49152 0 Y >> 23403 >> >> >> Brick gl2:/data/br1 49152 0 Y >> 14545 >> >> >> Brick gl3:/data/br1 49152 0 Y >> 11348 >> >> >> Self-heal Daemon on localhost N/A N/A Y >> 24766 >> >> >> Self-heal Daemon on gl4 N/A N/A Y >> 1087 >> >> >> Self-heal Daemon on gl5 N/A N/A Y >> 1080 >> >> >> Self-heal Daemon on gl3 N/A N/A Y >> 12321 >> >> >> Self-heal Daemon on gl2 N/A N/A Y >> 15496 >> >> >> Self-heal Daemon on gl6 N/A N/A Y >> 1091 >> >> >> >> >> >> Task Status of Volume vol1 >> >> >> ------------------------------------------------------------ >> ------------------ >> >> >> There are no active volume tasks >> >> >> >> >> >> root at gl1:~# gluster volume info >> >> >> >> >> >> Volume Name: vol1 >> >> >> Type: Disperse >> >> >> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >> >> >> Status: Started >> >> >> Snapshot Count: 0 >> >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> >> Transport-type: tcp >> >> >> Bricks: >> >> >> Brick1: gl1:/data/br1 >> >> >> Brick2: gl2:/data/br1 >> >> >> Brick3: gl3:/data/br1 >> >> >> Options Reconfigured: >> >> >> features.scrub-freq: hourly >> >> >> features.scrub: Inactive >> >> >> features.bitrot: off >> >> >> cluster.disperse-self-heal-daemon: enable >> >> >> transport.address-family: inet >> >> >> performance.readdir-ahead: on >> >> >> nfs.disable: on >> >> >> >> >> >> root at gl1:~# gluster peer status >> >> >> Number of Peers: 5 >> >> >> >> >> >> Hostname: gl2 >> >> >> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl3 >> >> >> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl4 >> >> >> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl5 >> >> >> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> Hostname: gl6 >> >> >> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >> >> >> State: Peer in Cluster (Connected) >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> Gluster-users mailing list >> >> >> Gluster-users at gluster.org >> >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> >> -- >> >> ~ Atin (atinm) >> >> >> >> >> >> >> >> -- >> >> ~ Atin (atinm) >> >> >> >> >> >> -- > - Atin (atinm) > > >-- ~ Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161214/29bd7a5e/attachment.html>
Atin Mukherjee
2016-Dec-14 04:10 UTC
[Gluster-users] Pre Validation failed when adding bricks
On Wed, Dec 14, 2016 at 9:36 AM, Atin Mukherjee <amukherj at redhat.com> wrote:> From gl4.dump file: > > glusterd.peer4.hostname=gl5 > > glusterd.peer4.port=0 > > glusterd.peer4.state=3 > > glusterd.peer4.quorum-action=0 > > glusterd.peer4.quorum-contrib=2 > > glusterd.peer4.detaching=0 > > glusterd.peer4.locked=0 > > glusterd.peer4.rpc.peername> > *glusterd.peer4.rpc.connected=0 * *<===== this indicates the gl5 is > not connected with gl4, so add-brick command failed as it supposed to in > this case * > glusterd.peer4.rpc.total-bytes-read=0 > > glusterd.peer4.rpc.total-bytes-written=0 > > glusterd.peer4.rpc.ping_msgs_sent=0 > > glusterd.peer4.rpc.msgs_sent=0 > > And the same goes true for gl6 as well as per this dump. So the issue is > with gl4 node. > > Now, in gl4's glusterd log I see the repetitive entries of following logs: > > [2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] > 0-management: DNS resolution failed on host gl5 > [2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] > 0-management: DNS resolution failed on host gl6 > [2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] > 0-management: DNS resolution failed on host gl5 > [2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] > 0-management: DNS resolution failed on host gl6 > [2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] > 0-management: DNS resolution failed on host gl5 > > The above indicates that gl4 is not able to resolve the DNS name for gl5 & > gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS > configuration and see if there are any incorrect entries put up there. From > our side what we need to check is why peer status didn't show both gl5 & > gl6 as disconnected. >Can you run gluster peer status from gl4 and see if both gl5 & gl6 are mentioned as disconnected, if so then its expected, since gl5 & gl6 were connected for all the nodes apart from gl4 peer status on all the other nodes apart from gl4 would show up as connected and that's an expected behaviour. Please do confirm.> > > On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <yipikai7 at gmail.com> > wrote: > >> Thanks Atin, the files you asked : https://we.tl/XrOvFhffGq >> >> On 13 Dec 2016, at 19:08, Atin Mukherjee <amukherj at redhat.com> wrote: >> >> Thanks, we will get back on this. In the mean time can you please also >> share glusterd statedump file from both the nodes? The way to take >> statedump is 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at >> /var/run/gluster directory. >> >> On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <yipikai7 at gmail.com> >> wrote: >> >>> 1. sorry, 3.9.0-1 >>> 2. no it does nothing >>> 3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoR >>> >>> >>> >>> On 13 Dec 2016, at 16:49, Atin Mukherjee <amukherj at redhat.com> wrote: >>> >>> And 3. In case 2 doesn't work, please provide the glusterd log files >>> from gl1 & gl5 >>> >>> On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <amukherj at redhat.com> >>> wrote: >>> >>> 1. Could you mention which gluster version are you running with? >>> 2. Does restarting glusterd instance on gl1 & gl5 solves the issue >>> (after removing the volume-id xattr from the bricks) ? >>> >>> On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <yipikai7 at gmail.com> >>> wrote: >>> >>> Hello, >>> >>> >>> >>> >>> >>> When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 >>> bricks in dispersed mode 2+1, it fails like this : >>> >>> >>> >>> >>> >>> root at gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >>> gl6:/data/br1 >>> >>> >>> volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not >>> connected >>> >>> >>> >>> >>> >>> However all peers are connected and there aren't networking issues : >>> >>> >>> >>> >>> >>> root at gl1:~# gluster peer status >>> >>> >>> Number of Peers: 5 >>> >>> >>> >>> >>> >>> Hostname: gl2 >>> >>> >>> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl3 >>> >>> >>> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl4 >>> >>> >>> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl5 >>> >>> >>> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl6 >>> >>> >>> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> : >>> >>> >>> >>> >>> >>> When I try a second time, the error is different : >>> >>> >>> >>> >>> >>> root at gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >>> gl6:/data/br1 >>> >>> >>> volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is >>> already part of a volume >>> >>> >>> Pre Validation failed on gl6. /data/br1 is already part of a volume >>> >>> >>> Pre Validation failed on gl4. /data/br1 is already part of a volume >>> >>> >>> >>> >>> >>> It seems the previous try, even if it has failed, have created the >>> gluster attributes on file system as shown by attr on gl4/5/6 : >>> >>> >>> >>> >>> >>> Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1 >>> >>> >>> >>> >>> >>> I already purge gluster and reformat brick on gl4/5/6 but the issue >>> persist, any ideas ? did I miss something ? >>> >>> >>> >>> >>> >>> >>> >>> >>> Some informations : >>> >>> >>> >>> >>> >>> root at gl1:~# gluster volume info >>> >>> >>> >>> >>> >>> Volume Name: vol1 >>> >>> >>> Type: Disperse >>> >>> >>> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >>> >>> >>> Status: Started >>> >>> >>> Snapshot Count: 0 >>> >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> >>> Transport-type: tcp >>> >>> >>> Bricks: >>> >>> >>> Brick1: gl1:/data/br1 >>> >>> >>> Brick2: gl2:/data/br1 >>> >>> >>> Brick3: gl3:/data/br1 >>> >>> >>> Options Reconfigured: >>> >>> >>> features.scrub-freq: hourly >>> >>> >>> features.scrub: Inactive >>> >>> >>> features.bitrot: off >>> >>> >>> cluster.disperse-self-heal-daemon: enable >>> >>> >>> transport.address-family: inet >>> >>> >>> performance.readdir-ahead: on >>> >>> >>> nfs.disable: on, I have the following error : >>> >>> >>> >>> >>> >>> root at gl1:~# gluster volume status >>> >>> >>> Status of volume: vol1 >>> >>> >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> >>> ------------------------------------------------------------ >>> ------------------ >>> >>> >>> Brick gl1:/data/br1 49152 0 Y >>> 23403 >>> >>> >>> Brick gl2:/data/br1 49152 0 Y >>> 14545 >>> >>> >>> Brick gl3:/data/br1 49152 0 Y >>> 11348 >>> >>> >>> Self-heal Daemon on localhost N/A N/A Y >>> 24766 >>> >>> >>> Self-heal Daemon on gl4 N/A N/A Y >>> 1087 >>> >>> >>> Self-heal Daemon on gl5 N/A N/A Y >>> 1080 >>> >>> >>> Self-heal Daemon on gl3 N/A N/A Y >>> 12321 >>> >>> >>> Self-heal Daemon on gl2 N/A N/A Y >>> 15496 >>> >>> >>> Self-heal Daemon on gl6 N/A N/A Y >>> 1091 >>> >>> >>> >>> >>> >>> Task Status of Volume vol1 >>> >>> >>> ------------------------------------------------------------ >>> ------------------ >>> >>> >>> There are no active volume tasks >>> >>> >>> >>> >>> >>> root at gl1:~# gluster volume info >>> >>> >>> >>> >>> >>> Volume Name: vol1 >>> >>> >>> Type: Disperse >>> >>> >>> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >>> >>> >>> Status: Started >>> >>> >>> Snapshot Count: 0 >>> >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> >>> Transport-type: tcp >>> >>> >>> Bricks: >>> >>> >>> Brick1: gl1:/data/br1 >>> >>> >>> Brick2: gl2:/data/br1 >>> >>> >>> Brick3: gl3:/data/br1 >>> >>> >>> Options Reconfigured: >>> >>> >>> features.scrub-freq: hourly >>> >>> >>> features.scrub: Inactive >>> >>> >>> features.bitrot: off >>> >>> >>> cluster.disperse-self-heal-daemon: enable >>> >>> >>> transport.address-family: inet >>> >>> >>> performance.readdir-ahead: on >>> >>> >>> nfs.disable: on >>> >>> >>> >>> >>> >>> root at gl1:~# gluster peer status >>> >>> >>> Number of Peers: 5 >>> >>> >>> >>> >>> >>> Hostname: gl2 >>> >>> >>> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl3 >>> >>> >>> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl4 >>> >>> >>> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl5 >>> >>> >>> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl6 >>> >>> >>> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> >>> Gluster-users mailing list >>> >>> >>> Gluster-users at gluster.org >>> >>> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >>> >>> -- >> - Atin (atinm) >> >> >> > > > -- > > ~ Atin (atinm) >-- ~ Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161214/7e1c383a/attachment.html>