thr3ads.net - Gluster users - [Gluster-users] Fwd: Replica brick not working [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2016-Dec-08 15:17 UTC

[Gluster-users] Fwd: Replica brick not working

On 12/08/2016 06:53 PM, Atin Mukherjee wrote:>
>
> On Thu, Dec 8, 2016 at 6:44 PM, Milo? ?u?ulovi? - MDPI 
> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>
>     Ah, damn! I found the issue. On the storage server, the storage2
>     IP address was wrong, I inversed two digits in the /etc/hosts
>     file, sorry for that :(
>
>     I was able to add the brick now, I started the heal, but still no
>     data transfer visible.
>1. Are the files getting created on the new brick though?
2. Can you provide the output of `getfattr -d -m . -e hex 
/data/data-cluster` on both bricks?
3. Is it possible to attach gdb to the self-heal daemon on the original 
(old) brick and get a backtrace?
   `gdb -p <pid of self-heal daemon on the orignal brick>`
      thread apply all bt  -->share this output
     quit gdb.


-Ravi>
> @Ravi/Pranith - can you help here?
>
>
>     By doing gluster volume status, I have
>
>     Status of volume: storage
>     Gluster process                       TCP Port  RDMA Port  Online  Pid
>    
------------------------------------------------------------------------------
>     Brick storage2:/data/data-cluster     49152     0 Y       23101
>     Brick storage:/data/data-cluster      49152     0 Y       30773
>     Self-heal Daemon on localhost         N/A       N/A Y       30050
>     Self-heal Daemon on storage           N/A       N/A Y       30792
>
>
>     Any idea?
>
>     On storage I have:
>     Number of Peers: 1
>
>     Hostname: 195.65.194.217
>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>     State: Peer in Cluster (Connected)
>
>
>     - Kindest regards,
>
>     Milos Cuculovic
>     IT Manager
>
>     ---
>     MDPI AG
>     Postfach, CH-4020 Basel, Switzerland
>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>     Tel. +41 61 683 77 35
>     Fax +41 61 302 89 18
>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>     Skype: milos.cuculovic.mdpi
>
>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>
>         Can you resend the attachment as zip? I am unable to extract the
>         content? We shouldn't have 0 info file. What does gluster peer
>         status
>         output say?
>
>         On Thu, Dec 8, 2016 at 4:51 PM, Milo? ?u?ulovi? - MDPI
>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>> wrote:
>
>             I hope you received my last email Atin, thank you!
>
>             - Kindest regards,
>
>             Milos Cuculovic
>             IT Manager
>
>             ---
>             MDPI AG
>             Postfach, CH-4020 Basel, Switzerland
>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>             Tel. +41 61 683 77 35
>             Fax +41 61 302 89 18
>             Email: cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>
>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>
>             Skype: milos.cuculovic.mdpi
>
>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>
>
>                 ---------- Forwarded message ----------
>                 From: *Atin Mukherjee* <amukherj at redhat.com
>         <mailto:amukherj at redhat.com>
>                 <mailto:amukherj at redhat.com
>         <mailto:amukherj at redhat.com>> <mailto:amukherj at
redhat.com
>         <mailto:amukherj at redhat.com>
>                 <mailto:amukherj at redhat.com
>         <mailto:amukherj at redhat.com>>>>
>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>                 Subject: Re: [Gluster-users] Replica brick not working
>                 To: Ravishankar N <ravishankar at redhat.com
>         <mailto:ravishankar at redhat.com>
>                 <mailto:ravishankar at redhat.com
>         <mailto:ravishankar at redhat.com>>
>         <mailto:ravishankar at redhat.com <mailto:ravishankar at
redhat.com>
>                 <mailto:ravishankar at redhat.com
>         <mailto:ravishankar at redhat.com>>>>
>                 Cc: Milo? ?u?ulovi? - MDPI <cuculovic at mdpi.com
>         <mailto:cuculovic at mdpi.com>
>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>
>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>
>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>>>,
>                 Pranith Kumar Karampuri
>                 <pkarampu at redhat.com <mailto:pkarampu at
redhat.com>
>         <mailto:pkarampu at redhat.com <mailto:pkarampu at
redhat.com>>
>                 <mailto:pkarampu at redhat.com
>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at
redhat.com
>         <mailto:pkarampu at redhat.com>>>>,
>                 gluster-users
>                 <gluster-users at gluster.org
>         <mailto:gluster-users at gluster.org>
>         <mailto:gluster-users at gluster.org
>         <mailto:gluster-users at gluster.org>>
>                 <mailto:gluster-users at gluster.org
>         <mailto:gluster-users at gluster.org>
>                 <mailto:gluster-users at gluster.org
>         <mailto:gluster-users at gluster.org>>>>
>
>
>
>
>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>                 <ravishankar at redhat.com
>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at
redhat.com
>         <mailto:ravishankar at redhat.com>>
>                 <mailto:ravishankar at redhat.com
>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at
redhat.com
>         <mailto:ravishankar at redhat.com>>>>
>
>                 wrote:
>
>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>
>                         >From the log snippet:
>
>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>                        
>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>                         0-management: Received add brick req
>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>                        
>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>                         0-management: replica-count is 2
>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>                        
>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>                 0-management:
>
>                         The last log entry indicates that we hit the
>         code path in
>                         gd_addbr_validate_replica_count ()
>
>                                         if (replica_count =>            
volinfo->replica_count) {
>                                                 if (!(total_bricks %
>                         volinfo->dist_leaf_count)) {
>                                                         ret = 1;
>                                                         goto out;
>                         }
>                                         }
>
>
>                     It seems unlikely that this snippet was hit
>         because we print
>                 the E
>                     [MSGID: 106291] in the above message only if ret==-1.
>                     gd_addbr_validate_replica_count() returns -1 and
>         yet not
>                 populates
>                     err_str only when in volinfo->type doesn't match
>         any of the
>                 known
>                     volume types, so volinfo->type is corrupted perhaps?
>
>
>                 You are right, I missed that ret is set to 1 here in
>         the above
>                 snippet.
>
>                 @Milos - Can you please provide us the volume info
>         file from
>                 /var/lib/glusterd/vols/<volname>/ from all the three
>         nodes to
>                 continue
>                 the analysis?
>
>
>
>                     -Ravi
>
>                         @Pranith, Ravi - Milos was trying to convert a
>         dist (1 X 1)
>                         volume to a replicate (1 X 2) using add brick
>         and hit
>                 this issue
>                         where add-brick failed. The cluster is
>         operating with 3.7.6.
>                         Could you help on what scenario this code path
>         can be
>                 hit? One
>                         straight forward issue I see here is missing
>         err_str in
>                 this path.
>
>
>
>
>
>
>                 --
>
>                 ~ Atin (atinm)
>
>
>
>                 --
>
>                 ~ Atin (atinm)
>
>
>
>
>         --
>
>         ~ Atin (atinm)
>
>
>
>
> -- 
>
> ~ Atin (atinm)

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161208/d56d2189/attachment.html>

Miloš Čučulović - MDPI

2016-Dec-08 15:32 UTC

head link

[Gluster-users] Fwd: Replica brick not working

1. No, atm the old server (storage2) volume is mounted on some other 
servers, so all files are created there. If I check the new brick, there 
is no files.


2. On storage2 server (old brick)
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382

On storage server (new brick)
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382


3.
Thread 8 (Thread 0x7fad832dd700 (LWP 30057)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad88834f3e in __afr_shd_healer_wait () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#2  0x00007fad88834fad in afr_shd_healer_wait () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#3  0x00007fad88835aa0 in afr_shd_index_healer () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#4  0x00007fad8df4270a in start_thread (arg=0x7fad832dd700) at 
pthread_create.c:333
#5  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fad83ade700 (LWP 30056)):
#0  0x00007fad8dc78e23 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e808a58 in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad83ade700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7fad894a5700 (LWP 30055)):
#0  0x00007fad8dc78e23 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e808a58 in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad894a5700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7fad8a342700 (LWP 30054)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad8e7ecd98 in syncenv_task () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8e7ed970 in syncenv_processor () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8a342700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7fad8ab43700 (LWP 30053)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad8e7ecd98 in syncenv_task () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8e7ed970 in syncenv_processor () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8ab43700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fad8b344700 (LWP 30052)):
#0  do_sigwait (sig=0x7fad8b343e3c, set=<optimized out>) at 
../sysdeps/unix/sysv/linux/sigwait.c:64
#1  __sigwait (set=<optimized out>, sig=0x7fad8b343e3c) at 
../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x00000000004080bf in glusterfs_sigwaiter ()
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8b344700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fad8bb45700 (LWP 30051)):
#0  0x00007fad8df4bc6d in nanosleep () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e7ca744 in gf_timer_proc () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad8bb45700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fad8ec66780 (LWP 30050)):
#0  0x00007fad8df439dd in pthread_join (threadid=140383309420288, 
thread_return=0x0) at pthread_join.c:90
#1  0x00007fad8e808eeb in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x0000000000405501 in main ()


- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 08.12.2016 16:17, Ravishankar N wrote:> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>
>>
>> On Thu, Dec 8, 2016 at 6:44 PM, Milo? ?u?ulovi? - MDPI
>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
wrote:
>>
>>     Ah, damn! I found the issue. On the storage server, the storage2
>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>     file, sorry for that :(
>>
>>     I was able to add the brick now, I started the heal, but still no
>>     data transfer visible.
>>
> 1. Are the files getting created on the new brick though?
> 2. Can you provide the output of `getfattr -d -m . -e hex
> /data/data-cluster` on both bricks?
> 3. Is it possible to attach gdb to the self-heal daemon on the original
> (old) brick and get a backtrace?
>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>      thread apply all bt  -->share this output
>     quit gdb.
>
>
> -Ravi
>>
>> @Ravi/Pranith - can you help here?
>>
>>
>>
>>     By doing gluster volume status, I have
>>
>>     Status of volume: storage
>>     Gluster process                       TCP Port  RDMA Port  Online 
Pid
>>    
------------------------------------------------------------------------------
>>     Brick storage2:/data/data-cluster     49152     0          Y
>>      23101
>>     Brick storage:/data/data-cluster      49152     0          Y
>>      30773
>>     Self-heal Daemon on localhost         N/A       N/A        Y
>>      30050
>>     Self-heal Daemon on storage           N/A       N/A        Y
>>      30792
>>
>>
>>     Any idea?
>>
>>     On storage I have:
>>     Number of Peers: 1
>>
>>     Hostname: 195.65.194.217
>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>     State: Peer in Cluster (Connected)
>>
>>
>>     - Kindest regards,
>>
>>     Milos Cuculovic
>>     IT Manager
>>
>>     ---
>>     MDPI AG
>>     Postfach, CH-4020 Basel, Switzerland
>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>     Tel. +41 61 683 77 35
>>     Fax +41 61 302 89 18
>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>     Skype: milos.cuculovic.mdpi
>>
>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>
>>         Can you resend the attachment as zip? I am unable to extract
the
>>         content? We shouldn't have 0 info file. What does gluster
peer
>>         status
>>         output say?
>>
>>         On Thu, Dec 8, 2016 at 4:51 PM, Milo? ?u?ulovi? - MDPI
>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>> wrote:
>>
>>             I hope you received my last email Atin, thank you!
>>
>>             - Kindest regards,
>>
>>             Milos Cuculovic
>>             IT Manager
>>
>>             ---
>>             MDPI AG
>>             Postfach, CH-4020 Basel, Switzerland
>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>             Tel. +41 61 683 77 35
>>             Fax +41 61 302 89 18
>>             Email: cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>
>>             Skype: milos.cuculovic.mdpi
>>
>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>
>>
>>                 ---------- Forwarded message ----------
>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>> <mailto:amukherj
at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>>>>
>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>                 Subject: Re: [Gluster-users] Replica brick not working
>>                 To: Ravishankar N <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at
redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>                 Cc: Milo? ?u?ulovi? - MDPI <cuculovic at mdpi.com
>>         <mailto:cuculovic at mdpi.com>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic
at mdpi.com>>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic
at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>>>,
>>                 Pranith Kumar Karampuri
>>                 <pkarampu at redhat.com <mailto:pkarampu at
redhat.com>
>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at
redhat.com>>
>>                 <mailto:pkarampu at redhat.com
>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at
redhat.com
>>         <mailto:pkarampu at redhat.com>>>>,
>>                 gluster-users
>>                 <gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>         <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>>>
>>
>>
>>
>>
>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>                 <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar
at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar
at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>
>>                 wrote:
>>
>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>
>>                         >From the log snippet:
>>
>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>
>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>                         0-management: Received add brick req
>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>
>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>                         0-management: replica-count is 2
>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>
>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>                 0-management:
>>
>>                         The last log entry indicates that we hit the
>>         code path in
>>                         gd_addbr_validate_replica_count ()
>>
>>                                         if (replica_count =>>    
volinfo->replica_count) {
>>                                                 if (!(total_bricks %
>>                         volinfo->dist_leaf_count)) {
>>                                                         ret = 1;
>>                                                         goto out;
>>                         }
>>                                         }
>>
>>
>>                     It seems unlikely that this snippet was hit
>>         because we print
>>                 the E
>>                     [MSGID: 106291] in the above message only if
ret==-1.
>>                     gd_addbr_validate_replica_count() returns -1 and
>>         yet not
>>                 populates
>>                     err_str only when in volinfo->type doesn't
match
>>         any of the
>>                 known
>>                     volume types, so volinfo->type is corrupted
perhaps?
>>
>>
>>                 You are right, I missed that ret is set to 1 here in
>>         the above
>>                 snippet.
>>
>>                 @Milos - Can you please provide us the volume info
>>         file from
>>                 /var/lib/glusterd/vols/<volname>/ from all the
three
>>         nodes to
>>                 continue
>>                 the analysis?
>>
>>
>>
>>                     -Ravi
>>
>>                         @Pranith, Ravi - Milos was trying to convert a
>>         dist (1 X 1)
>>                         volume to a replicate (1 X 2) using add brick
>>         and hit
>>                 this issue
>>                         where add-brick failed. The cluster is
>>         operating with 3.7.6.
>>                         Could you help on what scenario this code path
>>         can be
>>                 hit? One
>>                         straight forward issue I see here is missing
>>         err_str in
>>                 this path.
>>
>>
>>
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>
>>         --
>>
>>         ~ Atin (atinm)
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>
>

Miloš Čučulović - MDPI

2016-Dec-08 16:14 UTC

head link

[Gluster-users] Fwd: Replica brick not working

I was able to fix the sync by rsync-ing all the directories, then the 
hale started. The next problem :), as soon as there are files on the new 
brick, the gluster mount will render also this one for mounts, and the 
new brick is not ready yet, as the sync is not yet done, so it results 
on missing files on client side. I temporary removed the new brick, now 
I am running a manual rsync and will add the brick again, hope this 
could work.

What mechanism is managing this issue, I guess there is something per 
built to make a replica brick available only once the data is completely 
synced.

- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 08.12.2016 16:17, Ravishankar N wrote:> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>
>>
>> On Thu, Dec 8, 2016 at 6:44 PM, Milo? ?u?ulovi? - MDPI
>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
wrote:
>>
>>     Ah, damn! I found the issue. On the storage server, the storage2
>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>     file, sorry for that :(
>>
>>     I was able to add the brick now, I started the heal, but still no
>>     data transfer visible.
>>
> 1. Are the files getting created on the new brick though?
> 2. Can you provide the output of `getfattr -d -m . -e hex
> /data/data-cluster` on both bricks?
> 3. Is it possible to attach gdb to the self-heal daemon on the original
> (old) brick and get a backtrace?
>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>      thread apply all bt  -->share this output
>     quit gdb.
>
>
> -Ravi
>>
>> @Ravi/Pranith - can you help here?
>>
>>
>>
>>     By doing gluster volume status, I have
>>
>>     Status of volume: storage
>>     Gluster process                       TCP Port  RDMA Port  Online 
Pid
>>    
------------------------------------------------------------------------------
>>     Brick storage2:/data/data-cluster     49152     0          Y
>>      23101
>>     Brick storage:/data/data-cluster      49152     0          Y
>>      30773
>>     Self-heal Daemon on localhost         N/A       N/A        Y
>>      30050
>>     Self-heal Daemon on storage           N/A       N/A        Y
>>      30792
>>
>>
>>     Any idea?
>>
>>     On storage I have:
>>     Number of Peers: 1
>>
>>     Hostname: 195.65.194.217
>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>     State: Peer in Cluster (Connected)
>>
>>
>>     - Kindest regards,
>>
>>     Milos Cuculovic
>>     IT Manager
>>
>>     ---
>>     MDPI AG
>>     Postfach, CH-4020 Basel, Switzerland
>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>     Tel. +41 61 683 77 35
>>     Fax +41 61 302 89 18
>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>     Skype: milos.cuculovic.mdpi
>>
>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>
>>         Can you resend the attachment as zip? I am unable to extract
the
>>         content? We shouldn't have 0 info file. What does gluster
peer
>>         status
>>         output say?
>>
>>         On Thu, Dec 8, 2016 at 4:51 PM, Milo? ?u?ulovi? - MDPI
>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>> wrote:
>>
>>             I hope you received my last email Atin, thank you!
>>
>>             - Kindest regards,
>>
>>             Milos Cuculovic
>>             IT Manager
>>
>>             ---
>>             MDPI AG
>>             Postfach, CH-4020 Basel, Switzerland
>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>             Tel. +41 61 683 77 35
>>             Fax +41 61 302 89 18
>>             Email: cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>
>>             Skype: milos.cuculovic.mdpi
>>
>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>
>>
>>                 ---------- Forwarded message ----------
>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>> <mailto:amukherj
at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>>>>
>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>                 Subject: Re: [Gluster-users] Replica brick not working
>>                 To: Ravishankar N <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at
redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>                 Cc: Milo? ?u?ulovi? - MDPI <cuculovic at mdpi.com
>>         <mailto:cuculovic at mdpi.com>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic
at mdpi.com>>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic
at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at
mdpi.com>>>>,
>>                 Pranith Kumar Karampuri
>>                 <pkarampu at redhat.com <mailto:pkarampu at
redhat.com>
>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at
redhat.com>>
>>                 <mailto:pkarampu at redhat.com
>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at
redhat.com
>>         <mailto:pkarampu at redhat.com>>>>,
>>                 gluster-users
>>                 <gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>         <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>>>
>>
>>
>>
>>
>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>                 <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar
at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar
at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>
>>                 wrote:
>>
>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>
>>                         >From the log snippet:
>>
>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>
>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>                         0-management: Received add brick req
>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>
>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>                         0-management: replica-count is 2
>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>
>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>                 0-management:
>>
>>                         The last log entry indicates that we hit the
>>         code path in
>>                         gd_addbr_validate_replica_count ()
>>
>>                                         if (replica_count =>>    
volinfo->replica_count) {
>>                                                 if (!(total_bricks %
>>                         volinfo->dist_leaf_count)) {
>>                                                         ret = 1;
>>                                                         goto out;
>>                         }
>>                                         }
>>
>>
>>                     It seems unlikely that this snippet was hit
>>         because we print
>>                 the E
>>                     [MSGID: 106291] in the above message only if
ret==-1.
>>                     gd_addbr_validate_replica_count() returns -1 and
>>         yet not
>>                 populates
>>                     err_str only when in volinfo->type doesn't
match
>>         any of the
>>                 known
>>                     volume types, so volinfo->type is corrupted
perhaps?
>>
>>
>>                 You are right, I missed that ret is set to 1 here in
>>         the above
>>                 snippet.
>>
>>                 @Milos - Can you please provide us the volume info
>>         file from
>>                 /var/lib/glusterd/vols/<volname>/ from all the
three
>>         nodes to
>>                 continue
>>                 the analysis?
>>
>>
>>
>>                     -Ravi
>>
>>                         @Pranith, Ravi - Milos was trying to convert a
>>         dist (1 X 1)
>>                         volume to a replicate (1 X 2) using add brick
>>         and hit
>>                 this issue
>>                         where add-brick failed. The cluster is
>>         operating with 3.7.6.
>>                         Could you help on what scenario this code path
>>         can be
>>                 hit? One
>>                         straight forward issue I see here is missing
>>         err_str in
>>                 this path.
>>
>>
>>
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>
>>         --
>>
>>         ~ Atin (atinm)
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>
>

Gluster users - Dec 2016 - Fwd: Replica brick not working

[Gluster-users] Fwd: Replica brick not working

[Gluster-users] Fwd: Replica brick not working

[Gluster-users] Fwd: Replica brick not working