thr3ads.net - Gluster users - [Gluster-users] one brick one volume process dies? [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2017-Sep-13 11:01 UTC

[Gluster-users] one brick one volume process dies?

Additionally the brick log file of the same brick would be required. Please
look for if brick process went down or crashed. Doing a volume start force
should resolve the issue.

On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav <gyadav at redhat.com> wrote:
> Please send me the logs as well i.e glusterd.logs and cmd_history.log.
>
>
> On Wed, Sep 13, 2017 at 1:45 PM, lejeczek <peljasz at yahoo.co.uk>
wrote:
>
>>
>>
>> On 13/09/17 06:21, Gaurav Yadav wrote:
>>
> Please provide the output of gluster volume info, gluster volume status
>>> and gluster peer status.
>>>
>>> Apart  from above info, please provide glusterd logs,
cmd_history.log.
>>>
>>> Thanks
>>> Gaurav
>>>
>>> On Tue, Sep 12, 2017 at 2:22 PM, lejeczek <peljasz at
yahoo.co.uk <mailto:
>>> peljasz at yahoo.co.uk>> wrote:
>>>
>>>     hi everyone
>>>
>>>     I have 3-peer cluster with all vols in replica mode, 9
>>>     vols.
>>>     What I see, unfortunately, is one brick fails in one
>>>     vol, when it happens it's always the same vol on the
>>>     same brick.
>>>     Command: gluster vol status $vol - would show brick
>>>     not online.
>>>     Restarting glusterd with systemclt does not help, only
>>>     system reboot seem to help, until it happens, next time.
>>>
>>>     How to troubleshoot this weird misbehaviour?
>>>     many thanks, L.
>>>
>>>     .
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org
>>>
>>     <mailto:Gluster-users at gluster.org>
>>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>>
>>>
>> hi, here:
>>
>> $ gluster vol info C-DATA
>>
>> Volume Name: C-DATA
>> Type: Replicate
>> Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Brick2: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Brick3: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Options Reconfigured:
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> performance.io-thread-count: 64
>> performance.cache-size: 128MB
>> cluster.self-heal-daemon: enable
>> features.quota-deem-statfs: on
>> changelog.changelog: on
>> geo-replication.ignore-pid-check: on
>> geo-replication.indexing: on
>> features.inode-quota: on
>> features.quota: on
>> performance.readdir-ahead: on
>> nfs.disable: on
>> transport.address-family: inet
>> performance.cache-samba-metadata: on
>>
>>
>> $ gluster vol status C-DATA
>> Status of volume: C-DATA
>> Gluster process                             TCP Port  RDMA Port Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>> TERs/0GLUSTER-C-DATA                     N/A       N/A N       N/A
>> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>> STERs/0GLUSTER-C-DATA                    49152     0 Y       9376
>> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>> TERs/0GLUSTER-C-DATA                     49152     0 Y       8638
>> Self-heal Daemon on localhost               N/A       N/A Y      
387879
>> Quota Daemon on localhost                   N/A       N/A Y      
387891
>> Self-heal Daemon on rider.private.ccnr.ceb.
>> private.cam.ac.uk                           N/A       N/A Y       16439
>> Quota Daemon on rider.private.ccnr.ceb.priv
>> ate.cam.ac.uk                               N/A       N/A Y       16451
>> Self-heal Daemon on 10.5.6.32               N/A       N/A Y       7708
>> Quota Daemon on 10.5.6.32                   N/A       N/A Y       8623
>> Self-heal Daemon on 10.5.6.17               N/A       N/A Y       20549
>> Quota Daemon on 10.5.6.17                   N/A       N/A Y       9337
>>
>> Task Status of Volume C-DATA
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>
>
>>
>>
>>
>> .
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-- 
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170913/07d00115/attachment.html>

Ben Werthmann

2017-Sep-13 19:47 UTC

head link

[Gluster-users] one brick one volume process dies?

These symptoms appear to be the same as I've recorded in this post:

http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html

On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee <atin.mukherjee83 at
gmail.com>
wrote:
> Additionally the brick log file of the same brick would be required.
> Please look for if brick process went down or crashed. Doing a volume start
> force should resolve the issue.
>
> On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav <gyadav at redhat.com>
wrote:
>
>> Please send me the logs as well i.e glusterd.logs and cmd_history.log.
>>
>>
>> On Wed, Sep 13, 2017 at 1:45 PM, lejeczek <peljasz at
yahoo.co.uk> wrote:
>>
>>>
>>>
>>> On 13/09/17 06:21, Gaurav Yadav wrote:
>>>
>> Please provide the output of gluster volume info, gluster volume status
>>>> and gluster peer status.
>>>>
>>>> Apart  from above info, please provide glusterd logs,
cmd_history.log.
>>>>
>>>> Thanks
>>>> Gaurav
>>>>
>>>> On Tue, Sep 12, 2017 at 2:22 PM, lejeczek <peljasz at
yahoo.co.uk <mailto:
>>>> peljasz at yahoo.co.uk>> wrote:
>>>>
>>>>     hi everyone
>>>>
>>>>     I have 3-peer cluster with all vols in replica mode, 9
>>>>     vols.
>>>>     What I see, unfortunately, is one brick fails in one
>>>>     vol, when it happens it's always the same vol on the
>>>>     same brick.
>>>>     Command: gluster vol status $vol - would show brick
>>>>     not online.
>>>>     Restarting glusterd with systemclt does not help, only
>>>>     system reboot seem to help, until it happens, next time.
>>>>
>>>>     How to troubleshoot this weird misbehaviour?
>>>>     many thanks, L.
>>>>
>>>>     .
>>>>     _______________________________________________
>>>>     Gluster-users mailing list
>>>>     Gluster-users at gluster.org
>>>>
>>>     <mailto:Gluster-users at gluster.org>
>>>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>    
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>>
>>>>
>>> hi, here:
>>>
>>> $ gluster vol info C-DATA
>>>
>>> Volume Name: C-DATA
>>> Type: Replicate
>>> Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>> Brick2: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>> Brick3: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>> Options Reconfigured:
>>> performance.md-cache-timeout: 600
>>> performance.cache-invalidation: on
>>> performance.stat-prefetch: on
>>> features.cache-invalidation-timeout: 600
>>> features.cache-invalidation: on
>>> performance.io-thread-count: 64
>>> performance.cache-size: 128MB
>>> cluster.self-heal-daemon: enable
>>> features.quota-deem-statfs: on
>>> changelog.changelog: on
>>> geo-replication.ignore-pid-check: on
>>> geo-replication.indexing: on
>>> features.inode-quota: on
>>> features.quota: on
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> transport.address-family: inet
>>> performance.cache-samba-metadata: on
>>>
>>>
>>> $ gluster vol status C-DATA
>>> Status of volume: C-DATA
>>> Gluster process                             TCP Port  RDMA Port
Online
>>> Pid
>>> ------------------------------------------------------------
>>> ------------------
>>> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>>> TERs/0GLUSTER-C-DATA                     N/A       N/A N       N/A
>>> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>>> STERs/0GLUSTER-C-DATA                    49152     0 Y       9376
>>> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>>> TERs/0GLUSTER-C-DATA                     49152     0 Y       8638
>>> Self-heal Daemon on localhost               N/A       N/A Y      
387879
>>> Quota Daemon on localhost                   N/A       N/A Y      
387891
>>> Self-heal Daemon on rider.private.ccnr.ceb.
>>> private.cam.ac.uk                           N/A       N/A Y      
16439
>>> Quota Daemon on rider.private.ccnr.ceb.priv
>>> ate.cam.ac.uk                               N/A       N/A Y      
16451
>>> Self-heal Daemon on 10.5.6.32               N/A       N/A Y      
7708
>>> Quota Daemon on 10.5.6.32                   N/A       N/A Y      
8623
>>> Self-heal Daemon on 10.5.6.17               N/A       N/A Y      
20549
>>> Quota Daemon on 10.5.6.17                   N/A       N/A Y      
9337
>>>
>>> Task Status of Volume C-DATA
>>> ------------------------------------------------------------
>>> ------------------
>>> There are no active volume tasks
>>
>>
>>>
>>>
>>>
>>> .
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> --Atin
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170913/06fae6fd/attachment.html>

lejeczek

2017-Sep-28 16:05 UTC

head link

[Gluster-users] one brick one volume process dies?

On 13/09/17 20:47, Ben Werthmann wrote:> These symptoms appear to be the same as I've recorded in 
> this post:
>
> http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html
>
> On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee 
> <atin.mukherjee83 at gmail.com 
> <mailto:atin.mukherjee83 at gmail.com>> wrote:
>
>     Additionally the brick log file of the same brick
>     would be required. Please look for if brick process
>     went down or crashed. Doing a volume start force
>     should resolve the issue.
>
When I do: vol start force I see this between the lines:

[2017-09-28 16:00:55.120726] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: 
Stopping glustershd daemon running in pid: 308300
[2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv] 
0-glustershd: readv on 
/var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket 
failed (No data available)
[2017-09-28 16:00:56.122687] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: 
glustershd service is stopped

funnily(or not) I now see, a week after:

gluster vol status CYTO-DATA
Status of volume: CYTO-DATA
Gluster process???????????????????????????? TCP Port? RDMA 
Port Online? Pid
------------------------------------------------------------------------------
Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA???????????????????? 49161???? 0 
Y?????? 1743719
Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-CYTO-DATA??????????????????? 49152???? 0 
Y?????? 20438
Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA???????????????????? 49152???? 0 
Y?????? 5607
Self-heal Daemon on localhost?????????????? N/A?????? N/A 
Y?????? 41106
Quota Daemon on localhost?????????????????? N/A?????? N/A 
Y?????? 41117
Self-heal Daemon on 10.5.6.17?????????????? N/A?????? N/A 
Y?????? 19088
Quota Daemon on 10.5.6.17?????????????????? N/A?????? N/A 
Y?????? 19097
Self-heal Daemon on 10.5.6.32?????????????? N/A?????? N/A 
Y?????? 1832978
Quota Daemon on 10.5.6.32?????????????????? N/A?????? N/A 
Y?????? 1832987
Self-heal Daemon on 10.5.6.49?????????????? N/A?????? N/A 
Y?????? 320291
Quota Daemon on 10.5.6.49?????????????????? N/A?????? N/A 
Y?????? 320303

Task Status of Volume CYTO-DATA
------------------------------------------------------------------------------
There are no active volume tasks


$ gluster vol heal CYTO-DATA info
Brick 
10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
Status: Transport endpoint is not connected
Number of entries: -

Brick 
10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
....
....

>     On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
>     <gyadav at redhat.com <mailto:gyadav at redhat.com>> wrote:
>
>         Please send me the logs as well i.e glusterd.logs
>         and cmd_history.log.
>
>
>         On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
>         <peljasz at yahoo.co.uk <mailto:peljasz at
yahoo.co.uk>>
>         wrote:
>
>
>
>             On 13/09/17 06:21, Gaurav Yadav wrote:
>
>                 Please provide the output of gluster
>                 volume info, gluster volume status and
>                 gluster peer status.
>
>                 Apart? from above info, please provide
>                 glusterd logs, cmd_history.log.
>
>                 Thanks
>                 Gaurav
>
>                 On Tue, Sep 12, 2017 at 2:22 PM, lejeczek
>                 <peljasz at yahoo.co.uk
>                 <mailto:peljasz at yahoo.co.uk>
>                 <mailto:peljasz at yahoo.co.uk
>                 <mailto:peljasz at yahoo.co.uk>>> wrote:
>
>                 ? ? hi everyone
>
>                 ? ? I have 3-peer cluster with all vols in
>                 replica mode, 9
>                 ? ? vols.
>                 ? ? What I see, unfortunately, is one
>                 brick fails in one
>                 ? ? vol, when it happens it's always the
>                 same vol on the
>                 ? ? same brick.
>                 ? ? Command: gluster vol status $vol -
>                 would show brick
>                 ? ? not online.
>                 ? ? Restarting glusterd with systemclt
>                 does not help, only
>                 ? ? system reboot seem to help, until it
>                 happens, next time.
>
>                 ? ? How to troubleshoot this weird
>                 misbehaviour?
>                 ? ? many thanks, L.
>
>                 ? ? .
>                 ? ?
>                 _______________________________________________
>                 ? ? Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>
>                 ? ? <mailto:Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>>
>                 http://lists.gluster.org/mailman/listinfo/gluster-users
>                
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>                 ? ?
>                 <http://lists.gluster.org/mailman/listinfo/gluster-users
>                
<http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>             hi, here:
>
>             $ gluster vol info C-DATA
>
>             Volume Name: C-DATA
>             Type: Replicate
>             Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>             Status: Started
>             Snapshot Count: 0
>             Number of Bricks: 1 x 3 = 3
>             Transport-type: tcp
>             Bricks:
>             Brick1:
>             10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Brick2:
>             10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Brick3:
>             10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Options Reconfigured:
>             performance.md-cache-timeout: 600
>             performance.cache-invalidation: on
>             performance.stat-prefetch: on
>             features.cache-invalidation-timeout: 600
>             features.cache-invalidation: on
>             performance.io-thread-count: 64
>             performance.cache-size: 128MB
>             cluster.self-heal-daemon: enable
>             features.quota-deem-statfs: on
>             changelog.changelog: on
>             geo-replication.ignore-pid-check: on
>             geo-replication.indexing: on
>             features.inode-quota: on
>             features.quota: on
>             performance.readdir-ahead: on
>             nfs.disable: on
>             transport.address-family: inet
>             performance.cache-samba-metadata: on
>
>
>             $ gluster vol status C-DATA
>             Status of volume: C-DATA
>             Gluster process TCP Port? RDMA Port Online? Pid
>            
------------------------------------------------------------------------------
>             Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>             TERs/0GLUSTER-C-DATA N/A?????? N/A N?????? N/A
>             Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>             STERs/0GLUSTER-C-DATA 49152???? 0 Y?????? 9376
>             Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>             TERs/0GLUSTER-C-DATA 49152???? 0 Y?????? 8638
>             Self-heal Daemon on localhost N/A?????? N/A
>             Y?????? 387879
>             Quota Daemon on localhost N/A?????? N/A
>             Y?????? 387891
>             Self-heal Daemon on rider.private.ccnr.ceb.
>             private.cam.ac.uk <http://private.cam.ac.uk>
>             N/A?????? N/A Y?????? 16439
>             Quota Daemon on rider.private.ccnr.ceb.priv
>             ate.cam.ac.uk <http://ate.cam.ac.uk> N/A??????
>             N/A Y?????? 16451
>             Self-heal Daemon on 10.5.6.32 N/A?????? N/A
>             Y?????? 7708
>             Quota Daemon on 10.5.6.32 N/A?????? N/A
>             Y?????? 8623
>             Self-heal Daemon on 10.5.6.17 N/A?????? N/A
>             Y?????? 20549
>             Quota Daemon on 10.5.6.17 N/A?????? N/A
>             Y?????? 9337
>
>             Task Status of Volume C-DATA
>            
------------------------------------------------------------------------------
>             There are no active volume tasks
>
>
>
>
>
>             .
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org
>             <mailto:Gluster-users at gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>     -- 
>     --Atin
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>

.

Possibly Parallel Threads

Search for more apparently analagous threads

Gluster users - Sep 2017 - one brick one volume process dies?

[Gluster-users] one brick one volume process dies?

[Gluster-users] one brick one volume process dies?

[Gluster-users] one brick one volume process dies?

Possibly Parallel Threads