thr3ads.net - Gluster users - [Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Sahina Bose

2016-Sep-29 11:48 UTC

[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

Yes, this is a GlusterFS problem. Adding gluster users ML

On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari <davide at billymob.com>
wrote:
> Hello
>
> maybe this is more glustefs then ovirt related but since OVirt integrates
> Gluster management and I'm experiencing the problem in an ovirt
cluster,
> I'm writing here.
>
> The problem is simple: I have a data domain mappend on a replica 3
> arbiter1 Gluster volume with 6 bricks, like this:
>
> Status of volume: data_ssd
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick vm01.storage.billy:/gluster/ssd/data/
> brick                                       49153     0          Y
> 19298
> Brick vm02.storage.billy:/gluster/ssd/data/
> brick                                       49153     0          Y
> 6146
> Brick vm03.storage.billy:/gluster/ssd/data/
> arbiter_brick                               49153     0          Y
> 6552
> Brick vm03.storage.billy:/gluster/ssd/data/
> brick                                       49154     0          Y
> 6559
> Brick vm04.storage.billy:/gluster/ssd/data/
> brick                                       49152     0          Y
> 6077
> Brick vm02.storage.billy:/gluster/ssd/data/
> arbiter_brick                               49154     0          Y
> 6153
> Self-heal Daemon on localhost               N/A       N/A        Y
> 30746
> Self-heal Daemon on vm01.storage.billy      N/A       N/A        Y
> 196058
> Self-heal Daemon on vm03.storage.billy      N/A       N/A        Y
> 23205
> Self-heal Daemon on vm04.storage.billy      N/A       N/A        Y
> 8246
>
>
> Now, I've put in maintenance the vm04 host, from ovirt, ticking the
"Stop
> gluster" checkbox, and Ovirt didn't complain about anything. But
when I
> tried to run a new VM it complained about "storage I/O problem",
while the
> storage data status was always UP.
>
> Looking in the gluster logs I can see this:
>
> [2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-09-29 11:02:28.124151] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing READ on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.126580] W [MSGID: 108008]
> [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable
> subvolume -1 found with event generation 6 for gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d.
> (Possible split-brain)
> [2016-09-29 11:02:28.127374] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing FGETXATTR on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.128130] W [MSGID: 108027]
[afr-common.c:2403:afr_discover_done]
> 0-data_ssd-replicate-1: no read subvols for (null)
> [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
> 0-glusterfs-fuse: 8201: READ => -1
gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
> fd=0x7f09b749d210 (Input/output error)
> [2016-09-29 11:02:28.130824] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing FSTAT on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
> 0-glusterfs-fuse: 8202: FSTAT() /ba2bd397-9222-424d-aecc-
> eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/
> ff4e49c6-3084-4234-80a1-18a67615c527 => -1 (Input/output error)
> The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
> 0-data_ssd-replicate-1: Unreadable subvolume -1 found with event generation
> 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible
split-brain)"
> repeated 11 times between [2016-09-29 11:02:28.126580] and [2016-09-29
> 11:02:28.517744]
> [2016-09-29 11:02:28.518607] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing STAT on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
>
> Now, how is it possible to have a split brain if I stopped just ONE server
> which had just ONE of six bricks, and it was cleanly shut down with
> maintenance mode from ovirt?
>
> I created the volume originally this way:
> # gluster volume create data_ssd replica 3 arbiter 1
> vm01.storage.billy:/gluster/ssd/data/brick
vm02.storage.billy:/gluster/ssd/data/brick
> vm03.storage.billy:/gluster/ssd/data/arbiter_brick
> vm03.storage.billy:/gluster/ssd/data/brick
vm04.storage.billy:/gluster/ssd/data/brick
> vm02.storage.billy:/gluster/ssd/data/arbiter_brick
> # gluster volume set data_ssd group virt
> # gluster volume set data_ssd storage.owner-uid 36 && gluster
volume set
> data_ssd storage.owner-gid 36
> # gluster volume start data_ssd
>
>
> --
> Davide Ferrari
> Senior Systems Engineer
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160929/41c8ac1d/attachment.html>

Ravishankar N

2016-Sep-29 12:16 UTC

head link

[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

On 09/29/2016 05:18 PM, Sahina Bose wrote:> Yes, this is a GlusterFS problem. Adding gluster users ML
>
> On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari <davide at billymob.com 
> <mailto:davide at billymob.com>> wrote:
>
>     Hello
>
>     maybe this is more glustefs then ovirt related but since OVirt
>     integrates Gluster management and I'm experiencing the problem in
>     an ovirt cluster, I'm writing here.
>
>     The problem is simple: I have a data domain mappend on a replica 3
>     arbiter1 Gluster volume with 6 bricks, like this:
>
>     Status of volume: data_ssd
>     Gluster process TCP Port  RDMA Port  Online  Pid
>    
------------------------------------------------------------------------------
>     Brick vm01.storage.billy:/gluster/ssd/data/
>     brick 49153     0          Y       19298
>     Brick vm02.storage.billy:/gluster/ssd/data/
>     brick 49153     0          Y       6146
>     Brick vm03.storage.billy:/gluster/ssd/data/
>     arbiter_brick 49153     0          Y       6552
>     Brick vm03.storage.billy:/gluster/ssd/data/
>     brick 49154     0          Y       6559
>     Brick vm04.storage.billy:/gluster/ssd/data/
>     brick 49152     0          Y       6077
>     Brick vm02.storage.billy:/gluster/ssd/data/
>     arbiter_brick 49154     0          Y       6153
>     Self-heal Daemon on localhost N/A       N/A        Y       30746
>     Self-heal Daemon on vm01.storage.billy N/A       N/A       
>     Y       196058
>     Self-heal Daemon on vm03.storage.billy N/A       N/A       
>     Y       23205
>     Self-heal Daemon on vm04.storage.billy N/A       N/A       
>     Y       8246
>
>
>     Now, I've put in maintenance the vm04 host, from ovirt, ticking
>     the "Stop gluster" checkbox, and Ovirt didn't complain
about
>     anything. But when I tried to run a new VM it complained about
>     "storage I/O problem", while the storage data status was
always UP.
>
>     Looking in the gluster logs I can see this:
>
>     [2016-09-29 11:01:01.556908] I
>     [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change
>     in volfile, continuing
>     [2016-09-29 11:02:28.124151] E [MSGID: 108008]
>     [afr-read-txn.c:89:afr_read_txn_refresh_done]
>     0-data_ssd-replicate-1: Failing READ on gfid
>     bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
>     [Input/output error]
>     [2016-09-29 11:02:28.126580] W [MSGID: 108008]
>     [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
>     Unreadable subvolume -1 found with event generation 6 for gfid
>     bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
>     [2016-09-29 11:02:28.127374] E [MSGID: 108008]
>     [afr-read-txn.c:89:afr_read_txn_refresh_done]
>     0-data_ssd-replicate-1: Failing FGETXATTR on gfid
>     bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
>     [Input/output error]
>     [2016-09-29 11:02:28.128130] W [MSGID: 108027]
>     [afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no
>     read subvols for (null)
>     [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
>     0-glusterfs-fuse: 8201: READ => -1
>     gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210
>     (Input/output error)
>     [2016-09-29 11:02:28.130824] E [MSGID: 108008]
>     [afr-read-txn.c:89:afr_read_txn_refresh_done]
>     0-data_ssd-replicate-1: Failing FSTAT on gfid
>     bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
>     [Input/output error]
>
Does `gluster volume heal data_ssd info split-brain` report that the 
file is in split-brain, with vm04 still being down?
If yes, could you provide the extended attributes of this gfid from all 
3 bricks:
getfattr -d -m . -e hex 
/path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d

If no, then I'm guessing that it is not in actual split-brain (hence the 
'Possible split-brain' message). If the node you brought down contains 
the only good copy of the file (i.e the other data brick and arbiter are 
up, and the arbiter 'blames' this other brick), all I/O is failed with 
EIO to prevent file from getting into actual split-brain. The heals will 
happen when the good node comes up and I/O should be allowed again in 
that case.

-Ravi

>     [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
>     0-glusterfs-fuse: 8202: FSTAT()
>    
/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
>     => -1 (Input/output error)
>     The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
>     0-data_ssd-replicate-1: Unreadable subvolume -1 found with event
>     generation 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d.
>     (Possible split-brain)" repeated 11 times between [2016-09-29
>     11:02:28.126580] and [2016-09-29 11:02:28.517744]
>     [2016-09-29 11:02:28.518607] E [MSGID: 108008]
>     [afr-read-txn.c:89:afr_read_txn_refresh_done]
>     0-data_ssd-replicate-1: Failing STAT on gfid
>     bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
>     [Input/output error]
>
>     Now, how is it possible to have a split brain if I stopped just
>     ONE server which had just ONE of six bricks, and it was cleanly
>     shut down with maintenance mode from ovirt?
>
>     I created the volume originally this way:
>     # gluster volume create data_ssd replica 3 arbiter 1
>     vm01.storage.billy:/gluster/ssd/data/brick
>     vm02.storage.billy:/gluster/ssd/data/brick
>     vm03.storage.billy:/gluster/ssd/data/arbiter_brick
>     vm03.storage.billy:/gluster/ssd/data/brick
>     vm04.storage.billy:/gluster/ssd/data/brick
>     vm02.storage.billy:/gluster/ssd/data/arbiter_brick
>     # gluster volume set data_ssd group virt
>     # gluster volume set data_ssd storage.owner-uid 36 && gluster
>     volume set data_ssd storage.owner-gid 36
>     # gluster volume start data_ssd
>


>
>
>     -- 
>     Davide Ferrari
>     Senior Systems Engineer
>
>     _______________________________________________
>     Users mailing list
>     Users at ovirt.org <mailto:Users at ovirt.org>
>     http://lists.ovirt.org/mailman/listinfo/users
>     <http://lists.ovirt.org/mailman/listinfo/users>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160929/46d37728/attachment.html>

Gluster users - Sep 2016 - [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem