Sahina Bose
2016-Sep-29 11:48 UTC
[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem
Yes, this is a GlusterFS problem. Adding gluster users ML On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari <davide at billymob.com> wrote:> Hello > > maybe this is more glustefs then ovirt related but since OVirt integrates > Gluster management and I'm experiencing the problem in an ovirt cluster, > I'm writing here. > > The problem is simple: I have a data domain mappend on a replica 3 > arbiter1 Gluster volume with 6 bricks, like this: > > Status of volume: data_ssd > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick vm01.storage.billy:/gluster/ssd/data/ > brick 49153 0 Y > 19298 > Brick vm02.storage.billy:/gluster/ssd/data/ > brick 49153 0 Y > 6146 > Brick vm03.storage.billy:/gluster/ssd/data/ > arbiter_brick 49153 0 Y > 6552 > Brick vm03.storage.billy:/gluster/ssd/data/ > brick 49154 0 Y > 6559 > Brick vm04.storage.billy:/gluster/ssd/data/ > brick 49152 0 Y > 6077 > Brick vm02.storage.billy:/gluster/ssd/data/ > arbiter_brick 49154 0 Y > 6153 > Self-heal Daemon on localhost N/A N/A Y > 30746 > Self-heal Daemon on vm01.storage.billy N/A N/A Y > 196058 > Self-heal Daemon on vm03.storage.billy N/A N/A Y > 23205 > Self-heal Daemon on vm04.storage.billy N/A N/A Y > 8246 > > > Now, I've put in maintenance the vm04 host, from ovirt, ticking the "Stop > gluster" checkbox, and Ovirt didn't complain about anything. But when I > tried to run a new VM it complained about "storage I/O problem", while the > storage data status was always UP. > > Looking in the gluster logs I can see this: > > [2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2016-09-29 11:02:28.124151] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing READ on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: > split-brain observed. [Input/output error] > [2016-09-29 11:02:28.126580] W [MSGID: 108008] > [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable > subvolume -1 found with event generation 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. > (Possible split-brain) > [2016-09-29 11:02:28.127374] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing FGETXATTR on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: > split-brain observed. [Input/output error] > [2016-09-29 11:02:28.128130] W [MSGID: 108027] [afr-common.c:2403:afr_discover_done] > 0-data_ssd-replicate-1: no read subvols for (null) > [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk] > 0-glusterfs-fuse: 8201: READ => -1 gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d > fd=0x7f09b749d210 (Input/output error) > [2016-09-29 11:02:28.130824] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing FSTAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: > split-brain observed. [Input/output error] > [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk] > 0-glusterfs-fuse: 8202: FSTAT() /ba2bd397-9222-424d-aecc- > eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ > ff4e49c6-3084-4234-80a1-18a67615c527 => -1 (Input/output error) > The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] > 0-data_ssd-replicate-1: Unreadable subvolume -1 found with event generation > 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)" > repeated 11 times between [2016-09-29 11:02:28.126580] and [2016-09-29 > 11:02:28.517744] > [2016-09-29 11:02:28.518607] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing STAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: > split-brain observed. [Input/output error] > > Now, how is it possible to have a split brain if I stopped just ONE server > which had just ONE of six bricks, and it was cleanly shut down with > maintenance mode from ovirt? > > I created the volume originally this way: > # gluster volume create data_ssd replica 3 arbiter 1 > vm01.storage.billy:/gluster/ssd/data/brick vm02.storage.billy:/gluster/ssd/data/brick > vm03.storage.billy:/gluster/ssd/data/arbiter_brick > vm03.storage.billy:/gluster/ssd/data/brick vm04.storage.billy:/gluster/ssd/data/brick > vm02.storage.billy:/gluster/ssd/data/arbiter_brick > # gluster volume set data_ssd group virt > # gluster volume set data_ssd storage.owner-uid 36 && gluster volume set > data_ssd storage.owner-gid 36 > # gluster volume start data_ssd > > > -- > Davide Ferrari > Senior Systems Engineer > > _______________________________________________ > Users mailing list > Users at ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160929/41c8ac1d/attachment.html>
Ravishankar N
2016-Sep-29 12:16 UTC
[Gluster-users] [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem
On 09/29/2016 05:18 PM, Sahina Bose wrote:> Yes, this is a GlusterFS problem. Adding gluster users ML > > On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari <davide at billymob.com > <mailto:davide at billymob.com>> wrote: > > Hello > > maybe this is more glustefs then ovirt related but since OVirt > integrates Gluster management and I'm experiencing the problem in > an ovirt cluster, I'm writing here. > > The problem is simple: I have a data domain mappend on a replica 3 > arbiter1 Gluster volume with 6 bricks, like this: > > Status of volume: data_ssd > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick vm01.storage.billy:/gluster/ssd/data/ > brick 49153 0 Y 19298 > Brick vm02.storage.billy:/gluster/ssd/data/ > brick 49153 0 Y 6146 > Brick vm03.storage.billy:/gluster/ssd/data/ > arbiter_brick 49153 0 Y 6552 > Brick vm03.storage.billy:/gluster/ssd/data/ > brick 49154 0 Y 6559 > Brick vm04.storage.billy:/gluster/ssd/data/ > brick 49152 0 Y 6077 > Brick vm02.storage.billy:/gluster/ssd/data/ > arbiter_brick 49154 0 Y 6153 > Self-heal Daemon on localhost N/A N/A Y 30746 > Self-heal Daemon on vm01.storage.billy N/A N/A > Y 196058 > Self-heal Daemon on vm03.storage.billy N/A N/A > Y 23205 > Self-heal Daemon on vm04.storage.billy N/A N/A > Y 8246 > > > Now, I've put in maintenance the vm04 host, from ovirt, ticking > the "Stop gluster" checkbox, and Ovirt didn't complain about > anything. But when I tried to run a new VM it complained about > "storage I/O problem", while the storage data status was always UP. > > Looking in the gluster logs I can see this: > > [2016-09-29 11:01:01.556908] I > [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change > in volfile, continuing > [2016-09-29 11:02:28.124151] E [MSGID: 108008] > [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing READ on gfid > bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed. > [Input/output error] > [2016-09-29 11:02:28.126580] W [MSGID: 108008] > [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: > Unreadable subvolume -1 found with event generation 6 for gfid > bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain) > [2016-09-29 11:02:28.127374] E [MSGID: 108008] > [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing FGETXATTR on gfid > bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed. > [Input/output error] > [2016-09-29 11:02:28.128130] W [MSGID: 108027] > [afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no > read subvols for (null) > [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk] > 0-glusterfs-fuse: 8201: READ => -1 > gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210 > (Input/output error) > [2016-09-29 11:02:28.130824] E [MSGID: 108008] > [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing FSTAT on gfid > bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed. > [Input/output error] >Does `gluster volume heal data_ssd info split-brain` report that the file is in split-brain, with vm04 still being down? If yes, could you provide the extended attributes of this gfid from all 3 bricks: getfattr -d -m . -e hex /path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d If no, then I'm guessing that it is not in actual split-brain (hence the 'Possible split-brain' message). If the node you brought down contains the only good copy of the file (i.e the other data brick and arbiter are up, and the arbiter 'blames' this other brick), all I/O is failed with EIO to prevent file from getting into actual split-brain. The heals will happen when the good node comes up and I/O should be allowed again in that case. -Ravi> [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk] > 0-glusterfs-fuse: 8202: FSTAT() > /ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527 > => -1 (Input/output error) > The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] > 0-data_ssd-replicate-1: Unreadable subvolume -1 found with event > generation 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. > (Possible split-brain)" repeated 11 times between [2016-09-29 > 11:02:28.126580] and [2016-09-29 11:02:28.517744] > [2016-09-29 11:02:28.518607] E [MSGID: 108008] > [afr-read-txn.c:89:afr_read_txn_refresh_done] > 0-data_ssd-replicate-1: Failing STAT on gfid > bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed. > [Input/output error] > > Now, how is it possible to have a split brain if I stopped just > ONE server which had just ONE of six bricks, and it was cleanly > shut down with maintenance mode from ovirt? > > I created the volume originally this way: > # gluster volume create data_ssd replica 3 arbiter 1 > vm01.storage.billy:/gluster/ssd/data/brick > vm02.storage.billy:/gluster/ssd/data/brick > vm03.storage.billy:/gluster/ssd/data/arbiter_brick > vm03.storage.billy:/gluster/ssd/data/brick > vm04.storage.billy:/gluster/ssd/data/brick > vm02.storage.billy:/gluster/ssd/data/arbiter_brick > # gluster volume set data_ssd group virt > # gluster volume set data_ssd storage.owner-uid 36 && gluster > volume set data_ssd storage.owner-gid 36 > # gluster volume start data_ssd >> > > -- > Davide Ferrari > Senior Systems Engineer > > _______________________________________________ > Users mailing list > Users at ovirt.org <mailto:Users at ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users > <http://lists.ovirt.org/mailman/listinfo/users> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160929/46d37728/attachment.html>