Hello, I'm running into some serious problems with Gluster + CTDB and Samba. What I have: A two node replicated gluster cluster set up to share volumes using Samba setup according to this guide: https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf When we edit or copy files into the volume via SMB (from a Windows client accessing through a samba file share) this inevitably leads to a split-brain scenario. For example: gluster> volume heal fl-webroot info Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/ <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5> <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8> Number of entries: 2 Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/ /LandingPage_Saturn_Production/images /LandingPage_Saturn_Production /LandingPage_Saturn_Production/Services/v2 /LandingPage_Saturn_Production/images/country/be /LandingPage_Saturn_Production/bin /LandingPage_Saturn_Production/Services /LandingPage_Saturn_Production/images/generic /LandingPage_Saturn_Production/aspnet_client/system_web /LandingPage_Saturn_Production/images/country /LandingPage_Saturn_Production/Scripts /LandingPage_Saturn_Production/aspnet_client /LandingPage_Saturn_Production/images/country/fr Number of entries: 12 gluster> volume heal fl-webroot info Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/ <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5> <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8> Number of entries: 2 Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/ /LandingPage_Saturn_Production/images /LandingPage_Saturn_Production /LandingPage_Saturn_Production/Services/v2 /LandingPage_Saturn_Production/images/country/be /LandingPage_Saturn_Production/bin /LandingPage_Saturn_Production/Services /LandingPage_Saturn_Production/images/generic /LandingPage_Saturn_Production/aspnet_client/system_web /LandingPage_Saturn_Production/images/country /LandingPage_Saturn_Production/Scripts /LandingPage_Saturn_Production/aspnet_client /LandingPage_Saturn_Production/images/country/fr Sometimes self-heal works, sometimes it doesn't: [2014-08-06 19:32:17.986790] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-fl-webroot-replicate-0: entry self heal failed, on /LandingPage_Saturn_Production/Services/v2 [2014-08-06 19:32:18.008330] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-fl-webroot-client-0: remote operation failed: No such file or directory. Path: <gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282> (a89d7a07-2e3d-41ee-adcc-cb2fba3d2282) [2014-08-06 19:32:18.024057] I [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-fl-webroot-replicate-0: gfid or missing entry self heal is started, metadata self heal is successfully completed, backgroung data self heal is successfully completed, data self heal from fl-webroot-client-1 to sinks fl-webroot-client-0, with 0 bytes on fl-webroot-client-0, 168 bytes on fl-webroot-client-1, data - Pending matrix: [ [ 0 0 ] [ 1 0 ] ] metadata self heal from source fl-webroot-client-1 to fl-webroot-client-0, metadata - Pending matrix: [ [ 0 0 ] [ 2 0 ] ], on /LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx *More seriously, some files are simply missing on one of the nodes without any error in the logs or notice when running gluster volume heal $volume info.* Of course I can provide any log file necessary. -- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/e1da3016/attachment.html>
Tiemen Ruiten
2014-Aug-06 20:10 UTC
[Gluster-users] frequent split-brain with Gluster + Samba + Win client
Sorry, I seem to have messed up the subject. I should add, I'm mounting these volumes through GlusterFS FUSE, not the Samba VFS plugin. On 06-08-14 21:47, Tiemen Ruiten wrote:> Hello, > > I'm running into some serious problems with Gluster + CTDB and Samba. > What I have: > > A two node replicated gluster cluster set up to share volumes using > Samba setup according to this guide: > https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf > > When we edit or copy files into the volume via SMB (from a Windows > client accessing through a samba file share) this inevitably leads to > a split-brain scenario. For example: > > gluster> volume heal fl-webroot info > Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/ > <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5> > <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8> > Number of entries: 2 > > Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/ > /LandingPage_Saturn_Production/images > /LandingPage_Saturn_Production > /LandingPage_Saturn_Production/Services/v2 > /LandingPage_Saturn_Production/images/country/be > /LandingPage_Saturn_Production/bin > /LandingPage_Saturn_Production/Services > /LandingPage_Saturn_Production/images/generic > /LandingPage_Saturn_Production/aspnet_client/system_web > /LandingPage_Saturn_Production/images/country > /LandingPage_Saturn_Production/Scripts > /LandingPage_Saturn_Production/aspnet_client > /LandingPage_Saturn_Production/images/country/fr > Number of entries: 12 > > gluster> volume heal fl-webroot info > Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/ > <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5> > <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8> > Number of entries: 2 > > Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/ > /LandingPage_Saturn_Production/images > /LandingPage_Saturn_Production > /LandingPage_Saturn_Production/Services/v2 > /LandingPage_Saturn_Production/images/country/be > /LandingPage_Saturn_Production/bin > /LandingPage_Saturn_Production/Services > /LandingPage_Saturn_Production/images/generic > /LandingPage_Saturn_Production/aspnet_client/system_web > /LandingPage_Saturn_Production/images/country > /LandingPage_Saturn_Production/Scripts > /LandingPage_Saturn_Production/aspnet_client > /LandingPage_Saturn_Production/images/country/fr > > > > Sometimes self-heal works, sometimes it doesn't: > > [2014-08-06 19:32:17.986790] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-fl-webroot-replicate-0: entry self heal failed, on > /LandingPage_Saturn_Production/Services/v2 > [2014-08-06 19:32:18.008330] W > [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-fl-webroot-client-0: > remote operation failed: No such file or directory. Path: > <gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282> > (a89d7a07-2e3d-41ee-adcc-cb2fba3d2282) > [2014-08-06 19:32:18.024057] I > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-fl-webroot-replicate-0: gfid or missing entry self heal is > started, metadata self heal is successfully completed, backgroung > data self heal is successfully completed, data self heal from > fl-webroot-client-1 to sinks fl-webroot-client-0, with 0 bytes on > fl-webroot-client-0, 168 bytes on fl-webroot-client-1, data - Pending > matrix: [ [ 0 0 ] [ 1 0 ] ] metadata self heal from source > fl-webroot-client-1 to fl-webroot-client-0, metadata - Pending > matrix: [ [ 0 0 ] [ 2 0 ] ], on > /LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx > > *More seriously, some files are simply missing on one of the nodes > without any error in the logs or notice when running gluster volume > heal $volume info.* > > Of course I can provide any log file necessary. > > -- > Tiemen Ruiten > Systems Engineer > R&D Media-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/56adb445/attachment.html>