Jan Wrona
2017-Jul-18 08:48 UTC
[Gluster-users] Sporadic Bus error on mmap() on FUSE mount
Hi, I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses memory-mapped file IO extensively (I know I can recompile rrdtool with mmap() disabled, but that is just a workaround). I have three FUSE mount points on three different servers, on one of them the command "rrdtool create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is killed and Bus error is reported. With every Bus error, following two lines rise in the mount log: [2017-07-18 08:30:22.470770] E [MSGID: 108008] [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0: Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain observed. [Input/output error] [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk] 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error) I'm not sure about current state of mmap() on FUSE and Gluster, but its strange that it works only on certain mount of the same volume. version: glusterfs 3.10.3 [root at dc1]# gluster volume info flow Volume Name: flow Type: Distributed-Replicate Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir Options Reconfigured: performance.parallel-readdir: on performance.client-io-threads: on cluster.nufa: enable network.ping-timeout: 10 transport.address-family: inet nfs.disable: true [root at dc1]# gluster volume status flow Status of volume: flow Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dc1.liberouter.org:/data/glusterfs/fl ow/brick1/safety_dir 49155 0 Y 26441 Brick dc2.liberouter.org:/data/glusterfs/fl ow/brick2/safety_dir 49155 0 Y 26110 Brick dc2.liberouter.org:/data/glusterfs/fl ow/brick1/safety_dir 49156 0 Y 26129 Brick dc3.liberouter.org:/data/glusterfs/fl ow/brick2/safety_dir 49152 0 Y 8703 Brick dc3.liberouter.org:/data/glusterfs/fl ow/brick1/safety_dir 49153 0 Y 8722 Brick dc1.liberouter.org:/data/glusterfs/fl ow/brick2/safety_dir 49156 0 Y 26460 Self-heal Daemon on localhost N/A N/A Y 26493 Self-heal Daemon on dc2.liberouter.org N/A N/A Y 26151 Self-heal Daemon on dc3.liberouter.org N/A N/A Y 8744 Task Status of Volume flow ------------------------------------------------------------------------------ There are no active volume tasks
Niels de Vos
2017-Jul-18 10:17 UTC
[Gluster-users] Sporadic Bus error on mmap() on FUSE mount
On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote:> Hi, > > I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses > memory-mapped file IO extensively (I know I can recompile rrdtool with > mmap() disabled, but that is just a workaround). I have three FUSE mount > points on three different servers, on one of them the command "rrdtool > create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U > RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is > killed and Bus error is reported. With every Bus error, following two lines > rise in the mount log: > [2017-07-18 08:30:22.470770] E [MSGID: 108008] > [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0: > Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain > observed. [Input/output error] > [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk] > 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error) > > I'm not sure about current state of mmap() on FUSE and Gluster, but its > strange that it works only on certain mount of the same volume.This can be caused when a mmap()'d region is not written. For example, trying to read/write the mmap()'d region that is after the end-of-file. I've seen issues like this before (long ago), and that got fixed in the write-behind xlator. Could you disable the performance.write-behind option for the volume and try to reproduce the problem? If the issue is in write-behind, disabling it should prevent the issue. If this helps, please file a bug with strace of the application and tcpdump that contains the GlusterFS traffic from start to end when the problem is observed. https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind HTH, Niels> > version: glusterfs 3.10.3 > > [root at dc1]# gluster volume info flow > Volume Name: flow > Type: Distributed-Replicate > Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Options Reconfigured: > performance.parallel-readdir: on > performance.client-io-threads: on > cluster.nufa: enable > network.ping-timeout: 10 > transport.address-family: inet > nfs.disable: true > > [root at dc1]# gluster volume status flow > Status of volume: flow > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick dc1.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49155 0 Y 26441 > Brick dc2.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49155 0 Y 26110 > Brick dc2.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49156 0 Y 26129 > Brick dc3.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49152 0 Y 8703 > Brick dc3.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49153 0 Y 8722 > Brick dc1.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49156 0 Y 26460 > Self-heal Daemon on localhost N/A N/A Y 26493 > Self-heal Daemon on dc2.liberouter.org N/A N/A Y 26151 > Self-heal Daemon on dc3.liberouter.org N/A N/A Y 8744 > > Task Status of Volume flow > ------------------------------------------------------------------------------ > There are no active volume tasks > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170718/6526cee2/attachment.sig>
Jan Wrona
2017-Jul-18 11:55 UTC
[Gluster-users] Sporadic Bus error on mmap() on FUSE mount
On 18.7.2017 12:17, Niels de Vos wrote:> On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote: >> Hi, >> >> I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses >> memory-mapped file IO extensively (I know I can recompile rrdtool with >> mmap() disabled, but that is just a workaround). I have three FUSE mount >> points on three different servers, on one of them the command "rrdtool >> create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U >> RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is >> killed and Bus error is reported. With every Bus error, following two lines >> rise in the mount log: >> [2017-07-18 08:30:22.470770] E [MSGID: 108008] >> [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0: >> Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain >> observed. [Input/output error] >> [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk] >> 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error) >> >> I'm not sure about current state of mmap() on FUSE and Gluster, but its >> strange that it works only on certain mount of the same volume. > This can be caused when a mmap()'d region is not written. For example, > trying to read/write the mmap()'d region that is after the end-of-file. > I've seen issues like this before (long ago), and that got fixed in the > write-behind xlator. > > Could you disable the performance.write-behind option for the volume and > try to reproduce the problem? If the issue is in write-behind, disabling > it should prevent the issue. > > If this helps, please file a bug with strace of the application and > tcpdump that contains the GlusterFS traffic from start to end when the > problem is observed.I've disabled the performance.write-behind, umounted, stopped and started the volume, then mounted again, but no effect. After that I've been successively disabling/enabling options and xlators, and I've found that the problem is related to the cluster.nufa option. When NUFA translator is disabled, rrdtool works fine on all mounts. When enabled again, the problem shows up again.> > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind > > HTH, > Niels > > >> version: glusterfs 3.10.3 >> >> [root at dc1]# gluster volume info flow >> Volume Name: flow >> Type: Distributed-Replicate >> Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x 2 = 6 >> Transport-type: tcp >> Bricks: >> Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir >> Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir >> Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir >> Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir >> Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir >> Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir >> Options Reconfigured: >> performance.parallel-readdir: on >> performance.client-io-threads: on >> cluster.nufa: enable >> network.ping-timeout: 10 >> transport.address-family: inet >> nfs.disable: true >> >> [root at dc1]# gluster volume status flow >> Status of volume: flow >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick dc1.liberouter.org:/data/glusterfs/fl >> ow/brick1/safety_dir 49155 0 Y 26441 >> Brick dc2.liberouter.org:/data/glusterfs/fl >> ow/brick2/safety_dir 49155 0 Y 26110 >> Brick dc2.liberouter.org:/data/glusterfs/fl >> ow/brick1/safety_dir 49156 0 Y 26129 >> Brick dc3.liberouter.org:/data/glusterfs/fl >> ow/brick2/safety_dir 49152 0 Y 8703 >> Brick dc3.liberouter.org:/data/glusterfs/fl >> ow/brick1/safety_dir 49153 0 Y 8722 >> Brick dc1.liberouter.org:/data/glusterfs/fl >> ow/brick2/safety_dir 49156 0 Y 26460 >> Self-heal Daemon on localhost N/A N/A Y 26493 >> Self-heal Daemon on dc2.liberouter.org N/A N/A Y 26151 >> Self-heal Daemon on dc3.liberouter.org N/A N/A Y 8744 >> >> Task Status of Volume flow >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users