Matt Waymack
2019-Jan-07 18:44 UTC
[Gluster-users] [External] Re: Input/output error on FUSE log
Yes, all volumes use sharding. From: Davide Obbi <davide.obbi at booking.com> Sent: Monday, January 7, 2019 12:43 PM To: Matt Waymack <mwaymack at nsgdv.com> Cc: Raghavendra Gowdappa <rgowdapp at redhat.com>; gluster-users at gluster.org List <gluster-users at gluster.org> Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log are all the volumes being configured with sharding? On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack <mwaymack at nsgdv.com<mailto:mwaymack at nsgdv.com>> wrote: I think that I can rule out network as I have multiple volumes on the same nodes and not all volumes are affected. Additionally, access via SMB using samba-vfs-glusterfs is not affected, even on the same volumes. This is seemingly only affecting the FUSE clients. From: Davide Obbi <davide.obbi at booking.com<mailto:davide.obbi at booking.com>> Sent: Sunday, January 6, 2019 12:26 PM To: Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> Cc: Matt Waymack <mwaymack at nsgdv.com<mailto:mwaymack at nsgdv.com>>; gluster-users at gluster.org<mailto:gluster-users at gluster.org> List <gluster-users at gluster.org<mailto:gluster-users at gluster.org>> Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log Hi, i would start doing some checks like: "(Input/output error)" seems returned by the operating system, this happens for instance trying to access a file system which is on a device not available so i would check the network connectivity between the client to servers and server to server during the reported time. Regards Davide On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote: On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote: On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack <mwaymack at nsgdv.com<mailto:mwaymack at nsgdv.com>> wrote: Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This looks to be a DHT issue. Some questions: * Are all subvolumes of DHT up and client is connected to them? Particularly the subvolume which contains the file in question. * Can you get all extended attributes of parent directory of the file from all bricks? * set diagnostics.client-log-level to TRACE, capture these errors again and attach the client log file. I spoke a bit early. dht_writev doesn't search hashed subvolume as its already been looked up in lookup. So, these msgs looks to be of a different issue - not writev failure. This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding at 1GB. I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only reflected in the FUSE log. Here is the volume info: Volume Name: gv1 Type: Distributed-Replicate Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: tpc-glus4:/exp/b1/gv1 Brick2: tpc-glus2:/exp/b1/gv1 Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter) Brick4: tpc-glus2:/exp/b2/gv1 Brick5: tpc-glus4:/exp/b2/gv1 Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter) Brick7: tpc-glus4:/exp/b3/gv1 Brick8: tpc-glus2:/exp/b3/gv1 Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter) Brick10: tpc-glus4:/exp/b4/gv1 Brick11: tpc-glus2:/exp/b4/gv1 Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter) Brick13: tpc-glus1:/exp/b5/gv1 Brick14: tpc-glus3:/exp/b5/gv1 Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter) Brick16: tpc-glus1:/exp/b6/gv1 Brick17: tpc-glus3:/exp/b6/gv1 Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter) Brick19: tpc-glus1:/exp/b7/gv1 Brick20: tpc-glus3:/exp/b7/gv1 Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter) Brick22: tpc-glus1:/exp/b8/gv1 Brick23: tpc-glus3:/exp/b8/gv1 Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter) Options Reconfigured: performance.cache-samba-metadata: on performance.cache-invalidation: off features.shard-block-size: 1000MB features.shard: on transport.address-family: inet nfs.disable: on cluster.lookup-optimize: on I'm a bit stumped on this, any help is appreciated. Thank you! _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Davide Obbi Senior System Administrator Booking.com B.V. Vijzelstraat 66-80 Amsterdam 1017HL Netherlands Direct +31207031558 [Booking.com]<https://www.booking.com/> Empowering people to experience the world since 1996 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million reported listings Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) -- Davide Obbi Senior System Administrator Booking.com B.V. Vijzelstraat 66-80 Amsterdam 1017HL Netherlands Direct +31207031558 [Booking.com]<https://www.booking.com/> Empowering people to experience the world since 1996 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million reported listings Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190107/98f02eca/attachment-0001.html>
Davide Obbi
2019-Jan-07 18:46 UTC
[Gluster-users] [External] Re: Input/output error on FUSE log
i guess you tried already unmounting, stop/star and mounting? On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack <mwaymack at nsgdv.com> wrote:> Yes, all volumes use sharding. > > > > *From:* Davide Obbi <davide.obbi at booking.com> > *Sent:* Monday, January 7, 2019 12:43 PM > *To:* Matt Waymack <mwaymack at nsgdv.com> > *Cc:* Raghavendra Gowdappa <rgowdapp at redhat.com>; > gluster-users at gluster.org List <gluster-users at gluster.org> > *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE > log > > > > are all the volumes being configured with sharding? > > > > On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack <mwaymack at nsgdv.com> wrote: > > I think that I can rule out network as I have multiple volumes on the same > nodes and not all volumes are affected. Additionally, access via SMB using > samba-vfs-glusterfs is not affected, even on the same volumes. This is > seemingly only affecting the FUSE clients. > > > > *From:* Davide Obbi <davide.obbi at booking.com> > *Sent:* Sunday, January 6, 2019 12:26 PM > *To:* Raghavendra Gowdappa <rgowdapp at redhat.com> > *Cc:* Matt Waymack <mwaymack at nsgdv.com>; gluster-users at gluster.org List < > gluster-users at gluster.org> > *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE > log > > > > Hi, > > > > i would start doing some checks like: "(Input/output error)" seems > returned by the operating system, this happens for instance trying to > access a file system which is on a device not available so i would check > the network connectivity between the client to servers and server to > server during the reported time. > > > > Regards > > Davide > > > > On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > > > > > On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > > > > > On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack <mwaymack at nsgdv.com> wrote: > > Hi all, > > > > I'm having a problem writing to our volume. When writing files larger > than about 2GB, I get an intermittent issue where the write will fail and > return Input/Output error. This is also shown in the FUSE log of the > client (this is affecting all clients). A snip of a client log is below: > > [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] > 0-glusterfs-fuse: 51040978: WRITE => -1 > gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output > error) > > [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] > 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) > > [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] > 0-glusterfs-fuse: 51041266: WRITE => -1 > gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output > error) > > [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] > 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) > > [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] > 0-glusterfs-fuse: 51041548: WRITE => -1 > gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output > error) > > [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] > 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) > > The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] > 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times > between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] > > The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] > 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 > 22:39:33.925981] and [2019-01-05 22:39:50.451862] > > The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] > 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times > between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] > > > > This looks to be a DHT issue. Some questions: > > * Are all subvolumes of DHT up and client is connected to them? > Particularly the subvolume which contains the file in question. > > * Can you get all extended attributes of parent directory of the file from > all bricks? > > * set diagnostics.client-log-level to TRACE, capture these errors again > and attach the client log file. > > > > I spoke a bit early. dht_writev doesn't search hashed subvolume as its > already been looked up in lookup. So, these msgs looks to be of a different > issue - not writev failure. > > > > > > This is intermittent for most files, but eventually if a file is large > enough it will not write. The workflow is SFTP tot he client which then > writes to the volume over FUSE. When files get to a certain point,w e can > no longer write to them. The file sizes are different as well, so it's not > like they all get to the same size and just stop either. I've ruled out a > free space issue, our files at their largest are only a few hundred GB and > we have tens of terrabytes free on each brick. We are also sharding at 1GB. > > > > I'm not sure where to go from here as the error seems vague and I can only > see it on the client log. I'm not seeing these errors on the nodes > themselves. This is also seen if I mount the volume via FUSE on any of the > nodes as well and it is only reflected in the FUSE log. > > > > Here is the volume info: > > Volume Name: gv1 > > Type: Distributed-Replicate > > Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 8 x (2 + 1) = 24 > > Transport-type: tcp > > Bricks: > > Brick1: tpc-glus4:/exp/b1/gv1 > > Brick2: tpc-glus2:/exp/b1/gv1 > > Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter) > > Brick4: tpc-glus2:/exp/b2/gv1 > > Brick5: tpc-glus4:/exp/b2/gv1 > > Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter) > > Brick7: tpc-glus4:/exp/b3/gv1 > > Brick8: tpc-glus2:/exp/b3/gv1 > > Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter) > > Brick10: tpc-glus4:/exp/b4/gv1 > > Brick11: tpc-glus2:/exp/b4/gv1 > > Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter) > > Brick13: tpc-glus1:/exp/b5/gv1 > > Brick14: tpc-glus3:/exp/b5/gv1 > > Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter) > > Brick16: tpc-glus1:/exp/b6/gv1 > > Brick17: tpc-glus3:/exp/b6/gv1 > > Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter) > > Brick19: tpc-glus1:/exp/b7/gv1 > > Brick20: tpc-glus3:/exp/b7/gv1 > > Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter) > > Brick22: tpc-glus1:/exp/b8/gv1 > > Brick23: tpc-glus3:/exp/b8/gv1 > > Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter) > > Options Reconfigured: > > performance.cache-samba-metadata: on > > performance.cache-invalidation: off > > features.shard-block-size: 1000MB > > features.shard: on > > transport.address-family: inet > > nfs.disable: on > > cluster.lookup-optimize: on > > > > I'm a bit stumped on this, any help is appreciated. Thank you! > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > > *Davide Obbi* > > Senior System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > > Direct +31207031558 > > *[image: Booking.com] <https://www.booking.com/>* > > Empowering people to experience the world since 1996 > > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 > million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) > > > > -- > > *Davide Obbi* > > Senior System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > > Direct +31207031558 > > *[image: Booking.com] <https://www.booking.com/>* > > Empowering people to experience the world since 1996 > > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 > million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) >-- Davide Obbi Senior System Administrator Booking.com B.V. Vijzelstraat 66-80 Amsterdam 1017HL Netherlands Direct +31207031558 [image: Booking.com] <https://www.booking.com/> Empowering people to experience the world since 1996 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million reported listings Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190107/0803b8e7/attachment.html>