Andrew McGill
2008-Nov-11 08:59 UTC
[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN
Greetings glusterfs users, I have the errors below in /var/log/glusterfs.log. It's not clear, but I'm guessing that this is simply a network error which was handled adequately by the software -- but it is truly not obvious. * Were these network errors were handled by AFR? * Without AFR the application would I have seen a filesystem error (e.g. "Transport endpoint not connected")? (How about if the network error was on the namespace brick?). * Is there a recommended action for errors in the error log - or some other way of ensuring the integrity of the filesystem (like glusterfsck ...) The volume is defined as ... volume u100-node6 type protocol/client option transport-type tcp/client option transport-timeout 10sec option remote-host node6 option remote-subvolume u100-node6 option username dkpaa option password XXXXXXXXASDBH end-volume volume afr4 type cluster/afr subvolumes u100-node7 u100-node6 end-volume volume unify0 type cluster/unify subvolumes afr0 afr1 afr2 afr3 afr4 option namespace u25-node4 option rr.limits.min-free-disk 5% option scheduler rr end-volume Log file says: 2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing transport 2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup] u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208 2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6: no proper reply from server, returning ENOTCONN 2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz child=u100-node6) op_ret=-1 op_errno=107 2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing transport 2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup] u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8 2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6: no proper reply from server, returning ENOTCONN 2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz child=u100-node6) op_ret=-1 op_errno=107 2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: : returning EBADFD 2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz child=u100-node6) op_ret=-1 op_errno=77 On the server side, it sees the client going away: 2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1001) 2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server: transport_writev failed 2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1021) 2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server: transport_writev failed 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:999) 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1020)
Krishna Srinivas
2008-Nov-11 10:12 UTC
[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN
Hi Andrew, On Tue, Nov 11, 2008 at 2:29 PM, Andrew McGill <list2008 at lunch.za.net> wrote:> Greetings glusterfs users, > > I have the errors below in /var/log/glusterfs.log. It's not clear, but I'm > guessing that this is simply a network error which was handled adequately by > the software -- but it is truly not obvious. > > * Were these network errors were handled by AFR?Yes, if your application did not see any error then it means AFR handled the network errors.> > * Without AFR the application would I have seen a filesystem error > (e.g. "Transport endpoint not connected")?Yes (How about if the network error> was on the namespace brick?).If NS is not afr'ed and NS gets disconnected glusterfs will not work.> > * Is there a recommended action for errors in the error log - or some other > way of ensuring the integrity of the filesystem (like glusterfsck ...)When the disconnected server comes back up and the file gets opened the outdated file will automatically be replaced by the newer copy. So things will get auto healed. Krishna> > > The volume is defined as ... > > volume u100-node6 > type protocol/client > option transport-type tcp/client > option transport-timeout 10sec > option remote-host node6 > option remote-subvolume u100-node6 > option username dkpaa > option password XXXXXXXXASDBH > end-volume > > volume afr4 > type cluster/afr > subvolumes u100-node7 u100-node6 > end-volume > > volume unify0 > type cluster/unify > subvolumes afr0 afr1 afr2 afr3 afr4 > option namespace u25-node4 > option rr.limits.min-free-disk 5% > option scheduler rr > end-volume > > Log file says: > > 2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing > transport > 2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup] > u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208 > 2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6: no > proper reply from server, returning ENOTCONN > 2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4: > (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz > child=u100-node6) op_ret=-1 op_errno=107 > > 2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing > transport > 2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup] > u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8 > 2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6: no > proper reply from server, returning ENOTCONN > 2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4: > (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz > child=u100-node6) op_ret=-1 op_errno=107 > > 2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: : > returning EBADFD > 2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4: > (path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz > child=u100-node6) op_ret=-1 op_errno=77 > > > On the server side, it sees the client going away: > > 2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport] server: > EOF from peer (192.168.15.43:1001) > 2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server: > transport_writev failed > > 2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport] server: > EOF from peer (192.168.15.43:1021) > 2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server: > transport_writev failed > > 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: > EOF from peer (192.168.15.43:999) > 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: > EOF from peer (192.168.15.43:1020) > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >