thr3ads.net - Gluster users - [Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Andrew McGill

2008-Nov-11 08:59 UTC

[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN

Greetings glusterfs users,

I have the errors below in /var/log/glusterfs.log.  It's not clear, but
I'm
guessing that this is simply a network error which was handled adequately by 
the software -- but it is truly not obvious.  

 * Were these network errors were handled by AFR?  

 * Without AFR the application would I have seen a filesystem error 
(e.g. "Transport endpoint not connected")?  (How about if the network
error
was on the namespace brick?).

 * Is there a recommended action for errors in the error log - or some other 
way of ensuring the integrity of the filesystem (like glusterfsck ...)


The volume is defined as ...

volume u100-node6
 type protocol/client
 option transport-type tcp/client
 option transport-timeout 10sec
 option remote-host node6
 option remote-subvolume u100-node6
 option username dkpaa
 option password XXXXXXXXASDBH
end-volume

volume afr4
  type cluster/afr
  subvolumes u100-node7 u100-node6
end-volume

volume unify0
  type cluster/unify
  subvolumes afr0 afr1 afr2 afr3 afr4
  option namespace u25-node4
  option rr.limits.min-free-disk 5%
  option scheduler rr
end-volume

Log file says:

2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing 
transport
2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup] 
u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208
2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6: no 
proper reply from server, returning ENOTCONN
2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz
child=u100-node6) op_ret=-1 op_errno=107

2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing 
transport
2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup] 
u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8
2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6: no 
proper reply from server, returning ENOTCONN
2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz
child=u100-node6) op_ret=-1 op_errno=107

2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: : 
returning EBADFD
2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz
child=u100-node6) op_ret=-1 op_errno=77


On the server side, it sees the client going away:

2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1001)
2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server: 
transport_writev failed

2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1021)
2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server: 
transport_writev failed

2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:999)
2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1020)

Krishna Srinivas

2008-Nov-11 10:12 UTC

head link

[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN

Hi Andrew,

On Tue, Nov 11, 2008 at 2:29 PM, Andrew McGill <list2008 at lunch.za.net>
wrote:> Greetings glusterfs users,
>
> I have the errors below in /var/log/glusterfs.log.  It's not clear, but
I'm
> guessing that this is simply a network error which was handled adequately
by
> the software -- but it is truly not obvious.
>
>  * Were these network errors were handled by AFR?
Yes, if your application did not see any error then it means AFR
handled the network errors.
>
>  * Without AFR the application would I have seen a filesystem error
> (e.g. "Transport endpoint not connected")?
Yes

(How about if the network error> was on the namespace brick?).
If NS is not afr'ed and NS gets disconnected glusterfs will not work.

>
>  * Is there a recommended action for errors in the error log - or some
other
> way of ensuring the integrity of the filesystem (like glusterfsck ...)
When the disconnected server comes back up and the file gets opened
the outdated file will automatically be replaced by the newer copy. So
things will get auto healed.

Krishna
>
>
> The volume is defined as ...
>
> volume u100-node6
>  type protocol/client
>  option transport-type tcp/client
>  option transport-timeout 10sec
>  option remote-host node6
>  option remote-subvolume u100-node6
>  option username dkpaa
>  option password XXXXXXXXASDBH
> end-volume
>
> volume afr4
>  type cluster/afr
>  subvolumes u100-node7 u100-node6
> end-volume
>
> volume unify0
>  type cluster/unify
>  subvolumes afr0 afr1 afr2 afr3 afr4
>  option namespace u25-node4
>  option rr.limits.min-free-disk 5%
>  option scheduler rr
> end-volume
>
> Log file says:
>
> 2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing
> transport
> 2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup]
> u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208
> 2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6:
no
> proper reply from server, returning ENOTCONN
> 2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4:
>
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz
> child=u100-node6) op_ret=-1 op_errno=107
>
> 2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing
> transport
> 2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup]
> u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8
> 2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6:
no
> proper reply from server, returning ENOTCONN
> 2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4:
>
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz
> child=u100-node6) op_ret=-1 op_errno=107
>
> 2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: :
> returning EBADFD
> 2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4:
>
(path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz
> child=u100-node6) op_ret=-1 op_errno=77
>
>
> On the server side, it sees the client going away:
>
> 2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport]
server:
> EOF from peer (192.168.15.43:1001)
> 2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server:
> transport_writev failed
>
> 2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport]
server:
> EOF from peer (192.168.15.43:1021)
> 2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server:
> transport_writev failed
>
> 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport]
server:
> EOF from peer (192.168.15.43:999)
> 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport]
server:
> EOF from peer (192.168.15.43:1020)
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>

Gluster users - Nov 2008 - returning EBADFD / no proper reply from server, returning ENOTCONN

[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN

[Gluster-users] returning EBADFD / no proper reply from server, returning ENOTCONN