thr3ads.net - Gluster users - [Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) " [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Andrew McGill

2009-Mar-14 17:12 UTC

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Hello,

I upgraded from glusterfs-1.3.12.tar.gz to glusterfs-2.0.0rc4.tar.gz because I 
could not complete a rdiff-backup without inexplicable errors.  My efforts 
have been rewarded with a crash, for which some logs are displayed below.

The backend is unify, with multiple afr subvolumes of two ext3 volumes each. 

Here is how the client side of glusterfs died:

2009-03-14 13:32:15 W [afr-self-heal-data.c:798:afr_sh_data_fix] afr4: Picking 
favorite child u100-rs1 as authentic source to resolve conflicting data 
of
/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:00:17+02:00.snapshot.gz
2009-03-14 13:32:15 W [afr-self-heal-data.c:646:afr_sh_data_open_cbk] afr4: 
sourcing file /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta
data.2009-03-14T08:00:17+02:00.snapshot.gz from u100-rs1 to other sinks
2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dcc1: readv failed (Bad 
address)
2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dcc1: 
read (Bad address) in state 3 (192.168.227.65:6996)
2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dcc1: 
forced unwinding frame type(1) op(READ)
2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dr1: readv failed (Bad 
address)
2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dr1: 
read (Bad address) in state 3 (192.168.227.31:6996)
2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dr1: forced 
unwinding frame type(1) op(READ)
2009-03-14 13:32:22 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse: 
5998294: READ => -1 (Transport endpoint is not connected)
2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-dr2: readv failed (Bad 
address)
2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-dr2: 
read (Bad address) in state 3 (192.168.227.32:6996)
2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-dr2: forced 
unwinding frame type(1) op(READ)
2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-rs3: readv failed (Bad 
address)
2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-rs3: 
read (Bad address) in state 3 (192.168.227.59:6996)
2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-rs3: forced 
unwinding frame type(1) op(READ)
2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse: 
6006118: READ => -1 (Transport endpoint is not connected)
2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse: 
6006119: READ => -1 (Transport endpoint is not connected)
2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse: 
6006120: READ => -1 (Transport endpoint is not connected)
2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse: 
6006121: READ => -1 (Transport endpoint is not connected)
pending frames:
?
patchset: cb602a1d7d41587c24379cb2636961ab91446f86 +
signal received: 6
configuration details:argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.0rc4
[0x381420]
/lib/libc.so.6(abort+0x101)[0xb86451]
/usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]
/lib/libpthread.so.0[0xd302db]
/lib/libc.so.6(clone+0x5e)[0xc2912e]
---------


On the server side, the following messages don't enlighten me, but do remind
me that there was another client running version 1.13 still connecting.  It 
looks like the server just noticed that the client died.

2009-03-14 13:30:03 E [socket.c:583:__socket_proto_state_machine] server: 
socket header validate failed (192.168.227.167:1023). possible mismatch of
 transport-type between server and client volumes, or version mismatch
2009-03-14 13:30:03 N [server-protocol.c:8048:notify] server: 
192.168.227.167:1023 disconnected
2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server: 
socket header signature does not match :O (42.6c.6f)
2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server: 
socket header validate failed (192.168.227.167:1023). possible mismatch of
 transport-type between server and client volumes, or version mismatch
2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server: 
192.168.227.167:1023 disconnected
2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed 
(Connection reset by peer)
2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server: read 
(Connection reset by peer) in state 1 (192.168.227.5:1020)
2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server: 
192.168.227.5:1020 disconnected
2009-03-14 13:32:22 N [server-protocol.c:7295:mop_setvolume] server: accepted 
client from 192.168.227.5:1020
2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server: 
192.168.227.5:1017 disconnected
2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server: 
192.168.227.5:1020 disconnected
2009-03-14 13:35:48 N [server-helpers.c:515:server_connection_destroy] server: 
destroyed connection of backup5.foo.com-23205-2009/03/14-07:10:
52:777008-u50-dcc1


On another server brick, the 25Gb volume u50-dr1-raw was full (it should have 
been 50Gb like its peer).  As I recall, the free space of the second volume 
of AFR which does not get checked for disk space (a bug, IMHO).  

It said this, which could have led to the client-side failure a few minutes 
later (the clocks are in sync):

2009-03-14 13:30:23 W [posix.c:773:posix_mkdir] u50-dr1-raw: mkdir 
of /backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1: No 
space left on device
2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server: 
1109657: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server: 
1109658: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184942: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184943: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184947: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184949: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server: 
1109660: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server: 
1109661: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184952: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server: 
3184953: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:2774:server_stub_resume] server: 
1109663: XATTROP (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
backup.tmp.1) on u50-dr1 returning error: -1 (2)
2009-03-14 13:30:23 E [server-protocol.c:2868:server_stub_resume] server: 
1109665: RMDIR (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba
ckup.tmp.1) on u50-dr1 returning error: -1 (2)
2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server: 
socket header signature does not match :O (42.6c.6f)
2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server: 
socket header validate failed (192.168.227.167:1022). possible mismatch of
 transport-type between server and client volumes, or version mismatch
2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server: 
192.168.227.167:1022 disconnected
2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed 
(Connection reset by peer)
2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server: read 
(Connection reset by peer) in state 1 (192.168.227.5:1016)
2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server: 
192.168.227.5:1016 disconnected
2009-03-14 13:32:23 N [server-protocol.c:7295:mop_setvolume] server: accepted 
client from 192.168.227.5:1016


I may have to move the backup in question off glusterfs (if I can just find 
the space somewhere), since it has taken 4 days to realise that the backing 
up is not just slow, but faulty.  (Of course, if I can't fix it, I'll
win a
trip to the data center to install a new machine to replace the system.)

Andrew McGill

2009-Mar-18 13:27 UTC

head link

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Replying to myself, since nobody else did:

I'm outta here for now.  Whether by network errors or software errors, or 
plain old stupidity, I can longer maintain a stable glusterfs mount.  Due to 
the batch nature of the application I'm running (rdiff-backup), this is a 
fatal failure.  I've now decommissioned my failing glusterfs installation.  
It's a pity though - although it was horribly slow because of inappropriate 
hardware, it did work (until it flaked out fatally).  

For decommissioning, it turns out that having your data stored without 
metadata is a good design choice.  

Currently glusterfs needs some work in terms of recovering from network and 
server errors.  (I think that the developers should be made to run it over 
10Mbps ethernet hubs for a month.)  Currently glusterfs is a high capacity 
(given appropriate hardware), but not a high availability solution. 


On Saturday 14 March 2009 19:12:03 Andrew McGill wrote:> Hello,
>
> I upgraded from glusterfs-1.3.12.tar.gz to glusterfs-2.0.0rc4.tar.gz
> because I could not complete a rdiff-backup without inexplicable errors. 
> My efforts have been rewarded with a crash, for which some logs are
> displayed below.
>
> The backend is unify, with multiple afr subvolumes of two ext3 volumes
> each.
>
> Here is how the client side of glusterfs died:
>
> 2009-03-14 13:32:15 W [afr-self-heal-data.c:798:afr_sh_data_fix] afr4:
> Picking favorite child u100-rs1 as authentic source to resolve conflicting
> data of
> /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:0
>0:17+02:00.snapshot.gz 2009-03-14 13:32:15 W
> [afr-self-heal-data.c:646:afr_sh_data_open_cbk] afr4: sourcing file
> /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta
> data.2009-03-14T08:00:17+02:00.snapshot.gz from u100-rs1 to other sinks
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dcc1: readv failed
> (Bad address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dcc1:
> read (Bad address) in state 3 (192.168.227.65:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dcc1:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dr1: readv failed
> (Bad address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dr1:
> read (Bad address) in state 3 (192.168.227.31:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dr1:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 5998294: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-dr2: readv failed
> (Bad address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-dr2:
> read (Bad address) in state 3 (192.168.227.32:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-dr2:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-rs3: readv failed
> (Bad address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-rs3:
> read (Bad address) in state 3 (192.168.227.59:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-rs3:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006118: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006119: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006120: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006121: READ => -1 (Transport endpoint is not connected)
> pending frames:
> ?
> patchset: cb602a1d7d41587c24379cb2636961ab91446f86 +
> signal received: 6
> configuration details:argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 2.0.0rc4
> [0x381420]
> /lib/libc.so.6(abort+0x101)[0xb86451]
> /usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]
> /lib/libpthread.so.0[0xd302db]
> /lib/libc.so.6(clone+0x5e)[0xc2912e]
> ---------
>
>
> On the server side, the following messages don't enlighten me, but do
> remind me that there was another client running version 1.13 still
> connecting.  It looks like the server just noticed that the client died.
>
> 2009-03-14 13:30:03 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:30:03 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server:
> read (Connection reset by peer) in state 1 (192.168.227.5:1020)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:32:22 N [server-protocol.c:7295:mop_setvolume] server:
> accepted client from 192.168.227.5:1020
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1017 disconnected
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:35:48 N [server-helpers.c:515:server_connection_destroy]
> server: destroyed connection of backup5.foo.com-23205-2009/03/14-07:10:
> 52:777008-u50-dcc1
>
>
> On another server brick, the 25Gb volume u50-dr1-raw was full (it should
> have been 50Gb like its peer).  As I recall, the free space of the second
> volume of AFR which does not get checked for disk space (a bug, IMHO).
>
> It said this, which could have led to the client-side failure a few minutes
> later (the clocks are in sync):
>
> 2009-03-14 13:30:23 W [posix.c:773:posix_mkdir] u50-dr1-raw: mkdir
> of /backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1: No
> space left on device
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109657: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109658: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184942: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184943: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184947: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184949: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109660: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109661: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184952: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184953: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2774:server_stub_resume] server:
> 1109663: XATTROP (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2868:server_stub_resume] server:
> 1109665: RMDIR (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba
> ckup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1022). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1022 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server:
> read (Connection reset by peer) in state 1 (192.168.227.5:1016)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1016 disconnected
> 2009-03-14 13:32:23 N [server-protocol.c:7295:mop_setvolume] server:
> accepted client from 192.168.227.5:1016
>
>
> I may have to move the backup in question off glusterfs (if I can just find
> the space somewhere), since it has taken 4 days to realise that the backing
> up is not just slow, but faulty.  (Of course, if I can't fix it,
I'll win a
> trip to the data center to install a new machine to replace the system.)
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Anand Avati

2009-Mar-18 14:18 UTC

head link

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

do you have a core dump which can be inspected? do you still see the
error after taking the 1.3 client off?

Avati

On Sat, Mar 14, 2009 at 10:42 PM, Andrew McGill <list2008 at lunch.za.net>
wrote:> Hello,
>
> I upgraded from glusterfs-1.3.12.tar.gz to glusterfs-2.0.0rc4.tar.gz
because I
> could not complete a rdiff-backup without inexplicable errors.  My efforts
> have been rewarded with a crash, for which some logs are displayed below.
>
> The backend is unify, with multiple afr subvolumes of two ext3 volumes
each.
>
> Here is how the client side of glusterfs died:
>
> 2009-03-14 13:32:15 W [afr-self-heal-data.c:798:afr_sh_data_fix] afr4:
Picking
> favorite child u100-rs1 as authentic source to resolve conflicting data
> of
/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:00:17+02:00.snapshot.gz
> 2009-03-14 13:32:15 W [afr-self-heal-data.c:646:afr_sh_data_open_cbk] afr4:
> sourcing file /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta
> data.2009-03-14T08:00:17+02:00.snapshot.gz from u100-rs1 to other sinks
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dcc1: readv failed
(Bad
> address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dcc1:
> read (Bad address) in state 3 (192.168.227.65:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dcc1:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dr1: readv failed
(Bad
> address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dr1:
> read (Bad address) in state 3 (192.168.227.31:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dr1:
forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 5998294: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-dr2: readv failed
(Bad
> address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-dr2:
> read (Bad address) in state 3 (192.168.227.32:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-dr2:
forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-rs3: readv failed
(Bad
> address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-rs3:
> read (Bad address) in state 3 (192.168.227.59:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-rs3:
forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006118: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006119: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006120: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006121: READ => -1 (Transport endpoint is not connected)
> pending frames:
> ??
> patchset: cb602a1d7d41587c24379cb2636961ab91446f86 +
> signal received: 6
> configuration details:argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 2.0.0rc4
> [0x381420]
> /lib/libc.so.6(abort+0x101)[0xb86451]
> /usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]
> /lib/libpthread.so.0[0xd302db]
> /lib/libc.so.6(clone+0x5e)[0xc2912e]
> ---------
>
>
> On the server side, the following messages don't enlighten me, but do
remind
> me that there was another client running version 1.13 still connecting.  It
> looks like the server just noticed that the client died.
>
> 2009-03-14 13:30:03 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:30:03 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server:
read
> (Connection reset by peer) in state 1 (192.168.227.5:1020)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:32:22 N [server-protocol.c:7295:mop_setvolume] server:
accepted
> client from 192.168.227.5:1020
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1017 disconnected
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:35:48 N [server-helpers.c:515:server_connection_destroy]
server:
> destroyed connection of backup5.foo.com-23205-2009/03/14-07:10:
> 52:777008-u50-dcc1
>
>
> On another server brick, the 25Gb volume u50-dr1-raw was full (it should
have
> been 50Gb like its peer).  As I recall, the free space of the second volume
> of AFR which does not get checked for disk space (a bug, IMHO).
>
> It said this, which could have led to the client-side failure a few minutes
> later (the clocks are in sync):
>
> 2009-03-14 13:30:23 W [posix.c:773:posix_mkdir] u50-dr1-raw: mkdir
> of /backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1: No
> space left on device
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109657: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109658: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184942: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184943: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184947: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184949: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109660: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109661: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184952: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184953: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2774:server_stub_resume] server:
> 1109663: XATTROP (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2868:server_stub_resume] server:
> 1109665: RMDIR (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba
> ckup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1022). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1022 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server:
read
> (Connection reset by peer) in state 1 (192.168.227.5:1016)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1016 disconnected
> 2009-03-14 13:32:23 N [server-protocol.c:7295:mop_setvolume] server:
accepted
> client from 192.168.227.5:1016
>
>
> I may have to move the backup in question off glusterfs (if I can just find
> the space somewhere), since it has taken 4 days to realise that the backing
> up is not just slow, but faulty.  (Of course, if I can't fix it,
I'll win a
> trip to the data center to install a new machine to replace the system.)
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>

Keith Freedman

2009-Mar-19 02:56 UTC

head link

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Andrew, I just want to throw something out there, 
since I think you''re being rather harsh toward the gluster community.

I''ve found that the developers are very dedicated 
and very motivated to solve problems and debug the product.

However, if you''re not paying them for support, 
your right to complain about quick response times 
and prioritizing YOUR specific issues diminishes 
severely.  If you felt you weren''t getting proper 
responses form the gluster-users, it''s likely 
that none of us had an easy answer for you, or 
that maybe no one was really interested in 
solving that problem because they may not have 
seen a parallel to their problems (such is the 
nature of open-source and community support groups).

If you need to use something in a mission 
critical production environment, for heavens sake 
PAY FOR SUPPORT!.   you can''t operate mission 
critical applications on a "ask anyone around and 
hope they can help me in time" basis.

The developers are very responsive to the list, 
but they also have other work to do they get to 
take time off of work, and the one(s) who can 
solve your specific issue may not be checking the 
list every minute of every day.  (personally, I''m 
glad, cause they should be working on the 
product, not sifting around waiting for people to submit problems to them).

Again, pay fo support, you''ll get dedicated 
resources to solve your problem.  don''t pay and 
we as a community will be happy to help when and if we can.

my .02,
Keith

At 06:27 AM 3/18/2009, Andrew McGill wrote:>Replying to myself, since nobody else did: I''m 
>outta here for now.  Whether by network errors 
>or software errors, or plain old stupidity, I 
>can longer maintain a stable glusterfs 
>mount.  Due to the batch nature of the 
>application I''m running (rdiff-backup), this is 
>a fatal failure.  I''ve now decommissioned my 
>failing glusterfs installation.  It''s a pity 
>though - although it was horribly slow because 
>of inappropriate hardware, it did work (until it 
>flaked out fatally).  For decommissioning, it 
>turns out that having your data stored without 
>metadata is a good design choice.  Currently 
>glusterfs needs some work in terms of recovering 
>from network and server errors.  (I think that 
>the developers should be made to run it over 
>10Mbps ethernet hubs for a month.)  Currently 
>glusterfs is a high capacity (given appropriate 
>hardware), but not a high availability solution. 
>On Saturday 14 March 2009 19:12:03 Andrew McGill 
>wrote: > Hello, > > I upgraded from 
>glusterfs-1.3.12.tar.gz to 
>glusterfs-2.0.0rc4.tar.gz > because I could not 
>complete a rdiff-backup without inexplicable 
>errors. > My efforts have been rewarded with a 
>crash, for which some logs are > displayed 
>below. > > The backend is unify, with multiple 
>afr subvolumes of two ext3 volumes > each. > > 
>Here is how the client side of glusterfs 
>died: > > 2009-03-14 13:32:15 W 
>[afr-self-heal-data.c:798:afr_sh_data_fix] 
>afr4: > Picking favorite child u100-rs1 as 
>authentic source to resolve conflicting > data 
>of > 
>/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:0
> >0:17+02:00.snapshot.gz 2009-03-14 13:32:15 W > 
>[afr-self-heal-data.c:646:afr_sh_data_open_cbk] 
>afr4: sourcing file > 
>/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta  
> > data.2009-03-14T08:00:17+02:00.snapshot.gz 
>from u100-rs1 to other sinks > 2009-03-14 
>13:32:22 E [socket.c:102:__socket_rwv] u50-dcc1: 
>readv failed > (Bad address) > 2009-03-14 
>13:32:22 E 
>[socket.c:634:__socket_proto_state_machine] 
>u50-dcc1: > read (Bad address) in state 3 
>(192.168.227.65:6996) > 2009-03-14 13:32:22 E 
>[saved-frames.c:169:saved_frames_unwind] 
>u50-dcc1: > forced unwinding frame type(1) 
>op(READ) > 2009-03-14 13:32:22 E 
>[socket.c:102:__socket_rwv] u50-dr1: readv 
>failed > (Bad address) > 2009-03-14 13:32:22 E 
>[socket.c:634:__socket_proto_state_machine] 
>u50-dr1: > read (Bad address) in state 3 
>(192.168.227.31:6996) > 2009-03-14 13:32:22 E 
>[saved-frames.c:169:saved_frames_unwind] 
>u50-dr1: > forced unwinding frame type(1) 
>op(READ) > 2009-03-14 13:32:22 E 
>[fuse-bridge.c:1548:fuse_readv_cbk] 
>glusterfs-fuse: > 5998294: READ => -1 (Transport 
>endpoint is not connected) > 2009-03-14 13:33:03 
>E [socket.c:102:__socket_rwv] u50-dr2: readv 
>failed > (Bad address) > 2009-03-14 13:33:03 E 
>[socket.c:634:__socket_proto_state_machine] 
>u50-dr2: > read (Bad address) in state 3 
>(192.168.227.32:6996) > 2009-03-14 13:33:03 E 
>[saved-frames.c:169:saved_frames_unwind] 
>u50-dr2: > forced unwinding frame type(1) 
>op(READ) > 2009-03-14 13:33:03 E 
>[socket.c:102:__socket_rwv] u50-rs3: readv 
>failed > (Bad address) > 2009-03-14 13:33:03 E 
>[socket.c:634:__socket_proto_state_machine] 
>u50-rs3: > read (Bad address) in state 3 
>(192.168.227.59:6996) > 2009-03-14 13:33:03 E 
>[saved-frames.c:169:saved_frames_unwind] 
>u50-rs3: > forced unwinding frame type(1) 
>op(READ) > 2009-03-14 13:33:03 E 
>[fuse-bridge.c:1548:fuse_readv_cbk] 
>glusterfs-fuse: > 6006118: READ => -1 (Transport 
>endpoint is not connected) > 2009-03-14 13:33:03 
>E [fuse-bridge.c:1548:fuse_readv_cbk] 
>glusterfs-fuse: > 6006119: READ => -1 (Transport 
>endpoint is not connected) > 2009-03-14 13:33:03 
>E [fuse-bridge.c:1548:fuse_readv_cbk] 
>glusterfs-fuse: > 6006120: READ => -1 (Transport 
>endpoint is not connected) > 2009-03-14 13:33:03 
>E [fuse-bridge.c:1548:fuse_readv_cbk] 
>glusterfs-fuse: > 6006121: READ => -1 (Transport 
>endpoint is not connected) > pending frames: > 
>??? > patchset: 
>cb602a1d7d41587c24379cb2636961ab91446f86 + > 
>signal received: 6 > configuration details:argp 
>1 > backtrace 1 > dlfcn 1 > fdatasync 1 > 
>libpthread 1 > llistxattr 1 > setfsid 1 > 
>spinlock 1 > epoll.h 1 > xattr.h 1 > 
>st_atim.tv_nsec 1 > package-string: glusterfs 
>2.0.0rc4 > [0x381420] > 
>/lib/libc.so.6(abort+0x101)[0xb86451] > 
>/usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]  
> > /lib/libpthread.so.0[0xd302db] > 
>/lib/libc.so.6(clone+0x5e)[0xc2912e] > 
>--------- > > > On the server side, the 
>following messages don''t enlighten me, but do > 
>remind me that there was another client running 
>version 1.13 still > connecting.  It looks like 
>the server just noticed that the client 
>died. > > 2009-03-14 13:30:03 E 
>[socket.c:583:__socket_proto_state_machine] 
>server: > socket header validate failed 
>(192.168.227.167:1023). possible mismatch 
>of >  transport-type between server and client 
>volumes, or version mismatch > 2009-03-14 
>13:30:03 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.167:1023 disconnected > 
>2009-03-14 13:31:45 E 
>[socket.c:463:__socket_proto_validate_header] 
>server: > socket header signature does not match 
>:O (42.6c.6f) > 2009-03-14 13:31:45 E 
>[socket.c:583:__socket_proto_state_machine] 
>server: > socket header validate failed 
>(192.168.227.167:1023). possible mismatch 
>of >  transport-type between server and client 
>volumes, or version mismatch > 2009-03-14 
>13:31:45 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.167:1023 disconnected > 
>2009-03-14 13:32:22 E 
>[socket.c:102:__socket_rwv] server: readv 
>failed > (Connection reset by peer) > 2009-03-14 
>13:32:22 E 
>[socket.c:561:__socket_proto_state_machine] 
>server: > read (Connection reset by peer) in 
>state 1 (192.168.227.5:1020) > 2009-03-14 
>13:32:22 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.5:1020 disconnected > 
>2009-03-14 13:32:22 N 
>[server-protocol.c:7295:mop_setvolume] server: > 
>accepted client from 192.168.227.5:1020 > 
>2009-03-14 13:35:48 N 
>[server-protocol.c:8048:notify] server: > 
>192.168.227.5:1017 disconnected > 2009-03-14 
>13:35:48 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.5:1020 disconnected > 
>2009-03-14 13:35:48 N 
>[server-helpers.c:515:server_connection_destroy]  
> > server: destroyed connection of 
>backup5.foo.com-23205-2009/03/14-07:10: > 
>52:777008-u50-dcc1 > > > On another server 
>brick, the 25Gb volume u50-dr1-raw was full (it 
>should > have been 50Gb like its peer).  As I 
>recall, the free space of the second > volume of 
>AFR which does not get checked for disk space (a 
>bug, IMHO). > > It said this, which could have 
>led to the client-side failure a few minutes > 
>later (the clocks are in sync): > > 2009-03-14 
>13:30:23 W [posix.c:773:posix_mkdir] 
>u50-dr1-raw: mkdir > of 
>/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1: 
>No > space left on device > 2009-03-14 13:30:23 
>E [server-protocol.c:3478:server_stub_resume] 
>server: > 1109657: INODELK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 returning error: -1 
>(2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3478:server_stub_resume] 
>server: > 1109658: INODELK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 returning error: -1 
>(2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184942: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 for key <nul> 
>returning error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184943: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 for key <nul> 
>returning error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184947: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 for key hl returning 
>error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184949: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 for key hl returning 
>error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3478:server_stub_resume] 
>server: > 1109660: INODELK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1/hl) on u50-dr1 returning error: 
>-1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3478:server_stub_resume] 
>server: > 1109661: INODELK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1/hl) on u50-dr1 returning error: 
>-1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184952: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1/hl) on u50-dr1 for key <nul> 
>returning error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:3448:server_stub_resume] 
>server: > 3184953: ENTRYLK 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1/hl) on u50-dr1 for key <nul> 
>returning error: -1 (2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:2774:server_stub_resume] 
>server: > 1109663: XATTROP 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-  
> > backup.tmp.1) on u50-dr1 returning error: -1 
>(2) > 2009-03-14 13:30:23 E 
>[server-protocol.c:2868:server_stub_resume] 
>server: > 1109665: RMDIR 
>(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba  
> > ckup.tmp.1) on u50-dr1 returning error: -1 
>(2) > 2009-03-14 13:31:45 E 
>[socket.c:463:__socket_proto_validate_header] 
>server: > socket header signature does not match 
>:O (42.6c.6f) > 2009-03-14 13:31:45 E 
>[socket.c:583:__socket_proto_state_machine] 
>server: > socket header validate failed 
>(192.168.227.167:1022). possible mismatch 
>of >  transport-type between server and client 
>volumes, or version mismatch > 2009-03-14 
>13:31:45 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.167:1022 disconnected > 
>2009-03-14 13:32:22 E 
>[socket.c:102:__socket_rwv] server: readv 
>failed > (Connection reset by peer) > 2009-03-14 
>13:32:22 E 
>[socket.c:561:__socket_proto_state_machine] 
>server: > read (Connection reset by peer) in 
>state 1 (192.168.227.5:1016) > 2009-03-14 
>13:32:22 N [server-protocol.c:8048:notify] 
>server: > 192.168.227.5:1016 disconnected > 
>2009-03-14 13:32:23 N 
>[server-protocol.c:7295:mop_setvolume] server: > 
>accepted client from 192.168.227.5:1016 > > > I 
>may have to move the backup in question off 
>glusterfs (if I can just find > the space 
>somewhere), since it has taken 4 days to realise 
>that the backing > up is not just slow, but 
>faulty.  (Of course, if I can''t fix it, I''ll win 
>a > trip to the data center to install a new 
>machine to replace the system.) > > 
>_______________________________________________ > 
>  Gluster-users mailing list > 
>Gluster-users at gluster.org > 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users 
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Mar 2009 - Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "