Giovanni Toraldo
2010-Dec-24 15:44 UTC
[Gluster-users] node crashing on 4 replicated-distributed cluster
Hi, I've got troubles after few minutes of glusterfs operations. I setup a 4-node replica 4 storage, with 2 bricks on every server: # gluster volume create vms replica 4 transport tcp 192.168.7.1:/srv/vol1 192.168.7.2:/srv/vol1 192.168.7.3:/srv/vol1 192.168.7.4:/srv/vol1 192.168.7.1:/srv/vol2 192.168.7.2:/srv/vol2 192.168.7.3:/srv/vol2 192.168.7.4:/srv/vol2 I started copying files with rsync from node1, and after few minutes the network traffic stalled. Inspecting logs brick logs on node4, I've found many of: [2010-12-24 15:58:50.247688] C [rpcsvc.c:1118:rpcsvc_notify] rpcsvc: got MAP_XID event, which should have not come [2010-12-24 15:58:50.264731] E [rpcsvc.c:874:rpcsvc_request_create] rpc-service: RPC call decoding failed [2010-12-24 15:58:50.264835] I [server.c:428:server_rpc_notify] vms-server: disconnected connection from 192.168.7.1:1001 [2010-12-24 15:58:50.279233] I [server-handshake.c:535:server_setvolume] vms-server: accepted client from 192.168.7.1:1018 [2010-12-24 15:59:02.100081] E [rpcsvc.c:874:rpcsvc_request_create] rpc-service: RPC call decoding failed [2010-12-24 15:59:02.100160] I [server.c:428:server_rpc_notify] vms-server: disconnected connection from 192.168.7.1:1018 [2010-12-24 15:59:02.181278] I [server-handshake.c:535:server_setvolume] vms-server: accepted client from 192.168.7.1:1018 On nfs.log of node1 (many, operations changing): [2010-12-24 15:58:49.263361] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x77) [0x7fabdcf5bd17] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7fabdcf5b4ae] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fabdcf5b40e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-12-24 15:58:49.150707 Have you some idea? Thanks. -- Giovanni Toraldo http://www.libersoft.it/
Giovanni Toraldo
2010-Dec-29 16:01 UTC
[Gluster-users] node crashing on 4 replicated-distributed cluster
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Il 24/12/2010 16:44, Giovanni Toraldo ha scritto:> Hi, > > I've got troubles after few minutes of glusterfs operations. > > I setup a 4-node replica 4 storage, with 2 bricks on every server: > # gluster volume create vms replica 4 transport tcp > 192.168.7.1:/srv/vol1 192.168.7.2:/srv/vol1 192.168.7.3:/srv/vol1 > 192.168.7.4:/srv/vol1 192.168.7.1:/srv/vol2 192.168.7.2:/srv/vol2 > 192.168.7.3:/srv/vol2 192.168.7.4:/srv/vol2Seems a bug in glusterfs, I tried to configure a replica 3: * firstly excluding only 7.1 vol* * secondly excluding only 7.3 vol* both times I've got no problems. Today I retried a new replica 4 configuration including all 4 server and got the same crash as before. Someone is using latest glusterfs 3.1.1 with replica 4 with success? Even some sort of ack from devs could be nice. Thanks. - -- Giovanni Toraldo http://www.libersoft.it/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJNG1tWAAoJEGrHv689I8Z802MP/jUJrYyTE3bkcCEzOhG+IX1B ghKBTVHuqUBLE4ODUSoGnsb1WNJPpOLc3PtH9cZv48yazHAhMHTcEXWjK+23W8a1 7x76e0PAFUClWD9wMMn5sCVD8XmDBOLOQI2HvlM9bem+O1YOkDzV/O5ZqGoNO3FV oqLPO4QBLNSly81GRtITfmWY1Kuwc1lOHpO0CNNe+FCpyfDcksYcTDBPokoMG36I 2GdKVlMYc5MgrP0KQltNL2TlGX/hXnYKY9qA/P1pb7b2Tq+WItdDHa64Q/QYmsVh 9mug93mwiiuAyT8os/Sfcmapm3l78rrIrCFaT5C65wLipKfKndTPSG/5lvHlC7i/ 090YJMqDswE7K0s8gsB+rAZAW1KzuSyct9Ypqypx8j+bYnEqFPHT7OUuT1f9bqu/ IeGHeFKDKjmk+0SAhsi4Nv04EaC+pS45Rj146oloaaEk5roYGdKSFgnAy6/GpiIJ 8WdUjAJRJACazDeTUPn50659ZKyMx142k0wkau6/GObZAKKVbQw6n67wGxvZtkpK 9kGx9dS0/9xiW0iryR8K/XNZKx83m995X5JVczqabYNkeUnqQYeQR320qaq/uR1T opzmJUrv9lfo/ccwzj8SWWFJA3IP6PUUR6ZLiWhyjCiKhMOg3VTraqzaYRjyzsbi p9RnI/qeqLwfAVeDumXU =mjxg -----END PGP SIGNATURE-----