Hi gluster mongers, I have ran into a critical problem under 2.0.3, and I would like to know if it has been reported (or fixed) already before making a detailed bug report. ---- Low down: Two machines, RHEL 5.3, fuse 2.7.4, each running a single brick server. clients on the same machines, AFR with writebehind & local read-subvolume enabled. clients run with --disable-direct-io-mode. Activity: Each client has several open fd's, including a xen image. kill glusterfsd on one machine. Open fd's are still being written to. umount & mount the underlying FS. Restart glusterfsd. Weirdness: On the same machine, client log entries: ... forced unwinding frame type(1) ... ... disconnected ... connected. Server log entries: [server-protocol.c:3903:server_readv] invalid argument: state->fd ... [fd.c:326:gf_fd_fdptr_get] fd: invalid argument [server-protocol.c:4108:server_flush] invalid argument: state->fd [server-protocol.c:3903:server_readv] invalid argument: state->fd ... [posix.c:1712:posix_writev] export: writev failed on fd=0x2aaaac0040c0: Bad file descriptor ... [server-protocol.c:3956:server_writev] invalid argument: state->fd [server-protocol.c:4062:server_fsync] invalid argument: state->fd ... [fd.c:282:gf_fd_put] fd: invalid argument ...REPEATS AD NAUSEUM... Really, Really Weird: The fd's seem to have been confused some how, as data from the xen images began to appear in other open files. This occurred on the underlying FS on the broken server side only. ---- To restore sanity: umount both sides kill both glusterfsd's delete the corrupted files from the broken server's underlying FS. restart servers then clients. The deleted files auto repair successfully upon access and normality returns. ---- Thanks for reading this far! Anyone experienced this or something similar? Comments/feedback much appreciated, further info available on request. Jeff.
Can you confirm if http://patches.gluster.com/patch/893/ fixes this problem? Thanks, Avati On Thu, Aug 6, 2009 at 6:54 PM, Jeff Evans<jeffe at tricab.com> wrote:> Hi gluster mongers, > > I have ran into a critical problem under 2.0.3, and I would like to > know if it has been reported (or fixed) already before making a > detailed bug report. > > ---- > > Low down: > > Two machines, RHEL 5.3, fuse 2.7.4, each running a single brick server. > > clients on the same machines, AFR with writebehind & local > read-subvolume enabled. > > clients run with --disable-direct-io-mode. > > Activity: > > Each client has several open fd's, including a xen image. > > kill glusterfsd on one machine. > > Open fd's are still being written to. > > umount & mount the underlying FS. > > Restart glusterfsd. > > Weirdness: > > On the same machine, client log entries: > ... forced unwinding frame type(1) ... > ... disconnected ... connected. > > Server log entries: > > [server-protocol.c:3903:server_readv] invalid argument: state->fd > ... > [fd.c:326:gf_fd_fdptr_get] fd: invalid argument > [server-protocol.c:4108:server_flush] invalid argument: state->fd > [server-protocol.c:3903:server_readv] invalid argument: state->fd > ... > [posix.c:1712:posix_writev] export: writev failed on > fd=0x2aaaac0040c0: Bad file descriptor > ... > [server-protocol.c:3956:server_writev] invalid argument: state->fd > [server-protocol.c:4062:server_fsync] invalid argument: state->fd > ... > [fd.c:282:gf_fd_put] fd: invalid argument > ...REPEATS AD NAUSEUM... > > Really, Really Weird: > > The fd's seem to have been confused some how, as data from the xen > images began to appear in other open files. > > This occurred on the underlying FS on the broken server side only. > > ---- > > To restore sanity: > > umount both sides > kill both glusterfsd's > delete the corrupted files from the broken server's underlying FS. > restart servers then clients. > > The deleted files auto repair successfully upon access and normality > returns. > > ---- > > Thanks for reading this far! > Anyone experienced this or something similar? > Comments/feedback much appreciated, further info available on request. > > Jeff. > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
I too noticed some corruption problems in the 2.0.x series under replicate when servers go down, though I have not had the time to isolate the scenario to reliably reproduce. Latest version we are running is 2.0.4 so it still exists there. On 7 Aug 2009, at 02:54, Jeff Evans wrote:> Hi gluster mongers, > > I have ran into a critical problem under 2.0.3, and I would like to > know if it has been reported (or fixed) already before making a > detailed bug report. > > ---- > > Low down: > > Two machines, RHEL 5.3, fuse 2.7.4, each running a single brick > server. > > clients on the same machines, AFR with writebehind & local > read-subvolume enabled. > > clients run with --disable-direct-io-mode. > > Activity: > > Each client has several open fd's, including a xen image. > > kill glusterfsd on one machine. > > Open fd's are still being written to. > > umount & mount the underlying FS. > > Restart glusterfsd. > > Weirdness: > > On the same machine, client log entries: > ... forced unwinding frame type(1) ... > ... disconnected ... connected. > > Server log entries: > > [server-protocol.c:3903:server_readv] invalid argument: state->fd > ... > [fd.c:326:gf_fd_fdptr_get] fd: invalid argument > [server-protocol.c:4108:server_flush] invalid argument: state->fd > [server-protocol.c:3903:server_readv] invalid argument: state->fd > ... > [posix.c:1712:posix_writev] export: writev failed on > fd=0x2aaaac0040c0: Bad file descriptor > ... > [server-protocol.c:3956:server_writev] invalid argument: state->fd > [server-protocol.c:4062:server_fsync] invalid argument: state->fd > ... > [fd.c:282:gf_fd_put] fd: invalid argument > ...REPEATS AD NAUSEUM... > > Really, Really Weird: > > The fd's seem to have been confused some how, as data from the xen > images began to appear in other open files. > > This occurred on the underlying FS on the broken server side only. > > ---- > > To restore sanity: > > umount both sides > kill both glusterfsd's > delete the corrupted files from the broken server's underlying FS. > restart servers then clients. > > The deleted files auto repair successfully upon access and normality > returns. > > ---- > > Thanks for reading this far! > Anyone experienced this or something similar? > Comments/feedback much appreciated, further info available on request. > > Jeff. > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >