Jim Phillips
2008-Oct-08  20:03 UTC
[Gluster-users] GlusterFS spewing errors with Xen tap:aio block driver
Greetings, I'm trying to get to the bottom of a problem I'm having combining Xen and GlusterFS. I've googled this extensively with no success. Every once in awhile (more frequently than I'd like on a production environment), GlusterFS client will start spewing errors into glusterfs.log similar to: 2008-10-08 16:49:21 E [client-protocol.c:1158:client_writev] brick- gridcpu02: : returning EBADFD 2008-10-08 16:49:21 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs- fuse: 352231064: WRITE => -1 (77) 2008-10-08 16:49:53 E [client-protocol.c:1158:client_writev] brick- gridcpu02: : returning EBADFD 2008-10-08 16:49:53 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs- fuse: 352231241: WRITE => -1 (77) 2008-10-08 16:49:53 E [client-protocol.c:1158:client_writev] brick- gridcpu02: : returning EBADFD 2008-10-08 16:49:53 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs- fuse: 352231243: WRITE => -1 (77) It will literally fill the disk up in a matter of a hours if we don't catch it early enough. This seems to only happen around disk images opened by the Xen tap:aio block driver, and it goes away if I "xm destroy" the right virtual instance or "xm migrate" it to another Xen server. The DomU itself shows a disk error in dmesg and remounts the disk read-only. How much functionality remains in the DomU tends to vary from being able to cleanly shut it down, to not even being able to login to run the shutdown command. I'm going to cross-post this in both Xen and GlusterFS user mailing lists, so I hope to get a response from one side or the other. Specs are: Hardware: Dell PowerEdge (1955 I think), with PERC3 SCSI disks in RAID1 OS: Ubuntu 8.04, amd64, Kernel 2.6.24-19-xen GlusterFS 1.3.10, 1.3.10-0ubuntu1~hardy2 from https://launchpad.net/ ~neil-aldur/+ppa-packages Xen 3.2.0, 3.2.0-0ubuntu10, from Ubuntu GlusterFS client configuration: # file: /etc/glusterfs/glusterfs-client.vol volume brick-gridfs01 type protocol/client option transport-type tcp/client option remote-host atl1gridfs01 option remote-port 6997 option remote-subvolume brick end-volume volume brick-gridcpu01 type protocol/client option transport-type tcp/client option remote-host atl1gridcpu01 option remote-port 6997 option remote-subvolume brick end-volume volume brick-gridcpu02 type protocol/client option transport-type tcp/client option remote-host atl1gridcpu02 option remote-port 6997 option remote-subvolume brick end-volume volume brick-gridcpu03 type protocol/client option transport-type tcp/client option remote-host atl1gridcpu03 option remote-port 6997 option remote-subvolume brick end-volume volume brick-gridcpu04 type protocol/client option transport-type tcp/client option remote-host atl1gridcpu04 option remote-port 6997 option remote-subvolume brick end-volume volume namespace-gridfs01 type protocol/client option transport-type tcp/client option remote-host atl1gridfs01 option remote-port 6997 option remote-subvolume brick-ns end-volume volume unify0 type cluster/unify option scheduler alu option alu.limits.min-free-disk 5% option alu.limits.max-open-files 10000 option alu.order disk-usage:read-usage:write-usage:open-files- usage:disk-speed-usage option alu.disk-usage.entry-threshold 2GB # Kick in if the discrepancy in disk-usage between volumes is more than 2GB option alu.disk-usage.exit-threshold 60MB # Don't stop writing to the least-used volume until the discrepancy is 1988MB option alu.open-files-usage.entry-threshold 1024 # Kick in if the discrepancy in open files is 1024 option alu.open-files-usage.exit-threshold 32 # Don't stop until 992 files have been written the least-used volume # option alu.read-usage.entry-threshold 20% # Kick in when the read- usage discrepancy is 20% # option alu.read-usage.exit-threshold 4% # Don't stop until the discrepancy has been reduced to 16% (20% - 4%) # option alu.write-usage.entry-threshold 20% # Kick in when the write-usage discrepancy is 20% # option alu.write-usage.exit-threshold 4% # Don't stop until the discrepancy has been reduced to 16% # option alu.disk-speed-usage.entry-threshold # NEVER SET IT. SPEED IS CONSTANT!!! # option alu.disk-speed-usage.exit-threshold # NEVER SET IT. SPEED IS CONSTANT!!! option alu.stat-refresh.interval 10sec # Refresh the statistics used for decision-making every 10 seconds # option alu.stat-refresh.num-file-create 10 option namespace namespace-gridfs01 subvolumes brick-gridfs01 brick-gridcpu01 brick-gridcpu02 brick- gridcpu03 brick-gridcpu04 end-volume GlusterFS Server Config: # file: /etc/glusterfs/glusterfs-server.vol volume posix type storage/posix option directory /opt/gridfs/export end-volume volume plocks type features/posix-locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 4 subvolumes plocks end-volume volume brick-ns type storage/posix option directory /opt/gridfs/namespace end-volume volume server type protocol/server option transport-type tcp/server option listen-port 6997 option auth.ip.brick.allow * option auth.ip.brick-ns.allow * subvolumes brick brick-ns end-volume Jim Phillips jim at ergophobia.org