Hugo Cisneiros (Eitch)
2010-Jun-22 15:40 UTC
[Gluster-users] FUSE I/O Lockup while accessing glusterfs shares
Hi, We have a web server installation that is using glusterFS. There are 3 GlusterFS servers and 4 GlusterFS clients acessing one share. We're having a problem that randomly, the httpd processes begin to hung while waiting for I/O. This begins to happen with the following kernel call trace: kernel: INFO: task httpd.worker:18974 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: httpd.worker D ffffffff80150462 0 18974 4818 18975 18972 (NOTLB) kernel: ffff8108c0ee7c78 0000000000000086 ffff8101f3b2b000 ffff810116d0f200 kernel: ffff81037392c6d8 000000000000000a ffff8106b73d57a0 ffff810116dd4100 kernel: 00004f0baea8a21e 00000000000026f5 ffff8106b73d5988 0000000300000000 kernel: Call Trace: kernel: [<ffffffff88443cda>] :fuse:fuse_dentry_revalidate+0x13e/0x1c1 kernel: [<ffffffff80063c6f>] __mutex_lock_slowpath+0x60/0x9b kernel: [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14 kernel: [<ffffffff8000cec2>] do_lookup+0x90/0x1e6 kernel: [<ffffffff80009bb2>] __link_path_walk+0x3a6/0xf42 kernel: [<ffffffff8000e9e2>] link_path_walk+0x42/0xb2 kernel: [<ffffffff8000ccb2>] do_path_lookup+0x275/0x2f1 kernel: [<ffffffff8001280e>] getname+0x15b/0x1c2 kernel: [<ffffffff80023876>] __user_walk_fd+0x37/0x4c kernel: [<ffffffff80028846>] vfs_stat_fd+0x1b/0x4a kernel: [<ffffffff8000c5fe>] _atomic_dec_and_lock+0x39/0x57 kernel: [<ffffffff800235a8>] sys_newstat+0x19/0x31 kernel: [<ffffffff80012b54>] __fput+0x191/0x1bd kernel: [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89 kernel: [<ffffffff80023b84>] filp_close+0x5c/0x64 kernel: [<ffffffff8001dfa6>] sys_close+0x88/0xbd kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 Looking at the glusterfs client logs, the following entries begin to show up before this call trace begins: [fuse-bridge.c:585:fuse_lookup] glusterfs-fuse: 89023783: LOOKUP 46912563008752/2010 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024196: GETATTR 46912563008752 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024203: GETATTR 46912563008752 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024207: GETATTR 46912563008752 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024210: GETATTR 46912563008752 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024213: GETATTR 46912563008752 (fuse_loc_fill() failed) [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024218: GETATTR 46912563008752 (fuse_loc_fill() failed) The same message repeats 1116 times. Talking with a friend, we noticed that the 4 web servers are reading *and* writing very concurrently and because of this, there are many lookups, reads and unlinks returning "no such file or directory" on the log, like these: [fuse-bridge.c:859:fuse_fd_cbk] glusterfs-fuse: 88969079: OPEN() /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such file or directory) [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 88969080: UNLINK() /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such file or directory) [fuse-bridge.c:491:fuse_entry_cbk] glusterfs-fuse: LOOKUP(/cache/index.html) inode (ptr=0x2aaab313f9f0, ino=1528908, gen=5480759143103581058) found conflict (ptr=0x2aaadebc8690, ino=1528908, gen=5480759143103581058) After the kernel call trace appears, the fuse filesystem becomes very unresponsive and unstable. Who tries to read the mounted directory, randomly gets its process locked (with status waiting for I/O) and hangs forever. We know that maybe glusterfs is not made for high concurrency for both writing and reading, so we're working on getting an alternate solution for this behavior (it's a disk cache, we're transfering for something like memcached). But maybe this error helps to find something unexpected going on... :) Here's the client configuration: volume vol01 type protocol/client option transport-type tcp option remote-host server01.domain option remote-subvolume vol end-volume volume vol02 type protocol/client option transport-type tcp option remote-host server02.domain option remote-subvolume vol end-volume volume vol03 type protocol/client option transport-type tcp option remote-host server03.domain option remote-subvolume vol end-volume volume vol-mirror type cluster/replicate subvolumes vol01 vol02 vol03 end-volume volume vol-iocache type performance/io-cache option cache-size 128MB option cache-timeout 2 subvolumes vol-mirror end-volume volume vol-quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes vol-iocache end-volume volume vol-writebehind type performance/write-behind option cache-size 4MB subvolumes vol-quickread end-volume volume vol-statprefetch type performance/stat-prefetch subvolumes vol-writebehind end-volume Thanks, -- []'s Hugo www.devin.com.br
What distribution, kernel and GlusterFS versions are you using? I've had problems with some kernels between 2.6.31 and 2.6.32-r10 (-r10 is a Gentoo release number) with GlusterFS 3.0.4 inside a KVM and KVM versions < 0.12.4. Kernel 2.6.30 or latest 2.6.32-r11 and KVM 0.12.4 seem to work fine now. We also have had some NFSv4 problems with the kernels mentioned above. 2.6.30 and 2.6.32-r11 doesn't have this problem. The problem was a race condition in KVM's virtio-net driver. Maybe you have some problems with your network(driver) too. Some weeks ago we have moved a directory which was used by a Drupal cache module from GlusterFS to NFSv4 since we've had a lot of problems with it. There was also heavy writing and reading into this directory. Currently I'm not sure anymore if this was a kernel or GlusterFS problem. I've never tested it again with GlusterFS. Some other directories with mainly reading are working fine. - Robert On 06/22/10 17:40, Hugo Cisneiros (Eitch) wrote:> Hi, > > We have a web server installation that is using glusterFS. There are 3 > GlusterFS servers and 4 GlusterFS clients acessing one share. We're > having a problem that randomly, the httpd processes begin to hung > while waiting for I/O. This begins to happen with the following kernel > call trace: > > kernel: INFO: task httpd.worker:18974 blocked for more than 120 seconds. > kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > kernel: httpd.worker D ffffffff80150462 0 18974 4818 > 18975 18972 (NOTLB) > kernel: ffff8108c0ee7c78 0000000000000086 ffff8101f3b2b000 ffff810116d0f200 > kernel: ffff81037392c6d8 000000000000000a ffff8106b73d57a0 ffff810116dd4100 > kernel: 00004f0baea8a21e 00000000000026f5 ffff8106b73d5988 0000000300000000 > kernel: Call Trace: > kernel: [<ffffffff88443cda>] :fuse:fuse_dentry_revalidate+0x13e/0x1c1 > kernel: [<ffffffff80063c6f>] __mutex_lock_slowpath+0x60/0x9b > kernel: [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14 > kernel: [<ffffffff8000cec2>] do_lookup+0x90/0x1e6 > kernel: [<ffffffff80009bb2>] __link_path_walk+0x3a6/0xf42 > kernel: [<ffffffff8000e9e2>] link_path_walk+0x42/0xb2 > kernel: [<ffffffff8000ccb2>] do_path_lookup+0x275/0x2f1 > kernel: [<ffffffff8001280e>] getname+0x15b/0x1c2 > kernel: [<ffffffff80023876>] __user_walk_fd+0x37/0x4c > kernel: [<ffffffff80028846>] vfs_stat_fd+0x1b/0x4a > kernel: [<ffffffff8000c5fe>] _atomic_dec_and_lock+0x39/0x57 > kernel: [<ffffffff800235a8>] sys_newstat+0x19/0x31 > kernel: [<ffffffff80012b54>] __fput+0x191/0x1bd > kernel: [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89 > kernel: [<ffffffff80023b84>] filp_close+0x5c/0x64 > kernel: [<ffffffff8001dfa6>] sys_close+0x88/0xbd > kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 > > Looking at the glusterfs client logs, the following entries begin to > show up before this call trace begins: > > [fuse-bridge.c:585:fuse_lookup] glusterfs-fuse: 89023783: LOOKUP > 46912563008752/2010 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024196: GETATTR > 46912563008752 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024203: GETATTR > 46912563008752 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024207: GETATTR > 46912563008752 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024210: GETATTR > 46912563008752 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024213: GETATTR > 46912563008752 (fuse_loc_fill() failed) > [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024218: GETATTR > 46912563008752 (fuse_loc_fill() failed) > > The same message repeats 1116 times. Talking with a friend, we noticed > that the 4 web servers are reading *and* writing very concurrently and > because of this, there are many lookups, reads and unlinks returning > "no such file or directory" on the log, like these: > > [fuse-bridge.c:859:fuse_fd_cbk] glusterfs-fuse: 88969079: OPEN() > /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such file or > directory) > [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 88969080: > UNLINK() /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such > file or directory) > [fuse-bridge.c:491:fuse_entry_cbk] glusterfs-fuse: > LOOKUP(/cache/index.html) inode (ptr=0x2aaab313f9f0, ino=1528908, > gen=5480759143103581058) found conflict (ptr=0x2aaadebc8690, > ino=1528908, gen=5480759143103581058) > > After the kernel call trace appears, the fuse filesystem becomes very > unresponsive and unstable. Who tries to read the mounted directory, > randomly gets its process locked (with status waiting for I/O) and > hangs forever. We know that maybe glusterfs is not made for high > concurrency for both writing and reading, so we're working on getting > an alternate solution for this behavior (it's a disk cache, we're > transfering for something like memcached). But maybe this error helps > to find something unexpected going on... :) > > Here's the client configuration: > > volume vol01 > type protocol/client > option transport-type tcp > option remote-host server01.domain > option remote-subvolume vol > end-volume > > volume vol02 > type protocol/client > option transport-type tcp > option remote-host server02.domain > option remote-subvolume vol > end-volume > > volume vol03 > type protocol/client > option transport-type tcp > option remote-host server03.domain > option remote-subvolume vol > end-volume > > volume vol-mirror > type cluster/replicate > subvolumes vol01 vol02 vol03 > end-volume > > volume vol-iocache > type performance/io-cache > option cache-size 128MB > option cache-timeout 2 > subvolumes vol-mirror > end-volume > > volume vol-quickread > type performance/quick-read > option cache-timeout 1 > option max-file-size 64kB > subvolumes vol-iocache > end-volume > > volume vol-writebehind > type performance/write-behind > option cache-size 4MB > subvolumes vol-quickread > end-volume > > volume vol-statprefetch > type performance/stat-prefetch > subvolumes vol-writebehind > end-volume > > Thanks, >