Jim Leonard
2009-Jul-03 20:06 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
I''m having a pretty serious issue with 200906 with simple operations that used to work fine on nv_79. The problem I''m trying to solve right now are FTP transfers from a ZFS filesystem using proftpd as a server that pause for over ten minutes with no discernible cause. When the transfer hangs, this is the truss output: 811: write(1, " 1 5 0 O p e n i n g ".., 83) = 83 811: close(11) Err#9 EBADF 811: sigaction(SIGURG, 0x080479E0, 0x00000000) = 0 811: sigaction(SIGURG, 0x00000000, 0x08047A00) = 0 811: sigaction(SIGURG, 0x080479A0, 0x00000000) = 0 811: brk(0x08188B08) = 0 811: brk(0x08194B08) = 0 811: fcntl(7, F_SETLKW64, 0x080D3520) = 0 811: llseek(7, 16, SEEK_SET) = 16 811: write(7, " +03\0\0 aEA\0\0FEFF\0\0".., 520) = 520 811: llseek(7, 16, SEEK_SET) = 16 811: fcntl(7, F_SETLKW64, 0x080D3520) = 0 811: setsockopt(13, tcp, TCP_CORK, 0x08047AC4, 4, SOV_DEFAULT) = 0 811: fcntl(13, F_GETFL) = 2 811: sendfilev64(1, 13, 0x08047A30, 1, 0x08047A24) (sleeping...) ...and then takes at least 10 minutes to do something. Then another transfer starts, and it happens again. Only a few meg gets transferred between pauses. How can I begin to troubleshoot this? Is there something specific I should be looking at? Despite sendfilev64, is this even a ZFS issue? -- This message posted from opensolaris.org
Eric Schrock
2009-Jul-03 20:33 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
On 07/03/09 13:06, Jim Leonard wrote:> I''m having a pretty serious issue with 200906 with simple operations that used to work fine on nv_79. The problem I''m trying to solve right now are FTP transfers from a ZFS filesystem using proftpd as a server that pause for over ten minutes with no discernible cause. When the transfer hangs, this is the truss output: > > 811: write(1, " 1 5 0 O p e n i n g ".., 83) = 83 > 811: close(11) Err#9 EBADF > 811: sigaction(SIGURG, 0x080479E0, 0x00000000) = 0 > 811: sigaction(SIGURG, 0x00000000, 0x08047A00) = 0 > 811: sigaction(SIGURG, 0x080479A0, 0x00000000) = 0 > 811: brk(0x08188B08) = 0 > 811: brk(0x08194B08) = 0 > 811: fcntl(7, F_SETLKW64, 0x080D3520) = 0 > 811: llseek(7, 16, SEEK_SET) = 16 > 811: write(7, " +03\0\0 aEA\0\0FEFF\0\0".., 520) = 520 > 811: llseek(7, 16, SEEK_SET) = 16 > 811: fcntl(7, F_SETLKW64, 0x080D3520) = 0 > 811: setsockopt(13, tcp, TCP_CORK, 0x08047AC4, 4, SOV_DEFAULT) = 0 > 811: fcntl(13, F_GETFL) = 2 > 811: sendfilev64(1, 13, 0x08047A30, 1, 0x08047A24) (sleeping...) > > ...and then takes at least 10 minutes to do something. Then another transfer starts, and it happens again. Only a few meg gets transferred between pauses. > > How can I begin to troubleshoot this? Is there something specific I should be looking at? Despite sendfilev64, is this even a ZFS issue?This is probably: 6837719 TCP tx might hang when tcp_cork option is set Fixed in build 115. This is a generic networking bug and doesn''t have anything to do with ZFS. If you build proftp with TCP_CORK off you won''t have this problem. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
Jim Leonard
2009-Jul-03 21:16 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
> This is probably: > > 6837719 TCP tx might hang when tcp_cork option is set > > Fixed in build 115. This is a generic networking bug > and doesn''t have > anything to do with ZFS. If you build proftp with > TCP_CORK off you > won''t have this problem.Wow, that was it, thanks! What in the truss (or behavior) led you to that conclusion? -- This message posted from opensolaris.org
Eric Schrock
2009-Jul-03 21:22 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
On 07/03/09 14:16, Jim Leonard wrote:>> This is probably: >> >> 6837719 TCP tx might hang when tcp_cork option is set >> >> Fixed in build 115. This is a generic networking bug >> and doesn''t have >> anything to do with ZFS. If you build proftp with >> TCP_CORK off you >> won''t have this problem. > > Wow, that was it, thanks! What in the truss (or behavior) led you to that conclusion?This sneaky line: 811: setsockopt(13, tcp, TCP_CORK, 0x08047AC4, 4, SOV_DEFAULT) = 0 ^^^^^^^^ As well as the fact that you are using proftpd. The sight of TCP_CORK still triggers some deep fight or flight reaction in my animal brain, after having watched others debug the original problem. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
Bob Friesenhahn
2009-Jul-03 21:42 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
On Fri, 3 Jul 2009, Eric Schrock wrote:> This sneaky line: > > 811: setsockopt(13, tcp, TCP_CORK, 0x08047AC4, 4, SOV_DEFAULT) = 0 > ^^^^^^^^ > > As well as the fact that you are using proftpd. The sight of TCP_CORK still > triggers some deep fight or flight reaction in my animal brain, after having > watched others debug the original problem.I had never heard of the TCP_CORK socket option before. There is an excellent summmary at "http://www.baus.net/on-tcp_cork". The decription includes mention that mis-using TCP_CORK could cause a socket hang at the end of the transfer or if the application waits for a response while the kernel is still waiting for more data from the application. It is necessary to remove TCP_CORK before writing the final data and if the application guesses wrong, the connection will hang. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Eric Schrock
2009-Jul-03 21:46 UTC
[zfs-discuss] Hangs when transferring ftp off of a ZFS filesystem (truss included)
On 07/03/09 14:42, Bob Friesenhahn wrote:> > I had never heard of the TCP_CORK socket option before. There is an > excellent summmary at "http://www.baus.net/on-tcp_cork". The decription > includes mention that mis-using TCP_CORK could cause a socket hang at > the end of the transfer or if the application waits for a response while > the kernel is still waiting for more data from the application. It is > necessary to remove TCP_CORK before writing the final data and if the > application guesses wrong, the connection will hang. >Yep, it''s definitely tricky to get right. In this case though, proftpd isn''t actually at fault (and presumably does the right thing). It was ultimately a kernel bug. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock