I was copying /home from an old server (narawntapu) to a new one (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags ro,intr. On narawntapu /home was simply located on an SSD, but on aldan I created a ZFS filesystem for it. The copying was started thus: root at aldan:/home (435) cp -Rpn /mnt/* . for a while this was proceeding at a decent clip with cp making newnfsreq-uests: load: 0.78 cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S -> ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S 100% load: 1.23 cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S -> ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S 100% ZFS on the destination compressing and writing stuff out and the traffic between the two ranging from 30 to 50Mb/s (according to systat), but then something happened and the cp-process is now hung: load: 0.55 cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k load: 0.50 cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k load: 0.22 cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k There is nothing in the logs on the new system, but the old one has a number of entries like: Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb 0xfffff80086231930: Listen queue overflow: 8 already in queue awaiting acceptance (62 occurrences) Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb 0xfffff80086231930: Listen queue overflow: 8 already in queue awaiting acceptance (50 occurrences) Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb 0xfffff80086231930: Listen queue overflow: 8 already in queue awaiting acceptance (59 occurrences) Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb 0xfffff80086231930: Listen queue overflow: 8 already in queue awaiting acceptance (57 occurrences) Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb 0xfffff80086231930: Listen queue overflow: 8 already in queue awaiting acceptance (68 occurrences) Both systems are largely idle now. I'm not in a hurry -- is anybody interested in investigating it in situ? What is "fifoor" -- does this point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both systems run FreeBSD/amd64 of recent 10.x-vintage. Thanks! -mi
Mikhail T. wrote:> I was copying /home from an old server (narawntapu) to a new one > (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags > ro,intr. On narawntapu /home was simply located on an SSD, but on aldan > I created a ZFS filesystem for it. > > The copying was started thus: > > root at aldan:/home (435) cp -Rpn /mnt/* . > > for a while this was proceeding at a decent clip with cp making > newnfsreq-uests: > > load: 0.78 cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > 100% > load: 1.23 cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > 100% > > ZFS on the destination compressing and writing stuff out and the traffic > between the two ranging from 30 to 50Mb/s (according to systat), but > then something happened and the cp-process is now hung: > > load: 0.55 cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k > load: 0.50 cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k > load: 0.22 cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k >Doing `ps axHl` will show you what the ``cp`` process is stuck on (WCHAN). If it is down inside ZFS, then I suspect it is ZFS resource related. If it is stuck somewhere in NFS or the kernel RPC, then I`d suspect a net driver issue: - The number 1 issue for net drivers vs NFS is TSO, so disabling TSO is the first thing to try (if the processes aren`t stuck inside zfs). (In the machine that is sending data, since it is a transmit segment limit problem. If I understood what you were doing, that would be the NFS server, but I`d disable it on both server and client.) - If that doesn`t fix it, try rsize=32768,wsize=32768 mount options for the NFS mount. These TSO issues are slowly getting resolved, but some drivers may still be broken, especially if you aren`t running head. (For example, only a very recent em(4) driver is fixed.) You can also do things like `netstat -m` to look for mbuf cluster exhaustion and look at the stats for your net driver (usually a sysctl). Good luck with it, rick> There is nothing in the logs on the new system, but the old one has a > number of entries like: > > Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb > 0xfffff80086231930: Listen queue overflow: 8 already in queue > awaiting acceptance (62 occurrences) > Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb > 0xfffff80086231930: Listen queue overflow: 8 already in queue > awaiting acceptance (50 occurrences) > Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb > 0xfffff80086231930: Listen queue overflow: 8 already in queue > awaiting acceptance (59 occurrences) > Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb > 0xfffff80086231930: Listen queue overflow: 8 already in queue > awaiting acceptance (57 occurrences) > Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb > 0xfffff80086231930: Listen queue overflow: 8 already in queue > awaiting acceptance (68 occurrences) > > Both systems are largely idle now. I'm not in a hurry -- is anybody > interested in investigating it in situ? What is "fifoor" -- does this > point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both > systems run FreeBSD/amd64 of recent 10.x-vintage. > > Thanks! > > -mi > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
On Sat, Nov 28, 2015 at 10:42:28AM -0500, Mikhail T. wrote:> I was copying /home from an old server (narawntapu) to a new one > (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags > ro,intr. On narawntapu /home was simply located on an SSD, but on aldan > I created a ZFS filesystem for it.> The copying was started thus:> root at aldan:/home (435) cp -Rpn /mnt/* .> for a while this was proceeding at a decent clip with cp making > newnfsreq-uests:> load: 0.78 cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > 100% > load: 1.23 cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > 100%> ZFS on the destination compressing and writing stuff out and the traffic > between the two ranging from 30 to 50Mb/s (according to systat), but > then something happened and the cp-process is now hung:> load: 0.55 cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k > load: 0.50 cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k > load: 0.22 cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300kThis normally means that the process is opening a fifo for reading and is waiting for a writer. Although cp -R will normally copy a fifo by calling mkfifo at the destination, it may open one if a regular file is replaced with a fifo between the time it reads the directory and it copies that file. This is not that unlikely if large directory trees are copied during that time. On the other hand, cp without -R/-r/-l/-s will always open a fifo. You can make cp continue by opening the fifo (which you'll need to find first, for example by checking what has been copied already) for writing, like : >/path/to/some/fifo. It will be replaced with an empty file at the destination. -- Jilles Tjoelker
On Sat, Nov 28, 2015 at 10:42:28AM -0500, Mikhail T. wrote:> I was copying /home from an old server (narawntapu) to a new one > (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags > ro,intr. On narawntapu /home was simply located on an SSD, but on aldan > I created a ZFS filesystem for it. > > The copying was started thus: > > root at aldan:/home (435) cp -Rpn /mnt/* . > > for a while this was proceeding at a decent clip with cp making > newnfsreq-uests: > > load: 0.78 cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S > 100% > load: 1.23 cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k > /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > -> > ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S > 100% > > ZFS on the destination compressing and writing stuff out and the traffic > between the two ranging from 30 to 50Mb/s (according to systat), but > then something happened and the cp-process is now hung: > > load: 0.55 cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k > load: 0.50 cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k > load: 0.22 cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k ># grep -r fifoor /usr/src/ /usr/src/sys/fs/fifofs/fifo_vnops.c: PDROP | PCATCH | PSOCK, "fifoor", 0); May be cp try copy fifo's content, by incorrectly detect special file?