thr3ads.net - freebsd stable - cp from NFS to ZFS hung in "fifoor" [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Mikhail T.

2015-Nov-28 15:42 UTC

cp from NFS to ZFS hung in "fifoor"

I was copying /home from an old server (narawntapu) to a new one
(aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
I created a ZFS filesystem for it.

The copying was started thus:

    root at aldan:/home (435) cp -Rpn /mnt/* .

for a while this was proceeding at a decent clip with cp making
newnfsreq-uests:

    load: 0.78  cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
   
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
    ->
   
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
    100%
    load: 1.23  cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
   
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
    ->
   
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
    100%

ZFS on the destination compressing and writing stuff out and the traffic
between the two ranging from 30 to 50Mb/s (according to systat), but
then something happened and the cp-process is now hung:

    load: 0.55  cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
    load: 0.50  cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
    load: 0.22  cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k

There is nothing in the logs on the new system, but the old one has a
number of entries like:

    Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb
    0xfffff80086231930: Listen queue overflow: 8 already in queue
    awaiting acceptance (62 occurrences)
    Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb
    0xfffff80086231930: Listen queue overflow: 8 already in queue
    awaiting acceptance (50 occurrences)
    Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb
    0xfffff80086231930: Listen queue overflow: 8 already in queue
    awaiting acceptance (59 occurrences)
    Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb
    0xfffff80086231930: Listen queue overflow: 8 already in queue
    awaiting acceptance (57 occurrences)
    Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb
    0xfffff80086231930: Listen queue overflow: 8 already in queue
    awaiting acceptance (68 occurrences)

Both systems are largely idle now. I'm not in a hurry -- is anybody
interested in investigating it in situ? What is "fifoor" -- does this
point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both
systems run FreeBSD/amd64 of recent 10.x-vintage.

Thanks!

    -mi

Rick Macklem

2015-Nov-28 21:53 UTC

head link

cp from NFS to ZFS hung in "fifoor"

Mikhail T. wrote:> I was copying /home from an old server (narawntapu) to a new one
> (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
> ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
> I created a ZFS filesystem for it.
> 
> The copying was started thus:
> 
>     root at aldan:/home (435) cp -Rpn /mnt/* .
> 
> for a while this was proceeding at a decent clip with cp making
> newnfsreq-uests:
> 
>     load: 0.78  cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     100%
>     load: 1.23  cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     100%
> 
> ZFS on the destination compressing and writing stuff out and the traffic
> between the two ranging from 30 to 50Mb/s (according to systat), but
> then something happened and the cp-process is now hung:
> 
>     load: 0.55  cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
>     load: 0.50  cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
>     load: 0.22  cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k
> Doing `ps axHl` will show you what the ``cp`` process is stuck on (WCHAN).
If it is down inside ZFS, then I suspect it is ZFS resource related. If
it is stuck somewhere in NFS or the kernel RPC, then I`d suspect a net
driver issue:
- The number 1 issue for net drivers vs NFS is TSO, so disabling TSO is
  the first thing to try (if the processes aren`t stuck inside zfs).
  (In the machine that is sending data, since it is a transmit segment limit
   problem. If I understood what you were doing, that would be the NFS server,
   but I`d disable it on both server and client.)
  - If that doesn`t fix it, try rsize=32768,wsize=32768 mount options for the
    NFS mount.

These TSO issues are slowly getting resolved, but some drivers may still be
broken, especially if you aren`t running head. (For example, only a very recent
em(4) driver is fixed.)

You can also do things like `netstat -m` to look for mbuf cluster exhaustion
and look at the stats for your net driver (usually a sysctl).

Good luck with it, rick
> There is nothing in the logs on the new system, but the old one has a
> number of entries like:
> 
>     Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (62 occurrences)
>     Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (50 occurrences)
>     Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (59 occurrences)
>     Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (57 occurrences)
>     Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (68 occurrences)
> 
> Both systems are largely idle now. I'm not in a hurry -- is anybody
> interested in investigating it in situ? What is "fifoor" -- does
this
> point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both
> systems run FreeBSD/amd64 of recent 10.x-vintage.
> 
> Thanks!
> 
>     -mi
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>

Jilles Tjoelker

2015-Nov-28 22:41 UTC

head link

cp from NFS to ZFS hung in "fifoor"

On Sat, Nov 28, 2015 at 10:42:28AM -0500, Mikhail T.
wrote:> I was copying /home from an old server (narawntapu) to a new one
> (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
> ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
> I created a ZFS filesystem for it.
> The copying was started thus:
>     root at aldan:/home (435) cp -Rpn /mnt/* .
> for a while this was proceeding at a decent clip with cp making
> newnfsreq-uests:
>     load: 0.78  cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     100%
>     load: 1.23  cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     100%
> ZFS on the destination compressing and writing stuff out and the traffic
> between the two ranging from 30 to 50Mb/s (according to systat), but
> then something happened and the cp-process is now hung:
>     load: 0.55  cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
>     load: 0.50  cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
>     load: 0.22  cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k
This normally means that the process is opening a fifo for reading and
is waiting for a writer. Although cp -R will normally copy a fifo by
calling mkfifo at the destination, it may open one if a regular file is
replaced with a fifo between the time it reads the directory and it
copies that file. This is not that unlikely if large directory trees are
copied during that time.

On the other hand, cp without -R/-r/-l/-s will always open a fifo.

You can make cp continue by opening the fifo (which you'll need to find
first, for example by checking what has been copied already) for
writing, like : >/path/to/some/fifo. It will be replaced with an empty
file at the destination.

-- 
Jilles Tjoelker

Slawa Olhovchenkov

2015-Nov-29 12:00 UTC

head link

cp from NFS to ZFS hung in "fifoor"

On Sat, Nov 28, 2015 at 10:42:28AM -0500, Mikhail T. wrote:
> I was copying /home from an old server (narawntapu) to a new one
> (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
> ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
> I created a ZFS filesystem for it.
> 
> The copying was started thus:
> 
>     root at aldan:/home (435) cp -Rpn /mnt/* .
> 
> for a while this was proceeding at a decent clip with cp making
> newnfsreq-uests:
> 
>     load: 0.78  cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     100%
>     load: 1.23  cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
>    
/mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     ->
>    
./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     100%
> 
> ZFS on the destination compressing and writing stuff out and the traffic
> between the two ranging from 30 to 50Mb/s (according to systat), but
> then something happened and the cp-process is now hung:
> 
>     load: 0.55  cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
>     load: 0.50  cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
>     load: 0.22  cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k
> 
# grep -r fifoor /usr/src/
/usr/src/sys/fs/fifofs/fifo_vnops.c:              PDROP | PCATCH | PSOCK,
"fifoor", 0);

May be cp try copy fifo's content, by incorrectly detect special file?

freebsd stable - Nov 2015 - cp from NFS to ZFS hung in "fifoor"

cp from NFS to ZFS hung in "fifoor"

cp from NFS to ZFS hung in "fifoor"

cp from NFS to ZFS hung in "fifoor"

cp from NFS to ZFS hung in "fifoor"