cp from NFS to ZFS hung in "fifoor"

Sat Nov 28 21:53:41 UTC 2015

Mikhail T. wrote:
> I was copying /home from an old server (narawntapu) to a new one
> (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
> ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
> I created a ZFS filesystem for it.
> 
> The copying was started thus:
> 
>     root at aldan:/home (435) cp -Rpn /mnt/* .
> 
> for a while this was proceeding at a decent clip with cp making
> newnfsreq-uests:
> 
>     load: 0.78  cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
>     /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     ->
>     ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
>     100%
>     load: 1.23  cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
>     /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     ->
>     ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
>     100%
> 
> ZFS on the destination compressing and writing stuff out and the traffic
> between the two ranging from 30 to 50Mb/s (according to systat), but
> then something happened and the cp-process is now hung:
> 
>     load: 0.55  cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
>     load: 0.50  cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
>     load: 0.22  cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k
> 
Doing `ps axHl` will show you what the ``cp`` process is stuck on (WCHAN).
If it is down inside ZFS, then I suspect it is ZFS resource related. If
it is stuck somewhere in NFS or the kernel RPC, then I`d suspect a net
driver issue:
- The number 1 issue for net drivers vs NFS is TSO, so disabling TSO is
  the first thing to try (if the processes aren`t stuck inside zfs).
  (In the machine that is sending data, since it is a transmit segment limit
   problem. If I understood what you were doing, that would be the NFS server,
   but I`d disable it on both server and client.)
  - If that doesn`t fix it, try rsize=32768,wsize=32768 mount options for the
    NFS mount.

These TSO issues are slowly getting resolved, but some drivers may still be
broken, especially if you aren`t running head. (For example, only a very recent
em(4) driver is fixed.)

You can also do things like `netstat -m` to look for mbuf cluster exhaustion
and look at the stats for your net driver (usually a sysctl).

Good luck with it, rick

> There is nothing in the logs on the new system, but the old one has a
> number of entries like:
> 
>     Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (62 occurrences)
>     Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (50 occurrences)
>     Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (59 occurrences)
>     Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (57 occurrences)
>     Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb
>     0xfffff80086231930: Listen queue overflow: 8 already in queue
>     awaiting acceptance (68 occurrences)
> 
> Both systems are largely idle now. I'm not in a hurry -- is anybody
> interested in investigating it in situ? What is "fifoor" -- does this
> point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both
> systems run FreeBSD/amd64 of recent 10.x-vintage.
> 
> Thanks!
> 
>     -mi
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>