cp from NFS to ZFS hung in "fifoor"
Rick Macklem
rmacklem at uoguelph.ca
Sat Nov 28 21:53:41 UTC 2015
Mikhail T. wrote:
> I was copying /home from an old server (narawntapu) to a new one
> (aldan). The narawntapu:/home is mounted on aldan as /mnt with flags
> ro,intr. On narawntapu /home was simply located on an SSD, but on aldan
> I created a ZFS filesystem for it.
>
> The copying was started thus:
>
> root at aldan:/home (435) cp -Rpn /mnt/* .
>
> for a while this was proceeding at a decent clip with cp making
> newnfsreq-uests:
>
> load: 0.78 cmd: cp 38711 [newnfsreq] 802.84r 1.57u 140.63s 20% 10768k
> /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
> ->
> ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/sent/cur/1219621413.32392.hd8cl:2,S
> 100%
> load: 1.23 cmd: cp 38711 [newnfsreq] 874.19r 1.66u 154.74s 17% 4576k
> /mnt/mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
> ->
> ./mi/.kde/share/apps/kmail/dimap/.42838394.directory/ML/cur/1219595347.32392.rMDFf:2,S
> 100%
>
> ZFS on the destination compressing and writing stuff out and the traffic
> between the two ranging from 30 to 50Mb/s (according to systat), but
> then something happened and the cp-process is now hung:
>
> load: 0.55 cmd: cp 38711 [fifoor] 1107.67r 2.09u 194.12s 0% 3300k
> load: 0.50 cmd: cp 38711 [fifoor] 1112.66r 2.09u 194.12s 0% 3300k
> load: 0.22 cmd: cp 38711 [fifoor] 1642.37r 2.09u 194.12s 0% 3300k
>
Doing `ps axHl` will show you what the ``cp`` process is stuck on (WCHAN).
If it is down inside ZFS, then I suspect it is ZFS resource related. If
it is stuck somewhere in NFS or the kernel RPC, then I`d suspect a net
driver issue:
- The number 1 issue for net drivers vs NFS is TSO, so disabling TSO is
the first thing to try (if the processes aren`t stuck inside zfs).
(In the machine that is sending data, since it is a transmit segment limit
problem. If I understood what you were doing, that would be the NFS server,
but I`d disable it on both server and client.)
- If that doesn`t fix it, try rsize=32768,wsize=32768 mount options for the
NFS mount.
These TSO issues are slowly getting resolved, but some drivers may still be
broken, especially if you aren`t running head. (For example, only a very recent
em(4) driver is fixed.)
You can also do things like `netstat -m` to look for mbuf cluster exhaustion
and look at the stats for your net driver (usually a sysctl).
Good luck with it, rick
> There is nothing in the logs on the new system, but the old one has a
> number of entries like:
>
> Nov 28 10:28:45 narawntapu kernel: sonewconn: pcb
> 0xfffff80086231930: Listen queue overflow: 8 already in queue
> awaiting acceptance (62 occurrences)
> Nov 28 10:29:45 narawntapu kernel: sonewconn: pcb
> 0xfffff80086231930: Listen queue overflow: 8 already in queue
> awaiting acceptance (50 occurrences)
> Nov 28 10:30:46 narawntapu kernel: sonewconn: pcb
> 0xfffff80086231930: Listen queue overflow: 8 already in queue
> awaiting acceptance (59 occurrences)
> Nov 28 10:31:46 narawntapu kernel: sonewconn: pcb
> 0xfffff80086231930: Listen queue overflow: 8 already in queue
> awaiting acceptance (57 occurrences)
> Nov 28 10:32:46 narawntapu kernel: sonewconn: pcb
> 0xfffff80086231930: Listen queue overflow: 8 already in queue
> awaiting acceptance (68 occurrences)
>
> Both systems are largely idle now. I'm not in a hurry -- is anybody
> interested in investigating it in situ? What is "fifoor" -- does this
> point to a trouble in the ZFS, the NFS-client, or the NFS-server? Both
> systems run FreeBSD/amd64 of recent 10.x-vintage.
>
> Thanks!
>
> -mi
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
More information about the freebsd-stable
mailing list