file system deadlock in RELENG_11

Mike Tancsa mike at sentex.net
Thu Aug 24 15:48:45 UTC 2017


I upgraded a server yesterday from RELENG_11 from march 2017 to r322800
(Aug 22) and noticed that under heavy disk IO in a VM, the server is
locking up.  In the vm, I was doing a large untar and I noticed that
prior to the lockup, the hypervisor would be struggling to keep up the
disk writes.  The VM is on a zvol if that makes any difference. A few
times in the VM, IO would be clogged to the point that the disk would
timeout in the VM

Aug 24 08:32:02 kernel: ahcich6: Timeout on slot 14 port 0
Aug 24 08:32:02 kernel: ahcich6: is 00000000 cs 00000000 ss ffffffff rs
ffffffff tfd 50 serr 00000000 cmd 0001db17
Aug 24 08:32:02 kernel: (ada1:ahcich6:0:0:0): WRITE_FPDMA_QUEUED. ACB:
61 00 a8 47 d8 40 01 00 00 01 00 00
Aug 24 08:32:02 kernel: (ada1:ahcich6:0:0:0): CAM status: Command timeout
Aug 24 08:32:02 kernel: (ada1:ahcich6:0:0:0): Retrying command

When the parent deadlocks, I cant run anything thats not already in RAM.
shutdown doesnt work and I have to reboot the box via IPMI.

Any ideas how to debug this or try and better understand the problem so
I can at least work around it ?

	---Mike

-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/


More information about the freebsd-stable mailing list