ZFS / NFS deadlock??? (Was: Re: Unexpected reboot after ctld run into trouble.)
Willem Jan Withagen
wjw at digiware.nl
Wed May 20 15:43:26 UTC 2015
On 16/05/2015 14:25, Willem Jan Withagen wrote:
> Hi,
>
> Found the following in my logs:
> Losts of
> ----
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (740688 occurrences)
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (713721 occurrences)
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (691776 occurrences)
> ----
>
> Which then ends in:
> ----
> panic: deadlkres: possible deadlock detected for 0xfffff8001ee94920,
> blocked for 1801009 ticks
>
>
> cpuid = 1
> Uptime: 14d13h13m47s
> Dumping 7557 out of 8175
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%Table 'FACP' at
> 0xcfedbcf8
> ----
> The system is running ZFS with ZFS-on-root: FreeBSD zfs.digiware.nl
> 10.1-STABLE FreeBSD 10.1-STABLE #221 r282282: Fri May 1 06:51:41
> CEST 2015
>
> This could stem from the fact that I woke up my Win8 PC which has a
> iscsi volume mounted. It is used to store security cam captures on
> and does have somewhat bigger traffic on it.
>
> Suggestions or question to look at are welcome.
> I do have a core in /var/crash, but will need some guidance to
> retrieve stuff from it.
Followup to this story, after some discussion with/debugging by Edward
(trasz@):
>> Now, the bad news: I don't think I'll be able to help you with this
>> one. It looks like the problem is actually NFS-related. Using the
>> hex address from the deadlock message in dmesg:
>>
>> % kgdb boot/kernel/kernel vmcore.3
>>
>> (kgdb) p ((struct thread *)0xfffff8001ee94920)->td_proc->p_comm $6
>> = "nfsd", '\0' <repeats 15 times> (kgdb) p ((struct thread
>> *)0xfffff8001ee94920)->td_wmesg $7 = 0xffffffff80edcfc3 "zfs"
>>
>> So it might actually be a ZFS deadlock the nfsd thread tripped on.
> The panic was triggered by deadlkres; it noticed that there was a
> thread that spent way too much time waiting for something - so,
> presumably, it become "hung" due to a deadlock.
> The 0xfffff8001ee94920 in dmesg is the address of "struct thread" of the
> problematic thread.
> The first print shows the "command name" (p_comm) of the process the
> thread belongs to. The second print shows the "wait channel", on
> which the thread sleeped.
So now the questions are:
1) Is this indeed a ZFS / NFS deadlock problem?
2) Who can/wil help to get this worked out?
Thanx,
--WjW
More information about the freebsd-fs
mailing list