The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

Ken Merry ken at freebsd.org
Mon Jul 24 16:25:37 UTC 2017


It is possible that the change I MFCed today (r321207 in head, r321415 in stable/11) is related, but Mark will have to boot his machine with the fix to see if it makes any difference.

What happened in my case on one particular machine (not on most machines in our lab running the same code) was that mps_wait_command() / mpr_wait_command() would not wait the full 60 seconds for a write to the DPM table (Driver Persistent Mapping) table in the controller.  So, it reported that there was a timeout.

There is a secondary bug that is still in the mps(4) / mpr(4) drivers when a timeout does happen — the error recovery code in the wait_command() routine reinitializes the controller, which clears out all the commands.  When the wait_command() routine returns, the command passed in has been freed, but the caller doesn’t know that.  So the caller (it happens in a number of places) dereferences a pointer to freed memory and the kernel panics.

I’m planning to fix that bug, too, if slm@ doesn’t get to it first, I’ve just had other bugs to fix first.

Eliminating bogus timeouts will eliminate most all of the sources of those panics anyway.

Ken
— 
Ken Merry
ken at FreeBSD.ORG



> On Jul 24, 2017, at 12:10 PM, Steven Hartland <killing at multiplay.co.uk> wrote:
> 
> Based on your boot info you're using mps, so this could be related to mps fix committed to stable/11 today by ken@
> https://svnweb.freebsd.org/changeset/base/321415 <https://svnweb.freebsd.org/changeset/base/321415>
> 
> re@ cc'ed as this could cause hangs for others too on 11.1-RELEASE if this is the case.
> 
>     Regards
>     Steve
> 
> On 24/07/2017 15:55, Mark Martinec wrote:
>>> Thanks! Tried it, and the message (or a backtrace) does not show 
>>> during a boot of a generic (patched) kernel, at least not in 
>>> the last 40-lines screen before the hang occurs. 
>>> (It also does not show during a "Safe mode" successful boot.) 
>> 
>> Btw (may or may not be relevant): after the above experiment 
>> I have rebooted the machine in "Safe mode" (generic kernel, 
>> EARLY_AP_STARTUP enabled by default) - and spent some time 
>> doing non-intensive interactive work on this host (web browsing, 
>> editor, shell, all under KDE) - and after about an hour the 
>> machine froze: clock display not updating, keyboard unresponsive, 
>> console virtual terminals inaccessible) - so had to reboot. 
>> According to fans speed the machine was idle. 
>> The /var/log/messages does not show anything of interest 
>> before the freeze. All disks are under ZFS. 
>> 
>> Can EARLY_AP_STARTUP have an effect also _after_ booting? 
>> This host never hung during normal work when EARLY_AP_STARTUP 
>> was disabled (or with 11.0 and earlier). 
>> 
>>   Mark 
>> _______________________________________________ 
>> freebsd-stable at freebsd.org <mailto:freebsd-stable at freebsd.org> mailing list 
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable <https://lists.freebsd.org/mailman/listinfo/freebsd-stable> 
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" <mailto:freebsd-stable-unsubscribe at freebsd.org> 
> 



More information about the freebsd-stable mailing list