head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

Mark Millard marklmi at yahoo.com
Tue Oct 23 01:43:29 UTC 2018


[I' unable to reproduce the under-Hyper-V early kernel
crash for WITH_ZFS= (implicit) build that includes the
for-loaders patch I was given to try.]

On 2018-Oct-22, at 10:01 AM, Mark Millard <marklm at yahoo.com> wrote:

> [I will note the the loader problem has been shown to
> not be involved in the kernel problem that this
> "Subject:" was originally for.]
> 
> On 2018-Oct-22, at 9:26 AM, Warner Losh <imp at sdimp.com> wrote:
> 
>> On Mon, Oct 22, 2018 at 6:39 AM Mark Millard <marklmi at yahoo.com> wrote:
>>> On 2018-Oct-22, at 4:07 AM, Toomas Soome <tsoome at me.com> wrote:
>>> 
>>>> On 22 Oct 2018, at 13:58, Mark Millard <marklmi at yahoo.com> wrote:
>>>>> 
>>>>> On 2018-Oct-22, at 2:27 AM, Toomas Soome <tsoome at me.com> wrote:
>>>>>> 
>>>>>>> On 22 Oct 2018, at 06:30, Warner Losh <imp at bsdimp.com> wrote:
>>>>>>> 
>>>>>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp at bsdimp.com> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>>>>>>>> freebsd-stable at freebsd.org> wrote:
>>>>>>>> 
>>>>>>>>> [I built based on WITHOUT_ZFS= for other reasons. But,
>>>>>>>>> after installing the build, Hyper-V based boots are
>>>>>>>>> working.]
>>>>>>>>> 
>>>>>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote:
>>>>>>>>> 
>>>>>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote:
>>>>>>>>>> . . .
>>>>>>> 
>>>>>> 
>>>>>> It would help to get output from loader lsdev -v command.
>>>>> 
>>>>> That turned out to be very interesting: The non-ZFS loader
>>>>> crashes during the listing, during disk8, which shows a
>>>>> x0 instead of a x512.
>>>>> 
>>>> 
>>>> Yes, thats the root cause there. The non-zfs loader does only *read* the boot disk, thats why the issue was not revealed there. 
>>>> 
>>>> It would help to identify the sector size for that disk, at least from OS, so we can compare with what we can get from INT13.
>>>> 
>>>> I have pretty good idea what to look there, but I am afraid we need to run few tests with you to understand why that disk is reporting sector size 0 there.
>>>> 
>>>> 
>>> 
>>> Looks like I guessed wrong about the device
>>> for "drive8".
>>> 
>>> So I unplugged the only other external
>>> storage device, so the original drives
>>> 0-13 become 0-11 overall.
>>> 
>>> The machine has a multi-LUN media card reader with
>>> no cards plugged in. It is built-in rather than
>>> one that I plugged into a port. It has 4 LUN's.
>>> 
>>> So 8+4=12 and drives 0-7 show up with media before
>>> it tries any of the 4 LUN's with no card in place.
>>> 
>>> I conclude that "drive8" is an empty LUN in a media
>>> card reader.
>>> 
>>> I conclude that there is no sector size available for
>>> any of the empty LUNs in the media reader.
>>> 
>> I think you are probably right and we're hitting some divide by 0 error when we should just ignore the disk.
> 
> In the Hyper-V context, the loader and kernel do not
> see the 4-LUN media reader at all: only drives with
> normal freebsd-* style partitions and free space.
> This explains why I did not see a loader problem
> in that context.
> 
> So I conclude that the kernel crash under Hyper-V
> associated with -r338807 is a separate issue even
> though WITHOUT_ZFS= seems to have avoided the
> crash.
> 
> My plan is to continue with the -r338807 investigation
> after the loader problem is fixed in my builds. Then
> I've go back to trying builds using WITH_ZFS= (implicit),
> both native boots and Hyper-V based ones.

So much for my ability to make that inference correctly:

The WITH_ZFS= (implicit) build worked fine for booting
natively and via Hyper-V when the patch to fix the loaders
was included in what to build. I'm now unable to reproduce
this kernel-time crash.

The patch was from: https://reviews.freebsd.org/D11174

The empty LUN's in the media reader now get messages that
look something like:

disk8: Read 1 sector(s) from 0 to 0xffffe000 (0x8000): 0x31

early in the loader activity.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-stable mailing list