Re: 14-current: unable to boot after upgrade (installworld)

From: Toomas Soome via freebsd-current <freebsd-current_at_freebsd.org>
Date: Fri, 10 Dec 2021 16:29:22 UTC

> On 9. Dec 2021, at 20:54, Sergey Dyatko <sergey.dyatko@gmail.com> wrote:
> 
> tiger@dl:~ % gpart show
> 
>                                       |
> =>        40  3907029088  da4  GPT  (1.8T)
>          40        1024    1  freebsd-boot  (512K)
>        1064         984       - free -  (492K)
>        2048  3907026944    2  freebsd-zfs  (1.8T)
>  3907028992         136       - free -  (68K)
> 
> =>        40  3907029088  da5  GPT  (1.8T)
>          40        1024    1  freebsd-boot  (512K)
>        1064         984       - free -  (492K)
>        2048  3907026944    2  freebsd-zfs  (1.8T)
>  3907028992         136       - free -  (68K)
> 
> =>        40  3907029088  da6  GPT  (1.8T)
>          40        1024    1  freebsd-boot  (512K)
>        1064         984       - free -  (492K)
>        2048  3907026944    2  freebsd-zfs  (1.8T)
>  3907028992         136       - free -  (68K)
> 
> =>        40  3907029088  da7  GPT  (1.8T)
>          40        1024    1  freebsd-boot  (512K)
>        1064         984       - free -  (492K)
>        2048  3907026944    2  freebsd-zfs  (1.8T)
>  3907028992         136       - free -  (68K)
> 
> i'm not sure about video, everything happens faster than I can see :-) but
> sometimes the system does not freeze and I can enter commands. Can this
> help in some way?
> 

maybe, maybe not; from one hand, BTX register dump may help us to identify possible location or give other clues - eip=0000004c    from your screenshot is telling us that some structure with function pointers must have been corrupted, seems like NULL pointer derefernce caused by this corruption. So the investigation should try to identify what is causing such corruption…. 

Since it was booting before, does the old loader start? I see the iKVM windo does have record menu entry, can it be used to record whole incident?

rgds,
toomas


> чт, 9 дек. 2021 г. в 18:19, Toomas Soome <tsoome@me.com>:
> 
>> 
>> 
>>> On 9. Dec 2021, at 20:06, Sergey Dyatko <sergey.dyatko@gmail.com> wrote:
>>> 
>>> I was sure the installer did it when I reinstalled the system from
>> scratch. I
>>> can load 14-current successfully after boot via PXE and installworld with
>>> 13-current
>>> now I did the following:
>>> 1) boot from HDDs FreeBSD 14.0-CURRENT #0 main-n251494-f953785b3df (with
>>> 'old' world)
>>> 2)run installworld (f953785b3df)
>>> 3) run `gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da${D}`
>>> command , where D=4..7
>>> root@dl:/usr/src # zpool status
>>> pool: zroot
>>> state: ONLINE
>>> config:
>>> 
>>>       NAME        STATE     READ WRITE CKSUM
>>>       zroot       ONLINE       0     0     0
>>>         mirror-0  ONLINE       0     0     0
>>>           da4p2   ONLINE       0     0     0
>>>           da5p2   ONLINE       0     0     0
>>>         mirror-1  ONLINE       0     0     0
>>>           da6p2   ONLINE       0     0     0
>>>           da7p2   ONLINE       0     0     0
>>> errors: No known data errors
>>> 
>>> after `shutdown -r now` system still doesn't boot  with the same error.
>> As
>>> far I can see, there is /boot/lua/config.lua present, but when I try to
>> run
>>> command more /boot/lua/config.lua system hangs with following error:
>>> https://imgur.com/5p0xu6W.png
>>> 
>> 
>> You seem to get 2x BTX panic, could you try to create video from console,
>> so we could get register dumps?
>> 
>> can this system do UEFI boot and if so, can you test it? Could you post
>> partition tables?
>> 
>> rgds,
>> toomas
>> 
>> 
>>> 
>>> чт, 9 дек. 2021 г. в 15:56, Warner Losh <imp@bsdimp.com>:
>>> 
>>>> On Thu, Dec 9, 2021 at 7:58 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
>>>> wrote:
>>>> 
>>>>> On Thu, 9 Dec 2021 13:36:10 +0000
>>>>> "Sergey V. Dyatko" <sergey.dyatko@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Yesterday I tried to upgrade old 13-current (svn rev r368473) to fresh
>>>>>> 14-current from git,it looked like this:
>>>>>> 1) git pull https://git.freebsd.org/src.git /usr/src
>>>>>> 2) cd /usr/src ; make buildworld; make kernel
>>>>>> 3) shutdown -r now
>>>>>> after that I _successfully_ booted into 14-current and continued with
>>>>>> etcupdate -p
>>>>>> make installworld
>>>>>> etcupdate -B
>>>>>> shutdown -r now
>>>>>> 
>>>>>> but after that server doesn't come back. After I conneted to this
>>>> server
>>>>> via
>>>>>> IPMI ip-kvm I saw following (sorry for external link):
>>>>>> https://i.imgur.com/jH6MHd2.png
>>>>>> 
>>>>>> Well. There was a migration to zol between r368473 and current 'main'
>>>>> branch so
>>>>>> I decided to install fresh 14-current from snapshot
>>>>>> FreeBSD-14.0-CURRENT-amd64-20211202-610d908f8a6-251253 in order to
>>>> avoid
>>>>>> possible problems
>>>>>> 
>>>>>> and again, after make kernel and reboot OS runs, but after
>> installworld
>>>>> I ended up in the same situation
>>>>>> 
>>>>>> thoughts ?
>>>>>> 
>>>>>> --
>>>>>> wbr, Sergey
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Bootcode should be updated.
>>>>> The procedure you wrote doesn't seem to update it.
>>>>> 
>>>> 
>>>> The posted error is one you get when you can't read the filesystem,
>> which
>>>> is why you need to update
>>>> boot blocks across the OpenZFS change.
>>>> 
>>>> Warner
>>>> 
>> 
>>