How this wired boot timing bug comes, init rc scripts or zfs?

Rick Macklem rmacklem at uoguelph.ca
Sun Oct 4 21:28:11 UTC 2020


Meowthink wrote:
>Hello hackers,
>Recently I installworld and rebooted a server, seems working, but my
>kerberized nfsd, precisely gssd, is not functional.
>At first I thought it may be a bug from stable, so I did some trivial
>tests, replacing the kernel with releng one, then the whole world, but
>found this is nothing related to the kernel, and triggers randomly
>when rebooting some recent stable/11 world (releng/11.4 seems fine).
>To dig it deeper, here is what the console showed when failing:
>```
>Oct  3 20:23:59 r kernel: Starting file system checks:
>Oct  3 20:23:59 r kernel: /etc/rc: WARNING: run_rc_command: cannot run
>/usr/sbin/gssd
>Oct  3 20:23:59 r kernel: Mounting local filesystems:.
>Oct  3 20:23:59 r kernel: Updating CPU Microcode...
>Oct  3 20:23:59 r kernel: Done.
>Oct  3 20:23:59 r kernel: Starting ctld.
>Oct  3 20:23:59 r kernel: ctld: bind(2) failed for [::]: Can't assign
>requested address
>Oct  3 20:23:59 r kernel: ctld: bind(2) failed for 0.0.0.0: Can't
>assign requested address
>Oct  3 20:23:59 r kernel: ctld: failed to apply configuration; exiting
>Oct  3 20:23:59 r kernel: /etc/rc: WARNING: failed to start ctld
>Oct  3 20:23:59 r kernel: ELF ldconfig path: /lib /usr/lib
>/usr/lib/compat /usr/local/lib /usr/dt/lib /usr/local/lib/compat
>/usr/local/lib/gcc9 /usr/local/lib/graphviz /usr/local/lib/nss
>/usr/local/lib/perl5/5.28/mach/CORE /usr/local/lib/pth
>/usr/local/lib/qt4 /usr/local/lib/qt5 /usr/local/lib/samba4
>/usr/local/llvm10/lib /usr/local/share/chromium
>Oct  3 20:23:59 r kernel: 32-bit compatibility ldconfig path:
>/usr/lib32 /usr/local/lib32/compat
>Oct  3 20:23:59 r kernel: Setting hostname: r.domain.net.
>Oct  3 20:23:59 r kernel: Setting up harvesting:
>[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
>Oct  3 20:23:59 r kernel: Feeding entropy: .
>Oct  3 20:23:59 r kernel: Starting Network: lo0 bge0 bge1.
>```
>Then I realized I have / and /usr in separated zfs(same zpool). It may
>be that / mounted but /usr not. Thus I changed my /etc/rc.d/gssd line
>7 to # REQUIRE: mountcritlocal. 
I don't know what has changed post-11.4 to cause this.
However, the problem with adding "mountcritlocal" is that it assumes
/usr is a locally mounted file system and not NFS mounted nor a subtree
of "/".
--> I think a better solution might be to move gssd to /sbin, which should
      always be a part of the root fs. (/etc/rc.d/gssd already has "root" as
      REQUIRED.)

I have a similar problem with the rpc.tlsclntd daemon I have developed
for NFS-over-TLS.
- Neither sec=krb5[ip] nor tls can be used for an NFS mounted root,
  but I think we just have to live with that?

Do others have any suggestions? rick

By the way, /etc/rc.d/ctld to #
REQUIRE: netif. Everything works fine, even rebooting several times.
What I am confused is how this happens. It seems that /etc/rc.d/gssd
(in addition, /etc/rc.d/ctld) hasn't been changed since 2016. Both
gssd and init in stable/11 have no functional changes since
releng/11.4. Maybe zfs? but it's in the kernel, and kernel r366306
with releng/11.4 world works(though I only tested few times since
rebooting is too boring).
Any ideas?

Cheers,
meowthink
_______________________________________________
freebsd-hackers at freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"



More information about the freebsd-hackers mailing list