nosh init system

Rodney W. Grimes freebsd-rwg at pdx.rh.CN85.dnsmgr.net
Sun Feb 10 22:41:09 UTC 2019


> On Sun, Feb 10, 2019, 11:34 AM Cy Schubert <Cy.Schubert at cschubert.com wrote:
> 
> > In message <43C091FC-18ED-49DF-A488-784DC2329D52 at gmail.com>, Enji
> > Cooper writes
> > :
> > > On Feb 9, 2019, at 20:20, Rodney W. Grimes <
> > freebsd-rwg at pdx.rh.cn85.dnsmgr.ne
> > > t> wrote:
> > >
> > > >> In message
> > <CAG6CVpWXOA6r_aJcefxQBu2QZxprf1ZpDoTb4eb2JSwWsE2m+g at mail.gma
> > > >> il.com>
> > > >> , Conrad Meyer writes:
> > > >>> Hi Cy,
> > > >>>
> > > >>>> On Sat, Feb 9, 2019 at 3:35 PM Cy Schubert <
> > Cy.Schubert at cschubert.com> w
> > > rote:
> > > >>>> I don't see what's so "incredibly fragile" about rc(8). That's not
> > to
> > > >>>> say there aren't better solutions, like SMF.
> > > >>>
> > > >>> Maybe "incredibly" as a choice of adjective is inappropriate.  I
> > think
> > > >>> we (you, me, and ngie@) can all agree it is somewhat fragile, and
> > > >>> there are things SMF/systemd/nosh get right that rc(8) does not
> > > >>> (today).  Anyway, your next paragraph goes on to be a good start at
> > > >>> describing some of rc's fragility.  :-)
> > > >>>
> > > >>>> Where rc(8) falls down is any port or a customer's (user of
> > FreeBSD) rc
> > > >>>> script could fail hosing the boot or worse hosing the system*.
> > Where a
> > > >>>> solution like SMF solves the problem is that should a service which
> > > >>>> other services depend on fail, only that branch of the startup tree
> > > >>>> would fail.
> > > >>>
> > > >>> Right; that's a great example.
> > > >>>
> > > >>>> In that scenario, if a service fails but sshd start, a
> > > >>>> sysadmin would still be able to login remotely to resolve the
> > problem.
> > > >>>> So in this regard rc(8) is at a disadvantage.
> > > >>>>
> > > >>>> We could address the above paragraph by starting sshd earlier during
> > > >>>> boot thereby allowing the opportunity to fix remotely.
> > > >>>
> > > >>> I don't think that is really sufficient without substantially
> > > >>> modifying init+rc to be closer to something like systemd or SMF,
> > > >>> anyway.  And then we'd rather just have something like SMF :-).
> > > >>
> > > >> I'd rather see SMF but a number felt a CDDL licensed init was
> > > >> unacceptable -- except for the fact that SMF doesn't replace init.
> > > >>
> > > >>>
> > > >>> As soon as *any* rc service fails to start (signal, non-zero exit,
> > > >>> stop_boot), rc(8) exits non-zero, causing init(8) to go to single
> > > >>> user.  All service state is thrown away with rc(8) exit, but any rc.d
> > > >>> "services" that managed to start before boot failed are not
> > > >>> terminated.  Even if an admin manages to log in and fix the
> > > >>> configuration, re-starting rc(8) restarts the runcom process from
> > > >>> scratch, as if nothing had already been done, without first stopping
> > > >>> anything that was already running.  The only safe, reproducible way
> > to
> > > >>> re-start rc(8) is to fully reboot the system.
> > > >
> > > > It -should- be safe to restart rc, as rc scripts should check to
> > > > see if the item they are being requested to start is already running,
> > > > rc scripts that fail to have this check are defective and should be
> > > > fixed.  You should be able to invate /etc/rc.d/foo start as many
> > > > times as you want in a row and only get 1 instance of foo, with the
> > > > other starts returning "foo already running"   Same with stop.
> > >
> > > I???m not sure if Conrad is referring to the isilon way of restarting
> > service
> > > s. If so, the isilon parallel start process would effectively wipe the
> > slate
> > > clean and restart everything if interrupted, which (because of the
> > nature of
> > > cleanvar, etc), would wipe out any and all pidfiles, resulting in in
> > weird se
> > > t of services which fail to start on next run through.
> > >
> > > In short, I think the fact that isilon didn???t mount tmpfs to /var/run
> > was b
> > > egging for pain, as it???s a directory one should only setup once at
> > boot.
> >
> > Regardless of whether they use tmpfs or not, services should be
> > constructed in a manner such that it should still work if the customer
> > chooses not to use tmpfs.
> >
> 
> Correct. If we require this. That's a bug.
> 
> This also goes for those who mount /usr separately like I do (which has
> > saved my bacon as recently as a couple of weeks ago). A change made to
> > one of the RC scripts assumed /usr was on rootfs. (When I raised the
> > issue the reply was "you should /usr on / anyway.") My point is that we
> > assume our way of setting up a server is the only way and we bulldoze.
> > In reality FreeBSD and prior to that commercial UNIX were set up
> > variously. It's only since Linux became so popular that it has been
> > assumed that one size fits all.
> >
> > These are two examples of why this approach doesn't work. POLA is
> > painful.
> >
> 
> This would also be a bug. I'd just fix the bug. I know people don't want to
> think of these things, but we still support separate filesystems. Saying
> not to run that way is lame and unhelpful.

Then I'll done my nomex and jump in with seperate /usr is
rather seriously broken and neglected, to the point diskless
booting with seperate /usr is marginal and I actually gave
up fighting it and merged my / to /usr on the diskless server.

I really would like to see this fixed and remove that merging.

> > > That being said, there are other pseudo services that aren???t
> > necessarily id
> > > empotent. If they run twice, the second run could result in breakage to
> > other
> > >  dependent services run after them.
> >
> > Cleanvar being the focus of much of our discussion should be able to
> > determine it has run before.
> >
> > I'm purposely not discussing implementation details.
> >
> 
> Yea. That's also a sloppy bug. In this case, there is no concept of
> restarting... we want to run it only once... maybe that is the real bug
> here: we don't adequately have a way to Express that notion.
> 
> Of course the bigger issue is that this is the sort of thing you want to be
> 100% sure is done before anything that depends on it runs. When you have a
> complicated topology like our start graph, that makes doing stuff in
> parallel hard.

We do not have to wait for fsck any more,
that was a huge upside, even parallel fsck was at
the mercy of your largest partition.

Doesnt the openrc thing have this parrallel startup stuff in it,
and what happened to that FPC to move forward on that,
did it end up in the "lacks enough round tuits" basket?

-- 
Rod Grimes                                                 rgrimes at freebsd.org


More information about the freebsd-hackers mailing list