Re: [List] Re: service jail start xyz returns success - when it shouldn't!

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Sat, 25 Oct 2025 12:14:23 UTC
On 20/10/2025 18:19, James Gritton wrote:
> On 2025-10-20 09:31, Frank Leonhardt wrote:
>> So I accidentally installed an ARM base system in a AMD64 jail and 
>> tried to start it. It doesn't work, no one will be surprised to know.
>>
>> The thing is I'd expect service to return non-zero if the jail it's 
>> starting crashes and burns. To quote the FM:
>>
>> " The service utility exits 0 on success, and >0 if an error occurs."
>>
>> A pretty major error, but it still returns success. Now I can see why 
>> this might be considered success - the "service" utility did what it 
>> was told and the service itself went badly wrong. But it's not 
>> exactly helpful if you want to know if your jail has started 
>> properly, is it?
>>
>> My question - is this a bug, or can someone explain why it's actually 
>> a feature?
>
> The service in question is that of creating jails and starting them 
> up.  So I'd call it one more layer removed: the service itself 
> (creating a jail) went fine.  The jail was created and it reported 
> success.  Then the jail's main process (presumably "sh /etc/rc") 
> failed, so the jail crashed and burned.  But that's not the service; 
> that's just whatever the service was trying to do. The service 
> reported back "OK, I've readied by child for the world, and sent it 
> out the door."  That the child took one step out the door and was hit 
> by a bus is neither here nor there.
>
> From a practical standpoint, there's no good way to report the failure 
> of a jail.  A successfully created jail runs for an arbitrarily long 
> time, and that is called success.  Failure could happen immediately, 
> or it could also take a long time.  Even if you choose to wait, jails 
> have no wait(2) call with an exit status like processes do.  They're 
> like daemons that way; you have to judge success or failure on your 
> own by looking at logs, or seeing if what you expect to be running is 
> in fact still running.
>
Thanks. That's how I figured it too, which makes the return code as 
useful as a chocolate teapot. I think most people would expect "service 
start" to return success if the service was actually started, not that 
an attempt to start it had been made. The service script returns the 
result of an execv for the rc script. Looking at a few of the rc 
scripts, some return a failure if they fail to start a service, others 
return the return code of the final echo of the script (i.e. most likely 
0). The jail script is one of these - it prints a "crashed and burnt" 
message and then returns, by default, something likely to be 0.

The base system /etc/rc.d scripts do not seem to be consistent in how 
they work or what they return at all, whereas the documentation 
confidently states it will return a meaningful code. If this was a 
"feature" then it would have to be consistent with the documentation, so 
I'm putting it down as a bug (in the documentation or half the /etc/rc.d 
scripts).

You make a good point that a jail (or any other service) might start and 
then stop shortly afterwards by design. As this was ARM64 binaries, not 
a single instruction was executed so I'd say that landed on the "failed" 
side. But perhaps, as you suggest, it's not so easy to define total 
success as a service can indeed start running and shut down due to an 
error during initialisation and the script can't tell other than waiting 
for, say, 15 seconds and checking it's still there.

For practical purposes, running "jls -j <jailname>" afterwards gives you 
a valid success code if it has started - for anyone reading this in the 
future and looking for an answer.

Regards, Frank.