[Bug 235657] /usr/libexec/atrun race causes missed jobs

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Feb 11 05:46:15 UTC 2019


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235657

            Bug ID: 235657
           Summary: /usr/libexec/atrun race causes missed jobs
           Product: Base System
           Version: 12.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: bugs at FreeBSD.org
          Reporter: karl at denninger.net

Created attachment 201915
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=201915&action=edit
Diff against /usr/src/libexec/atrun directory

I have no idea why this hasn't bit people before, or isn't biting people
now.... but it is biting me.

/usr/libexec/atrun is the "batch" job executor out of the cron and by default
runs every 5 minutes.

The code has an unlink call in it that attempts to remove old jobs from the
queue but unfortunately the queue code can select a job to run, call fork() to
start it, post-fork() the child can give up the CPU before it opens the file
containing the job and thus the queue code (which is in the parent) can execute
the unlink before the child process gets the file open.  If this happens you
get a "file not found" error in the cron log and the job doesn't run.

The attached patch fixes the potential race by moving the unlink into the
child; it may not be the most-elegant, but it works.  Unfortunately due to the
code's structure (it performs multiple tests on the file to be run for security
reasons) there are multiple error exits and, in the event of any of those, you
must unlink the file as well or it will try to run repeatedly -- yet you can't
unlink it immediately after it is opened because some of the tests require it
still be on the filesystem.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list