Looks like atrun has a race condition? (was: at job disappears?)
Dieter
freebsd at sopwith.solgatos.com
Tue May 15 17:11:00 UTC 2007
> FreeBSD 6.2
> AMD64 (single CPU)
> /var is FFS with soft-updates, on SATA.
>
> /var/cron/tabs/root contains:
>
> * * * * * /usr/libexec/atrun
>
> I had three at jobs queued. They all call the same shell
> script with different arguments. First one runs fine.
> Second one gets:
>
> atrun[3212]: cannot open input file: No such file or directory
>
> And then the third one runs fine.
>
> The machine is idle except for the at jobs. No reboot, no fsck.
> As far as I know, nothing should be mucking around in /var/at except
> atrun. Nothing to explain a file disappearing into thin air.
Looking at the atrun source, I think there is a race condition.
When atrun starts running a job, the first thing it does is
chmod the job file to 400. But in main() we have
/* Delete older files
*/
if ((run_time < now) && !(S_IXUSR & buf.st_mode) && (S_IRUSR & buf.st_mode))
unlink(dirent->d_name);
Main() doesn't know that run_file() isn't finished with the file and
blindly unlinks it.
Since run_file() unlinks the file when it is finished, I assume the unlink
in main() is to clean up files after a crash? Perhaps main() should only
unlink the file if it is really old, say a week.
More information about the freebsd-questions
mailing list