Looks like atrun has a race condition? (was: at job disappears?)

Dieter freebsd at sopwith.solgatos.com
Tue May 15 17:11:00 UTC 2007


> FreeBSD 6.2
> AMD64 (single CPU)
> /var is FFS with soft-updates, on SATA.
> 
> /var/cron/tabs/root  contains:
> 
> 	*     *       *       *       *           /usr/libexec/atrun
> 
> I had three at jobs queued.  They all call the same shell
> script with different arguments.  First one runs fine.
> Second one gets:
> 
> 	atrun[3212]: cannot open input file: No such file or directory
> 
> And then the third one runs fine.
> 
> The machine is idle except for the at jobs.  No reboot, no fsck.
> As far as I know, nothing should be mucking around in /var/at except
> atrun.  Nothing to explain a file disappearing into thin air.

Looking at the atrun source, I think there is a race condition.

When atrun starts running a job, the first thing it does is
chmod the job file to 400.  But in main() we have

	/*  Delete older files
         */
	if ((run_time < now) && !(S_IXUSR & buf.st_mode) && (S_IRUSR & buf.st_mode))
		unlink(dirent->d_name);

Main() doesn't know that run_file() isn't finished with the file and
blindly unlinks it.

Since run_file() unlinks the file when it is finished, I assume the unlink
in main() is to clean up files after a crash?  Perhaps main() should only
unlink the file if it is really old, say a week.


More information about the freebsd-questions mailing list