bin/113239: atrun(8) loses jobs due to race condition

Dieter freebsd at sopwith.solgatos.com
Sat Jun 2 00:10:04 UTC 2007


>Number:         113239
>Category:       bin
>Synopsis:       atrun(8) loses jobs due to race condition
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 02 00:10:03 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Dieter
>Release:        6.2
>Organization:
>Environment:
6.2-RELEASE amd64
>Description:
Due to a race condition, atrun(8) can unlink a job before it is executed.
This can result in lost data.
>How-To-Repeat:
Put a sleep in to emulate something (fork() perhaps) taking a
long time.  Set up an at job and execute atrun.  Execute atrun
a second time before the sleep returns.  Observe that your at
job did not get executed, see error message in syslog.

The patch file has code to demo the problem.
>Fix:
I have a workaround.  Only unlink the file if it is more
than 6 hours old.  Strictly speaking this is not a true fix,
the race condition is still present, but if fork is taking
6 hours you have other problems.

The patch file implements this workaround.


Patch attached with submission follows:

===================================================================
RCS file: RCS/atrun.c,v
retrieving revision 1.1
diff -r1.1 atrun.c
83a84,88
> /* Workaround for race condition: only unlink file if it is
>  * older than 6 hours.
>  */
> #define MIN_UNLINK_TIME 60*60*6   /* Number of seconds in 6 hours */
> 
143a149,161
> #if 0
>       /* If something takes too long and another instance of
>        * atrun starts up, it will unlink our file out from
>        * under us.  To demonstrate this race condition,
>        * enable the sleep, set MIN_UNLINK_TIME to 0, create
>        * an at job ("echo hello" is sufficient) and have atrun
>        * run more frequently than the sleep time.  The 70 second
>        * sleep assumes atrun is run from cron once a minute.
>        */
>       syslog(LOG_DEBUG, "Sleeping to trigger race condition, file=%s\n", filename);
>       sleep(70);
> #endif
> 
179c197
< 	perr("cannot open input file");
---
> 	syslog(LOG_ERR, "Cannot open input file %s : %m\n", filename);
479a498,500
> 	 *
> 	 *  Workaround for race condition: only unlink file if it is
> 	 *  older than MIN_UNLINK_TIME seconds.
481c502,504
< 	if ((run_time < now) && !(S_IXUSR & buf.st_mode) && (S_IRUSR & buf.st_mode))
---
> 	if (( (run_time + MIN_UNLINK_TIME) < now) && !(S_IXUSR & buf.st_mode) && (S_IRUSR & buf.st_mode))
> 	  {
> 	    syslog(LOG_DEBUG, "Unlinking %s run_time=%ld now=%ld\n", dirent->d_name, run_time, now);
482a506
> 	  }


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list