flock incorrectly detects deadlock on 7-stable and current

Paul Koch paul.koch at statseeker.com
Fri May 9 06:07:28 UTC 2008


On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote:
> On 8 May 2008, at 09:12, Paul Koch wrote:
> > Hi,
> >
> > We have been trying to track down a problem with one of our apps
> > which does a lot of flock(2) calls.  flock returns errno 11
> > (Resource deadlock avoided) under certain scenarios.  Our app works
> > fine on 7-Release, but fails on 7-stable and -current.
> >
> > The problem appears to be when we have at least three processes
> > doing flock() on a file, and one is trying to upgrade a shared lock
> > to an exclusive lock but fails with a deadlock avoided.
> >
> > Attached is a simple flock() test program.
> >
> > a. Process 1 requests and gets a shared lock
> > b. Process 2 requests and blocks for an exclusive lock
> > c. Process 3 requests and gets a shared lock
> > d. Process 3 requests an upgrade to an exclusive lock but fails
> > (errno 11)
> >
> > If we change 'd' to
> >   Process 3 requests unlock, then requests exclusive lock, it
> > works.
>
> Could you possibly try this patch and tell me if it helps:
>
> ==== //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 -
> /tank/projects/ lockd/src/sys/kern/kern_lockf.c ====
> @@ -1370,6 +1370,18 @@
>   		}
>
>   		/*
> +		 * For flock type locks, we must first remove
> +		 * any shared locks that we hold before we sleep
> +		 * waiting for an exclusive lock.
> +		 */
> +		if ((lock->lf_flags & F_FLOCK) &&
> +		    lock->lf_type == F_WRLCK) {
> +			lock->lf_type = F_UNLCK;
> +			lf_activate_lock(state, lock);
> +			lock->lf_type = F_WRLCK;
> +		}
> +
> +		/*
>   		 * We are blocked. Create edges to each blocking lock,
>   		 * checking for deadlock using the owner graph. For
>   		 * simplicity, we run deadlock detection for all
> @@ -1389,17 +1401,6 @@
>   		}
>
>   		/*
> -		 * For flock type locks, we must first remove
> -		 * any shared locks that we hold before we sleep
> -		 * waiting for an exclusive lock.
> -		 */
> -		if ((lock->lf_flags & F_FLOCK) &&
> -		    lock->lf_type == F_WRLCK) {
> -			lock->lf_type = F_UNLCK;
> -			lf_activate_lock(state, lock);
> -			lock->lf_type = F_WRLCK;
> -		}
> -		/*
>   		 * We have added edges to everything that blocks
>   		 * us. Sleep until they all go away.
>   		 */

Manually applied the patch to stable kern_lockf.c  1.57.2.1.  Ran the 
flock_test program on many of our architectures and it works fine.

Have also been testing our app on a single core i386 machine today with 
no locking problems.  Just setup a quad core -stable amd64 build and it 
also appears to be running fine now.

Thanks

	Paul.


More information about the freebsd-stable mailing list