flock incorrectly detects deadlock on 7-stable and current
Paul Koch
paul.koch at statseeker.com
Fri May 9 06:07:28 UTC 2008
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote:
> On 8 May 2008, at 09:12, Paul Koch wrote:
> > Hi,
> >
> > We have been trying to track down a problem with one of our apps
> > which does a lot of flock(2) calls. flock returns errno 11
> > (Resource deadlock avoided) under certain scenarios. Our app works
> > fine on 7-Release, but fails on 7-stable and -current.
> >
> > The problem appears to be when we have at least three processes
> > doing flock() on a file, and one is trying to upgrade a shared lock
> > to an exclusive lock but fails with a deadlock avoided.
> >
> > Attached is a simple flock() test program.
> >
> > a. Process 1 requests and gets a shared lock
> > b. Process 2 requests and blocks for an exclusive lock
> > c. Process 3 requests and gets a shared lock
> > d. Process 3 requests an upgrade to an exclusive lock but fails
> > (errno 11)
> >
> > If we change 'd' to
> > Process 3 requests unlock, then requests exclusive lock, it
> > works.
>
> Could you possibly try this patch and tell me if it helps:
>
> ==== //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 -
> /tank/projects/ lockd/src/sys/kern/kern_lockf.c ====
> @@ -1370,6 +1370,18 @@
> }
>
> /*
> + * For flock type locks, we must first remove
> + * any shared locks that we hold before we sleep
> + * waiting for an exclusive lock.
> + */
> + if ((lock->lf_flags & F_FLOCK) &&
> + lock->lf_type == F_WRLCK) {
> + lock->lf_type = F_UNLCK;
> + lf_activate_lock(state, lock);
> + lock->lf_type = F_WRLCK;
> + }
> +
> + /*
> * We are blocked. Create edges to each blocking lock,
> * checking for deadlock using the owner graph. For
> * simplicity, we run deadlock detection for all
> @@ -1389,17 +1401,6 @@
> }
>
> /*
> - * For flock type locks, we must first remove
> - * any shared locks that we hold before we sleep
> - * waiting for an exclusive lock.
> - */
> - if ((lock->lf_flags & F_FLOCK) &&
> - lock->lf_type == F_WRLCK) {
> - lock->lf_type = F_UNLCK;
> - lf_activate_lock(state, lock);
> - lock->lf_type = F_WRLCK;
> - }
> - /*
> * We have added edges to everything that blocks
> * us. Sleep until they all go away.
> */
Manually applied the patch to stable kern_lockf.c 1.57.2.1. Ran the
flock_test program on many of our architectures and it works fine.
Have also been testing our app on a single core i386 machine today with
no locking problems. Just setup a quad core -stable amd64 build and it
also appears to be running fine now.
Thanks
Paul.
More information about the freebsd-stable
mailing list