atomic_load_acq_int in sequential_heuristic
Konstantin Belousov
kostikbel at gmail.com
Mon Aug 25 19:35:40 UTC 2014
On Mon, Aug 25, 2014 at 08:04:17PM +0200, Mateusz Guzik wrote:
> On Mon, Aug 25, 2014 at 08:27:55PM +0300, Konstantin Belousov wrote:
> > On Mon, Aug 25, 2014 at 03:04:33PM +0200, Mateusz Guzik wrote:
> > > On Mon, Aug 25, 2014 at 02:10:01PM +0300, Konstantin Belousov wrote:
> > > > On Mon, Aug 25, 2014 at 11:10:56AM +0200, Mateusz Guzik wrote:
> > > > > On Mon, Aug 25, 2014 at 11:35:39AM +0300, Konstantin Belousov wrote:
> > > > > > > + atomic_set_int(&fp->f_flag, FHASLOCK);
> > > > > > You misspelled FRDAHEAD as FHASLOCK, below as well.
> > > > > > Was this tested ?
> > > > > >
> > > > >
> > > > > Oops, damn copy-pasto. Sorry.
> > > > >
> > > > > > > + VOP_UNLOCK(vp, 0);
> > > > > > > } else {
> > > > > > > - do {
> > > > > > > - new = old = fp->f_flag;
> > > > > > > - new &= ~FRDAHEAD;
> > > > > > > - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new));
> > > > > > > + atomic_clear_int(&fp->f_flag, FHASLOCK);
> > > > > > So what about extending the vnode lock to cover the flag reset ?
> > > > > >
> > > > >
> > > > > Sure.
> > > > >
> > > > > So this time I tested it properly and found out it is impossible to
> > > > > disable the hint. The test is:
> > > > >
> > > > > -1 is truncated and then read from intptr_t which yields a big positive
> > > > > number instead. Other users in the function use int tmp to work around
> > > > > this issue.
> > > > Could you provide me with the test case which demonstrates the problem ?
> > > >
> > >
> > > Nothing special:
> > > https://people.freebsd.org/~mjg/patches/F_READAHEAD.c
> > And how did you verified that fcntl(F_READAHEAD, -1) did not worked ?
> > I ended up with adding printf() to kern_fcntl() to see arg value.
> >
>
> 3 uprintfs. one with the value, and then one in each if branch.
>
> > >
> > > > The fcntl(2) prototype in sys/fcntl.h is variadic, so int arg argument
> > > > is not promoted. On the other hand, syscalls.master declares arg as long.
> > > > Did you tried to pass -1L as third argument to disable ?
> > > >
> > >
> > > Yes, -1L deals with the problem. I would still argue that using 'tmp'
> > > like the rest of the function would not hurt as a cheap solution.
> > This would deliberately break the current ABI (which takes the argument
> > as long for F_READAHEAD), which is not acceptable.
> >
>
> Ok.
>
> > I do think that there is bug in the "-1" stuff, but it is in compat32
> > shims. The compat/freebsd32/syscalls.master does not provide the compat
> > for fcntl(2), which means that 32bit fcntl(2) does not work when either
> > signed extension is needed (the F_READAHEAD case), or on the big-endian
> > machines. On i386, it did not practically matter before F_READAHEAD,
> > since x86 is little-endian and flags passed as arg did not touch the
> > sign bit.
> >
> > Note that fcntl(2) man page is wrong, it claims that optional argument
> > arg is int. It cannot be true since pointer on LP64 platform cannot
> > fit into int. The SUSv4 is explicit in describing which command
> > takes which type; our man page must be fixed, but this is for later.
> >
> > See the patch at the end of the reply for the fix. It needs sysent
> > regen for actual build.
> >
>
> I tested the patch and it fixes the problem.
Which patch ? Your's or mine ?
>
> > > /*
> > > * Exclusive lock synchronizes against f_seqcount reads and writes in
> > > * sequential_heuristic().
> > > */
> > >
> > > > Another place to add the locking annotation is the struct file in
> > > > sys/file.h. Now f_seqcount is 'protected' by the vnode lock.
> > > > I am not sure how to express the locking model shortly.
> > > >
> > >
> > > /*
> > > * (a) f_vnode lock required (shared allows both reads and writes)
> > > */
> > Ok.
> >
>
> diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c
> index 7abdca0..52fc01a 100644
> --- a/sys/kern/kern_descrip.c
> +++ b/sys/kern/kern_descrip.c
> @@ -476,7 +476,6 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg)
> struct vnode *vp;
> cap_rights_t rights;
> int error, flg, tmp;
> - u_int old, new;
> uint64_t bsize;
> off_t foffset;
>
> @@ -760,26 +759,24 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg)
> error = EBADF;
> break;
> }
> + vp = fp->f_vnode;
> + /*
> + * Exclusive lock synchronizes against f_seqcount reads and
> + * writes in sequential_heuristic().
> + */
> + error = vn_lock(vp, LK_EXCLUSIVE);
> + if (error != 0) {
> + fdrop(fp, td);
> + break;
> + }
> if (arg >= 0) {
> - vp = fp->f_vnode;
> - error = vn_lock(vp, LK_SHARED);
> - if (error != 0) {
> - fdrop(fp, td);
> - break;
> - }
> bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize;
> - VOP_UNLOCK(vp, 0);
> fp->f_seqcount = (arg + bsize - 1) / bsize;
> - do {
> - new = old = fp->f_flag;
> - new |= FRDAHEAD;
> - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new));
> + atomic_set_int(&fp->f_flag, FRDAHEAD);
> } else {
> - do {
> - new = old = fp->f_flag;
> - new &= ~FRDAHEAD;
> - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new));
> + atomic_clear_int(&fp->f_flag, FRDAHEAD);
> }
> + VOP_UNLOCK(vp, 0);
> fdrop(fp, td);
> break;
>
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index f1d19ac..98823f3 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -438,7 +438,8 @@ static int
> sequential_heuristic(struct uio *uio, struct file *fp)
> {
>
> - if (atomic_load_acq_int(&(fp->f_flag)) & FRDAHEAD)
> + ASSERT_VOP_LOCKED(fp->f_vnode, __func__);
> + if (fp->f_flag & FRDAHEAD)
> return (fp->f_seqcount << IO_SEQSHIFT);
>
> /*
> diff --git a/sys/sys/file.h b/sys/sys/file.h
> index b7d358b..856f799 100644
> --- a/sys/sys/file.h
> +++ b/sys/sys/file.h
> @@ -143,6 +143,7 @@ struct fileops {
> *
> * Below is the list of locks that protects members in struct file.
> *
> + * (a) f_vnode lock required (shared allows both reads and writes)
> * (f) protected with mtx_lock(mtx_pool_find(fp))
> * (d) cdevpriv_mtx
> * none not locked
> @@ -168,7 +169,7 @@ struct file {
> /*
> * DTYPE_VNODE specific fields.
> */
> - int f_seqcount; /* Count of sequential accesses. */
> + int f_seqcount; /* (a) Count of sequential accesses. */
> off_t f_nextoff; /* next expected read/write offset. */
> union {
> struct cdev_privdata *fvn_cdevpriv;
>
I think this patch is fine.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20140825/9dc8c807/attachment.sig>
More information about the freebsd-hackers
mailing list