fcntl(F_RDAHEAD)

Xin LI delphij at delphij.net
Thu Sep 17 22:26:53 UTC 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, Igor,

Igor Sysoev wrote:
> Hi,
> 
> nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO
> flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single
> byte. The first aio_read() preloads the first 128K part of a file in VM cache,
> however, all successive aio_read()s preload just 16K parts of the file.
> This makes non-blocking sendfile() usage ineffective for files larger
> than 128K.
> 
> I've created a small patch for Darwin compatible F_RDAHEAD fcntl:
> 
>    fcntl(fd, F_RDAHEAD, preload_size)
> 
> There is small incompatibilty: Darwin's fcntl allows just to enable/disable
> read ahead, while the proposed patch allows to set exact preload size.
> 
> Currently the preload size affects vn_read() code path only and does not
> affect on sendfile() code path. However, it can be easy extended on
> sendfile() part too. The preload size is still limited by sysctl vfs.read_max.
> 
> The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only.

I have ported this as a patch against -HEAD (should apply on 8.0-R but
it's too late for us to add a new feature) plus a manual page entry
documenting the feature.

I've used F_READAHEAD as the name, but reading the manual page, it looks
like we can just use F_RDAHEAD since Darwin seems to just distinguish 0
and !=0 case so that programmers won't have to use #ifdef or something
else to get code working on different platform?

Cheers,
- --
Xin LI <delphij at delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t
WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM
=uP3m
-----END PGP SIGNATURE-----
-------------- next part --------------
Index: lib/libc/sys/fcntl.2
===================================================================
--- lib/libc/sys/fcntl.2	(revision 197297)
+++ lib/libc/sys/fcntl.2	(working copy)
@@ -28,7 +28,7 @@
 .\"     @(#)fcntl.2	8.2 (Berkeley) 1/12/94
 .\" $FreeBSD$
 .\"
-.Dd March 8, 2008
+.Dd September 19, 2009
 .Dt FCNTL 2
 .Os
 .Sh NAME
@@ -241,6 +241,14 @@
 .Dv SA_RESTART
 (see
 .Xr sigaction 2 ) .
+.It Dv F_READAHEAD
+Set or clear the read ahead amount for sequential access to the third
+argument,
+.Fa arg ,
+which is rounded up to the nearest block size.
+A zero value in
+.Fa arg
+turns off read ahead.
 .El
 .Pp
 When a shared lock has been set on a segment of a file,
Index: sys/kern/kern_descrip.c
===================================================================
--- sys/kern/kern_descrip.c	(revision 197297)
+++ sys/kern/kern_descrip.c	(working copy)
@@ -421,6 +421,7 @@
 	struct vnode *vp;
 	int error, flg, tmp;
 	int vfslocked;
+	uint64_t bsize;
 
 	vfslocked = 0;
 	error = 0;
@@ -686,6 +687,31 @@
 		vfslocked = 0;
 		fdrop(fp, td);
 		break;
+
+	case F_READAHEAD:
+		FILEDESC_SLOCK(fdp);
+		if ((fp = fdtofp(fd, fdp)) == NULL) {
+			FILEDESC_SUNLOCK(fdp);
+			error = EBADF;
+			break;
+		}
+		if (fp->f_type != DTYPE_VNODE) {
+			FILEDESC_SUNLOCK(fdp);
+			error = EBADF;
+			break;
+		}
+		fhold(fp);
+		FILEDESC_SUNLOCK(fdp);
+		if (arg) {
+			bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize;
+			fp->f_seqcount = (arg + bsize - 1) / bsize;
+			fp->f_flag |= O_READAHEAD;
+		} else {
+			fp->f_flag &= ~O_READAHEAD;
+		}
+		fdrop(fp, td);
+		break;
+
 	default:
 		error = EINVAL;
 		break;
Index: sys/kern/vfs_vnops.c
===================================================================
--- sys/kern/vfs_vnops.c	(revision 197297)
+++ sys/kern/vfs_vnops.c	(working copy)
@@ -312,6 +312,9 @@
 sequential_heuristic(struct uio *uio, struct file *fp)
 {
 
+	if (fp->f_flag & O_READAHEAD)
+		return (fp->f_seqcount << IO_SEQSHIFT);
+
 	/*
 	 * Offset 0 is handled specially.  open() sets f_seqcount to 1 so
 	 * that the first I/O is normally considered to be slightly
Index: sys/sys/fcntl.h
===================================================================
--- sys/sys/fcntl.h	(revision 197297)
+++ sys/sys/fcntl.h	(working copy)
@@ -112,7 +112,11 @@
 #if __BSD_VISIBLE
 /* Attempt to bypass buffer cache */
 #define O_DIRECT	0x00010000
+#ifdef _KERNEL
+/* Read ahead */
+#define O_READAHEAD	0x00020000
 #endif
+#endif
 
 /* Defined by POSIX Extended API Set Part 2 */
 #if __BSD_VISIBLE
@@ -218,6 +222,7 @@
 #define	F_SETLK		12		/* set record locking information */
 #define	F_SETLKW	13		/* F_SETLK; wait if blocked */
 #define	F_SETLK_REMOTE	14		/* debugging support for remote locks */
+#define	F_READAHEAD	15		/* read ahead */
 
 /* file descriptor flags (F_GETFD, F_SETFD) */
 #define	FD_CLOEXEC	1		/* close-on-exec flag */


More information about the freebsd-hackers mailing list