svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri...

Ken Merry ken at freebsd.org
Tue Jun 20 20:29:56 UTC 2017


I don’t know for sure that this commit is the cause, but it (and r320153) are the only ZFS commits between a version of head from June 14th that boots off a ZFS mirror, and one that panics.

Here’s the stack trace:

Fatal trap 12: page fault while in kernel mode
cpuid = 22; 

Fatal trap 12: page fault while in kernel mode
cpuid = 9; apic id = 09
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81e47f21
stack pointer           = 0x28:0xfffffe08b37f8810
frame pointer           = 0x28:0xfffffe08b37f8860
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_free_issue_0_3)
[ thread pid 0 tid 100478 ]
Stopped at      0xffffffff81e47f21 = zio_vdev_io_start+0x1f1:   testb   $0x1,(%rax)
db> bt
Tracing pid 0 tid 100478 td 0xfffff80193156000
zio_vdev_io_start() at 0xffffffff81e47f21 = zio_vdev_io_start+0x1f1/frame 0xfffffe08b37f8860
zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f88b0
zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame 0xfffffe08b37f88e0
vdev_mirror_io_start() at 0xffffffff81e224fc = vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930
zio_vdev_io_start() at 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990
zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f89e0
taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b37f8a40
taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70
fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b37f8ab0
fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b37f8ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db> 

(kgdb) list *(zio_vdev_io_start+0x1f1)
0xd9f21 is in zio_vdev_io_start (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:350).
345
346             /*
347              * Ensure that anyone expecting this zio to contain a linear ABD isn't
348              * going to get a nasty surprise when they try to access the data.
349              */
350             IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data));
351
352             zt->zt_orig_abd = zio->io_abd;
353             zt->zt_orig_size = zio->io_size;
354             zt->zt_bufsize = bufsize;

I’ll try rebooting and see if the problem goes away.  If not, I’ll roll back the ABD change and see if the problem goes away.

Ken
— 
Ken Merry
ken at FreeBSD.ORG



> On Jun 20, 2017, at 1:39 PM, Andriy Gapon <avg at freebsd.org> wrote:
> 
> Author: avg
> Date: Tue Jun 20 17:39:24 2017
> New Revision: 320156
> URL: https://svnweb.freebsd.org/changeset/base/320156
> 
> Log:
>  MFV r318946: 8021 ARC buf data scatter-ization
> 
>  illumos/illumos-gate at 770499e185d15678ccb0be57ebc626ad18d93383
>  https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc626ad18d93383
> 
>  https://www.illumos.org/issues/8021
>    The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
>    community) changes the way the ARC allocates `b_pdata` memory from using linear
>    `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
>    improves ZFS's performance by helping to defragment the address space occupied
>    by the ARC, in particular for cases where compressed ARC is enabled. It could
>    also ease future work to allocate pages directly from `segkpm` for minimal-
>    overhead memory allocations, bypassing the `kmem` subsystem.
>    This is essentially the same change as the one which recently landed in ZFS on
>    Linux, although they made some platform-specific changes while adapting this
>    work to their codebase:
>    1. Implemented the equivalent of the `segkpm` suggestion for future work
>    mentioned above to bypass issues that they've had with the Linux kernel memory
>    allocator.
>    2. Changed the internal representation of the ABD's scatter/gather list so it
>    could be used to pass I/O directly into Linux block device drivers. (This
>    feature is not available in the illumos block device interface yet.)
> 
>  FreeBSD notes:
>  - the actual (default) chunk size is 4KB (despite the text above saying 1KB)
>  - we can try to reimplement ABDs, so that they are not permanently
>    mapped into the KVA unless explicitly requested, especially on
>    platforms with scarce KVA
>  - we can try to use unmapped I/O and avoid intermediate allocation of a
>    linear, virtual memory mapped buffer
>  - we can try to avoid extra data copying by referring to chunks / pages
>    in the original ABD
> 
>  Reviewed by: Matthew Ahrens <mahrens at delphix.com>
>  Reviewed by: George Wilson <george.wilson at delphix.com>
>  Reviewed by: Paul Dagnelie <pcd at delphix.com>
>  Reviewed by: John Kennedy <john.kennedy at delphix.com>
>  Reviewed by: Prakash Surya <prakash.surya at delphix.com>
>  Reviewed by: Prashanth Sreenivasa <pks at delphix.com>
>  Reviewed by: Pavel Zakharov <pavel.zakharov at delphix.com>
>  Reviewed by: Chris Williamson <chris.williamson at delphix.com>
>  Approved by: Richard Lowe <richlowe at richlowe.net>
>  Author: Dan Kimmel <dan.kimmel at delphix.com>
> 
>  MFC after:	3 weeks
> 
> Added:
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
>     - copied, changed from r318946, vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/abd.h
>     - copied, changed from r318946, vendor-sys/illumos/dist/uts/common/fs/zfs/sys/abd.h
> Modified:
>  head/cddl/contrib/opensolaris/cmd/zdb/zdb.c
>  head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
>  head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
>  head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
>  head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
>  head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
>  head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/edonr_zfs.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sha256.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/skein_zfs.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_checksum.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c
>  head/sys/conf/files
> Directory Properties:
>  head/cddl/contrib/opensolaris/   (props changed)
>  head/cddl/contrib/opensolaris/cmd/zdb/   (props changed)
>  head/cddl/contrib/opensolaris/lib/libzfs/   (props changed)
>  head/sys/cddl/contrib/opensolaris/   (props changed)
> 
> Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb.c
> ==============================================================================
> --- head/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -59,6 +59,7 @@
> #include <sys/arc.h>
> #include <sys/ddt.h>
> #include <sys/zfeature.h>
> +#include <sys/abd.h>
> #include <zfs_comutil.h>
> #undef verify
> #include <libzfs.h>
> @@ -2410,7 +2411,7 @@ zdb_blkptr_done(zio_t *zio)
> 	zdb_cb_t *zcb = zio->io_private;
> 	zbookmark_phys_t *zb = &zio->io_bookmark;
> 
> -	zio_data_buf_free(zio->io_data, zio->io_size);
> +	abd_free(zio->io_abd);
> 
> 	mutex_enter(&spa->spa_scrub_lock);
> 	spa->spa_scrub_inflight--;
> @@ -2477,7 +2478,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr
> 	if (!BP_IS_EMBEDDED(bp) &&
> 	    (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) {
> 		size_t size = BP_GET_PSIZE(bp);
> -		void *data = zio_data_buf_alloc(size);
> +		abd_t *abd = abd_alloc(size, B_FALSE);
> 		int flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | ZIO_FLAG_RAW;
> 
> 		/* If it's an intent log block, failure is expected. */
> @@ -2490,7 +2491,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr
> 		spa->spa_scrub_inflight++;
> 		mutex_exit(&spa->spa_scrub_lock);
> 
> -		zio_nowait(zio_read(NULL, spa, bp, data, size,
> +		zio_nowait(zio_read(NULL, spa, bp, abd, size,
> 		    zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, flags, zb));
> 	}
> 
> @@ -3270,6 +3271,13 @@ name:
> 	return (NULL);
> }
> 
> +/* ARGSUSED */
> +static int
> +random_get_pseudo_bytes_cb(void *buf, size_t len, void *unused)
> +{
> +	return (random_get_pseudo_bytes(buf, len));
> +}
> +
> /*
>  * Read a block from a pool and print it out.  The syntax of the
>  * block descriptor is:
> @@ -3301,7 +3309,8 @@ zdb_read_block(char *thing, spa_t *spa)
> 	uint64_t offset = 0, size = 0, psize = 0, lsize = 0, blkptr_offset = 0;
> 	zio_t *zio;
> 	vdev_t *vd;
> -	void *pbuf, *lbuf, *buf;
> +	abd_t *pabd;
> +	void *lbuf, *buf;
> 	char *s, *p, *dup, *vdev, *flagstr;
> 	int i, error;
> 
> @@ -3373,7 +3382,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 	psize = size;
> 	lsize = size;
> 
> -	pbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
> +	pabd = abd_alloc_linear(SPA_MAXBLOCKSIZE, B_FALSE);
> 	lbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
> 
> 	BP_ZERO(bp);
> @@ -3401,15 +3410,15 @@ zdb_read_block(char *thing, spa_t *spa)
> 		/*
> 		 * Treat this as a normal block read.
> 		 */
> -		zio_nowait(zio_read(zio, spa, bp, pbuf, psize, NULL, NULL,
> +		zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, NULL,
> 		    ZIO_PRIORITY_SYNC_READ,
> 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL));
> 	} else {
> 		/*
> 		 * Treat this as a vdev child I/O.
> 		 */
> -		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pbuf, psize,
> -		    ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
> +		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd,
> +		    psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
> 		    ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE |
> 		    ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY |
> 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL, NULL));
> @@ -3432,21 +3441,21 @@ zdb_read_block(char *thing, spa_t *spa)
> 		void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
> 		void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
> 
> -		bcopy(pbuf, pbuf2, psize);
> +		abd_copy_to_buf(pbuf2, pabd, psize);
> 
> -		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize,
> -		    SPA_MAXBLOCKSIZE - psize) == 0);
> +		VERIFY0(abd_iterate_func(pabd, psize, SPA_MAXBLOCKSIZE - psize,
> +		    random_get_pseudo_bytes_cb, NULL));
> 
> -		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
> -		    SPA_MAXBLOCKSIZE - psize) == 0);
> +		VERIFY0(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
> +		    SPA_MAXBLOCKSIZE - psize));
> 
> 		for (lsize = SPA_MAXBLOCKSIZE; lsize > psize;
> 		    lsize -= SPA_MINBLOCKSIZE) {
> 			for (c = 0; c < ZIO_COMPRESS_FUNCTIONS; c++) {
> -				if (zio_decompress_data(c, pbuf, lbuf,
> -				    psize, lsize) == 0 &&
> -				    zio_decompress_data(c, pbuf2, lbuf2,
> -				    psize, lsize) == 0 &&
> +				if (zio_decompress_data(c, pabd,
> +				    lbuf, psize, lsize) == 0 &&
> +				    zio_decompress_data_buf(c, pbuf2,
> +				    lbuf2, psize, lsize) == 0 &&
> 				    bcmp(lbuf, lbuf2, lsize) == 0)
> 					break;
> 			}
> @@ -3465,7 +3474,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 		buf = lbuf;
> 		size = lsize;
> 	} else {
> -		buf = pbuf;
> +		buf = abd_to_buf(pabd);
> 		size = psize;
> 	}
> 
> @@ -3483,7 +3492,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 		zdb_dump_block(thing, buf, size, flags);
> 
> out:
> -	umem_free(pbuf, SPA_MAXBLOCKSIZE);
> +	abd_free(pabd);
> 	umem_free(lbuf, SPA_MAXBLOCKSIZE);
> 	free(dup);
> }
> 
> Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
> ==============================================================================
> --- head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -24,7 +24,7 @@
>  */
> 
> /*
> - * Copyright (c) 2013, 2014 by Delphix. All rights reserved.
> + * Copyright (c) 2013, 2016 by Delphix. All rights reserved.
>  */
> 
> /*
> @@ -41,6 +41,7 @@
> #include <sys/resource.h>
> #include <sys/zil.h>
> #include <sys/zil_impl.h>
> +#include <sys/abd.h>
> 
> extern uint8_t dump_opt[256];
> 
> @@ -117,13 +118,27 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, lr_rena
> }
> 
> /* ARGSUSED */
> +static int
> +zil_prt_rec_write_cb(void *data, size_t len, void *unused)
> +{
> +	char *cdata = data;
> +	for (int i = 0; i < len; i++) {
> +		if (isprint(*cdata))
> +			(void) printf("%c ", *cdata);
> +		else
> +			(void) printf("%2X", *cdata);
> +		cdata++;
> +	}
> +	return (0);
> +}
> +
> +/* ARGSUSED */
> static void
> zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
> {
> -	char *data, *dlimit;
> +	abd_t *data;
> 	blkptr_t *bp = &lr->lr_blkptr;
> 	zbookmark_phys_t zb;
> -	char buf[SPA_MAXBLOCKSIZE];
> 	int verbose = MAX(dump_opt['d'], dump_opt['i']);
> 	int error;
> 
> @@ -144,7 +159,6 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write
> 		if (BP_IS_HOLE(bp)) {
> 			(void) printf("\t\t\tLSIZE 0x%llx\n",
> 			    (u_longlong_t)BP_GET_LSIZE(bp));
> -			bzero(buf, sizeof (buf));
> 			(void) printf("%s<hole>\n", prefix);
> 			return;
> 		}
> @@ -157,28 +171,26 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write
> 		    lr->lr_foid, ZB_ZIL_LEVEL,
> 		    lr->lr_offset / BP_GET_LSIZE(bp));
> 
> +		data = abd_alloc(BP_GET_LSIZE(bp), B_FALSE);
> 		error = zio_wait(zio_read(NULL, zilog->zl_spa,
> -		    bp, buf, BP_GET_LSIZE(bp), NULL, NULL,
> +		    bp, data, BP_GET_LSIZE(bp), NULL, NULL,
> 		    ZIO_PRIORITY_SYNC_READ, ZIO_FLAG_CANFAIL, &zb));
> 		if (error)
> -			return;
> -		data = buf;
> +			goto out;
> 	} else {
> -		data = (char *)(lr + 1);
> +		/* data is stored after the end of the lr_write record */
> +		data = abd_alloc(lr->lr_length, B_FALSE);
> +		abd_copy_from_buf(data, lr + 1, lr->lr_length);
> 	}
> 
> -	dlimit = data + MIN(lr->lr_length,
> -	    (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE));
> -
> 	(void) printf("%s", prefix);
> -	while (data < dlimit) {
> -		if (isprint(*data))
> -			(void) printf("%c ", *data);
> -		else
> -			(void) printf("%2X", *data);
> -		data++;
> -	}
> +	(void) abd_iterate_func(data,
> +	    0, MIN(lr->lr_length, (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE)),
> +	    zil_prt_rec_write_cb, NULL);
> 	(void) printf("\n");
> +
> +out:
> +	abd_free(data);
> }
> 
> /* ARGSUSED */
> 
> Modified: head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
> ==============================================================================
> --- head/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -112,6 +112,7 @@
> #include <sys/refcount.h>
> #include <sys/zfeature.h>
> #include <sys/dsl_userhold.h>
> +#include <sys/abd.h>
> #include <stdio.h>
> #include <stdio_ext.h>
> #include <stdlib.h>
> @@ -190,6 +191,7 @@ extern uint64_t metaslab_df_alloc_threshold;
> extern uint64_t zfs_deadman_synctime_ms;
> extern int metaslab_preload_limit;
> extern boolean_t zfs_compressed_arc_enabled;
> +extern boolean_t zfs_abd_scatter_enabled;
> 
> static ztest_shared_opts_t *ztest_shared_opts;
> static ztest_shared_opts_t ztest_opts;
> @@ -5042,7 +5044,7 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
> 	enum zio_checksum checksum = spa_dedup_checksum(spa);
> 	dmu_buf_t *db;
> 	dmu_tx_t *tx;
> -	void *buf;
> +	abd_t *abd;
> 	blkptr_t blk;
> 	int copies = 2 * ZIO_DEDUPDITTO_MIN;
> 
> @@ -5122,14 +5124,14 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
> 	 * Damage the block.  Dedup-ditto will save us when we read it later.
> 	 */
> 	psize = BP_GET_PSIZE(&blk);
> -	buf = zio_buf_alloc(psize);
> -	ztest_pattern_set(buf, psize, ~pattern);
> +	abd = abd_alloc_linear(psize, B_TRUE);
> +	ztest_pattern_set(abd_to_buf(abd), psize, ~pattern);
> 
> 	(void) zio_wait(zio_rewrite(NULL, spa, 0, &blk,
> -	    buf, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
> +	    abd, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
> 	    ZIO_FLAG_CANFAIL | ZIO_FLAG_INDUCE_DAMAGE, NULL));
> 
> -	zio_buf_free(buf, psize);
> +	abd_free(abd);
> 
> 	(void) rw_unlock(&ztest_name_lock);
> }
> @@ -5413,6 +5415,12 @@ ztest_resume_thread(void *arg)
> 		 */
> 		if (ztest_random(10) == 0)
> 			zfs_compressed_arc_enabled = ztest_random(2);
> +
> +		/*
> +		 * Periodically change the zfs_abd_scatter_enabled setting.
> +		 */
> +		if (ztest_random(10) == 0)
> +			zfs_abd_scatter_enabled = ztest_random(2);
> 	}
> 	return (NULL);
> }
> 
> Modified: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
> ==============================================================================
> --- head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -199,19 +199,19 @@ dump_record(dmu_replay_record_t *drr, void *payload, i
> {
> 	ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
> 	    ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
> -	fletcher_4_incremental_native(drr,
> +	(void) fletcher_4_incremental_native(drr,
> 	    offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum), zc);
> 	if (drr->drr_type != DRR_BEGIN) {
> 		ASSERT(ZIO_CHECKSUM_IS_ZERO(&drr->drr_u.
> 		    drr_checksum.drr_checksum));
> 		drr->drr_u.drr_checksum.drr_checksum = *zc;
> 	}
> -	fletcher_4_incremental_native(&drr->drr_u.drr_checksum.drr_checksum,
> -	    sizeof (zio_cksum_t), zc);
> +	(void) fletcher_4_incremental_native(
> +	    &drr->drr_u.drr_checksum.drr_checksum, sizeof (zio_cksum_t), zc);
> 	if (write(outfd, drr, sizeof (*drr)) == -1)
> 		return (errno);
> 	if (payload_len != 0) {
> -		fletcher_4_incremental_native(payload, payload_len, zc);
> +		(void) fletcher_4_incremental_native(payload, payload_len, zc);
> 		if (write(outfd, payload, payload_len) == -1)
> 			return (errno);
> 	}
> @@ -2096,9 +2096,9 @@ recv_read(libzfs_handle_t *hdl, int fd, void *buf, int
> 
> 	if (zc) {
> 		if (byteswap)
> -			fletcher_4_incremental_byteswap(buf, ilen, zc);
> +			(void) fletcher_4_incremental_byteswap(buf, ilen, zc);
> 		else
> -			fletcher_4_incremental_native(buf, ilen, zc);
> +			(void) fletcher_4_incremental_native(buf, ilen, zc);
> 	}
> 	return (0);
> }
> @@ -3688,7 +3688,8 @@ zfs_receive_impl(libzfs_handle_t *hdl, const char *tos
> 		 * recv_read() above; do it again correctly.
> 		 */
> 		bzero(&zcksum, sizeof (zio_cksum_t));
> -		fletcher_4_incremental_byteswap(&drr, sizeof (drr), &zcksum);
> +		(void) fletcher_4_incremental_byteswap(&drr,
> +		    sizeof (drr), &zcksum);
> 		flags->byteswap = B_TRUE;
> 
> 		drr.drr_type = BSWAP_32(drr.drr_type);
> 
> Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -24,6 +24,7 @@
>  */
> /*
>  * Copyright 2013 Saso Kiselkov. All rights reserved.
> + * Copyright (c) 2016 by Delphix. All rights reserved.
>  */
> 
> /*
> @@ -133,17 +134,29 @@
> #include <sys/byteorder.h>
> #include <sys/zio.h>
> #include <sys/spa.h>
> +#include <zfs_fletcher.h>
> 
> -/*ARGSUSED*/
> void
> -fletcher_2_native(const void *buf, uint64_t size,
> -    const void *ctx_template, zio_cksum_t *zcp)
> +fletcher_init(zio_cksum_t *zcp)
> {
> +	ZIO_SET_CHECKSUM(zcp, 0, 0, 0, 0);
> +}
> +
> +int
> +fletcher_2_incremental_native(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp = data;
> +
> 	const uint64_t *ip = buf;
> 	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
> 	uint64_t a0, b0, a1, b1;
> 
> -	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
> +	a0 = zcp->zc_word[0];
> +	a1 = zcp->zc_word[1];
> +	b0 = zcp->zc_word[2];
> +	b1 = zcp->zc_word[3];
> +
> +	for (; ip < ipend; ip += 2) {
> 		a0 += ip[0];
> 		a1 += ip[1];
> 		b0 += a0;
> @@ -151,18 +164,33 @@ fletcher_2_native(const void *buf, uint64_t size,
> 	}
> 
> 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
> +	return (0);
> }
> 
> /*ARGSUSED*/
> void
> -fletcher_2_byteswap(const void *buf, uint64_t size,
> +fletcher_2_native(const void *buf, size_t size,
>     const void *ctx_template, zio_cksum_t *zcp)
> {
> +	fletcher_init(zcp);
> +	(void) fletcher_2_incremental_native((void *) buf, size, zcp);
> +}
> +
> +int
> +fletcher_2_incremental_byteswap(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp = data;
> +
> 	const uint64_t *ip = buf;
> 	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
> 	uint64_t a0, b0, a1, b1;
> 
> -	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
> +	a0 = zcp->zc_word[0];
> +	a1 = zcp->zc_word[1];
> +	b0 = zcp->zc_word[2];
> +	b1 = zcp->zc_word[3];
> +
> +	for (; ip < ipend; ip += 2) {
> 		a0 += BSWAP_64(ip[0]);
> 		a1 += BSWAP_64(ip[1]);
> 		b0 += a0;
> @@ -170,50 +198,23 @@ fletcher_2_byteswap(const void *buf, uint64_t size,
> 	}
> 
> 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
> +	return (0);
> }
> 
> /*ARGSUSED*/
> void
> -fletcher_4_native(const void *buf, uint64_t size,
> +fletcher_2_byteswap(const void *buf, size_t size,
>     const void *ctx_template, zio_cksum_t *zcp)
> {
> -	const uint32_t *ip = buf;
> -	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
> -	uint64_t a, b, c, d;
> -
> -	for (a = b = c = d = 0; ip < ipend; ip++) {
> -		a += ip[0];
> -		b += a;
> -		c += b;
> -		d += c;
> -	}
> -
> -	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	fletcher_init(zcp);
> +	(void) fletcher_2_incremental_byteswap((void *) buf, size, zcp);
> }
> 
> -/*ARGSUSED*/
> -void
> -fletcher_4_byteswap(const void *buf, uint64_t size,
> -    const void *ctx_template, zio_cksum_t *zcp)
> +int
> +fletcher_4_incremental_native(void *buf, size_t size, void *data)
> {
> -	const uint32_t *ip = buf;
> -	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
> -	uint64_t a, b, c, d;
> +	zio_cksum_t *zcp = data;
> 
> -	for (a = b = c = d = 0; ip < ipend; ip++) {
> -		a += BSWAP_32(ip[0]);
> -		b += a;
> -		c += b;
> -		d += c;
> -	}
> -
> -	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> -}
> -
> -void
> -fletcher_4_incremental_native(const void *buf, uint64_t size,
> -    zio_cksum_t *zcp)
> -{
> 	const uint32_t *ip = buf;
> 	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
> 	uint64_t a, b, c, d;
> @@ -231,12 +232,23 @@ fletcher_4_incremental_native(const void *buf, uint64_
> 	}
> 
> 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	return (0);
> }
> 
> +/*ARGSUSED*/
> void
> -fletcher_4_incremental_byteswap(const void *buf, uint64_t size,
> -    zio_cksum_t *zcp)
> +fletcher_4_native(const void *buf, size_t size,
> +    const void *ctx_template, zio_cksum_t *zcp)
> {
> +	fletcher_init(zcp);
> +	(void) fletcher_4_incremental_native((void *) buf, size, zcp);
> +}
> +
> +int
> +fletcher_4_incremental_byteswap(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp = data;
> +
> 	const uint32_t *ip = buf;
> 	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
> 	uint64_t a, b, c, d;
> @@ -254,4 +266,14 @@ fletcher_4_incremental_byteswap(const void *buf, uint6
> 	}
> 
> 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	return (0);
> +}
> +
> +/*ARGSUSED*/
> +void
> +fletcher_4_byteswap(const void *buf, size_t size,
> +    const void *ctx_template, zio_cksum_t *zcp)
> +{
> +	fletcher_init(zcp);
> +	(void) fletcher_4_incremental_byteswap((void *) buf, size, zcp);
> }
> 
> Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -24,6 +24,7 @@
>  */
> /*
>  * Copyright 2013 Saso Kiselkov. All rights reserved.
> + * Copyright (c) 2016 by Delphix. All rights reserved.
>  */
> 
> #ifndef	_ZFS_FLETCHER_H
> @@ -40,12 +41,15 @@ extern "C" {
>  * fletcher checksum functions
>  */
> 
> -void fletcher_2_native(const void *, uint64_t, const void *, zio_cksum_t *);
> -void fletcher_2_byteswap(const void *, uint64_t, const void *, zio_cksum_t *);
> -void fletcher_4_native(const void *, uint64_t, const void *, zio_cksum_t *);
> -void fletcher_4_byteswap(const void *, uint64_t, const void *, zio_cksum_t *);
> -void fletcher_4_incremental_native(const void *, uint64_t, zio_cksum_t *);
> -void fletcher_4_incremental_byteswap(const void *, uint64_t, zio_cksum_t *);
> +void fletcher_init(zio_cksum_t *);
> +void fletcher_2_native(const void *, size_t, const void *, zio_cksum_t *);
> +void fletcher_2_byteswap(const void *, size_t, const void *, zio_cksum_t *);
> +int fletcher_2_incremental_native(void *, size_t, void *);
> +int fletcher_2_incremental_byteswap(void *, size_t, void *);
> +void fletcher_4_native(const void *, size_t, const void *, zio_cksum_t *);
> +void fletcher_4_byteswap(const void *, size_t, const void *, zio_cksum_t *);
> +int fletcher_4_incremental_native(void *, size_t, void *);
> +int fletcher_4_incremental_byteswap(void *, size_t, void *);
> 
> #ifdef	__cplusplus
> }
> 
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -33,6 +33,7 @@
> # common to all SunOS systems.
> 
> ZFS_COMMON_OBJS +=		\
> +	abd.o			\
> 	arc.o			\
> 	bplist.o		\
> 	blkptr.o		\
> 
> Copied and modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c (from r318946, vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c)
> ==============================================================================
> --- vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c	Fri May 26 12:13:27 2017	(r318946, copy source)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -174,6 +174,7 @@ abd_free_chunk(void *c)
> void
> abd_init(void)
> {
> +#ifdef illumos
> 	vmem_t *data_alloc_arena = NULL;
> 
> #ifdef _KERNEL
> @@ -186,7 +187,10 @@ abd_init(void)
> 	 */
> 	abd_chunk_cache = kmem_cache_create("abd_chunk", zfs_abd_chunk_size, 0,
> 	    NULL, NULL, NULL, NULL, data_alloc_arena, KMC_NOTOUCH);
> -
> +#else
> +	abd_chunk_cache = kmem_cache_create("abd_chunk", zfs_abd_chunk_size, 0,
> +	    NULL, NULL, NULL, NULL, 0, KMC_NOTOUCH | KMC_NODEBUG);
> +#endif
> 	abd_ksp = kstat_create("zfs", 0, "abdstats", "misc", KSTAT_TYPE_NAMED,
> 	    sizeof (abd_stats) / sizeof (kstat_named_t), KSTAT_FLAG_VIRTUAL);
> 	if (abd_ksp != NULL) {
> 
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Tue Jun 20 17:39:24 2017	(r320156)
> @@ -128,14 +128,14 @@
>  * the arc_buf_hdr_t that will point to the data block in memory. A block can
>  * only be read by a consumer if it has an l1arc_buf_hdr_t. The L1ARC
>  * caches data in two ways -- in a list of ARC buffers (arc_buf_t) and
> - * also in the arc_buf_hdr_t's private physical data block pointer (b_pdata).
> + * also in the arc_buf_hdr_t's private physical data block pointer (b_pabd).
>  *
>  * The L1ARC's data pointer may or may not be uncompressed. The ARC has the
> - * ability to store the physical data (b_pdata) associated with the DVA of the
> - * arc_buf_hdr_t. Since the b_pdata is a copy of the on-disk physical block,
> + * ability to store the physical data (b_pabd) associated with the DVA of the
> + * arc_buf_hdr_t. Since the b_pabd is a copy of the on-disk physical block,
>  * it will match its on-disk compression characteristics. This behavior can be
>  * disabled by setting 'zfs_compressed_arc_enabled' to B_FALSE. When the
> - * compressed ARC functionality is disabled, the b_pdata will point to an
> + * compressed ARC functionality is disabled, the b_pabd will point to an
>  * uncompressed version of the on-disk data.
>  *
>  * Data in the L1ARC is not accessed by consumers of the ARC directly. Each
> @@ -174,7 +174,7 @@
>  *   | l1arc_buf_hdr_t
>  *   |           |              arc_buf_t
>  *   | b_buf     +------------>+-----------+      arc_buf_t
> - *   | b_pdata   +-+           |b_next     +---->+-----------+
> + *   | b_pabd    +-+           |b_next     +---->+-----------+
>  *   +-----------+ |           |-----------|     |b_next     +-->NULL
>  *                 |           |b_comp = T |     +-----------+
>  *                 |           |b_data     +-+   |b_comp = F |
> @@ -191,8 +191,8 @@
>  * When a consumer reads a block, the ARC must first look to see if the
>  * arc_buf_hdr_t is cached. If the hdr is cached then the ARC allocates a new
>  * arc_buf_t and either copies uncompressed data into a new data buffer from an
> - * existing uncompressed arc_buf_t, decompresses the hdr's b_pdata buffer into a
> - * new data buffer, or shares the hdr's b_pdata buffer, depending on whether the
> + * existing uncompressed arc_buf_t, decompresses the hdr's b_pabd buffer into a
> + * new data buffer, or shares the hdr's b_pabd buffer, depending on whether the
>  * hdr is compressed and the desired compression characteristics of the
>  * arc_buf_t consumer. If the arc_buf_t ends up sharing data with the
>  * arc_buf_hdr_t and both of them are uncompressed then the arc_buf_t must be
> @@ -216,7 +216,7 @@
>  *                |           |                 arc_buf_t    (shared)
>  *                |    b_buf  +------------>+---------+      arc_buf_t
>  *                |           |             |b_next   +---->+---------+
> - *                |  b_pdata  +-+           |---------|     |b_next   +-->NULL
> + *                |  b_pabd   +-+           |---------|     |b_next   +-->NULL
>  *                +-----------+ |           |         |     +---------+
>  *                              |           |b_data   +-+   |         |
>  *                              |           +---------+ |   |b_data   +-+
> @@ -230,19 +230,19 @@
>  *                                    |                    +------+     |
>  *                                    +---------------------------------+
>  *
> - * Writing to the ARC requires that the ARC first discard the hdr's b_pdata
> + * Writing to the ARC requires that the ARC first discard the hdr's b_pabd
>  * since the physical block is about to be rewritten. The new data contents
>  * will be contained in the arc_buf_t. As the I/O pipeline performs the write,
>  * it may compress the data before writing it to disk. The ARC will be called
>  * with the transformed data and will bcopy the transformed on-disk block into
> - * a newly allocated b_pdata. Writes are always done into buffers which have
> + * a newly allocated b_pabd. Writes are always done into buffers which have
>  * either been loaned (and hence are new and don't have other readers) or
>  * buffers which have been released (and hence have their own hdr, if there
>  * were originally other readers of the buf's original hdr). This ensures that
>  * the ARC only needs to update a single buf and its hdr after a write occurs.
>  *
> - * When the L2ARC is in use, it will also take advantage of the b_pdata. The
> - * L2ARC will always write the contents of b_pdata to the L2ARC. This means
> + * When the L2ARC is in use, it will also take advantage of the b_pabd. The
> + * L2ARC will always write the contents of b_pabd to the L2ARC. This means
>  * that when compressed ARC is enabled that the L2ARC blocks are identical
>  * to the on-disk block in the main data pool. This provides a significant
>  * advantage since the ARC can leverage the bp's checksum when reading from the
> @@ -263,7 +263,9 @@
> #include <sys/vdev.h>
> #include <sys/vdev_impl.h>
> #include <sys/dsl_pool.h>
> +#include <sys/zio_checksum.h>
> #include <sys/multilist.h>
> +#include <sys/abd.h>
> #ifdef _KERNEL
> #include <sys/dnlc.h>
> #include <sys/racct.h>
> @@ -307,7 +309,7 @@ int zfs_arc_evict_batch_limit = 10;
> /* number of seconds before growing cache again */
> static int		arc_grow_retry = 60;
> 
> -/* shift of arc_c for calculating overflow limit in arc_get_data_buf */
> +/* shift of arc_c for calculating overflow limit in arc_get_data_impl */
> int		zfs_arc_overflow_shift = 8;
> 
> /* shift of arc_c for calculating both min and max arc_p */
> @@ -543,13 +545,13 @@ typedef struct arc_stats {
> 	kstat_named_t arcstat_c_max;
> 	kstat_named_t arcstat_size;
> 	/*
> -	 * Number of compressed bytes stored in the arc_buf_hdr_t's b_pdata.
> +	 * Number of compressed bytes stored in the arc_buf_hdr_t's b_pabd.
> 	 * Note that the compressed bytes may match the uncompressed bytes
> 	 * if the block is either not compressed or compressed arc is disabled.
> 	 */
> 	kstat_named_t arcstat_compressed_size;
> 	/*
> -	 * Uncompressed size of the data stored in b_pdata. If compressed
> +	 * Uncompressed size of the data stored in b_pabd. If compressed
> 	 * arc is disabled then this value will be identical to the stat
> 	 * above.
> 	 */
> @@ -988,7 +990,7 @@ typedef struct l1arc_buf_hdr {
> 	refcount_t		b_refcnt;
> 
> 	arc_callback_t		*b_acb;
> -	void			*b_pdata;
> +	abd_t			*b_pabd;
> } l1arc_buf_hdr_t;
> 
> typedef struct l2arc_dev l2arc_dev_t;
> @@ -1341,7 +1343,7 @@ typedef struct l2arc_read_callback {
> 	blkptr_t		l2rcb_bp;		/* original blkptr */
> 	zbookmark_phys_t	l2rcb_zb;		/* original bookmark */
> 	int			l2rcb_flags;		/* original flags */
> -	void			*l2rcb_data;		/* temporary buffer */
> +	void			*l2rcb_abd;		/* temporary buffer */
> } l2arc_read_callback_t;
> 
> typedef struct l2arc_write_callback {
> @@ -1351,7 +1353,7 @@ typedef struct l2arc_write_callback {
> 
> typedef struct l2arc_data_free {
> 	/* protected by l2arc_free_on_write_mtx */
> -	void		*l2df_data;
> +	abd_t		*l2df_abd;
> 	size_t		l2df_size;
> 	arc_buf_contents_t l2df_type;
> 	list_node_t	l2df_list_node;
> @@ -1361,10 +1363,14 @@ static kmutex_t l2arc_feed_thr_lock;
> static kcondvar_t l2arc_feed_thr_cv;
> static uint8_t l2arc_thread_exit;
> 
> +static abd_t *arc_get_data_abd(arc_buf_hdr_t *, uint64_t, void *);
> static void *arc_get_data_buf(arc_buf_hdr_t *, uint64_t, void *);
> +static void arc_get_data_impl(arc_buf_hdr_t *, uint64_t, void *);
> +static void arc_free_data_abd(arc_buf_hdr_t *, abd_t *, uint64_t, void *);
> static void arc_free_data_buf(arc_buf_hdr_t *, void *, uint64_t, void *);
> -static void arc_hdr_free_pdata(arc_buf_hdr_t *hdr);
> -static void arc_hdr_alloc_pdata(arc_buf_hdr_t *);
> +static void arc_free_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag);
> +static void arc_hdr_free_pabd(arc_buf_hdr_t *);
> +static void arc_hdr_alloc_pabd(arc_buf_hdr_t *);
> static void arc_access(arc_buf_hdr_t *, kmutex_t *);
> static boolean_t arc_is_overflowing();
> static void arc_buf_watch(arc_buf_t *);
> @@ -1718,7 +1724,9 @@ static inline boolean_t
> arc_buf_is_shared(arc_buf_t *buf)
> {
> 	boolean_t shared = (buf->b_data != NULL &&
> -	    buf->b_data == buf->b_hdr->b_l1hdr.b_pdata);
> +	    buf->b_hdr->b_l1hdr.b_pabd != NULL &&
> +	    abd_is_linear(buf->b_hdr->b_l1hdr.b_pabd) &&
> +	    buf->b_data == abd_to_buf(buf->b_hdr->b_l1hdr.b_pabd));
> 	IMPLY(shared, HDR_SHARED_DATA(buf->b_hdr));
> 	IMPLY(shared, ARC_BUF_SHARED(buf));
> 	IMPLY(shared, ARC_BUF_COMPRESSED(buf) || ARC_BUF_LAST(buf));
> @@ -1822,7 +1830,8 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t *zio)
> 		uint64_t csize;
> 
> 		void *cbuf = zio_buf_alloc(HDR_GET_PSIZE(hdr));
> -		csize = zio_compress_data(compress, zio->io_data, cbuf, lsize);
> +		csize = zio_compress_data(compress, zio->io_abd, cbuf, lsize);
> +
> 		ASSERT3U(csize, <=, HDR_GET_PSIZE(hdr));
> 		if (csize < HDR_GET_PSIZE(hdr)) {
> 			/*
> @@ -1857,7 +1866,7 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t *zio)
> 	 * logical I/O size and not just a gang fragment.
> 	 */
> 	valid_cksum = (zio_checksum_error_impl(zio->io_spa, zio->io_bp,
> -	    BP_GET_CHECKSUM(zio->io_bp), zio->io_data, zio->io_size,
> +	    BP_GET_CHECKSUM(zio->io_bp), zio->io_abd, zio->io_size,
> 	    zio->io_offset, NULL) == 0);
> 	zio_pop_transforms(zio);
> 	return (valid_cksum);
> @@ -2161,7 +2170,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t compressed)
> 
> 	if (hdr_compressed == compressed) {
> 		if (!arc_buf_is_shared(buf)) {
> -			bcopy(hdr->b_l1hdr.b_pdata, buf->b_data,
> +			abd_copy_to_buf(buf->b_data, hdr->b_l1hdr.b_pabd,
> 			    arc_buf_size(buf));
> 		}
> 	} else {
> @@ -2213,7 +2222,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t compressed)
> 			return (0);
> 		} else {
> 			int error = zio_decompress_data(HDR_GET_COMPRESS(hdr),
> -			    hdr->b_l1hdr.b_pdata, buf->b_data,
> +			    hdr->b_l1hdr.b_pabd, buf->b_data,
> 			    HDR_GET_PSIZE(hdr), HDR_GET_LSIZE(hdr));
> 
> 			/*
> @@ -2250,7 +2259,7 @@ arc_decompress(arc_buf_t *buf)
> }
> 
> /*
> - * Return the size of the block, b_pdata, that is stored in the arc_buf_hdr_t.
> + * Return the size of the block, b_pabd, that is stored in the arc_buf_hdr_t.
>  */
> static uint64_t
> arc_hdr_size(arc_buf_hdr_t *hdr)
> @@ -2282,14 +2291,14 @@ arc_evictable_space_increment(arc_buf_hdr_t *hdr, arc_
> 	if (GHOST_STATE(state)) {
> 		ASSERT0(hdr->b_l1hdr.b_bufcnt);
> 		ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
> -		ASSERT3P(hdr->b_l1hdr.b_pdata, ==, NULL);
> +		ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
> 		(void) refcount_add_many(&state->arcs_esize[type],
> 		    HDR_GET_LSIZE(hdr), hdr);
> 		return;
> 	}
> 
> 	ASSERT(!GHOST_STATE(state));
> -	if (hdr->b_l1hdr.b_pdata != NULL) {
> +	if (hdr->b_l1hdr.b_pabd != NULL) {
> 		(void) refcount_add_many(&state->arcs_esize[type],
> 		    arc_hdr_size(hdr), hdr);
> 	}
> @@ -2317,14 +2326,14 @@ arc_evictable_space_decrement(arc_buf_hdr_t *hdr, arc_
> 	if (GHOST_STATE(state)) {
> 		ASSERT0(hdr->b_l1hdr.b_bufcnt);
> 		ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
> -		ASSERT3P(hdr->b_l1hdr.b_pdata, ==, NULL);
> +		ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
> 		(void) refcount_remove_many(&state->arcs_esize[type],
> 		    HDR_GET_LSIZE(hdr), hdr);
> 		return;
> 	}
> 
> 	ASSERT(!GHOST_STATE(state));
> -	if (hdr->b_l1hdr.b_pdata != NULL) {
> +	if (hdr->b_l1hdr.b_pabd != NULL) {
> 		(void) refcount_remove_many(&state->arcs_esize[type],
> 		    arc_hdr_size(hdr), hdr);
> 	}
> @@ -2421,7 +2430,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t
> 		old_state = hdr->b_l1hdr.b_state;
> 		refcnt = refcount_count(&hdr->b_l1hdr.b_refcnt);
> 		bufcnt = hdr->b_l1hdr.b_bufcnt;
> -		update_old = (bufcnt > 0 || hdr->b_l1hdr.b_pdata != NULL);
> +		update_old = (bufcnt > 0 || hdr->b_l1hdr.b_pabd != NULL);
> 	} else {
> 		old_state = arc_l2c_only;
> 		refcnt = 0;
> @@ -2491,7 +2500,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t
> 			 */
> 			(void) refcount_add_many(&new_state->arcs_size,
> 			    HDR_GET_LSIZE(hdr), hdr);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, ==, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
> 		} else {
> 			uint32_t buffers = 0;
> 
> @@ -2520,7 +2529,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t
> 			}
> 			ASSERT3U(bufcnt, ==, buffers);
> 
> -			if (hdr->b_l1hdr.b_pdata != NULL) {
> +			if (hdr->b_l1hdr.b_pabd != NULL) {
> 				(void) refcount_add_many(&new_state->arcs_size,
> 				    arc_hdr_size(hdr), hdr);
> 			} else {
> @@ -2533,7 +2542,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t
> 		ASSERT(HDR_HAS_L1HDR(hdr));
> 		if (GHOST_STATE(old_state)) {
> 			ASSERT0(bufcnt);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, ==, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
> 
> 			/*
> 			 * When moving a header off of a ghost state,
> @@ -2573,7 +2582,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t
> 				    buf);
> 			}
> 			ASSERT3U(bufcnt, ==, buffers);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
> 			(void) refcount_remove_many(
> 			    &old_state->arcs_size, arc_hdr_size(hdr), hdr);
> 		}
> @@ -2655,7 +2664,7 @@ arc_space_return(uint64_t space, arc_space_type_t type
> 
> /*
>  * Given a hdr and a buf, returns whether that buf can share its b_data buffer
> - * with the hdr's b_pdata.
> + * with the hdr's b_pabd.
>  */
> static boolean_t
> arc_can_share(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> @@ -2732,20 +2741,23 @@ arc_buf_alloc_impl(arc_buf_hdr_t *hdr, void *tag, bool
> 	/*
> 	 * If the hdr's data can be shared then we share the data buffer and
> 	 * set the appropriate bit in the hdr's b_flags to indicate the hdr is
> -	 * sharing it's b_pdata with the arc_buf_t. Otherwise, we allocate a new
> +	 * sharing it's b_pabd with the arc_buf_t. Otherwise, we allocate a new
> 	 * buffer to store the buf's data.
> 	 *
> -	 * There is one additional restriction here because we're sharing
> -	 * hdr -> buf instead of the usual buf -> hdr: the hdr can't be actively
> -	 * involved in an L2ARC write, because if this buf is used by an
> -	 * arc_write() then the hdr's data buffer will be released when the
> +	 * There are two additional restrictions here because we're sharing
> +	 * hdr -> buf instead of the usual buf -> hdr. First, the hdr can't be
> +	 * actively involved in an L2ARC write, because if this buf is used by
> +	 * an arc_write() then the hdr's data buffer will be released when the
> 	 * write completes, even though the L2ARC write might still be using it.
> +	 * Second, the hdr's ABD must be linear so that the buf's user doesn't
> +	 * need to be ABD-aware.
> 	 */
> -	boolean_t can_share = arc_can_share(hdr, buf) && !HDR_L2_WRITING(hdr);
> +	boolean_t can_share = arc_can_share(hdr, buf) && !HDR_L2_WRITING(hdr) &&
> +	    abd_is_linear(hdr->b_l1hdr.b_pabd);
> 
> 	/* Set up b_data and sharing */
> 	if (can_share) {
> -		buf->b_data = hdr->b_l1hdr.b_pdata;
> +		buf->b_data = abd_to_buf(hdr->b_l1hdr.b_pabd);
> 		buf->b_flags |= ARC_BUF_FLAG_SHARED;
> 		arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA);
> 	} else {
> @@ -2841,11 +2853,11 @@ arc_loan_inuse_buf(arc_buf_t *buf, void *tag)
> }
> 
> static void
> -l2arc_free_data_on_write(void *data, size_t size, arc_buf_contents_t type)
> +l2arc_free_abd_on_write(abd_t *abd, size_t size, arc_buf_contents_t type)
> {
> 	l2arc_data_free_t *df = kmem_alloc(sizeof (*df), KM_SLEEP);
> 
> -	df->l2df_data = data;
> +	df->l2df_abd = abd;
> 	df->l2df_size = size;
> 	df->l2df_type = type;
> 	mutex_enter(&l2arc_free_on_write_mtx);
> @@ -2876,7 +2888,7 @@ arc_hdr_free_on_write(arc_buf_hdr_t *hdr)
> 		arc_space_return(size, ARC_SPACE_DATA);
> 	}
> 
> -	l2arc_free_data_on_write(hdr->b_l1hdr.b_pdata, size, type);
> +	l2arc_free_abd_on_write(hdr->b_l1hdr.b_pabd, size, type);
> }
> 
> /*
> @@ -2890,7 +2902,7 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> 	arc_state_t *state = hdr->b_l1hdr.b_state;
> 
> 	ASSERT(arc_can_share(hdr, buf));
> -	ASSERT3P(hdr->b_l1hdr.b_pdata, ==, NULL);
> +	ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
> 	ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr));
> 
> 	/*
> @@ -2899,7 +2911,9 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> 	 * the refcount whenever an arc_buf_t is shared.
> 	 */
> 	refcount_transfer_ownership(&state->arcs_size, buf, hdr);
> -	hdr->b_l1hdr.b_pdata = buf->b_data;
> +	hdr->b_l1hdr.b_pabd = abd_get_from_buf(buf->b_data, arc_buf_size(buf));
> +	abd_take_ownership_of_buf(hdr->b_l1hdr.b_pabd,
> +	    HDR_ISTYPE_METADATA(hdr));
> 	arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA);
> 	buf->b_flags |= ARC_BUF_FLAG_SHARED;
> 
> @@ -2919,7 +2933,7 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> 	arc_state_t *state = hdr->b_l1hdr.b_state;
> 
> 	ASSERT(arc_buf_is_shared(buf));
> -	ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
> +	ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
> 	ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr));
> 
> 	/*
> @@ -2928,7 +2942,9 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> 	 */
> 	refcount_transfer_ownership(&state->arcs_size, hdr, buf);
> 	arc_hdr_clear_flags(hdr, ARC_FLAG_SHARED_DATA);
> -	hdr->b_l1hdr.b_pdata = NULL;
> +	abd_release_ownership_of_buf(hdr->b_l1hdr.b_pabd);
> +	abd_put(hdr->b_l1hdr.b_pabd);
> +	hdr->b_l1hdr.b_pabd = NULL;
> 	buf->b_flags &= ~ARC_BUF_FLAG_SHARED;
> 
> 	/*
> @@ -3025,7 +3041,7 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> 	if (ARC_BUF_SHARED(buf) && !ARC_BUF_COMPRESSED(buf)) {
> 		/*
> 		 * If the current arc_buf_t is sharing its data buffer with the
> -		 * hdr, then reassign the hdr's b_pdata to share it with the new
> +		 * hdr, then reassign the hdr's b_pabd to share it with the new
> 		 * buffer at the end of the list. The shared buffer is always
> 		 * the last one on the hdr's buffer list.
> 		 *
> @@ -3040,8 +3056,8 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> 			/* hdr is uncompressed so can't have compressed buf */
> 			VERIFY(!ARC_BUF_COMPRESSED(lastbuf));
> 
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
> -			arc_hdr_free_pdata(hdr);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
> +			arc_hdr_free_pabd(hdr);
> 
> 			/*
> 			 * We must setup a new shared block between the
> @@ -3079,26 +3095,26 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> }
> 
> *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
> 



More information about the svn-src-all mailing list