ZFS bug: was creating ZIL ignores vfs.zfs.min_auto_ashift, should be ZIL sets improper ashift with AHCI controllers

Steven Hartland killing at multiplay.co.uk
Thu Nov 6 20:26:26 UTC 2014


Something very strange going on.

I have a boot pool (tank) and if I add ada1p3 (512b disk with 
min_auto_ashift = 12) to it as a log device zdb reports its ashift as 9.

If I add the same device to another test pool (tpool) on the same 
machine it gets ashift 12.

The attached dtrace script traces the calls and shows that 
vdev_ashift_optimize is correctly called and that the ashift of the vdev 
in both cases should be 12 according to the final vdev_config_generate call.

More debugging required

On 06/11/2014 14:58, Borja Marcos wrote:
> On Nov 6, 2014, at 2:26 PM, Steven Hartland wrote:
>
>> That's not relevant as min when set should override the drives params
> There is more to this than it seems, I just found more funny stuff.
>
> MY CONCLUSION IS: when creating a ZIL device, it behaves differently depending on the disk controller. It works with SAS,
> and it doesn't work with AHCI.
>
> When using an AHCI controller, ZIL ignores *both* the 4K block quirk and the min_auto_ashift variables. Ashift is fixed to 9. It only
> uses a different ashift when using a "nop" device. For example, I have tried with a 4 KB gnop device and this time it used the correct ashift, 12.
>
> When using a SAS controller, ZIL works perfectly with both.
>
> Seems quite odd to me. I am not running exactly the same version on both machines (the one with AHCI controllers is running -STABLE
> from three days ago) and the one with the SAS controller is running 10.1-RC4. But the  results should be the  same.
>
>
>
>
>
> I've added the relevant quirk to ata_da.c and the SSD is now
> properly "quirked":
>
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <INTEL SSDSA2CT040G3 4PC10362> ATA-8 SATA 2.x device
> ada1: Serial Number PEPR408501DV040AGN
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C)
> ada1: quirks=0x1<4K>
>
>
> But still something is wrong:
>
> EXAMPLE ONE: AHCI controller, min_auto_ashift with the default value of 9.
>
> The log child, has the wrong ashift, 9, regardless of the 4K  quirk.
>
>         children[1]:
>              type: 'disk'
>              id: 1
>              guid: 2447450905312007897
>              path: '/dev/ada1'
>              phys_path: '/dev/ada1'
>              whole_disk: 1
>              metaslab_array: 0
>              metaslab_shift: 0
>              ashift: 9
>              asize: 40015757312
>              is_log: 1
>              create_txg: 11741519
>
>
> EXAMPLE 2: AHCI controller, raise min_auto_ashift to 12
>
> # sysctl vfs.zfs.min_auto_ashift=12
> vfs.zfs.min_auto_ashift: 9 -> 12
>
> # zpool add rpool log ada1
>
> And our log child still has the wrong ashift.
>
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 17598938711972588792
>              path: '/dev/ada1'
>              phys_path: '/dev/ada1'
>              whole_disk: 1
>              metaslab_array: 0
>              metaslab_shift: 0
>              ashift: 9
>              asize: 40015757312
>              is_log: 1
>              create_txg: 11741560
>
>
>
> EXAMPLE 3: Doing the same as example one, but using a SAS controller (mps).
> I haven't changed the  min_auto_ashift.
>
> da3: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da3: Serial Number S1D9NEADA08568E
> da3: 600.000MB/s transfers
> da3: Command Queueing enabled
> da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da3: quirks=0x8<4K>
> da1: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da1: Serial Number S1D9NEADA08549F
> da1: 600.000MB/s transfers
> da1: Command Queueing enabled
> da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da1: quirks=0x8<4K>
> da2: <ATA Samsung SSD 840 BB0Q> Fixed Direct Access SCSI-6 device
> da2: Serial Number S1D9NEADA08548T
> da2: 600.000MB/s transfers
> da2: Command Queueing enabled
> da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> da2: quirks=0x8<4K>
>
>
> Now, we create a pool. I did this in two steps in order to reproduce my AHCI more accurately.
>
> # zpool create sample mirror da2 da3
>
> and add a log device
>
> # zpool add sample log da1
>
> And our log device uses the ashift...
>
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 1327562712929751294
>              path: '/dev/da1'
>              phys_path: '/dev/da1'
>              whole_disk: 1
>              metaslab_array: 38
>              metaslab_shift: 33
>              ashift: 12                            <=============== BINGO! 12!!
>              asize: 1000199946240
>              is_log: 1
>              create_txg: 7
>
>
> EXAMPLE 4: Same hardware as before, but I have compiled a "dequirked" kernel. The Samsung 840 SSD is now
> detected with 512 byte sectors.
>
> # sysctl vfs.zfs.min_auto_ashift=12
>
> # zpool create sample da2 da3
>
> # zpool add sample log da1
>
> # zdb
>
> sample:
>      version: 5000
>      name: 'sample'
>      state: 0
>      txg: 10
>      pool_guid: 10244789911221894670
>      hostid: 1065071139
>      hostname: 'elibm'
>      vdev_children: 3
>      vdev_tree:
>          type: 'root'
>          id: 0
>          guid: 10244789911221894670
>          create_txg: 4
>          children[0]:
>              type: 'disk'
>              id: 0
>              guid: 147759032286414284
>              path: '/dev/da2'
>              phys_path: '/dev/da2'
>              whole_disk: 1
>              metaslab_array: 37
>              metaslab_shift: 33
>              ashift: 12
>              asize: 1000199946240
>              is_log: 0
>              create_txg: 4
>          children[1]:
>              type: 'disk'
>              id: 1
>              guid: 2632519121370708463
>              path: '/dev/da3'
>              phys_path: '/dev/da3'
>              whole_disk: 1
>              metaslab_array: 34
>              metaslab_shift: 33
>              ashift: 12
>              asize: 1000199946240
>              is_log: 0
>              create_txg: 4
>          children[2]:
>              type: 'disk'
>              id: 2
>              guid: 10136980984141171426
>              path: '/dev/da1'
>              phys_path: '/dev/da1'
>              whole_disk: 1
>              metaslab_array: 39
>              metaslab_shift: 33
>              ashift: 12							<========= 12, ashift for the log device
>              asize: 1000199946240
>              is_log: 1
>              create_txg: 8
>      features_for_read:
>          com.delphix:hole_birth
>          com.delphix:embedded_data
> root at elibm:~ #
>

-------------- next part --------------
#!/usr/sbin/dtrace -s

fbt::vdev_ashift_optimize:entry {
	vd = (vdev_t *)arg0;
	printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d",
		vd->vdev_path ? stringof(vd->vdev_path) : "n/a",
		vd->vdev_ashift,
		vd->vdev_physical_ashift,
		vd == vd->vdev_top,
		`zfs_min_auto_ashift
	);
}
fbt::vdev_config_generate:entry {
	vd = (vdev_t *)arg1;
	printf("vdev: %s, ashift: %d, physical_ashift: %d, top: %d, min: %d",
		vd->vdev_path ? stringof(vd->vdev_path) : "n/a",
		vd->vdev_ashift,
		vd->vdev_physical_ashift,
		vd == vd->vdev_top,
		`zfs_min_auto_ashift
	);
}

fbt::vdev_ashift_optimize:return {
	printf("%x", arg0);
}



More information about the freebsd-fs mailing list