Status of support for 4KB disk sectors

Glen Barber glen.j.barber at gmail.com
Tue Jul 19 03:59:45 UTC 2011


On 7/18/11 7:41 PM, Jeremy Chadwick wrote:
> On Mon, Jul 18, 2011 at 03:50:15PM -0700, Kevin Oberman wrote:
>> I just want to check on the status of 4K sector support in FreeBSD.  I read
>> a long thread on the topic from a while back and it looks like I might hit some
>> issues if I'm not REALLY careful. Since I will be keeping the existing Windows
>> installation, I need to be sure that I can set up the disk correctly without
>> screwing up Windows 7.
>>
>> I was planning on just DDing the W7 slice over, but I am not sure how well this
>> would play with GPT. Or should I not try to use GPT at all? I'd like
>> to as this laptop
>> spreads Windows 7 over two slices and adds a third for the recovery
>> system, leaving
>> only one for FreeBSD and I'd like to put my files in a separate slice.
>> GPT would offer
>> that fifth slice.
>>
>> I have read the handbook and don't see any reference to 4K sectors and only a
>> one-liner about gpart(8) and GPT. Oncew I get this all figured out,
>> I'll see about writing
>> an update about this as GPT looks like the way to go in e future.
> 
> When you say "4KB sector support", what do you mean by this?  All
> drives on the market as of this writing, that I've seen, all claim a
> physical/logical sector size of 512 bytes -- yes, even SSDs, and EARS
> drives which we know use 4KB sectors.  They do this to guarantee full
> compatibility with existing software.
> 
> Since you're talking about gpart and "4KB sector support", did you mean
> to ask "what's the state of FreeBSD and aligned partition support to
> ensure decent performance with 4KB-sector drives?"
> 
> If so: there have been some commits in recent days to RELENG_8 to help
> try to address the shortcomings of the existing utilities and GEOM
> infrastructure.  Read the most recent commit text carefully:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/geom/class/part/geom_part.c
> 
> But the currently "known method" is to use gnop(8).  Here's an example:
> 
> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/
> 

Notice: I'm reading this as "how badly do 'green drives' suck?"

FWIW, I've recently done the gnop(8) trick to two "green" drives in one
of my machines because I was seeing horrifying performance problems with
what I consider to be basic stuff, like 'portsnap extract', but more
severely with copying large data (file-backed bacula files to be exact)
into said datasets.  I have yet to retry my read/write tests with drives
I have not converted with gnop(8).

I have not conclusively tested all possible combinations of
configurations, nor reverted the changes to the drives to retest, but if
it is of any interest, here's what I'm seeing.

I have comparisons between WD "green" and "black" drives.
Unfortunately, the machines are not completely similar - one is a
Core2Quad, the other Core2Duo; one has 6GB RAM, the other 8GB RAM; also,
'orion' is running a month-old 8-STABLE; 'kaos' is running a 2-week-old
-CURRENT.  Both machines are using ZFSv28:

orion % sysctl -n hw.ncpu; sysctl -n hw.physmem
4
6353416192

kaos % sysctl -n hw.ncpu; sysctl -n hw.physmem
2
8534401024

The drives in 'orion' are 1TB WD green drives in a ZFS mirror; the
drives in 'kaos' are 1TB WD black drives in a raidz1 (3 drives).

First the read test:

kaos % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
	12.94 real         0.60 user        11.95 sys

orion % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
	118.02 real         0.46 user         8.74 sys

I guess no real surprise here.  'kaos' has more spindles to read from,
on top of faster seek times.

Next the write test:

The 'compressed' and 'dedup' datasets referenced below are 'lzjb' and
'sha256,verify', respectively.  I'd wait for the 'compressed+dedup'
tests to finish, but I have to wake up tomorrow morning.

orion# sh -c 'time portsnap extract -p /zstore/perftest >/dev/null'
	306.71 real        44.37 user       110.28 sys

orion# sh -c 'time portsnap extract -p /zstore/perftest_compress >/dev/null'
	166.62 real        43.87 user       109.49 sys

orion# sh -c 'time portsnap extract -p /zstore/perftest_dedup >/dev/null'
	3576.43 real        44.98 user       109.12 sys

kaos# sh -c 'time portsnap extract -p /perftest >/dev/null'
	311.31 real        51.23 user       193.37 sys

kaos# sh -c 'time portsnap extract -p /perftest_compress >/dev/null'
	269.85 real        49.55 user       191.56 sys

kaos# sh -c 'time portsnap extract -p /perftest_dedup >/dev/null'
	4655.73 real        51.86 user       196.22 sys

Like I said, I have not yet had the time to retest this on drives
without the gnop(8) fix (another similar zpool with 2 drives), so maybe
the data I'm providing isn't relevant, but since the gnop(8) fix for 4K
sector drives was mentioned, I thought it might be relevant to a point.

> Now, that's for ZFS, but I'm under the impression the exact same is
> needed for FFS/UFS.
> 
> <rant> Do I bother doing this with my SSDs?  No.  Am I suffering in
> performance?  Probably.  Why do I not care?  Because the level of
> annoyance is extremely high -- remember, all of this has to be done from
> within the installer environment (referring to "Emergency Shell"), which
> on FreeBSD lacks an incredible amount of usability, and is even worse to
> deal with when doing a remote install via PXE/serial.  Fixit is the only
> decent environment.  Given that floppies are more or less gone, I don't
> understand why the Fixit environment doesn't replace the "Emergency
> Shell". </rant>
> 

Not that it necessarily helps in a PXE environment, but a memstick of
9-CURRENT has helped me recover minor "oops" situations a few times over
the past few months.  What are these "floppies" you speak of, again?  :)

Regards,

-- 
Glen Barber


More information about the freebsd-stable mailing list