Status of support for 4KB disk sectors
Glen Barber
glen.j.barber at gmail.com
Tue Jul 19 03:59:45 UTC 2011
On 7/18/11 7:41 PM, Jeremy Chadwick wrote:
> On Mon, Jul 18, 2011 at 03:50:15PM -0700, Kevin Oberman wrote:
>> I just want to check on the status of 4K sector support in FreeBSD. I read
>> a long thread on the topic from a while back and it looks like I might hit some
>> issues if I'm not REALLY careful. Since I will be keeping the existing Windows
>> installation, I need to be sure that I can set up the disk correctly without
>> screwing up Windows 7.
>>
>> I was planning on just DDing the W7 slice over, but I am not sure how well this
>> would play with GPT. Or should I not try to use GPT at all? I'd like
>> to as this laptop
>> spreads Windows 7 over two slices and adds a third for the recovery
>> system, leaving
>> only one for FreeBSD and I'd like to put my files in a separate slice.
>> GPT would offer
>> that fifth slice.
>>
>> I have read the handbook and don't see any reference to 4K sectors and only a
>> one-liner about gpart(8) and GPT. Oncew I get this all figured out,
>> I'll see about writing
>> an update about this as GPT looks like the way to go in e future.
>
> When you say "4KB sector support", what do you mean by this? All
> drives on the market as of this writing, that I've seen, all claim a
> physical/logical sector size of 512 bytes -- yes, even SSDs, and EARS
> drives which we know use 4KB sectors. They do this to guarantee full
> compatibility with existing software.
>
> Since you're talking about gpart and "4KB sector support", did you mean
> to ask "what's the state of FreeBSD and aligned partition support to
> ensure decent performance with 4KB-sector drives?"
>
> If so: there have been some commits in recent days to RELENG_8 to help
> try to address the shortcomings of the existing utilities and GEOM
> infrastructure. Read the most recent commit text carefully:
>
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/geom/class/part/geom_part.c
>
> But the currently "known method" is to use gnop(8). Here's an example:
>
> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/
>
Notice: I'm reading this as "how badly do 'green drives' suck?"
FWIW, I've recently done the gnop(8) trick to two "green" drives in one
of my machines because I was seeing horrifying performance problems with
what I consider to be basic stuff, like 'portsnap extract', but more
severely with copying large data (file-backed bacula files to be exact)
into said datasets. I have yet to retry my read/write tests with drives
I have not converted with gnop(8).
I have not conclusively tested all possible combinations of
configurations, nor reverted the changes to the drives to retest, but if
it is of any interest, here's what I'm seeing.
I have comparisons between WD "green" and "black" drives.
Unfortunately, the machines are not completely similar - one is a
Core2Quad, the other Core2Duo; one has 6GB RAM, the other 8GB RAM; also,
'orion' is running a month-old 8-STABLE; 'kaos' is running a 2-week-old
-CURRENT. Both machines are using ZFSv28:
orion % sysctl -n hw.ncpu; sysctl -n hw.physmem
4
6353416192
kaos % sysctl -n hw.ncpu; sysctl -n hw.physmem
2
8534401024
The drives in 'orion' are 1TB WD green drives in a ZFS mirror; the
drives in 'kaos' are 1TB WD black drives in a raidz1 (3 drives).
First the read test:
kaos % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
12.94 real 0.60 user 11.95 sys
orion % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
118.02 real 0.46 user 8.74 sys
I guess no real surprise here. 'kaos' has more spindles to read from,
on top of faster seek times.
Next the write test:
The 'compressed' and 'dedup' datasets referenced below are 'lzjb' and
'sha256,verify', respectively. I'd wait for the 'compressed+dedup'
tests to finish, but I have to wake up tomorrow morning.
orion# sh -c 'time portsnap extract -p /zstore/perftest >/dev/null'
306.71 real 44.37 user 110.28 sys
orion# sh -c 'time portsnap extract -p /zstore/perftest_compress >/dev/null'
166.62 real 43.87 user 109.49 sys
orion# sh -c 'time portsnap extract -p /zstore/perftest_dedup >/dev/null'
3576.43 real 44.98 user 109.12 sys
kaos# sh -c 'time portsnap extract -p /perftest >/dev/null'
311.31 real 51.23 user 193.37 sys
kaos# sh -c 'time portsnap extract -p /perftest_compress >/dev/null'
269.85 real 49.55 user 191.56 sys
kaos# sh -c 'time portsnap extract -p /perftest_dedup >/dev/null'
4655.73 real 51.86 user 196.22 sys
Like I said, I have not yet had the time to retest this on drives
without the gnop(8) fix (another similar zpool with 2 drives), so maybe
the data I'm providing isn't relevant, but since the gnop(8) fix for 4K
sector drives was mentioned, I thought it might be relevant to a point.
> Now, that's for ZFS, but I'm under the impression the exact same is
> needed for FFS/UFS.
>
> <rant> Do I bother doing this with my SSDs? No. Am I suffering in
> performance? Probably. Why do I not care? Because the level of
> annoyance is extremely high -- remember, all of this has to be done from
> within the installer environment (referring to "Emergency Shell"), which
> on FreeBSD lacks an incredible amount of usability, and is even worse to
> deal with when doing a remote install via PXE/serial. Fixit is the only
> decent environment. Given that floppies are more or less gone, I don't
> understand why the Fixit environment doesn't replace the "Emergency
> Shell". </rant>
>
Not that it necessarily helps in a PXE environment, but a memstick of
9-CURRENT has helped me recover minor "oops" situations a few times over
the past few months. What are these "floppies" you speak of, again? :)
Regards,
--
Glen Barber
More information about the freebsd-stable
mailing list