ZFS: How to enable cache and logs.
Daniel Kalchev
daniel at digsys.bg
Thu May 12 10:16:56 UTC 2011
On 12.05.11 11:34, Jeremy Chadwick wrote:
>
> I guess this is why others have mentioned the importance of BBUs and
> supercaps, but I don't know what guarantee there is that during a power
> failure there won't be some degree of filesystem corruption or lost
> data.
You can think of the SLOG as the BBU of ZFS.
The best SLOG of course is battery backed RAM. Just what the BBUs are.
Any battery backed RAM device used for SLOG will beat (by a large
margin) any however expensive SSD.
Fears of corruption is, besides performance, what makes people use SLC
flash for SLOG devices. The MLC flash is much more prone to errors, than
SLC flash. This includes situations like power loss.
This is also the reason people talk so much about super-capacitors.
>
>> How can ever TRIM support influence reading from the drive?!
> I guess you want more proof, so here you go.
Of course :)
> I imagine the reason this happens is similar to why memory performance
> degrades under fragmentation or when there's a lot of "middle-man stuff"
> going on.
TRIM does not change fragmentation.
All TRIM does is erase the flash cells in background, so that when the
new write request arrives, data can just be written, instead of
erased-written. The erase operation is slow in flash memory.
Think of TRIM as OS-assisted garbage collection. It is nothing else --
no matter what advertising says :)
Also, please note that there is no "fragmentation" in either SLOG or
L2ARC to be concerned with. There are no "files" there - just raw blocks
that can sit anywhere.
>> TRIM is an slow operation. How often are these issued?
> Good questions, for which I have no answer. The same could be asked of
> any OS however, not just Windows. And I've asked the same question
> about SSDs internal "garbage collection" too. I have no answers, so you
> and I are both wondering the same question. And yes, I am aware TRIM is
> a costly operation.
Well, at least we know some commodity SSDs on the market have "lazy"
garbage collection, some do it right away. The "lazy" drives give good
performance initially
Jeremy, thanks for the detailed data.
So much about theory :)
Just a quick "(slow) HDD as SLOG" test, not very scientific :)
Hardware: Supermicro X8DTH-6F (integrated LSI2008)
2xE5620 Xeons
24 GB RAM
6x Hitachi HDS72303 drives
All disks are labeled with GPT, first partition on 1GB.
First, create ashift=12 raidz2 zpool with all drives
# gnop create -S 4096 gpt/disk00
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03
gpt/disk04 gpt/disk05
$ bonnie++
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
a1.register.bg 48G 126 99 293971 93 177423 52 357 99 502710 86
234.2 8
Latency 68881us 2817ms 5388ms 37301us 1266ms
471ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 25801 90 +++++ +++ 23915 94 25869 98 +++++ +++
24858 97
Latency 12098us 117us 141us 24121us 29us
66us
1.96,1.96,a1.register.bg,1,1305158675,48G,,126,99,293971,93,177423,52,357,99,502710,86,234.
2,8,16,,,,,25801,90,+++++,+++,23915,94,25869,98,+++++,+++,24858,97,68881us,2817ms,5388ms,37
301us,1266ms,471ms,12098us,117us,141us,24121us,29us,66us
Recreate the pool with 5 drives + one drive as SLOG
# zpool destroy storage
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03
gpt/disk04 log gpt/disk05
$ bonnie++
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
a1.register.bg 48G 110 99 306932 68 223853 46 354 99 664034 65
501.8 11
Latency 172ms 11571ms 4217ms 50414us 1895ms
245ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 24673 97 +++++ +++ 24262 98 19108 97 +++++ +++
23821 97
Latency 12051us 132us 143us 23392us 47us
79us
1.96,1.96,a1.register.bg,1,1305171999,48G,,110,99,306932,68,223853,46,354,99,664034,65,501.8,11,16,,,,,24673,97,+++++,+++,24262,98,19108,97,+++++,+++,23821,97,172ms,11571ms,4217ms,50414us,1895ms,245ms,12051us,132us,143us,23392us,47us,79us
Interesting to note that
zpool iostat -v 1
never showed more than 128K of usage on the SLOG drive, although from
time to time it was hitting over 1200 IOPS and over 150 MB/s write.
Also, the second pool is with one disk less. For comparison, here is the
same pool with 5 disks and no SLOG
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03
gpt/disk04
$ bonnie++
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
a1.register.bg 48G 118 99 287361 92 152566 40 345 98 398392 51
242.4 24
Latency 56962us 2619ms 4308ms 57304us 1214ms
350ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 27438 95 +++++ +++ 19374 90 25259 97 +++++ +++
6876 99
Latency 8913us 200us 295us 27249us 30us
238us
1.96,1.96,a1.register.bg,1,1305165435,48G,,118,99,287361,92,152566,40,345,98,398392,51,242.
4,24,16,,,,,27438,95,+++++,+++,19374,90,25259,97,+++++,+++,6876,99,56962us,2619ms,4308ms,57
304us,1214ms,350ms,8913us,200us,295us,27249us,30us,238us
One side effect I forgot to mention from using a SLOG is less
fragmentation in the pool. When the ZIL is in the main pool, it is
frequently written and erased and the ZIL is variable size, leaving
undesired gaps.
Hope this helps.
Daniel
More information about the freebsd-fs
mailing list