ZFS: How to enable cache and logs.

Daniel Kalchev daniel at digsys.bg
Thu May 12 10:16:56 UTC 2011



On 12.05.11 11:34, Jeremy Chadwick wrote:
>
> I guess this is why others have mentioned the importance of BBUs and
> supercaps, but I don't know what guarantee there is that during a power
> failure there won't be some degree of filesystem corruption or lost
> data.
You can think of the SLOG as the BBU of ZFS.

The best SLOG of course is battery backed RAM. Just what the BBUs are. 
Any battery backed RAM device used for SLOG will beat (by a large 
margin) any however expensive SSD.

Fears of corruption is, besides performance, what makes people use SLC 
flash for SLOG devices. The MLC flash is much more prone to errors, than 
SLC flash. This includes situations like power loss.
This is also the reason people talk so much about super-capacitors.

>
>> How can ever TRIM support influence reading from the drive?!
> I guess you want more proof, so here you go.
Of course :)
> I imagine the reason this happens is similar to why memory performance
> degrades under fragmentation or when there's a lot of "middle-man stuff"
> going on.
TRIM does not change fragmentation.
All TRIM does is erase the flash cells in background, so that when the 
new write request arrives, data can just be written, instead of 
erased-written. The erase operation is slow in flash memory.
Think of TRIM as OS-assisted garbage collection. It is nothing else -- 
no matter what advertising says :)

Also, please note that there is no "fragmentation" in either SLOG or 
L2ARC to be concerned with. There are no "files" there - just raw blocks 
that can sit anywhere.

>> TRIM is an slow operation. How often are these issued?
> Good questions, for which I have no answer.  The same could be asked of
> any OS however, not just Windows.  And I've asked the same question
> about SSDs internal "garbage collection" too.  I have no answers, so you
> and I are both wondering the same question.  And yes, I am aware TRIM is
> a costly operation.
Well, at least we know some commodity SSDs on the market have "lazy" 
garbage collection, some do it right away. The "lazy"  drives give good 
performance initially

Jeremy, thanks for the detailed data.

So much about theory :)

Just a quick "(slow) HDD as SLOG" test, not very scientific :)

Hardware: Supermicro X8DTH-6F (integrated LSI2008)
2xE5620 Xeons
24 GB RAM
6x Hitachi HDS72303 drives

All disks are labeled with GPT, first partition on 1GB.

First, create ashift=12 raidz2 zpool with all drives
# gnop create -S 4096 gpt/disk00
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04 gpt/disk05

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   126  99 293971  93 177423  52   357  99 502710  86 
234.2   8
Latency             68881us    2817ms    5388ms   37301us    1266ms     
471ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 25801  90 +++++ +++ 23915  94 25869  98 +++++ +++ 
24858  97
Latency             12098us     117us     141us   24121us      29us      
66us
1.96,1.96,a1.register.bg,1,1305158675,48G,,126,99,293971,93,177423,52,357,99,502710,86,234.
2,8,16,,,,,25801,90,+++++,+++,23915,94,25869,98,+++++,+++,24858,97,68881us,2817ms,5388ms,37
301us,1266ms,471ms,12098us,117us,141us,24121us,29us,66us

Recreate the pool with 5 drives + one drive as SLOG

# zpool destroy storage
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04 log gpt/disk05

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   110  99 306932  68 223853  46   354  99 664034  65 
501.8  11
Latency               172ms   11571ms    4217ms   50414us    1895ms     
245ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 24673  97 +++++ +++ 24262  98 19108  97 +++++ +++ 
23821  97
Latency             12051us     132us     143us   23392us      47us      
79us
1.96,1.96,a1.register.bg,1,1305171999,48G,,110,99,306932,68,223853,46,354,99,664034,65,501.8,11,16,,,,,24673,97,+++++,+++,24262,98,19108,97,+++++,+++,23821,97,172ms,11571ms,4217ms,50414us,1895ms,245ms,12051us,132us,143us,23392us,47us,79us


Interesting to note that

zpool iostat -v 1
never showed more than 128K of usage on the SLOG drive, although from 
time to time it was hitting over 1200 IOPS and over 150 MB/s write.

Also, the second pool is with one disk less. For comparison, here is the 
same pool with 5 disks and no SLOG

# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   118  99 287361  92 152566  40   345  98 398392  51 
242.4  24
Latency             56962us    2619ms    4308ms   57304us    1214ms     
350ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 27438  95 +++++ +++ 19374  90 25259  97 +++++ +++  
6876  99
Latency              8913us     200us     295us   27249us      30us     
238us
1.96,1.96,a1.register.bg,1,1305165435,48G,,118,99,287361,92,152566,40,345,98,398392,51,242.
4,24,16,,,,,27438,95,+++++,+++,19374,90,25259,97,+++++,+++,6876,99,56962us,2619ms,4308ms,57
304us,1214ms,350ms,8913us,200us,295us,27249us,30us,238us



One side effect I forgot to mention from using a SLOG is less 
fragmentation in the pool. When the ZIL is in the main pool, it is 
frequently written and erased and the ZIL is variable size, leaving 
undesired gaps.

Hope this helps.

Daniel


More information about the freebsd-fs mailing list