ZFS: How to enable cache and logs.

Jason Hellenthal jhell at DataIX.net
Thu May 12 01:48:56 UTC 2011


Jeremy, As always the qaulity of your messages are 101% spot on and I 
always find some new new information that becomes handy more often than I 
could say, and there is always something to be learned. 

Thanks.

On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > 
> > Jeremy,
> > 
> > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > >should also keep that in mind when putting an SSD into use in this
> > > > >fashion.
> > > >
> > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > > would handle that write load without TRIM and without any performance
> > > > degradation.
> > > >
> > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > and the need for rewriting will be small. If you don't need to
> > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > twice or more the advertised size and always write to fresh cells,
> > > > scheduling an background erase of the 'overwritten' cell.
> > > 
> > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > space they keep available on an SSD.  I'd rather not speculate as to how
> > > much, as I'm certain it varies per vendor.
> > > 
> > 
> > Lets not forget here: The size of the separate log device may be quite 
> > small. A rule of thumb is that you should size the separate log to be able 
> > to handle 10 seconds of your expected synchronous write workload. It would 
> > be rare to need more than 100 MB in a separate log device, but the 
> > separate log must be at least 64 MB.
> > 
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > 
> > So in other words how much is TRIM really even effective give the above ?
> > 
> > Even with a high database write load on the disks at full compacity of the 
> > incoming link I would find it hard to believe that anyone could get the 
> > ZIL to even come close to 512MB.
> 
> In the case of an SSD being used as a log device (ZIL), I imagine it
> would only matter the longer the drive was kept in use.  I do not use
> log devices anywhere with ZFS, so I can't really comment.
> 
> In the case of an SSD being used as a cache device (L2ARC), I imagine it
> would matter much more.
> 
> In the case of an SSD being used as a pool device, it matters greatly.
> 
> Why it matters: there's two methods of "reclaiming" blocks which were
> used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> reclaimed, it has to be erased -- SSDs erase things in pages rather
> than individual LBAs.  With TRIM, you submit the data management command
> via ATA with a list of LBAs you wish to inform the drive are no longer
> used.  The drive aggregates the LBA ranges, determines if an entire
> flash page can be erased, and does it.  If it can't, it makes some sort
> of mental note that the individual LBA (in some particular page)
> shouldn't be used.
> 
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)
> 
> Anyway, how all this manifests itself performance-wise is intriguing.
> It's not speculation: there's hard evidence that not using TRIM results
> in SSD performance, bluntly put, sucking badly on some SSDs.
> 
> There's this mentality that wear levelling completely solves all of the
> **performance** concerns -- that isn't the case at all.  In fact, I'm
> under the impression it probably hurts performance, but it depends on
> how it's implemented within the drive firmware.
> 
> bit-tech did an experiment using Windows 7 -- which supports and uses
> TRIM assuming the device advertises the capability -- with different
> models of SSDs.  The testing procedure is documented here, but I'll
> document it as well:
> 
> http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
> 
> Again, remember, this is done on a Windows 7 system which does support
> TRIM if the device supports it.  The testing steps, in this order:
> 
> 1) SSD without TRIM support -- all LBAs are zeroed.
> 2) Took read/write benchmark readings.
> 3) SSD without TRIM support -- partitioned and formatted as NTFS
>    (cluster size unknown), copied 100GB of data to the drive, deleted all
>    the data, and repeated this method 10 times.
> 4) Step #2 repeated.
> 5) Upgraded SSD firmware to a version that supports TRIM.
> 6) SSD with TRIM support -- step #1 repeated.
> 7) Step #2 repeated.
> 8) SSD with TRIM support -- step #3 repeated.
> 9) Step #2 repeated.
> 
> Without TRIM, some drives drop their read performance by more than 50%,
> and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> by the way.  I do not care for OCZ or Corsair products.
> 
> So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> on FreeBSD will mimic (to some degree).
> 
> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.
> 
> Can you refute any of this evidence?
> 

At least now at the moment NO. But I can say depending on how large of a 
use of SSDs with OpenSolaris users from before the Oracle reaping that I 
didnt recall seeing any relative bug reports on degradation. But like I 
said... I havent seen them but thats not to say there wasnt a lack of use 
either. Definately more to look into, test, benchmark & test again.

> > Given most SSD's come at a size greater than 32GB I hope this comes as a 
> > early reminder that the ZIL you are buying that disk for is only going to 
> > be using a small percent of that disk and I hope you justify cost over its 
> > actual use. If you do happen to justify creating a ZIL for your pool then 
> > I hope that you partition it wisely to make use of the rest of the space 
> > that is untouched.
> > 
> > For all other cases I would reccomend if you still want to have a ZIL that 
> > you take some sort of PCI->SD CARD or USB stick into account with 
> > mirroring.
> 
> Others have pointed out this isn't effective (re: USB sticks).  The read
> and write speeds are too slow, and limit the overall performance of ZFS
> in a very bad way.  I can absolutely confirm this claim (I've tested it
> myself, using a high-end USB flash drive as a cache device (L2ARC)).
> 
> Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> *does* improve performance on older systems which have slower disk I/O
> (e.g. ICH5-based systems).
> 

Agreed. Soon as the bus speed, write speeds are greater than the speeds 
that USB 2.0 can handle, then any USB based solution is useless. ICH5 and 
up would be right about that time you would see this starting to happen.

sdcards/cfcards mileage may vary depending on the transfer rates. But 
still the same situation applies like you said once your main pool 
throughput outweighs the throughput on your ZIL then its probably not 
worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
ZIL.


Anyway all good information for those to make the judgement whether they 
need a cache or a zil.


Thanks again Jeremy. Always appreciated.

-- 

 Regards, (jhell)
 Jason Hellenthal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 522 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20110512/0eb228e3/attachment.pgp


More information about the freebsd-fs mailing list