Questions about erasing an ssd to restore performance under FreeBSD

Steven Hartland killing at multiplay.co.uk
Thu Jul 28 13:22:19 UTC 2011


----- Original Message ----- 
From: "Jeremy Chadwick" <freebsd at jdc.parodius.com>
> Well, on FreeBSD /dev/urandom is a symlink to /dev/random.  I've
> discussed in the past why I use /dev/urandom instead of /dev/random (I
> happen to work in a heterogeneous OS environment at work, where urandom
> and random are different things).
> 
> I was mainly curious why you were using if=/some/actual/file rather than
> if=/dev/urandom directly.  'tis okay, not of much importance.

/dev/urandom seems to bottle neck at ~60MB/s a cached file generated from
it doesn't e.g.
dd if=/dev/random of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 16.152686 secs (64916509 bytes/sec)

dd if=/dev/random of=/data/test bs=1m count=1000               
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 16.178811 secs (64811685 bytes/sec)

dd if=/data/test of=/dev/null bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 0.240348 secs (4362738865 bytes/sec)

> Okay, so it sounds like what happened -- if I understand correctly -- is
> that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of
> data copied to it.  It still had 60% free space available.  After, the
> SSD performance for writes really plummeted (~20MByte/sec), but reads
> were still decent.  Performing an actual ATA-level secure erase brought
> the drive back to normal write performance (~190MByte/sec).

Yes this is correct.

> If all of that is correct, then I would say the issue is that the
> internal GC on the Corsair SSD in question sucks.  With 60% of the drive
> still available, performance should not have dropped to such an abysmal
> rate; the FTL and wear levelling should have, ideally, dealt with this
> just fine.  But it didn't.

Agreed

> Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever,
> that's an engineering discussion for elsewhere) lacks TRIM.  The
> underlying filesystem is therefore unable to tell the drive "hey, these
> LBAs aren't used any more, you can consider them free and perform a NAND
> page erase when an entire NAND page is unused".  The FTL has to track
> all LBAs you've written to, otherwise if erasing a NAND page which still
> had used data in it (for the filesystem) it would result in loss of
> data.
> 
> So in summary I'm not too surprised by this situation happening, but I
> *AM* surprised at just how horrible writes became for you.  The white
> paper I linked you goes over this to some degree -- it talks about how
> everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks
> or talks about how horrible they perform when very little free space is
> available, or if the GC is badly implemented.  Maybe Corsair's GC is
> badly implemented -- I don't know.

Agreed again, we've seen a few disks now drop to this level of performance
at first we thought the disk was failing, as the newfs -E didn't fix it when
the man page indicates it should. But seems thats explained now, only
works if its ada not da, and also not quite as good as a secure erase.

> I would see if there are any F/W updates for that model of drive.  The
> firmware controls the GC model/method.  Otherwise, if this issue is
> reproducible, I'll add this model of Corsair SSD to my list of drives to
> avoid.

Its the latest firmware version, already checked that. The performance
has been good till now and I suspect it could be a generic sandforce
thing if its a firmware issue.

> Is it possible to accomplish Secure Erase via "camcontrol cmd" with
> ada(4)?  Yes, but the procedure will be extremely painful, drawn out,
> and very error-prone.
> 
> Given that you've followed the procedure on the Linux hdparm/ATA Secure
> Erase web page, you're aware of the security and "locked" status one has
> to deal with using password-protection to accomplish the erase.  hdparm
> makes this easy because it's just a bunch of command-line flags; the
> ""heavy lifting"" on the ATA layer is done elsewhere.  With "camcontrol
> cmd", you get to submit the raw ATA CDB yourself, multiple times, at
> different phases.  Just how familiar with the ATA protocol are you?  :-)
> 
> Why I sound paranoid: a typo could potentially "brick" your drive.  If
> you issue a set-password on the drive, ***ALL*** LBA accesses (read and
> write) return I/O errors from that point forward.  Make a typo in the
> password, formulate the CDB wrong, whatever -- suddenly you have a drive
> that you can't access or use any more because the password was wrong,
> etc...  If the user doesn't truly understand what they're doing
> (including the formulation of the CDB), then they're going to panic.
> 
> camcontrol and atacontrol could both be modified to do the heavy
> lifting, making similar options/arguments that would mimic hdparm in
> operation.  This would greatly diminish the risks, but the *EXACT
> PROCEDURE* would need to be explained in the man page.  But keep reading
> for why that may not be enough.
> 
> I've been in the situation where I've gone through the procedure you
> followed on said web page, only to run into a quirk with the ATA/IDE
> subsystem on Windows XP, requiring a power-cycle of the system.  The
> secure erase finished, but I was panicking when I saw the drive spitting
> out I/O errors on every LBA.  I realised that I needed to unlock the
> drive using --security-unlock then disable security by using
> --security-disable.  Once I did that it was fine.  The web page omits
> that part, in the case of emergency or anomalies are witnessed.  This
> ordeal happened to me today, no joke, while tinkering with my new Intel
> 510 SSD.  So here's a better page:
> 
> http://tinyapps.org/docs/wipe_drives_hdparm.html
> 
> Why am I pointing this out?  Because, in effect, an entire "HOW TO DO
> THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be
> added to camcontrol/atacontrol to ensure people don't end up with
> "bricked" drives and blame FreeBSD.  Trust me, it will happen.  Give
> users tools to shoot themselves in the foot and they will do so.
> 
> Furthermore, SCSI drives (which is what camcontrol has historically been
> for up until recently) have a completely different secure erase CDB
> command for them.  ATA has SECURITY ERASE UNIT, SCSI has SECURITY
> INITIALIZE -- and in the SCSI realm, this feature is optional!  So
> there's that error-prone issue as well.  Do you know how many times I've
> issued "camcontrol inquiry" instead of "camcontrol identify" on my
> ada(4)-based systems?  Too many.  Food for thought.  :-)
> 
> Anyway, this is probably the only time you will ever find me saying
> this, but: if improving camcontrol/atacontrol to accomplish the above is
> what you want, patches are welcome.  I could try to spend some time on
> this if there is great interest in the community for such (I'm more
> familiar with atacontrol's code given my SMART work in the past), and I
> do have an unused Intel 320-series SSD which I can test with.

This is of definite of interest here and I suspect to the rest of the
community as well. I'm not at all familiar with ATA codes etc so I
expect it would take me ages to come up with this.

In our case SSD's are a must as HD's don't have the IOPs to deal with
our application, we'll just need to manage the write speed drop offs.

Performing offline maintenance to have them run at good speed is
not ideal but much easier and more acceptable than booting another OS,
which would a total PITA as some machines don't have IPMI with virtual
media so means remote hands etc.

Using a Backup -> Erase -> Restore direct from BSD would hence be my
preferred workaround until TRIM support is added, but I guess that could
well be some time for ZFS.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-fs mailing list