SSD errors
Frank Leonhardt
freebsd-doc at fjl.co.uk
Sun Apr 16 09:03:25 UTC 2017
On 13/04/2017 21:59, heasley wrote:
> <snip>
> When I push a lot of data to them, such as an rsync, I receive errors like
> the below. If I move drives between slots, it seems to follow the chassis
> slots, those closest to the power supply, but I'm not positive about this.
>
> I suppose the questions for list are:
> - have I missed any fbsd ssd-specific configuration?
>
> - all 4 have non-zero UDMA_CRC_Error_Count counters; not many, about the
> same number, which I believe implies electrical interference - most
> likely in the cable or chassis backplane. Should I buy some specific
> model cable? other recommendations?
<snip>
I'm not aware of any SSD-specific stuff you've missed. The SSD option on
the initialisation code in the BIOS is probably just there because
there's no need to wait for spin-up time (as you probably thought too).
So I don't have an answer, but here are a few thoughts:
I think it's the CRC error (out of that lot) that you should be worried
about. It means that the drive wrote data, but when it read it back it
didn't match. With ST506 this could (and often was) a cable fault but
not with IDE. This doesn't mean dodgy cables can't cause you problems
with IDE; only that they'd manifest differently. If the drive wrote the
data to the flash with a CRC and then the CRC didn't match later, it
doesn't make any difference if the data was corrupted on it's way to the
drive, or even if it was corrupted on its way back (ZFS would pick that
up). So it must have been corrupted on-drive. Right? (I could be wrong
about where your CRC errors are being tested/detected, so not
necessarily right).
So with this in mind, why should the drive's location on the shelf
matter (if it does make a difference). I can think of two reasons -
electromagnetic interference from adjacent circuits or PSU problems.
So if it were me, I'd check the interference theory by using longer
cables and spreading the drives out. Serial transfer on long cables
isn't really a problem like it was with parallel. That's the easy check.
Then it's on to PSU issues. Does an SSD use more or less power than
spinning rust? Really? Most people assume they'll use less but it's not
as much less as you think, and it varies in different ways. If the PSU
can't cope with the peak (e.g. while it's writing).
IT people will know all about watts. Add up the number of watts on all
your drives and if it's <= the number of watts written on your PSU, cushty.
Wrong! An engineer will tell you you can't add watts together and get
anything meaningful. And believing the label on a PSU is a mug's game.
So, if you've got a decent oscilloscope take a look at the supply rails
where they enter the drives. Try writing, and if you get so much as a
blip on the voltage then do something about it.
If you haven't got a 'scope to hand, I'd try running (some) the drives
of a different PSU and see that makes a difference.
Although I haven't hit this problem myself, I'd be surprised if the same
PSU design intended to power spinning rust at a relatively constant
current could cope well with an SSD going from nothing much to lots to
nothing much again over a very short space of time. If I was connecting
a different PSU to the SSD I'd load it with some real drives just to
stabilise the current output a bit (i.e. plug an old drive or two on to
some of the other spare outlets).
Then there's always the chance it's over-cooking, but I think you'd have
mentioned if they were getting very hot.
Regards, Frank.
More information about the freebsd-hardware
mailing list