SSD errors

Karl Denninger karl at denninger.net
Sun Apr 16 14:31:56 UTC 2017


On 4/16/2017 03:49, Frank Leonhardt wrote:
> On 13/04/2017 21:59, heasley wrote:
>> <snip>
>> When I push a lot of data to them, such as an rsync, I receive errors
>> like
>> the below.  If I move drives between slots, it seems to follow the
>> chassis
>> slots, those closest to the power supply, but I'm not positive about
>> this.
>>
>> I suppose the questions for list are:
>> - have I missed any fbsd ssd-specific configuration?
>>
>> - all 4 have non-zero UDMA_CRC_Error_Count counters; not many, about the
>>    same number, which I believe implies electrical interference - most
>>    likely in the cable or chassis backplane.  Should I buy some specific
>>    model cable?  other recommendations?
> <snip>
>
> I'm not aware of any SSD-specific stuff you've missed. The SSD option
> on the initialisation code in the BIOS is probably just there because
> there's no need to wait for spin-up time (as you probably thought too).
>
> So I don't have an answer, but here are a few thoughts:
>
> I think it's the CRC error (out of that lot) that you should be
> worried about. It means that the drive wrote data, but when it read it
> back it didn't match. With ST506 this could (and often was) a cable
> fault but not with IDE. This doesn't mean dodgy cables can't cause you
> problems with IDE; only that they'd manifest differently. If the drive
> wrote the data to the flash with a CRC and then the CRC didn't match
> later, it doesn't make any difference if the data was corrupted on
> it's way to the drive, or even if it was corrupted on its way back
> (ZFS would pick that up). So it must have been corrupted on-drive.
> Right? (I could be wrong about where your CRC errors are being
> tested/detected, so not necessarily right).
>
> So with this in mind, why should the drive's location on the shelf
> matter (if it does make a difference). I can think of two reasons -
> electromagnetic interference from adjacent circuits or PSU problems.
>
> So if it were me, I'd check the interference theory by using longer
> cables and spreading the drives out. Serial transfer on long cables
> isn't really a problem like it was with parallel. That's the easy check.
>
> Then it's on to PSU issues. Does an SSD use more or less power than
> spinning rust? Really? Most people assume they'll use less but it's
> not as much less as you think, and it varies in different ways. If the
> PSU can't cope with the peak (e.g. while it's writing).
>
> IT people will know all about watts. Add up the number of watts on all
> your drives and if it's <= the number of watts written on your PSU,
> cushty.
>
> Wrong! An engineer will tell you you can't add watts together and get
> anything meaningful. And believing the label on a PSU is a mug's game.
> So, if you've got a decent oscilloscope take a look at the supply
> rails where they enter the drives. Try writing, and if you get so much
> as a blip on the voltage then do something about it.
>
> If you haven't got a 'scope to hand, I'd try running (some) the drives
> of a different PSU and see that makes a difference.
>
> Although I haven't hit this problem myself, I'd be surprised if the
> same PSU design intended to power spinning rust at a relatively
> constant current could cope well with an SSD going from nothing much
> to lots to nothing much again over a very short space of time. If I
> was connecting a different PSU to the SSD I'd load it with some real
> drives just to stabilise the current output a bit (i.e. plug an old
> drive or two on to some of the other spare outlets).
>
> Then there's always the chance it's over-cooking, but I think you'd
> have mentioned if they were getting very hot.
>
> Regards, Frank.
>
> _______________________________________________
> freebsd-hardware at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
> To unsubscribe, send any mail to
> "freebsd-hardware-unsubscribe at freebsd.org"

Flaky power has been the cause of more intermittent and very odd
problems, especially under load, than you can count.  I always get
suspicious of power issues when the system seems fine right up until you
place it under heavy load, then bad things happen -- and I'm usually right.

I second Frank's suggestion.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2993 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hardware/attachments/20170416/199f82d4/attachment.bin>


More information about the freebsd-hardware mailing list