L2 cache errors???
Willem Jan Withagen
wjw at digiware.nl
Tue Jul 28 19:45:21 UTC 2015
On 28/07/2015 21:04, Josh Paetzel wrote:
>
>
> On 07/28/2015 13:40, Willem Jan Withagen wrote:
>> On 28/07/2015 19:48, Mike Tancsa wrote:
>>> On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
>>>> Hi,
>>>>
>>>> Are these what I think they are?
>>>> Errors in the CPU L2 cache?
>>>>
>>>> Are the ECC corrected?
>>>> Or is error really "data kaput"?
>>>>
>>>
>>>
>>> Could be. There is also an erratum issue that triggers these errors on
>>> certain CPUs when running software like virtualbox. It was fixed in
>>> RELENG_10 some time ago. What are you running ?
>>>
>>>
>>> https://svnweb.freebsd.org/base?view=revision&revision=269052
>>>
>>> has some details.
>>
>> 'mmm,
>> Not running Haswell stuff, but rather older hardware.
>>
>> Looked in older logfiles, and there are a few more...
>> All with the same data, except that it is detected on different CPUs
>>
>> And it occurs when running:
>> mbuffer -4 -m 1000M -I 6666 | \
>> zfs receive -F -d -v zfs
>> to receive a full backup from my fileserver.
>>
>> --WjW
>>
>
> You can tell ECC corrected the error because on FreeBSD if ECC can't fix
> the error the system will panic. Other systems (Solaris and HP-UX being
> the two I have direct experience with) can detach subsystems that have
> sustained uncorrectable errors in some cases. (Yes, even CPUs!)
Offlining CPus, cool.
No the system does not panic, but I do get reports from 'zfs receive'
that the datastream is invalid. And it then aborts.
So I'll have to do more digging, to see what is up.
> If a system is generating hundreds or thousands of MCAs a minute you are
> dealing with a hardware issue.
>
> If you are getting spurious MCAs to the tune of a few a day there's
> nothing abnormal or broken there it's just the system doing what it's
> supposed to.
Never had them before, and now about 6 this week.
Let alone in L2 cache.
So it got me worried.
> Given the amount of data that flies around inside modern computers I'm
> surprised there aren't more MCAs than there are in most systems.
Perhaps not enough alpha particles hitting the cells. :)
Thanx,
--WjW
More information about the freebsd-hardware
mailing list