em driver, 82574L chip, and possibly ASPM

Mike Tancsa mike at sentex.net
Tue Nov 23 14:45:05 UTC 2010


On 11/23/2010 8:16 AM, Ivan Voras wrote:
> On 11/23/10 14:03, Mike Tancsa wrote:
>> On 11/23/2010 7:47 AM, Ivan Voras wrote:
>>> It looks like I'm unfortunate enough to have to deploy on a machine
>>> which has the 82574L Intel NIC chip on a Supermicro X8SIE-F board, which
>>> apparently has hardware issues, according to this thread:
>>>
>>> http://sourceforge.net/tracker/index.php?func=detail&aid=2908463&group_id=42302&atid=447449
>>>
>>>
>>>
>>
>> Interesting, this is the same nic that has been giving me grief! Mine is
>> on an Intel server board (S3420GPX). The symptoms are VERY similar to
>> what the LINUX user sees as well with RX errors and the traffic patterns.
> 
> I've posted detailed info on this NIC in the thread "em card wedging" -
> can you compare it with yours?
> 
> The whole thing looks very sensitive to BIOS settings. I've just toggled
> something that looked unrelated (don't remember what, I've been toggling
> BIOS settings all day) and the machine has been doing a flood-ping for
> 20 minutes without wedging (which doesn't mean it won't wedge as soon as
> I send this message, it did such things before).


I posted whats in the BIOS at

http://www.tancsa.com/82574.html

Unfortunately, if I disable the BIOS option highlighted I can no longer
netboot the box :(  For my production box having the issues, this is not
a problem.  But it makes it difficult for testing on my lab box.  I am
not sure if that even really disables IPMI ?  Also on this box whats
NIC1 and NIC2 is the opposite of what FreeBSD sees as em0 and em1.

So far I have tried

Driver from HEAD -- This seems to help a bit in that wedges are less
disable MSIX - no difference, still hangs

It seems the nic will get one error and never recover. There will just
be a steady stream of them.  On the other onboard nic (a different type
of em), the card will see the odd "no_buff" error, but it recovers like
all the other em nics. Where as this problem nic, gets errors and they
just keep on going up and up. Using the driver from HEAD, I can do an
ifconfig em1 down;sleep 1;ifconfig em1 up and that fixes the problem

dev.em.1.mac_stats.missed_packets: 1292
dev.em.1.mac_stats.recv_no_buff: 31

where as previous versions of the driver would panic the box doing that.

Looking at the driver from HEAD, there does seem to be some mention of
ASPM. Is this what the LINUX driver is doing too ?



       /* PCI-Ex Control Registers */
        switch (hw->mac.type) {
        case e1000_82574:
        case e1000_82583:
                reg = E1000_READ_REG(hw, E1000_GCR);
                reg |= (1 << 22);
                E1000_WRITE_REG(hw, E1000_GCR, reg);

                /*
                 * Workaround for hardware errata.
                 * apply workaround for hardware errata documented in errata
                 * docs Fixes issue where some error prone or unreliable
PCIe
                 * completions are occurring, particularly with ASPM
enabled.
                 * Without fix, issue can cause tx timeouts.
                 */
                reg = E1000_READ_REG(hw, E1000_GCR2);
                reg |= 1;
                E1000_WRITE_REG(hw, E1000_GCR2, reg);
                break;
        default:
                break;
        }

        return;




	---Mike


More information about the freebsd-net mailing list