[ATA] and re(4) stability issues
Victor Balada Diaz
victor at bsdes.net
Thu Dec 11 01:50:23 PST 2008
On Thu, Dec 11, 2008 at 06:00:56PM +0900, Pyun YongHyeon wrote:
> On Thu, Dec 11, 2008 at 09:10:45AM +0100, Victor Balada Diaz wrote:
> > On Thu, Dec 11, 2008 at 08:57:07AM +0100, Victor Balada Diaz wrote:
> > > On Wed, Dec 10, 2008 at 09:07:19PM +0900, Pyun YongHyeon wrote:
> > > > On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote:
> > > > > Also i didn't see any problem with interfaces going up and down,
> > > > > but that usually happen after some hours of uptime, so i'll let
> > > > > you know if the error happens again.
> > > > >
> > >
> > > After writing to the HD with dd for a few hours and using
> > > stress -i 10 -d 10 the machine lost connectivity. I waited until
> > > today to be sure if the machine hung, paniced or just lost network
> > > connectivity. I don't have local access or serial access, so this
> > > is the only way i could do it. I've seen in the logs during the
> > > night various messages of:
> > >
> > >
> > > Dec 10 00:33:49 yac kernel: re0: watchdog timeout
> > > Dec 10 00:33:49 yac kernel: re0: link state changed to DOWN
> > > Dec 10 00:33:52 yac kernel: re0: link state changed to UP
> > >
> > > The interface never recovered and i wasn't able to ping the machine
> > > until i rebooted. Nagios was checking all the time and no recovery
> > > happened.
> > >
> > > The netstat -i in daily scripts shows just one Oerrs. I'm used to
> > > have a lot of them, but seems this time the card didn't recover from
> > > the only one. I also want to say that this is not a regression, as
> > > it happened before with 7.1 -BETA 2 code.
> > >
> > > Is there anything more i can try?
> > Sorry it's too early in the morning and i thought today was 10
> > instead of 11. I don't even know the day i'm today.
> > Looking at today's log i see no link state changed messages
> > but i see this other messages that started happening more or
> > less at the same time i lost connectivity to the server:
> > Dec 10 18:20:32 yac kernel: re0: link state changed to DOWN
> > Dec 10 18:20:32 yac kernel: re0: PHY read failed
> I've reverted r185756 which caused GMII access issues on some
> controllers. If you are brave enough to try beta code, you can
> get latest re(4) in the following URL. Note, I don't have PCIe
> based RealTek controllers so the code was not tested at all.
I've recompiled the kernel with the first file in sys/dev/re/
and the second one in sys/pci/. I'm still testing with MSI enabled.
So far tried rebooting using nextboot(8) (just in case i lost the
network card i could boot again) and the card seems to work
but i'll continue stress testing the machine with stress + dd +
iperf and see if i can take it down. I'll let you know how it goes.
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros.
More information about the freebsd-amd64