60 second stall resuming Thinkpad T23 on 8.x - regression

Ian Smith smithi at nimnet.asn.au
Sat Dec 4 19:13:39 UTC 2010


Hi,

With 6.x and 7.0-R through 7.2-STABLE of about 11 months ago, my T23 
always suspended and resumed reliably every time; I chose it for that.
I'm now running another one of these since then; both behave the same.

Since 8.0-RELEASE it stalls for 60 seconds on resuming from S3 suspend, 
every time, consistently.  I first reported this almost a year ago, and 
nothing substantial has changed through 8.1-STABLE, now 8.2-PRERELEASE 
#0: Fri Dec 3 21:43:58 EST 2010 i386 GENERIC.

Apart from some extra points noted below, I don't want to rewrite this:
http://lists.freebsd.org/pipermail/freebsd-acpi/2009-December/006192.html

Nate Lawson said it looked like ata timing out.  While I can't discount 
that, and gave up trying to sus out the ata code, I'm hoping someone can 
either confirm that, or suggest another cause.  Currently I'm leaning to 
it being an fxp issue, due to symptoms described below, but I'm feeling 
rather exasperated and clueless despite hunting through lots of code.

Verbose dmesg: http://smithi.id.au/dmesg.boot.verbose.8.2-P_0

Here's an annotated verbose dmesg after that of the first suspend/resume 
cycle, with as close as I can get to describing the timings involved:

acpi_button0: sleep button pressed
acpi_lid0: wake_prep enabled for \\_SB_.LID_ (S3)
acpi_button0: wake_prep enabled for \\_SB_.SLPB (S3)

Above are the last messaged appearing on VT0 when that's in view.

vga0: saving 68 bytes of video state
vga0: saving color palette
pci0:1:0:0: Transition from D0 to D3

Video turns off.

fxp0: link state changed to DOWN
pci0:2:8:0: Transition from D0 to D3

That's fxp0 switching off.  There's a possible issue around this:

fxp0: <Intel 82801CAM (ICH3) Pro/100 VE Ethernet> port 0x6400-0x643f
  mem 0xc0200000-0xc0200fff irq 11 at device 8.0 on pci2
fxp0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xc0200000
fxp0: using memory space register mapping
fxp0: PCI IDs: 8086 1031 1014 0209 0042
fxp0: Dynamic Standby mode is disabled   <<<<<<<<<<<<<<<<<
miibus0: <MII bus> on fxp0
inphy0: <i82562ET 10/100 media interface> PHY 1 on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
fxp0: bpf attached
fxp0: Ethernet address: 00:02:8a:36:08:29
fxp0: [MPSAFE]
fxp0: [ITHREAD]

where it's not clear whether fxp0 is powered down or not - more later.  

Also note that fxp0 going DOWN before suspend and UP after resume is new 
in the last month; before then this didn't happen .. I'd actually tried 
ifconfig fxp0 down in rc.suspend and ifconfig fxp0 up in rc.resume quite
recently, but it made no difference to the bad resume behaviour.

ct_to_ts([2010-12-04 00:35:27]) = 1291422927.000000000

The clock is written to the RTC.  I've tried with that disabled too.

======== acpi_printcpu() debug dump ========
gdt[0097:c0e5a320] idt[07ff:c0e62980] ldt[0050] tr[0048] efl[00000006]
eax[0141e000] ebx[00000000] ecx[c141e000] edx[0141e000]
esi[c419bb00] edi[00000202] ebp[df7a2b18] esp[df7a2af8]
cr0[8005003b] cr2[28162840] cr3[0141e000] cr4[000006d1]
cs[0020] ds[0028] es[0028] fs[0008] gs[001b] ss[0028]

Before suspend, then

======== acpi_printcpu() debug dump ========
gdt[0097:c0e5a320] idt[07ff:c0e62980] ldt[0050] tr[0048] efl[00000002]
eax[c48f0801] ebx[00000000] ecx[00000004] edx[c48f0870]
esi[c419bb00] edi[00000202] ebp[df7a2b18] esp[df7a2af8]
cr0[8005003b] cr2[28162840] cr3[0141e000] cr4[000006d1]
cs[0020] ds[0028] es[0028] fs[0008] gs[001b] ss[0028]

after resume.  I have sysctl debug.acpi.resume_beep=1 so the speaker 
starts a steady rather piercing beep here, which continues for the full 
next minute, until (apparently) the ct_to_ts message below, where I 
gather the RTC is read to restore the clock (albeit up to a second off)

Now none of the following messages appear on VT0 until they all do, but 
on 8.0-R these t_delta messages used to appear as they happened, 15 or 
perhaps 16? seconds apart.  I'd still like to know what these really 
mean, and what may be causing the problem they seem to be indicating?

acpi_lid0: run_prep cleaned up for \\_SB_.LID_
acpi_button0: run_prep cleaned up for \\_SB_.SLPB
pci0:1:0:0: Transition from D3 to D0

Straight after pressing the resume button, the screen comes on.  Apart 
from the ultrabay CD light flashing (just BIOS init, I think) none of 
the keyboard or HD LEDs flash until the 60 second stall completes; if 
ata is stalling, it's not indicating even trying to access the HD, and 
there's no difference with my newer 120GB drive or its original 30GB.

t_delta 15.fbb95ad165139480 too short
t_delta 15.fbb94cc1cd800000 too short
t_delta 16.09c9679193616a00 too long

Occasionally only two of these appear, though usually three.  Either way 
the nothing-happening stall seems to last exactly 60 seconds.  If those 
do indicate 16 second intervals, then the one following the third must 
be 12 seconds; as these no longer appear on the console, I can't check.

So after that 60 seconds, the above and the following messages all 
appear at once, and the resume beep also quitss at/near this point:

ct_to_ts([2010-12-04 00:36:31]) = 1291422991.000000000
wakeup from sleeping state (slept 00:01:04)

Here I'd pressed resume immediately, so that's the 3 second sleep in 
rc.suspend, plus the 60 second stall, plus the one second all this takes 
anyway, eg when faking it with debug.acpi.suspend_bounce=1 (see below)

At this point the keyboard LEDs flash, then immediately after the HD LED 
flashes, then all the above plus these messages below appear:

ata0: reiniting channel ..
ata0: reset tp1 mask=03 ostat0=50 ostat1=00
ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
ata0: reset tp2 stat0=50 stat1=00 devices=0x1
ad0: setting UDMA100
ata0: reinit done ..

Then there's a 1 or 2 second delay as the CD is tasted, then:

ata1: reiniting channel ..
ata1: reset tp1 mask=03 ostat0=00 ostat1=00
ata1: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
ata1: stat1=0x00 err=0x00 lsb=0x00 msb=0x00
ata1: reset tp2 stat0=00 stat1=00 devices=0x10000
acd0: setting UDMA33
ata1: reinit done ..
atkbd: the current kbd controller command byte 0047
atkbd: keyboard ID 0x54ab (2)
kbdc: RESET_KBD return code:00fa
kbdc: RESET_KBD status:00aa
kbdc: TEST_AUX_PORT status:0000
kbdc: RESET_AUX return code:00fa
kbdc: RESET_AUX status:00aa
kbdc: RESET_AUX ID:0000
fxp0: link state changed to UP
battery0: battery initialization start
battery0: battery initialization done, tried 1 times

Again, fxp0 going UP here is new.  On boot it actually comes up a bit 
late, /var/log/console.log now shows fxp0 as inactive where before it 
showed as active 100 fdx, though it does come up soon afterwards, and 
doesn't seem to be an issue - ntpdate runs once on boot and that's ok, 
as reported by the final ts_to_ct on the verbose dmesg.

Re fxp, a couple of points: firstly WoL is enabled in BIOS, showing as:

fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2009<RXCSUM,VLAN_MTU,WOL_MAGIC>
        ether 00:02:8a:36:08:29
        inet 192.168.7.7 netmask 0xffffff00 broadcast 192.168.7.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

but attempts to wake it up with net/wakeonlan, both via the switch and 
directly with a cross-over cable, are ignored.  Earlier I tried having 
WoL disabled in BIOS, no difference.  According to if_fxp.c:

        /*
         * Enable workarounds for certain chip revision deficiencies.
         *
         * Systems based on the ICH2/ICH2-M chip from Intel, and possibly
         * some systems based a normal 82559 design, have a defect where
         * the chip can cause a PCI protocol violation if it receives
         * a CU_RESUME command when it is entering the IDLE state.  The
         * workaround is to disable Dynamic Standby Mode, so the chip never
         * deasserts CLKRUN#, and always remains in an active state.

which sounds a bit disturbing, re fxp power consumption while sleeping?

         *
         * See Intel 82801BA/82801BAM Specification Update, Errata #30.
         */
        if ((sc->ident->ich >= 2 && sc->ident->ich <= 3) ||
            (sc->ident->ich == 0 && sc->revision >= FXP_REV_82559_A0)) {
                fxp_read_eeprom(sc, &data, 10, 1);
                if (data & 0x02) {                      /* STB enable */

It's ICH3, and sc->revision >= FXP_REV_82559_A0 is also true, but that 
bit is already off in eeprom so 'fxp0: Dynamic Standby mode is disabled'

        if (bootverbose) {
                device_printf(dev, "PCI IDs: %04x %04x %04x %04x %04x\n",
                    pci_get_vendor(dev), pci_get_device(dev),
                    pci_get_subvendor(dev), pci_get_subdevice(dev),
                    pci_get_revid(dev));
                fxp_read_eeprom(sc, &data, 10, 1);
                device_printf(dev, "Dynamic Standby mode is %s\n",
                    data & 0x02 ? "enabled" : "disabled");
        }

Not that I really know what's going on with this, but I did notice that 
when debug.acpi.suspend_bounce=1, apart from not getting down to the 
suspend/resume 'acpi_printcpu()' messages, and no stall, the difference 
in dmesg is only that on the resume side we see:

pci0:1:0:0: Transition from D0 to D3
fxp0: link state changed to DOWN
pci0:2:8:0: Transition from D0 to D3
ct_to_ts([2010-12-05 05:17:08]) = 1291526228.000000000
acpi_lid0: wake_prep enabled for \\_SB_.LID_ (S3)
acpi_button0: wake_prep enabled for \\_SB_.SLPB (S3)
pci0:1:0:0: Transition from D3 to D0

pci0:2:8:0: Transition from D3 to D0     <<<<<<<<<<<<<<<<<<<<<<<<<<

ct_to_ts([2010-12-05 05:17:09]) = 1291526229.000000000
wakeup from sleeping state (slept 00:00:01)
ata0: reiniting channel ..
[..]

ie fxp0 reports 'pci0:2:8:0: Transition from D3 to D0' which does NOT 
appear after a 'real' resume.  I don't know it that's significant, in 
fact I don't know if I can trust my nose to lead anywhere useful ..

Apologies for the long-winded description, but I wanted to be thorough. 
Any clues welcome, I'm desperate enough to try anything, before I have 
to abandon 8.x, install 7.4-PRE and get on with some real work including 
travel, really requiring 100% reliable suspend/resume, without the nasty 
clock / uptime / process time resets that can occur now, especially when 
running X - all the above is on a bare system not running anything much.

cheers, Ian


More information about the freebsd-stable mailing list