Machine wedges solid after one serial-port source-line addition...

Barry Bouwsma freebsd-misuser at remove-NOSPAM-to-reply.NOSPAM.netscum.dk
Mon Sep 15 14:30:42 PDT 2003


[NOTE:  IPv6-only e-mail above, so you probably want to drop me from
 the recipients and just send to the list, which I'll read later, as
 I'm not always online -- else remove just the hostname part to reveal
 an IPv4-aware e-mail for me that may well timeout and bounce.  Sorry.]


Hello gurus and the like;

In the process of trying to enhance my FreeBSD kernel's PPS and related
NTP timekeeping ability, I discovered I could reliably wedge my machine
(two different machines, actually) solid, such that I couldn't break into
the kernel debugger and the NumLock key wouldn't toggle the LED, and only
hitting the reset/power switch could return me to sanity.

Thinking it was a problem with the logic of my added code, I pruned things
and realized a single printf() line would cause my machine to hang within
a few minutes of boot; of course, with a PPS source (radio clock) connected
to the serial port to toggle the DCD line every second and trigger the
printf().

I'd been stuck with STABLE-09.Dec.2002 for a while, but the same thing
seems to happen as well with a RELENG_4 kernel as of a week or so ago --
at least with my hardware.

Would anyone care to explain why the following simple patch could be
enough to wedge my machine solid?  (My original hack-patches without
any console printf() debuggery did the same thing within seconds, as
well...)  All it does is notify the console whenever a serial port DCD
PPS signal transition is detected, as follows (patch against 4.foo; I
haven't tried this with 5.bar or later -- also, not a real patch as I've
included context and snipped my comments) :

--- /usr/local/system/src/sys/isa/sio.c Tue Sep  2 08:57:19 2003
+++ /usr/local/source-hacks/sys/isa/sio.c       Tue Sep  2 18:55:31 2003
[...]
@@ -1999,21 +2015,56 @@

        while (!com->gone) {
                if (com->pps.ppsparam.mode & PPS_CAPTUREBOTH) {
                        modem_status = inb(com->modem_status_port);
                        if ((modem_status ^ com->last_modem_status) & MSR_DCD) {
                                tc = timecounter;
                                count = tc->tc_get_timecount(tc);
                                pps_event(&com->pps, tc, count,
                                    (modem_status & MSR_DCD) ?
                                    PPS_CAPTUREASSERT : PPS_CAPTURECLEAR);
+                               printf("DCD status change\n");
                        }
                }
                line_status = inb(com->line_status_port);
[...]



I'd be grateful for enlightenment.  I'd successfully added other lines
to record timestamps of other modem lines in addition to DCD (TIOCDCDTIMESTAMP)
but any attempt to do anything with code comparable to the above would
invariably result in a wedge within seconds to hours, from which keyboard
debugger entry was ineffective.

Also note that added debuggery reveals the solid wedge doesn't happen
anywhere in the suspect section of code that I sprinkled with printf()s,
but I haven't done enough debuggery to narrow down where it does or does
not happen.

I'm wondering if it's something really blindingly obvious that I should
be but am not aware of, or something I gotta work on to track down.


Thanks,
Barry Bouwsma



More information about the freebsd-hackers mailing list