kern/56339: select() call (poll() too) hangs, yet call works perfectly (no hang) under gdb

Burton M. Strauss III Burton at ntopsupport.com
Tue Sep 2 16:30:21 PDT 2003


>Number:         56339
>Category:       kern
>Synopsis:       select() call (poll() too) hangs, yet call works perfectly
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Sep 02 16:30:17 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Burton M. Strauss III
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
private citizen
>Environment:
System: FreeBSD owl.gateway.2wire.net 4.8-RELEASE FreeBSD 4.8-RELEASE #0:
Thu Apr 3 10:53:38 GMT 2003 root at free
bsd-stable.sentex.ca:/usr/obj/usr/src/sys/GENERIC i386

>Description:
        A normal user program (multi-threaded) hangs on a select() call.
        Changing the call to a poll() still hangs.
        Under gdb, the program works perfectly.

        Here is the code:

            while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) {
                traceEvent(CONST_TRACE_INFO, "DEBUG: Select(ing) %d....",
topSock);
                memcpy(&mask, &mask_copy, sizeof(fd_set));
                rc = select(topSock+1, &mask, 0, 0, NULL /* Infinite */);
                traceEvent(CONST_TRACE_INFO, "DEBUG: select returned: %d",
rc);
                if(rc > 0) {
                    handleSingleWebConnection(&mask);
                }
            }

        (traceEvent becomes a call to syslog).
        The log message shows the call to select, but it never returns.

        This is true, even if I change the timeout from infinite to, say
10s.

        Same behavior is seen on FreeBSD 5.1.

        Same behavior if you convert select() to poll(), (example below),
the
        call never returns.

              while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) {
                  for (i=0; i<pollFdsCount; i++) {
                      pollFds[i].revents = 0;
                  }
                  traceEvent(CONST_TRACE_INFO, "DEBUG: poll(0x%X, %d,
10000)", &pollFds[0], pollFdsCount);
                  rc = poll(&pollFds[0], pollFdsCount, 10000);
                  traceEvent(CONST_TRACE_INFO, "DEBUG: poll returned: %d",
rc);
          ...
              }

          # netstat -a

       Shows the socket is in LISTEN state:

          Active Internet connections (including servers)
          Proto Recv-Q Send-Q  Local Address          Foreign Address
(state)
          tcp4       0      0  *.3000                 *.*
LISTEN

       The application IS running:

          12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop
            PID  TT  STAT      TIME COMMAND
          61359  ??  Ss     0:02.06 /usr/bin/ntop -i sis0
@/etc/ntop.conf -d --use-syslog=local3 -t 5

          12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop
            PID  TT  STAT      TIME COMMAND
          61359  ??  Ss     0:02.32 /usr/bin/ntop -i sis0
@/etc/ntop.conf -d --use-syslog=local3 -t 5

        If you connect to the running program and check the various threads:

          (gdb) info thread
            8 process 60269, thread 8  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            7 process 60269, thread 7  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            6 process 60269, thread 6  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            5 process 60269, thread 5  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            4 process 60269, thread 4  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            3 process 60269, thread 3  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
            2 process 60269, thread 2  0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
          * 1 process 60269, thread 1  0x2830358c in __sys_read () from
/usr/lib/libc_r.so.4

        gives:

          (gdb) thread 1   --- this is the libpcap thread
          (gdb) info stack
          #0  0x2830358c in __sys_read () from /usr/lib/libc_r.so.4
          #1  0x282ff9c8 in _read () from /usr/lib/libc_r.so.4
          #2  0x282ffa22 in read () from /usr/lib/libc_r.so.4
          #3  0x28549de2 in pcap_read () from /usr/lib/libpcap.so.2
          #4  0x2854997f in pcap_dispatch () from /usr/lib/libpcap.so.2
          #5  0x280fbb19 in pcapDispatch (_i=0x0) at ntop.c:81
          #6  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #7  0x0 in ?? ()

          (gdb) thread 2    -- this is the hung thread
          [Switching to thread 2 (process 60269, thread 2)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
          #2  0x282c5050 in _poll () from /usr/lib/libc_r.so.4
          #3  0x282c50ae in poll () from /usr/lib/libc_r.so.4
          #4  0x280ccb8b in handleWebConnections (notUsed=0x0) at
webInterface.c:5351
          #5  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #6  0x0 in ?? ()

       -- threads 3, 4, 5 and 7 periodically wake up, do their processing
and sleep

          (gdb) thread 3
          [Switching to thread 3 (process 60269, thread 3)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
          #2  0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
          #3  0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
          #4  0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
          #5  0x28110251 in ntop_sleep (secs=13) at util.c:2950
          #6  0x2877469f in rrdMainLoop (notUsed=0x0) at rrdPlugin.c:1412
          #7  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #8  0x0 in ?? ()

         (gdb) thread 4
          [Switching to thread 4 (process 60269, thread 4)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
          #2  0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4
          #3  0x2810db62 in waitCondvar (condvarId=0x2811d4e4) at
util.c:1415
          #4  0x280f0587 in dequeueAddress (notUsed=0x0) at address.c:546
          #5  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #6  0x0 in ?? ()

         (gdb) thread 5
          [Switching to thread 5 (process 60269, thread 5)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
          #2  0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
          #3  0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
          #4  0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
          #5  0x28110251 in ntop_sleep (secs=60) at util.c:2950
          #6  0x280fcb50 in scanIdleLoop (notUsed=0x0) at ntop.c:592
          #7  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #8  0x0 in ?? ()

         (gdb) thread 6
          [Switching to thread 6 (process 60269, thread 6)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
          #2  0x28304585 in pthread_cond_timedwait () from
/usr/lib/libc_r.so.4
          #3  0x282ebee1 in _thread_gc () from /usr/lib/libc_r.so.4
          #4  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #5  0x0 in ?? ()

         (gdb) thread 7
          [Switching to thread 7 (process 60269, thread 7)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
          #2  0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4
          #3  0x2810db62 in waitCondvar (condvarId=0x2811d4d8) at
util.c:1415
          #4  0x28101ce9 in dequeuePacket (notUsed=0x0) at pbuf.c:1694
          #5  0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
          #6  0x0 in ?? ()

          (gdb) thread 8  -- this is the main() - it wakes up, checks if the
children
                             are busy and goes back to sleep.
          [Switching to thread 8 (process 60269, thread 8)]
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          (gdb) info stack
          #0  0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
          #1  0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
          #2  0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
          #3  0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
          #4  0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
          #5  0x28110251 in ntop_sleep (secs=10) at util.c:2950
          #6  0x804ce6a in main (argc=8, argv=0xbfbffb34) at main.c:1186
          #7  0x804abf2 in _start ()
          (gdb)

>How-To-Repeat:
        No specifics - it just happens in our code every time.
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:
 X-send-pr-version: 3.113
 X-GNATS-Notify:
 
 (no hang) under gdb


More information about the freebsd-bugs mailing list