kern/56339: select() call (poll() too) hangs, yet call works
perfectly (no hang) under gdb
Burton M. Strauss III
Burton at ntopsupport.com
Tue Sep 2 16:30:21 PDT 2003
>Number: 56339
>Category: kern
>Synopsis: select() call (poll() too) hangs, yet call works perfectly
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Tue Sep 02 16:30:17 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator: Burton M. Strauss III
>Release: FreeBSD 4.8-RELEASE i386
>Organization:
private citizen
>Environment:
System: FreeBSD owl.gateway.2wire.net 4.8-RELEASE FreeBSD 4.8-RELEASE #0:
Thu Apr 3 10:53:38 GMT 2003 root at free
bsd-stable.sentex.ca:/usr/obj/usr/src/sys/GENERIC i386
>Description:
A normal user program (multi-threaded) hangs on a select() call.
Changing the call to a poll() still hangs.
Under gdb, the program works perfectly.
Here is the code:
while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) {
traceEvent(CONST_TRACE_INFO, "DEBUG: Select(ing) %d....",
topSock);
memcpy(&mask, &mask_copy, sizeof(fd_set));
rc = select(topSock+1, &mask, 0, 0, NULL /* Infinite */);
traceEvent(CONST_TRACE_INFO, "DEBUG: select returned: %d",
rc);
if(rc > 0) {
handleSingleWebConnection(&mask);
}
}
(traceEvent becomes a call to syslog).
The log message shows the call to select, but it never returns.
This is true, even if I change the timeout from infinite to, say
10s.
Same behavior is seen on FreeBSD 5.1.
Same behavior if you convert select() to poll(), (example below),
the
call never returns.
while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) {
for (i=0; i<pollFdsCount; i++) {
pollFds[i].revents = 0;
}
traceEvent(CONST_TRACE_INFO, "DEBUG: poll(0x%X, %d,
10000)", &pollFds[0], pollFdsCount);
rc = poll(&pollFds[0], pollFdsCount, 10000);
traceEvent(CONST_TRACE_INFO, "DEBUG: poll returned: %d",
rc);
...
}
# netstat -a
Shows the socket is in LISTEN state:
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address
(state)
tcp4 0 0 *.3000 *.*
LISTEN
The application IS running:
12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop
PID TT STAT TIME COMMAND
61359 ?? Ss 0:02.06 /usr/bin/ntop -i sis0
@/etc/ntop.conf -d --use-syslog=local3 -t 5
12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop
PID TT STAT TIME COMMAND
61359 ?? Ss 0:02.32 /usr/bin/ntop -i sis0
@/etc/ntop.conf -d --use-syslog=local3 -t 5
If you connect to the running program and check the various threads:
(gdb) info thread
8 process 60269, thread 8 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
7 process 60269, thread 7 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
6 process 60269, thread 6 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
5 process 60269, thread 5 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
4 process 60269, thread 4 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
3 process 60269, thread 3 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
2 process 60269, thread 2 0x28301e7b in _thread_kern_sched ()
from /usr/lib/libc_r.so.4
* 1 process 60269, thread 1 0x2830358c in __sys_read () from
/usr/lib/libc_r.so.4
gives:
(gdb) thread 1 --- this is the libpcap thread
(gdb) info stack
#0 0x2830358c in __sys_read () from /usr/lib/libc_r.so.4
#1 0x282ff9c8 in _read () from /usr/lib/libc_r.so.4
#2 0x282ffa22 in read () from /usr/lib/libc_r.so.4
#3 0x28549de2 in pcap_read () from /usr/lib/libpcap.so.2
#4 0x2854997f in pcap_dispatch () from /usr/lib/libpcap.so.2
#5 0x280fbb19 in pcapDispatch (_i=0x0) at ntop.c:81
#6 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#7 0x0 in ?? ()
(gdb) thread 2 -- this is the hung thread
[Switching to thread 2 (process 60269, thread 2)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
#2 0x282c5050 in _poll () from /usr/lib/libc_r.so.4
#3 0x282c50ae in poll () from /usr/lib/libc_r.so.4
#4 0x280ccb8b in handleWebConnections (notUsed=0x0) at
webInterface.c:5351
#5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#6 0x0 in ?? ()
-- threads 3, 4, 5 and 7 periodically wake up, do their processing
and sleep
(gdb) thread 3
[Switching to thread 3 (process 60269, thread 3)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
#2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
#3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
#4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
#5 0x28110251 in ntop_sleep (secs=13) at util.c:2950
#6 0x2877469f in rrdMainLoop (notUsed=0x0) at rrdPlugin.c:1412
#7 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#8 0x0 in ?? ()
(gdb) thread 4
[Switching to thread 4 (process 60269, thread 4)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
#2 0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4
#3 0x2810db62 in waitCondvar (condvarId=0x2811d4e4) at
util.c:1415
#4 0x280f0587 in dequeueAddress (notUsed=0x0) at address.c:546
#5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#6 0x0 in ?? ()
(gdb) thread 5
[Switching to thread 5 (process 60269, thread 5)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
#2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
#3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
#4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
#5 0x28110251 in ntop_sleep (secs=60) at util.c:2950
#6 0x280fcb50 in scanIdleLoop (notUsed=0x0) at ntop.c:592
#7 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#8 0x0 in ?? ()
(gdb) thread 6
[Switching to thread 6 (process 60269, thread 6)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
#2 0x28304585 in pthread_cond_timedwait () from
/usr/lib/libc_r.so.4
#3 0x282ebee1 in _thread_gc () from /usr/lib/libc_r.so.4
#4 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#5 0x0 in ?? ()
(gdb) thread 7
[Switching to thread 7 (process 60269, thread 7)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x283026aa in _thread_kern_sched_state_unlock () from
/usr/lib/libc_r.so.4
#2 0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4
#3 0x2810db62 in waitCondvar (condvarId=0x2811d4d8) at
util.c:1415
#4 0x28101ce9 in dequeuePacket (notUsed=0x0) at pbuf.c:1694
#5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4
#6 0x0 in ?? ()
(gdb) thread 8 -- this is the main() - it wakes up, checks if the
children
are busy and goes back to sleep.
[Switching to thread 8 (process 60269, thread 8)]
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
(gdb) info stack
#0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4
#1 0x2830263d in _thread_kern_sched_state () from
/usr/lib/libc_r.so.4
#2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4
#3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4
#4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4
#5 0x28110251 in ntop_sleep (secs=10) at util.c:2950
#6 0x804ce6a in main (argc=8, argv=0xbfbffb34) at main.c:1186
#7 0x804abf2 in _start ()
(gdb)
>How-To-Repeat:
No specifics - it just happens in our code every time.
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
X-send-pr-version: 3.113
X-GNATS-Notify:
(no hang) under gdb
More information about the freebsd-bugs
mailing list