dhclient taking all cpu

Brooks Davis brooks at one-eyed-alien.net
Wed Jul 27 19:10:59 GMT 2005


On Tue, Jul 26, 2005 at 04:39:33PM -0700, Brooks Davis wrote:
> On Tue, Jul 26, 2005 at 06:53:17PM -0400, Jung-uk Kim wrote:
> > On Tuesday 26 July 2005 04:00 pm, Wilko Bulte wrote:
> > > On Tue, Jul 26, 2005 at 12:33:24PM -0700, Brooks Davis wrote..
> > >
> > > > On Mon, Jul 25, 2005 at 10:39:09PM -0400, Mike Jakubik wrote:
> > > > > On Mon, July 25, 2005 9:54 pm, Brooks Davis said:
> > > > > >>> Probably something wrong with your interface, but you
> > > > > >>> havent't provided any useful information so who knows.  At
> > > > > >>> the very least, I need to know what interface you are
> > > > > >>> running on, something about it's status, and if both
> > > > > >>> dhclient processes are running.
> > > > > >>
> > > > > >> The interface is xl0 (3Com 3c905C-TX Fast Etherlink XL), and
> > > > > >> it worked in this machine fine for as long as i remember.
> > > > > >> This seems to have happened since a recent cvsup and
> > > > > >> buildworld from ~6-BETA to 7-CURRENT. I rebooted three
> > > > > >> times, and the problem occured rougly a minute after bootup.
> > > > > >> On the fourth time however, it seems to be ok so far.
> > > > > >
> > > > > > That sounds like a problem with the code that handles the
> > > > > > link state notifications in the interface driver.  The
> > > > > > notifications are a reletivly new feature that we're only now
> > > > > > starting to use heavily so there are going to be bumps in the
> > > > > > road.  It would be intresting to know if you see link state
> > > > > > messages promptly if you plug and unplug the network cable.
> > > > >
> > > > > It seems to be back at it again, this time it took longer to
> > > > > kick in. Here is a "ps auxw|grep dhclient" :
> > > > >
> > > > > _dhcp      219 93.5  0.2  1484  1136  ??  Rs    8:49PM  
> > > > > 5:06.00 dhclient: xl0 (dhclient)
> > > > > root       193  0.0  0.2  1484  1088  d0- S     8:49PM  
> > > > > 0:00.02 dhclient: xl0 [priv] (dhclient)
> > > > >
> > > > > top:
> > > > >
> > > > >   PID USERNAME      THR PRI NICE   SIZE    RES STATE    TIME  
> > > > > WCPU COMMAND 219 _dhcp           1 129    0  1484K  1136K RUN  
> > > > >    9:33 94.24% dhclient
> > > > >
> > > > > Nothing in dmesg about link state changes on xl0. Unplugging
> > > > > and replugging the network cable results in link state
> > > > > notification within a couple seconds.
> > > >
> > > > Could you see what happens if you run dhclient in the foreground?
> > > >  Just running "dhclient -d xl0" should do it.  I'd like to know
> > > > what sort of output it's generating.
> > >
> > > In my case it is not displaying anything:
> > >
> > >
> > > chuck#dhclient -d ath0
> > > DHCPREQUEST on ath0 to 255.255.255.255 port 67
> > > DHCPACK from 192.168.5.254
> > > bound to 192.168.5.20 -- renewal in 21600 seconds.
> > >
> > > <nothing>
> > >
> > > I can tell the phenomenon occurs when my laptop fan springs to
> > > life:
> > >
> > > CPU states: 96.5% user,  0.0% nice,  2.7% system,  0.8% interrupt, 
> > > 0.0% idle
> > > Mem: 48M Active, 28M Inact, 50M Wired, 680K Cache, 34M Buf, 115M
> > > Free Swap: 257M Total, 257M Free
> > >
> > >   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU
> > > COMMAND 719 _dhcp       1 129    0  1384K  1092K RUN      2:14
> > > 93.55% dhclient 607 root        1  98    0 34584K 21212K select  
> > > 0:09  1.81% Xorg 663 wb          4  20    0 46712K 40224K kserel  
> > > 0:27  0.00% mozilla-bin 503 root        1   8    0  1184K   796K
> > > nanslp   0:07  0.00% powerd
> > >
> > > Took (best guess) approx 5-10 minutes for the effect to kick in.
> > 
> > FYI, I have the same issues with bge(4) and ndis(4).
> 
> I've seen it on ath and em interfaces now, but am not sure what's going
> on. and have no idea how to reproduce the problem.  As also reported by
> Bakul Shah, we seem to be getting into a state where receive_packet() is
> spinning.  I'm not seeing an obvious way for this to be possible.

I think I've found it.  There was a really odd typo (= instead of +) in
the code that handles undersized captures on the bpf socket.  Please try
the following patch and see if it solves the problem.  I'm testing here,
but I don't have a reliable way to trigger the bug.  The fix is fairly
obvious so I'll commit it to head shortly.

-- Brooks

==== //depot/user/brooks/cleanup/sbin/dhclient/bpf.c#3 - /usr/home/brooks/working/freebsd/p4/cleanup/sbin/dhclient/bpf.c ====
@@ -316,19 +316,19 @@
 			continue;
 		}
 
+		/* Skip over the BPF header... */
+		interface->rbuf_offset += hdr.bh_hdrlen;
+
 		/*
 		 * If the captured data wasn't the whole packet, or if
 		 * the packet won't fit in the input buffer, all we can
 		 * do is drop it.
 		 */
 		if (hdr.bh_caplen != hdr.bh_datalen) {
-			interface->rbuf_offset += hdr.bh_hdrlen = hdr.bh_caplen;
+			interface->rbuf_offset += hdr.bh_caplen;
 			continue;
 		}
 
-		/* Skip over the BPF header... */
-		interface->rbuf_offset += hdr.bh_hdrlen;
-
 		/* Decode the physical header... */
 		offset = decode_hw_header(interface->rbuf,
 		    interface->rbuf_offset, hfrom);

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20050727/a075ea50/attachment.bin


More information about the freebsd-current mailing list