HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you

Mike Tancsa mike at sentex.net
Mon Aug 18 13:37:55 UTC 2008

At 04:14 AM 8/18/2008, Robert Watson wrote:

>On Sun, 3 Aug 2008, Robert Watson wrote:
>>This is an advance warning that, late next week, I will be merging 
>>a fairly large set of changes to the IPv4 and IPv6 protocols 
>>layered over the inpcb/inpcbinfo kernel infrastructure.  To be 
>>specific, this affects TCP, UDP, and raw sockets on both IPv4 and 
>>IPv6.  I will post a further e-mail announcement along with patch 
>>set and schedule in a day or two once it's prepared.
>FYI: This patch has now been committed to Subversion.  I'll keep a 
>close eye out for difficulties; if you run into issues, please send 
>me an e-mail (and CC stable@).

Hi Robert,
         I just did a buildworld/kernel in case your commit fixed the 
routing bugs, but I am still seeing those bogus arp / routing table 
entries. I narrowed it down to the commits below. I dont think its 
the intel stuff, as another user reported the same issue using bce nics.


Updating collection src-all/cvs
  Edit src/sys/conf/files
   Add delta 1.1243.2.32 2008. kmacy
  Checkout src/sys/dev/e1000/LICENSE
  Checkout src/sys/dev/e1000/README
  Checkout src/sys/dev/e1000/e1000_80003es2lan.c
  Checkout src/sys/dev/e1000/e1000_80003es2lan.h
  Checkout src/sys/dev/e1000/e1000_82540.c
  Checkout src/sys/dev/e1000/e1000_82541.c
  Checkout src/sys/dev/e1000/e1000_82541.h
  Checkout src/sys/dev/e1000/e1000_82542.c
  Checkout src/sys/dev/e1000/e1000_82543.c
  Checkout src/sys/dev/e1000/e1000_82543.h
  Checkout src/sys/dev/e1000/e1000_82571.c
  Checkout src/sys/dev/e1000/e1000_82571.h
  Checkout src/sys/dev/e1000/e1000_82575.c
  Checkout src/sys/dev/e1000/e1000_82575.h
  Checkout src/sys/dev/e1000/e1000_api.c
  Checkout src/sys/dev/e1000/e1000_api.h
  Checkout src/sys/dev/e1000/e1000_defines.h
  Checkout src/sys/dev/e1000/e1000_hw.h
  Checkout src/sys/dev/e1000/e1000_ich8lan.c
  Checkout src/sys/dev/e1000/e1000_ich8lan.h
  Checkout src/sys/dev/e1000/e1000_mac.c
  Checkout src/sys/dev/e1000/e1000_mac.h
  Checkout src/sys/dev/e1000/e1000_manage.c
  Checkout src/sys/dev/e1000/e1000_manage.h
  Checkout src/sys/dev/e1000/e1000_nvm.c
  Checkout src/sys/dev/e1000/e1000_nvm.h
  Checkout src/sys/dev/e1000/e1000_osdep.c
  Checkout src/sys/dev/e1000/e1000_osdep.h
  Checkout src/sys/dev/e1000/e1000_phy.c
  Checkout src/sys/dev/e1000/e1000_phy.h
  Checkout src/sys/dev/e1000/e1000_regs.h
  Checkout src/sys/dev/e1000/if_em.c
  Checkout src/sys/dev/e1000/if_em.h
  Checkout src/sys/dev/e1000/if_igb.h
  Edit src/sys/kern/kern_synch.c
   Add delta 1.302.2.3 2008. rwatson
  Edit src/sys/kern/sys_process.c
   Add delta 2008. jhb
  Edit src/sys/netinet/tcp_subr.c
   Add delta 1.300.2.4 2008. kmacy
  Edit src/sys/netinet/tcp_syncache.c
   Add delta 2008. kmacy
   Add delta 2008. kmacy
  Edit src/sys/netinet/tcp_syncache.h
   Add delta 2008. kmacy
  Edit src/sys/netinet/tcp_usrreq.c
   Add delta 2008. kmacy
  Edit src/sys/netinet/udp_usrreq.c
   Add delta 2008. bz
  Edit src/sys/netinet6/ip6_input.c
   Add delta 2008. bz
  Edit src/sys/netinet6/ip6_var.h
   Add delta 2008. bz
  Edit src/sys/sys/socket.h
   Add delta 2008. kmacy
  Edit src/sys/ufs/ufs/ufs_lookup.c
   Add delta 2008. jhb
  Edit src/sys/vm/vm_object.c
   Add delta 1.385.2.2 2008. jhb
  Edit src/sys/vm/vm_object.h
   Add delta 2008. jhb
  Edit src/sys/vm/vnode_pager.c
   Add delta 2008. jhb


>Robert N M Watson
>Computer Laboratory
>University of Cambridge
>>The thrust of this change is to replace the mutexes protecting the 
>>inpcb and inpcbinfo data structures with read-write locks 
>>(rwlocks).  These structures represent, respectively, particular 
>>sockets and the global socket lists for all socket types in IPv4 
>>and IPv6 except for SCTP.  When you run netstat, inpcbinfo is the 
>>data structure referencing all connections, and each line in the 
>>nestat output reflects the contents of a specific inpcb.
>>In the current stage of this work, the intent is to improve 
>>performance for datagram-related protocols on SMP systems by 
>>allowing concurrent acquisition of both global and connection locks 
>>during receive and transmit.  This is possible because, in the 
>>common case, no connection or global state is modified during 
>>UDP/raw receive and transmit at the IP layer, so a read lock is 
>>sufficient to prevent data in those structures from unexpectedly 
>>changing. For receive, socket layer state is modified, but this is 
>>separately protected by socket layer locks.  On transmit, no state 
>>is modified at any layer, so in principle we will allow fully 
>>parallel transmit from multiple threads down to about the routing 
>>and network interface layers, whereas previously they would bottleneck in UDP.
>>The applications targeted by this change are threaded UDP server 
>>applications, such as BIND9, nsd, and UDP-based memcached.  Kris 
>>Kennaway and Paul Saab have done fairly extensive testing with the 
>>changes and demonstrated significant performance improvements due 
>>to reduced contention and overhead.  Perhaps they can mention some 
>>of those numbers in a follow-up to this post.
>>The reason for the heads up is that, while carefully-tested, 
>>changes of this sort do come with risks.  We've carefully 
>>structured them so as to avoid breaking the ABIs for netstat, etc, 
>>but it's not impossible that some problems will arise as the 
>>changes settle.  The goal, however, is to see these performance 
>>improvements in 7.1, and since they've had a bit to shake out in 
>>8.x and seen some heavy use, I think now is the right time to merge them.
>>In any case, I will send out e-mail in a couple of days with a 
>>proposed merge patch and schedule for merging, and perhaps if you 
>>are in a positition where you might benefit from these 
>>improvements, or have interesting UDP or raw-socket based 
>>applications running on 7.x, you could test the candidate patch 
>>before it's merged, reporting any problems.  Unless I receive 
>>negative feedback, I will plan on merging the changes late in the 
>>week, and keep a close eye on stable@ for any reports of problems.
>>Robert N M Watson
>>Computer Laboratory
>>University of Cambridge
>>freebsd-stable at freebsd.org mailing list
>>To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>freebsd-stable at freebsd.org mailing list
>To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

More information about the freebsd-stable mailing list