SMP: protocol control block protection for a multithreaded process (ex: udp).

Wed May 30 01:43:45 UTC 2012

On 29 May 2012, at 21:09, vasanth rao naik sabavat wrote:

> I am trying to understand the socket <--> protocol layer as part of our project. I was trying to understand why the sotoinpcb() is called before taking any locks. Also, I am trying to understand scenario of a multi-threaded process trying to do socket operations simultaneously on a multicore cpu.
> 
> I have gone through the socket life cycle comments in the code and gave good understanding of the socket life cycle. Thank you for the reference.

Hi Vasanth:

Historically, the so->so_pcb pointer in BSD was protected by spl's, and could only be followed safely while at an elevated spl (probably splnet -- details forgotten at this point!).

In FreeBSD 6.x, I made a substantial revisions to the semantics of the socket<->pcb relationship in order to reduce the amount of synchronisation required. Among other things, I made it so that the validity of the so->so_pcb pointer is entirely defined by the protocol, and also made it so that all protocols could safely follow so->so_pcb without locks held, by virtue of the reference model. This trades off slightly greater memory use (inpcbs are always allocated for sockets, even after they have closed) for reduced synchronisation overhead + improved stability (due to reduced complexity). The socket life cycle ensures that no access to so->so_pcb occurs before pru_attach() has returned, and also ensures that no socket access will occur from the moment pru_detach() is called. As pru_attach() and pru_detach() are responsible for allocating and freeing pcb state, this means that all other pru_method() calls can safely dereference so_pcb in all protocols.

Synchronisation is required to use the socket, but the nature of the synchronisation depends on the protocol, and different protocols use quite different locking strategies (e.g., netnatm vs unix domain sockets vs IPv4/IPv6). There are similar reference concerns in the other direction, which among other things allow TCP to hold a reference on the socket it represents until it's done with it, regardless of API-layer close operations. We universally place protocol locks before socket-layer locks in the lock order so that calls into the socket layer are safe from the protocol while holding locks required to stabilise pcbs -- this means that socket locks can't be held over calls down the stack, mandating a stronger reference model.

None of this precludes bugs, of course, but the design is fairly coherent. The area of greatest weakness in synchronisation in the network stack is actually in the socket state machine (so_state and friends), where the stack is unclear whether the protocol or the socket layer is driving the state machine. I've been gradually pushing in the direction of the protocol driving state transitions, since that allows atomicity between layers due to protocol locks being held over socket locks when calling into the socket layer from the protocol.

Robert