recv() with MSG_WAITALL might stuck when receiving more than rcvbuf

Mikolaj Golub trociny at freebsd.org
Sat Apr 9 13:31:28 UTC 2011


Hi,

When testing HAST synchronization running both primary and secondary HAST
instances on the same host I faced an issue that the synchronization may be
very slow:

Apr  9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 512MB synchronized in 16m38s (525KB/sec).

hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it
splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives
the whole block calling recv() with MSG_WAITALL option.

Sometimes recv() gets stuck: in tcpdump I see that sending side sent all
chunks, all they were acked, but receiving thread is still waiting in
recv(). netstat is reporting non empty Recv-Q for receiving side (with the
amount of bytes usually equal to the size of last sent chunk). It looked like
the receiving userspace was not informed by the kernel that all data had been
arrived.

I can reproduce the issue with the attached test_MSG_WAITALL.c.

I think the issue is in soreceive_generic(). 

If MSG_WAITALL is set but the request is larger than the receive buffer, it
has to do the receive in sections. So after receiving some data it notifies
protocol (calls pr_usrreqs->pru_rcvd) about the data, releasing so_rcv
lock. Returning it blocks in sbwait() waiting for the rest of data. I think
there is a race: when it was in pr_usrreqs->pru_rcvd not keeping the lock the
rest of data could arrive. Thus it should check for this before sbwait().

See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for
me.

Apr  9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 512MB synchronized in 4s (128MB/sec).

I observed the problem on STABLE but believe the same is on CURRENT.

BTW, I also tried optimized version of soreceive(), soreceive_stream(). It
does not have this problem. But with it I was observing tcp connections
getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin
(with many contacts) start. The processes were killable only with -9. I did
not investigate this much though.

-- 
Mikolaj Golub

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_MSG_WAITALL.c
Type: application/octet-stream
Size: 2992 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110409/ea5588b3/test_MSG_WAITALL.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uipc_socket.c.soreceive.patch
Type: text/x-patch
Size: 1216 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110409/ea5588b3/uipc_socket.c.soreceive.bin


More information about the freebsd-net mailing list