Extreme console latency during disk IO (8.0-RC1, previous
releases also affected according to others)
ivoras at freebsd.org
Tue Oct 13 13:34:18 UTC 2009
2009/10/13 Robert Watson <rwatson at freebsd.org>:
> On Tue, 13 Oct 2009, Ivan Voras wrote:
>> Thomas Backman wrote:
>>> I'm copying this over from the freebsd-performance list, as I'm looking
>>> for a few more opinions - not on the problems *I* am having, but rather to
>>> check whether the problem is universal or not, and if not, find a possible
>>> common factor. In other words: I want to hear about your experiences, *good
>>> or bad*!
>>> Here's the original thread (not from the beginning, though):
>>> Long story short, my version: when the disk is stressed hard enough,
>>> console IO becomes COMPLETELY unbearable. 10+ seconds to switch between
>>> windows in screen(1), running (or even typing) simple commands, etc. This
>>> happens both via SSH and the serial console.
>> Hmm, this looks familiar - I've noticed it before on the physical (VGA)
>> console and I notice it all the time under VMWare. It sort of looks like
>> disk IO really blocks network IO in this case - I use the VMs over ssh.
> Real hardware and virtual hardware have vastly different performance
> properties, so I'd be careful not to assume that the issue described by the
> original reporter and the issue you're experiencing are the same. In our
> kernel, low level network protocols will essentially always take precedence
> over disk I/O activity. So on face value "disk IO really blocks network IO"
> is highly unlikely.
Yes, I agree for both reasons and that is why I wasn't complaining
until encountering this thread.
> There are two much more likely possibilities: (1) poor VM implementation
> causes the virtual CPU to be suspended behind synchronous host OS I/O or (2)
> the network stack is running fine but the interactive user application is
> getting I/O or locks scheduled behind a bulk process.
> A useful diagnostic here is to compare the behavior of three kinds of
> network latency tests:
> (1) ping from the host OS to the guest OS
> (2) netperf TCP_RR from the host OS to the guest OS
> (3) ssh interactive latency
> If (1) is highly variable during I/O, it's almost certainly a property of
> the VM technology you're using, and there's nought to be done about it in
> the guest OS.
Here's an example of a ping session with 0.1s resolution during a few
seconds-stall in ssh:
64 bytes from 188.8.131.52: icmp_seq=1576 ttl=64 time=0.383 ms
64 bytes from 184.108.40.206: icmp_seq=1577 ttl=64 time=0.405 ms
64 bytes from 220.127.116.11: icmp_seq=1578 ttl=64 time=0.360 ms
64 bytes from 18.104.22.168: icmp_seq=2304 ttl=64 time=4.194 ms
64 bytes from 22.214.171.124: icmp_seq=2305 ttl=64 time=0.454 ms
64 bytes from 126.96.36.199: icmp_seq=2306 ttl=64 time=0.376 ms
note huge packet loss. It looks like it's VM fault or something like it.
More information about the freebsd-stable