[Fwd: Strange networking behaviour in storage server]

Karli Sjöberg karli.sjoberg at slu.se
Mon Jun 1 08:15:14 UTC 2015


-------- Vidarebefordrat meddelande --------
> Från: Karli Sjöberg <karli.sjoberg at slu.se>
> Till: freebsd-fs at freebsd.org <freebsd-fs at freebsd.org>
> Ämne: Strange networking behaviour in storage server
> Datum: Mon, 1 Jun 2015 07:49:56 +0000
> 
> Hey!
> 
> So we have this ZFS storage server upgraded from 9.3-RELEASE to
> 10.1-STABLE to overcome not being able to 1) use SSD drives as
> L2ARC[1]
> and 2) not being able to hotswap SATA drives[2].
> 
> After the upgrade we´ve noticed a very odd networking behaviour, it
> sends/receives full speed for a while, then there is a couple of
> minutes
> of complete silence where even terminal commands like an "ls" just
> waits
> until they are executed and then it starts sending full speed again. I
> ´ve linked to a screenshot showing this send and pause behaviour. The
> blue line is the total, green is SMB and turquoise is NFS over jumbo
> frames. It behaves this way regardless of the protocol.
> 
> http://oi62.tinypic.com/33xvjb6.jpg
> 
> The problem is that these pauses can sometimes be so long that
> connections drop. Like someone is copying files over SMB or iSCSI and
> suddenly they get an error message saying that the transfer failed and
> they have to start over with the file(s). That´s horrible!
> 
> So far NFS has proven to be the most resillient, it´s stupid simple
> nature just waits and resumes transfer when pause is over. Kudus for
> that.
> 
> The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 and 64GB
> ECC
> RAM. The hardware has been ruled out, we happened to have a identical
> MB
> and CPU lying around and that didn´t improve things. We have also
> installed a Intel PRO 100/1000 Quad-port ethernet adapter to test if
> that would change things, but it hasn´t, it still behaves this way.
> 
> The two built-in NIC's are Intel 82574L and the Quad-port NIC's are
> Intel 82571EB, so both em(4) driven. I happen to know that the em
> driver
> has updated between 9.3 and 10.1. Perhaps that is to blame, but I have
> no idea.
> 
> Is there anyone that can make sense of this?
> 
> [1]:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164
> 
> [2]:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348
> 
> /K
> 
> 

Another observation I´ve made is that during these pauses, the entire
system is put on hold, even ZFS scrub stops and then resumes after a
while. Looking in top, the system is completly idle.

Normally during scrub, the kernel eats 20-30% CPU, but during a pause,
even the [kernel] goes down to 0.00%. Makes me think the networking has
nothing to do with it.

What´s then to blame? ZFS?

/K


More information about the freebsd-fs mailing list