[Fwd: Strange networking behaviour in storage server]

Karli Sjöberg karli.sjoberg at slu.se
Mon Jun 1 09:17:43 UTC 2015


mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson:
> 
> 
> On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg <karli.sjoberg at slu.se>
> wrote:
>         -------- Vidarebefordrat meddelande --------
>         > Från: Karli Sjöberg <karli.sjoberg at slu.se>
>         > Till: freebsd-fs at freebsd.org <freebsd-fs at freebsd.org>
>         > Ämne: Strange networking behaviour in storage server
>         > Datum: Mon, 1 Jun 2015 07:49:56 +0000
>         >
>         > Hey!
>         >
>         > So we have this ZFS storage server upgraded from 9.3-RELEASE
>         to
>         > 10.1-STABLE to overcome not being able to 1) use SSD drives
>         as
>         > L2ARC[1]
>         > and 2) not being able to hotswap SATA drives[2].
>         >
>         > After the upgrade we´ve noticed a very odd networking
>         behaviour, it
>         > sends/receives full speed for a while, then there is a
>         couple of
>         > minutes
>         > of complete silence where even terminal commands like an
>         "ls" just
>         > waits
>         > until they are executed and then it starts sending full
>         speed again. I
>         > ´ve linked to a screenshot showing this send and pause
>         behaviour. The
>         > blue line is the total, green is SMB and turquoise is NFS
>         over jumbo
>         > frames. It behaves this way regardless of the protocol.
>         >
>         > http://oi62.tinypic.com/33xvjb6.jpg
>         >
>         > The problem is that these pauses can sometimes be so long
>         that
>         > connections drop. Like someone is copying files over SMB or
>         iSCSI and
>         > suddenly they get an error message saying that the transfer
>         failed and
>         > they have to start over with the file(s). That´s horrible!
>         >
>         > So far NFS has proven to be the most resillient, it´s stupid
>         simple
>         > nature just waits and resumes transfer when pause is over.
>         Kudus for
>         > that.
>         >
>         > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2
>         and 64GB
>         > ECC
>         > RAM. The hardware has been ruled out, we happened to have a
>         identical
>         > MB
>         > and CPU lying around and that didn´t improve things. We have
>         also
>         > installed a Intel PRO 100/1000 Quad-port ethernet adapter to
>         test if
>         > that would change things, but it hasn´t, it still behaves
>         this way.
>         >
>         > The two built-in NIC's are Intel 82574L and the Quad-port
>         NIC's are
>         > Intel 82571EB, so both em(4) driven. I happen to know that
>         the em
>         > driver
>         > has updated between 9.3 and 10.1. Perhaps that is to blame,
>         but I have
>         > no idea.
>         >
>         > Is there anyone that can make sense of this?
>         >
>         > [1]:
>         > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164
>         >
>         > [2]:
>         > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348
>         >
>         > /K
>         >
>         >
>         
>         
>         Another observation I´ve made is that during these pauses, the
>         entire
>         system is put on hold, even ZFS scrub stops and then resumes
>         after a
>         while. Looking in top, the system is completly idle.
>         
>         Normally during scrub, the kernel eats 20-30% CPU, but during
>         a pause,
>         even the [kernel] goes down to 0.00%. Makes me think the
>         networking has
>         nothing to do with it.
>         
>         What´s then to blame? ZFS?
>         
>         /K
>         _______________________________________________
>         freebsd-fs at freebsd.org mailing list
>         http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>         To unsubscribe, send any mail to
>         "freebsd-fs-unsubscribe at freebsd.org"
> 
> 
> Hello,
> 
> 
> does this happen when clients are only reading from server? 

Yes it happens when clients are only reading from the server.

> Otherwise I would suspect that it could be caused by ZFS writing out a
> large chunck of data sitting in its caches, and until that is complete
> I/O is stalled.

That´s what so strange, we have three more systems set up about the same
size and none of others are acting this way.

The only thing I can think of that differs that we haven´t tested ruling
out yet is ctld, the other systems are still running istgt as their
iSCSI daemon.

/K

> 
> 
> Have you tried what is suggested in
> https://wiki.freebsd.org/ZFSTuningGuide ? In particular setting
> vfs.zfs.write_limit_override to something appropriate for your site.
> The timeout seems to be defaulting to 5 now.
> 
> 
> Best regards
> 
> Andreas
> 
> 
> 



More information about the freebsd-fs mailing list