[Fwd: Strange networking behaviour in storage server]
Andreas Nilsson
andrnils at gmail.com
Mon Jun 1 10:11:18 UTC 2015
On Mon, Jun 1, 2015 at 11:56 AM, Mehmet Erol Sanliturk <
m.e.sanliturk at gmail.com> wrote:
>
>
> On Mon, Jun 1, 2015 at 2:02 AM, Karli Sjöberg <karli.sjoberg at slu.se>
> wrote:
>
>> mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson:
>> >
>> >
>> > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg <karli.sjoberg at slu.se>
>> > wrote:
>> > -------- Vidarebefordrat meddelande --------
>> > > Från: Karli Sjöberg <karli.sjoberg at slu.se>
>> > > Till: freebsd-fs at freebsd.org <freebsd-fs at freebsd.org>
>> > > Ämne: Strange networking behaviour in storage server
>> > > Datum: Mon, 1 Jun 2015 07:49:56 +0000
>> > >
>> > > Hey!
>> > >
>> > > So we have this ZFS storage server upgraded from 9.3-RELEASE
>> > to
>> > > 10.1-STABLE to overcome not being able to 1) use SSD drives
>> > as
>> > > L2ARC[1]
>> > > and 2) not being able to hotswap SATA drives[2].
>> > >
>> > > After the upgrade we´ve noticed a very odd networking
>> > behaviour, it
>> > > sends/receives full speed for a while, then there is a
>> > couple of
>> > > minutes
>> > > of complete silence where even terminal commands like an
>> > "ls" just
>> > > waits
>> > > until they are executed and then it starts sending full
>> > speed again. I
>> > > ´ve linked to a screenshot showing this send and pause
>> > behaviour. The
>> > > blue line is the total, green is SMB and turquoise is NFS
>> > over jumbo
>> > > frames. It behaves this way regardless of the protocol.
>> > >
>> > > http://oi62.tinypic.com/33xvjb6.jpg
>> > >
>> > > The problem is that these pauses can sometimes be so long
>> > that
>> > > connections drop. Like someone is copying files over SMB or
>> > iSCSI and
>> > > suddenly they get an error message saying that the transfer
>> > failed and
>> > > they have to start over with the file(s). That´s horrible!
>> > >
>> > > So far NFS has proven to be the most resillient, it´s stupid
>> > simple
>> > > nature just waits and resumes transfer when pause is over.
>> > Kudus for
>> > > that.
>> > >
>> > > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2
>> > and 64GB
>> > > ECC
>> > > RAM. The hardware has been ruled out, we happened to have a
>> > identical
>> > > MB
>> > > and CPU lying around and that didn´t improve things. We have
>> > also
>> > > installed a Intel PRO 100/1000 Quad-port ethernet adapter to
>> > test if
>> > > that would change things, but it hasn´t, it still behaves
>> > this way.
>> > >
>> > > The two built-in NIC's are Intel 82574L and the Quad-port
>> > NIC's are
>> > > Intel 82571EB, so both em(4) driven. I happen to know that
>> > the em
>> > > driver
>> > > has updated between 9.3 and 10.1. Perhaps that is to blame,
>> > but I have
>> > > no idea.
>> > >
>> > > Is there anyone that can make sense of this?
>> > >
>> > > [1]:
>> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164
>> > >
>> > > [2]:
>> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348
>> > >
>> > > /K
>> > >
>> > >
>> >
>> >
>> > Another observation I´ve made is that during these pauses, the
>> > entire
>> > system is put on hold, even ZFS scrub stops and then resumes
>> > after a
>> > while. Looking in top, the system is completly idle.
>> >
>> > Normally during scrub, the kernel eats 20-30% CPU, but during
>> > a pause,
>> > even the [kernel] goes down to 0.00%. Makes me think the
>> > networking has
>> > nothing to do with it.
>> >
>> > What´s then to blame? ZFS?
>> >
>> > /K
>> > _______________________________________________
>> > freebsd-fs at freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > To unsubscribe, send any mail to
>> > "freebsd-fs-unsubscribe at freebsd.org"
>> >
>> >
>> > Hello,
>> >
>> >
>> > does this happen when clients are only reading from server?
>>
>> Yes it happens when clients are only reading from the server.
>>
>> > Otherwise I would suspect that it could be caused by ZFS writing out a
>> > large chunck of data sitting in its caches, and until that is complete
>> > I/O is stalled.
>>
>> That´s what so strange, we have three more systems set up about the same
>> size and none of others are acting this way.
>>
>> The only thing I can think of that differs that we haven´t tested ruling
>> out yet is ctld, the other systems are still running istgt as their
>> iSCSI daemon.
>>
>> /K
>>
>> What does a zpool status say? Could very well be disks starting to fail.
Anything in dmesg concerning cam timeouts?
Best regards
Andreas
More information about the freebsd-fs
mailing list