[Fwd: Strange networking behaviour in storage server]
Karli Sjöberg
karli.sjoberg at slu.se
Tue Jun 2 08:48:38 UTC 2015
Vet inte varför du skriver engelska när det bara är mellan oss... Kanske
glömde svara alla?
tis 2015-06-02 klockan 10:10 +0200 skrev Andreas Nilsson:
> No, mbufs should not effect a scrub.
>
>
> You can get some stats from vmstat -z
Hmm, what would i be looking for exactly?
>
>
> Have had a systat running while IO stalls?
Same as above.
>
>
> Also, zpool has tunable for failmode, which defaults to wait, but as
> you say scrub/zpool status indicates no errors this is unlikely the
> cause.
>
>
>
> Other than that I'm out of ideas :(
We have that in common:)
/K
>
>
> Best regards
>
> Andreas
>
>
> On Mon, Jun 1, 2015 at 1:01 PM, Karli Sjöberg <karli.sjoberg at slu.se>
> wrote:
> mån 2015-06-01 klockan 12:53 +0200 skrev Andreas Nilsson:
> > Interesting.
> >
> >
> > Out of mbufs perhaps?
>
> Hmm, why would depleted mbufs stall even a scrub?
>
> How would I verify that?
>
> /K
>
> >
> >
> > /A
> >
> >
> > On Mon, Jun 1, 2015 at 12:28 PM, Karli Sjöberg
> <karli.sjoberg at slu.se>
> > wrote:
> > mån 2015-06-01 klockan 02:56 -0700 skrev Mehmet Erol
> > Sanliturk:
> > >
> > >
> > > On Mon, Jun 1, 2015 at 2:02 AM, Karli Sjöberg
> > <karli.sjoberg at slu.se>
> > > wrote:
> > > mån 2015-06-01 klockan 10:33 +0200 skrev
> Andreas
> > Nilsson:
> > > >
> > > >
> > > > On Mon, Jun 1, 2015 at 10:14 AM, Karli
> Sjöberg
> > > <karli.sjoberg at slu.se>
> > > > wrote:
> > > > -------- Vidarebefordrat
> meddelande
> > --------
> > > > > Från: Karli Sjöberg
> > <karli.sjoberg at slu.se>
> > > > > Till: freebsd-fs at freebsd.org
> > > <freebsd-fs at freebsd.org>
> > > > > Ämne: Strange networking
> behaviour in
> > storage
> > > server
> > > > > Datum: Mon, 1 Jun 2015
> 07:49:56 +0000
> > > > >
> > > > > Hey!
> > > > >
> > > > > So we have this ZFS storage
> server
> > upgraded from
> > > 9.3-RELEASE
> > > > to
> > > > > 10.1-STABLE to overcome not
> being able
> > to 1) use
> > > SSD drives
> > > > as
> > > > > L2ARC[1]
> > > > > and 2) not being able to
> hotswap SATA
> > drives[2].
> > > > >
> > > > > After the upgrade we´ve
> noticed a very
> > odd
> > > networking
> > > > behaviour, it
> > > > > sends/receives full speed for
> a while,
> > then there
> > > is a
> > > > couple of
> > > > > minutes
> > > > > of complete silence where even
> terminal
> > commands
> > > like an
> > > > "ls" just
> > > > > waits
> > > > > until they are executed and
> then it
> > starts sending
> > > full
> > > > speed again. I
> > > > > ´ve linked to a screenshot
> showing this
> > send and
> > > pause
> > > > behaviour. The
> > > > > blue line is the total, green
> is SMB and
> > turquoise
> > > is NFS
> > > > over jumbo
> > > > > frames. It behaves this way
> regardless
> > of the
> > > protocol.
> > > > >
> > > > >
> http://oi62.tinypic.com/33xvjb6.jpg
> > > > >
> > > > > The problem is that these
> pauses can
> > sometimes be
> > > so long
> > > > that
> > > > > connections drop. Like someone
> is
> > copying files
> > > over SMB or
> > > > iSCSI and
> > > > > suddenly they get an error
> message
> > saying that the
> > > transfer
> > > > failed and
> > > > > they have to start over with
> the
> > file(s). That´s
> > > horrible!
> > > > >
> > > > > So far NFS has proven to be
> the most
> > resillient,
> > > it´s stupid
> > > > simple
> > > > > nature just waits and resumes
> transfer
> > when pause
> > > is over.
> > > > Kudus for
> > > > > that.
> > > > >
> > > > > The server is driven by a
> Supermicro
> > X9SRL-F, a
> > > Xeon 1620v2
> > > > and 64GB
> > > > > ECC
> > > > > RAM. The hardware has been
> ruled out, we
> > happened
> > > to have a
> > > > identical
> > > > > MB
> > > > > and CPU lying around and that
> didn´t
> > improve
> > > things. We have
> > > > also
> > > > > installed a Intel PRO 100/1000
> Quad-port
> > ethernet
> > > adapter to
> > > > test if
> > > > > that would change things, but
> it hasn´t,
> > it still
> > > behaves
> > > > this way.
> > > > >
> > > > > The two built-in NIC's are
> Intel 82574L
> > and the
> > > Quad-port
> > > > NIC's are
> > > > > Intel 82571EB, so both em(4)
> driven. I
> > happen to
> > > know that
> > > > the em
> > > > > driver
> > > > > has updated between 9.3 and
> 10.1.
> > Perhaps that is
> > > to blame,
> > > > but I have
> > > > > no idea.
> > > > >
> > > > > Is there anyone that can make
> sense of
> > this?
> > > > >
> > > > > [1]:
> > > > >
> > >
> >
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164
> > > > >
> > > > > [2]:
> > > > >
> > >
> >
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348
> > > > >
> > > > > /K
> > > > >
> > > > >
> > > >
> > > >
> > > > Another observation I´ve made is
> that
> > during these
> > > pauses, the
> > > > entire
> > > > system is put on hold, even ZFS
> scrub
> > stops and then
> > > resumes
> > > > after a
> > > > while. Looking in top, the
> system is
> > completly idle.
> > > >
> > > > Normally during scrub, the
> kernel eats
> > 20-30% CPU,
> > > but during
> > > > a pause,
> > > > even the [kernel] goes down to
> 0.00%.
> > Makes me think
> > > the
> > > > networking has
> > > > nothing to do with it.
> > > >
> > > > What´s then to blame? ZFS?
> > > >
> > > > /K
> > > >
> > _______________________________________________
> > > > freebsd-fs at freebsd.org mailing
> list
> > > >
> >
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > > To unsubscribe, send any mail to
> > > >
> "freebsd-fs-unsubscribe at freebsd.org"
> > > >
> > > >
> > > > Hello,
> > > >
> > > >
> > > > does this happen when clients are only
> reading
> > from server?
> > >
> > > Yes it happens when clients are only
> reading from
> > the server.
> > >
> > > > Otherwise I would suspect that it could
> be caused
> > by ZFS
> > > writing out a
> > > > large chunck of data sitting in its
> caches, and
> > until that
> > > is complete
> > > > I/O is stalled.
> > >
> > > That´s what so strange, we have three more
> systems
> > set up
> > > about the same
> > > size and none of others are acting this
> way.
> > >
> > > The only thing I can think of that differs
> that we
> > haven´t
> > > tested ruling
> > > out yet is ctld, the other systems are
> still running
> > istgt as
> > > their
> > > iSCSI daemon.
> > >
> > > /K
> > >
> > >
> > >
> > >
> > > If there are other three similar systems and they
> are
> > exactly
> > > installed with the same structure , my first
> possibility to
> > consider
> > > would be to suspect a slowly progressing hardware
> failure :
> > >
> > >
> > > From a circuit , it is not possible to get a
> response in
> > expected
> > > time , but , it is responding after a time which
> is not
> > normal . Such
> > > an action may be caused by a faulty soldered or
> cracked line
> > point in
> > > the circuit : When it is hot , it is
> disconnecting , when it
> > is cold
> > > it is connecting .
> >
> >
> > As initially stated, both motherboard and processor
> has been
> > replaced
> > with identical hardware that went through a day of
> memtest
> > before being
> > installed. Then there´s an external Supermicro
> JBOD[*] but I
> > haven´t
> > seen any disk timeouts or SES errors logged. At
> least at a
> > driver level
> > there should have been timeouts at such a long delay
> as five
> > minutes.
> >
> > /K
> >
> > [*]:
> >
> http://www.supermicro.nl/products/chassis/3U/837/SC837E26-RJBOD1.cfm
> >
> > >
> > >
> > >
> > >
> > >
> > > Thank you very much .
> > >
> > >
> > >
> > > Mehmet Erol Sanliturk
> > >
> > >
> > >
> > >
> > >
> > > >
> > > >
> > > > Have you tried what is suggested in
> > > >
> https://wiki.freebsd.org/ZFSTuningGuide ? In
> > particular
> > > setting
> > > > vfs.zfs.write_limit_override to
> something
> > appropriate for
> > > your site.
> > > > The timeout seems to be defaulting to 5
> now.
> > > >
> > > >
> > > > Best regards
> > > >
> > > > Andreas
> > > >
> > > >
> > > >
> > >
> > >
> _______________________________________________
> > > freebsd-fs at freebsd.org mailing list
> > >
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > To unsubscribe, send any mail to
> > > "freebsd-fs-unsubscribe at freebsd.org"
> > >
> > >
> >
> >
> >
> >
>
>
>
>
More information about the freebsd-fs
mailing list