heavy gmirror problems

Nico -telmich- Schottelius nico-freebsd-geom at schottelius.org
Fri Jun 29 11:25:39 UTC 2007


Hello!

After some weeks of trying hard to get gmirror working with FreeBSD 6.2
we had to give up today. This mail is thought to be a

   a) summary so other people with the same problems do not have
      to go through the same process
   b) clarification: perhaps we did something really wrong here
   c) report for FreeBSD-developers, perhaps there are things to
      be fixed.

We started to use gmirror on a Dell SC 1425 with an Adaptec 39320,
because we were not able to see the "hardware raid" that the Adaptec
provided and because software raid could make us independent of
   a) the vendor supporting the OS
   b) the specific raid card
   c) and even the bus (whether so choose pata/sata/p-scsi/sas)

Then we had problems on the Adaptec 39320 with gmirror:

   a) system was getting slower and slower until it was
      inaccessable
   b) not even ssh connect was possible
   c) not even login in via console was possible

Just yesterday we had some controller resets on one of the
Dell SC 1425 with the Adaptec 39320. So this controller is
dead for us with FreeBSD 6.2.

We were recommended to get LSI Logic hardware, because they
are well supported in FreeBSD.

So we bought a LSI Logic U320-2 and configured that card
to provide both disks as raid0, so we can do gmirror on that
card. This way we could still change the controller, because
it writes its metadata at the end and the harddisks are still
readable by other scsi HBAs.

Today was the day we migrated to the u320-2+gmirror system and
it was *REALLY* slow, we had about 2000 processes running, because
the disk i/o was not performing (mostly qmail related processes,
qmail-smtpd, qmail-popup or sslserver). So once again we had an unusable
(load almost always > 15, ~1500 processes hanging) system.

We first thought that the problem was raised by the sync of gmirror, but
after the sync finished the problem did not vanish. We even tried to
reboot it to have a clean state. But just after some seconds there were
more than 1000 processes hanging. Regarding to systat -vmstat or gstat
the disk i/o on amrd1 was always >= 95%, almost always 100%.

What also made us wonder is why amrd0 was just 10-20% busy, although
the reading algorithm was specified as round-robin.

This morning we removed gmirror load from /boot/loader.conf,
rebootet the system with amrd0s1a as / and now it runs fine.

We will migrate it to hardware raid1 soon, but gmirror is no choice
here anymore.

Perhaps we did something really wrong and gmirror maybe a nice tool,
but all tries here failed.

I am personally very frustrated, because I really thought gmirror
would give us a stable software raid solution without any performance
problems.

For more information about that case have a look at the mails
listed below.


Nico


For reference, the mails regarding this subject were:
(only initial mail, follow the thread for full discussion)

   freebsd-questions:
      Message-ID: <20070521093931.GD1101 at schottelius.org>
      Message-ID: <20070606134237.GC30443 at schottelius.org>
   freebsd-hackers:
      Message-ID: <20070521092443.GC1101 at schottelius.org>
   freesbsd-scsi:
      Message-ID: <20070606143430.GA31380 at schottelius.org>
      Message-ID: <20070606161221.GB31380 at schottelius.org>
      Message-ID: <20070607085611.GC25624 at schottelius.org>
      Message-ID: <20070613072837.GC27749 at schottelius.org>
   freebsd-geom:
      Message-ID: <20070626220914.GA11511 at schottelius.org>


-- 
Think about Free and Open Source Software (FOSS).
http://nico.schottelius.org/documentations/foss/the-term-foss/

PGP: BFE4 C736 ABE5 406F 8F42  F7CF B8BE F92A 9885 188C
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20070629/40850548/attachment.pgp


More information about the freebsd-geom mailing list