Issues with XEN and ZFS
Eric Bautsch
eric.bautsch at pobox.com
Fri Feb 15 07:22:52 UTC 2019
Thanks all for your help and my apologies for the late reply, I was out on a
long weekend and then on customer site until Wednesday night....
Comments/answers inline.
Thanks again.
Eric
On 11/02/2019 15:43, Rodney W. Grimes wrote:
>> Thanks for the testing!
>>
>> On Fri, Feb 08, 2019 at 07:35:04PM +0000, Eric Bautsch wrote:
>>> Hi.
>>>
>>>
>>> Brief abstract: I'm having ZFS/Xen interaction issues with the disks being
>>> declared unusable by the dom0.
>>>
>>>
>>> The longer bit:
>>>
>>> I'm new to FreeBSD, so my apologies for all the stupid questions. I'm trying
>>> to migrate from Linux as my virtual platform host (very bad experiences with
>>> stability, let's leave it at that). I'm hosting mostly Solaris VMs (that
>>> being my choice of OS, but again, Betamax/VHS, need I say more), as well as
>>> a Windows VM (because I have to) and a Linux VM (as a future desktop via
>>> thin clients as and when I have to retire my SunRay solution which also runs
>>> on a VM for lack of functionality).
>>>
>>> So, I got xen working on FreeBSD now after my newbie mistake was pointed out to me.
>>>
>>> However, I seem to be stuck again:
>>>
>>> I have, in this initial test server, only two disks. They are SATA hanging
>>> off the on-board SATA controller. The system is one of those Shuttle XPC
>>> cubes, an older one I had hanging around with 16GB memory and I think 4
>>> cores.
>>>
>>> I've given the dom0 2GB of memory and 2 core to start with.
>> 2GB might be too low when using ZFS, I would suggest 4G as a minimum
>> when using ZFS for reasonable performance, even 8G. ZFS is quite
>> memory hungry.
> 2GB should not be too low, I comfortably run ZFS in 1G. ZFS is a
> "free memory hog", by design it uses all memory it can. Unfortantly
> often the free aspect is over looked and it does not return memory when
> it should, leading to OOM kills, those are bugs and need fixed.
>
> If you are going to run ZFS at all I do strongly suggest overriding
> the arc memory size with vfs.zfs.arc_max= in /boot/loader.conf to be
> something more reasonable than the default 95% of host memory.
On my machines, I tend to limit it to 2GB where there's plenty of memory about.
As this box only has 2GB, I didn't bother, but thanks for letting me know where
and how to do it, as I will need to know at some point... ;-)
>
> For a DOM0 I would start at 50% of memory (so 1G in this case) and
> monitor the DOM0 internally with top, and slowly increase this limit
> until the free memory dropped to the 256MB region. If the work load
> on DOM0 changes dramatically you may need to readjust.
>
>>> The root filesystem is zfs with a mirror between the two disks.
>>>
>>> The entire thing is dead easy to blow away and re-install as I was very
>>> impressed how easy the FreeBSD automatic installer was to understand and
>>> pick up, so I have it all scripted. If I need to blow stuff away to test, no
>>> problem and I can always get back to a known configuration.
>>>
>>>
>>> As I only have two disks, I have created a zfs volume for the Xen domU thus:
>>>
>>> zfs create -V40G -o volmode=dev zroot/nereid0
>>>
>>>
>>> The domU nereid is defined thus:
>>>
>>> cat - << EOI > /export/vm/nereid.cfg
>>> builder = "hvm"
>>> name = "nereid"
>>> memory = 2048
>>> vcpus = 1
>>> vif = [ 'mac=00:16:3E:11:11:51,bridge=bridge0',
>>> 'mac=00:16:3E:11:11:52,bridge=bridge1',
>>> 'mac=00:16:3E:11:11:53,bridge=bridge2' ]
>>> disk = [ '/dev/zvol/zroot/nereid0,raw,hda,rw' ]
>>> vnc = 1
>>> vnclisten = "0.0.0.0"
>>> serial = "pty"
>>> EOI
>>>
>>> nereid itself also auto-installs, it's a Solaris 11.3 instance.
>>>
>>>
>>> As it tries to install, I get this in the dom0:
>>>
>>> Feb 8 18:57:16 bianca.swangage.co.uk kernel: (ada1:ahcich1:0:0:0):
>>> WRITE_FPDMA_QUEUED. ACB: 61 18 a0 ef 88 40 46 00 00 00 00 00
>>> Feb 8 18:57:16 bianca.swangage.co.uk last message repeated 4 times
>>> Feb 8 18:57:16 bianca.swangage.co.uk kernel: (ada1:ahcich1:0:0:0): CAM
>>> status: CCB request was invalid
>> That's weird, and I would say it's not related to ZFS, the same could
>> likely happen with UFS since this is an error message from the
>> disk controller hardware.
> CCB invalid, thats not good, we sent a command to the drive/controller that
> it does not like.
> This drive may need to be quirked in some way, or there may be
> some hardware issues here of some kind.
Should I have pointed out that these two disks are both identical and not SSDs:
Geom name: ada0
Providers:
1. Name: ada0
Mediasize: 1000204886016 (932G)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r2w2e3
descr: ST1000LM035-1RK172
lunid: 5000c5009d4d4c12
ident: WDE0R5LL
rotationrate: 5400
fwsectors: 63
fwheads: 16
>> Can you test whether the same happens _without_ Xen running?
>>
>> Ie: booting FreeBSD without Xen and then doing some kind of disk
>> stress test, like fio [0].
I've just run an fio thus (sorry, not used it before, this seemed like a
reasonable set of options, but tell me if there's a better set):
fio --name=randwrite --iodepth=4 --rw=randwrite --bs=4k --direct=0 --size=512M
--numjobs=10 --runtime=1200 --group_reporting
Leading to this output when I stopped it:
randwrite: (groupid=0, jobs=10): err= 0: pid=68148: Thu Feb 14 09:50:08 2019
write: IOPS=926, BW=3705KiB/s (3794kB/s)(2400MiB/663425msec)
clat (usec): min=10, max=4146.6k, avg=9558.71, stdev=94020.98
lat (usec): min=10, max=4146.6k, avg=9558.97, stdev=94020.98
clat percentiles (usec):
| 1.00th=[ 47], 5.00th=[ 52], 10.00th=[ 100],
| 20.00th=[ 133], 30.00th=[ 161], 40.00th=[ 174],
| 50.00th=[ 180], 60.00th=[ 204], 70.00th=[ 249],
| 80.00th=[ 367], 90.00th=[ 2008], 95.00th=[ 10552],
| 99.00th=[ 160433], 99.50th=[ 566232], 99.90th=[1367344],
| 99.95th=[2055209], 99.99th=[2868904]
bw ( KiB/s): min= 7, max=16383, per=16.36%, avg=606.11, stdev=1379.59,
samples=7795
iops : min= 1, max= 4095, avg=151.06, stdev=344.94, samples=7795
lat (usec) : 20=0.51%, 50=2.53%, 100=6.88%, 250=60.31%, 500=12.97%
lat (usec) : 750=2.16%, 1000=1.64%
lat (msec) : 2=2.98%, 4=2.65%, 10=2.27%, 20=1.16%, 50=1.58%
lat (msec) : 100=0.95%, 250=0.63%, 500=0.22%, 750=0.17%, 1000=0.16%
cpu : usr=0.04%, sys=0.63%, ctx=660907, majf=1, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,614484,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4
Run status group 0 (all jobs):
WRITE: bw=3705KiB/s (3794kB/s), 3705KiB/s-3705KiB/s (3794kB/s-3794kB/s),
io=2400MiB (2517MB), run=663425-663425msec
I didn't manage to produce any errors in the log files...
Just to be on the safe side, I have changed the dom0 memory to 4GB and limited
ZFS arc to 1GB thus:
xen_cmdline="dom0_mem=4092M dom0_max_vcpus=2 dom0=pvh console=com1,vga
com1=115200,8n1 guest_loglvl=all loglvl=all"
vfs.zfs.arc_max="1024M"
I've now re-created one of my domUs and I have not experienced any issues at all
this time. Of course I now don't know if it was the limiting of ZFS arc, the
increase in memory or both together that fixed it.
I will attempt further tests and update the list....
Thanks again.
Eric
--
____
/ . Eric A. Bautsch
/-- __ ___ ______________________________________
/ / / / /
(_____/____(___(__________________/ email: eric.bautsch at pobox.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4127 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-xen/attachments/20190215/c92455c4/attachment.bin>
More information about the freebsd-xen
mailing list