Issues with XEN and ZFS

Fri Feb 15 07:22:52 UTC 2019

Thanks all for your help and my apologies for the late reply, I was out on a 
long weekend and then on customer site until Wednesday night....

Comments/answers inline.

Thanks again.

Eric

On 11/02/2019 15:43, Rodney W. Grimes wrote:
>> Thanks for the testing!
>>
>> On Fri, Feb 08, 2019 at 07:35:04PM +0000, Eric Bautsch wrote:
>>> Hi.
>>>
>>>
>>> Brief abstract: I'm having ZFS/Xen interaction issues with the disks being
>>> declared unusable by the dom0.
>>>
>>>
>>> The longer bit:
>>>
>>> I'm new to FreeBSD, so my apologies for all the stupid questions. I'm trying
>>> to migrate from Linux as my virtual platform host (very bad experiences with
>>> stability, let's leave it at that). I'm hosting mostly Solaris VMs (that
>>> being my choice of OS, but again, Betamax/VHS, need I say more), as well as
>>> a Windows VM (because I have to) and a Linux VM (as a future desktop via
>>> thin clients as and when I have to retire my SunRay solution which also runs
>>> on a VM for lack of functionality).
>>>
>>> So, I got xen working on FreeBSD now after my newbie mistake was pointed out to me.
>>>
>>> However, I seem to be stuck again:
>>>
>>> I have, in this initial test server, only two disks. They are SATA hanging
>>> off the on-board SATA controller. The system is one of those Shuttle XPC
>>> cubes, an older one I had hanging around with 16GB memory and I think 4
>>> cores.
>>>
>>> I've given the dom0 2GB of memory and 2 core to start with.
>> 2GB might be too low when using ZFS, I would suggest 4G as a minimum
>> when using ZFS for reasonable performance, even 8G. ZFS is quite
>> memory hungry.
> 2GB should not be too low, I comfortably run ZFS in 1G.  ZFS is a
> "free memory hog", by design it uses all memory it can.  Unfortantly
> often the free aspect is over looked and it does not return memory when
> it should, leading to OOM kills, those are bugs and need fixed.
>
> If you are going to run ZFS at all I do strongly suggest overriding
> the arc memory size with vfs.zfs.arc_max= in /boot/loader.conf to be
> something more reasonable than the default 95% of host memory.
On my machines, I tend to limit it to 2GB where there's plenty of memory about. 
As this box only has 2GB, I didn't bother, but thanks for letting me know where 
and how to do it, as I will need to know at some point... ;-)

>
> For a DOM0 I would start at 50% of memory (so 1G in this case) and
> monitor the DOM0 internally with top, and slowly increase this limit
> until the free memory dropped to the 256MB region.  If the work load
> on DOM0 changes dramatically you may need to readjust.
>
>>> The root filesystem is zfs with a mirror between the two disks.
>>>
>>> The entire thing is dead easy to blow away and re-install as I was very
>>> impressed how easy the FreeBSD automatic installer was to understand and
>>> pick up, so I have it all scripted. If I need to blow stuff away to test, no
>>> problem and I can always get back to a known configuration.
>>>
>>>
>>> As I only have two disks, I have created a zfs volume for the Xen domU thus:
>>>
>>> zfs create -V40G -o volmode=dev zroot/nereid0
>>>
>>>
>>> The domU nereid is defined thus:
>>>
>>> cat - << EOI > /export/vm/nereid.cfg
>>> builder = "hvm"
>>> name = "nereid"
>>> memory = 2048
>>> vcpus = 1
>>> vif = [ 'mac=00:16:3E:11:11:51,bridge=bridge0',
>>>          'mac=00:16:3E:11:11:52,bridge=bridge1',
>>>          'mac=00:16:3E:11:11:53,bridge=bridge2' ]
>>> disk = [ '/dev/zvol/zroot/nereid0,raw,hda,rw' ]
>>> vnc = 1
>>> vnclisten = "0.0.0.0"
>>> serial = "pty"
>>> EOI
>>>
>>> nereid itself also auto-installs, it's a Solaris 11.3 instance.
>>>
>>>
>>> As it tries to install, I get this in the dom0:
>>>
>>> Feb  8 18:57:16 bianca.swangage.co.uk kernel: (ada1:ahcich1:0:0:0):
>>> WRITE_FPDMA_QUEUED. ACB: 61 18 a0 ef 88 40 46 00 00 00 00 00
>>> Feb  8 18:57:16 bianca.swangage.co.uk last message repeated 4 times
>>> Feb  8 18:57:16 bianca.swangage.co.uk kernel: (ada1:ahcich1:0:0:0): CAM
>>> status: CCB request was invalid
>> That's weird, and I would say it's not related to ZFS, the same could
>> likely happen with UFS since this is an error message from the
>> disk controller hardware.
> CCB invalid, thats not good, we sent a command to the drive/controller that
> it does not like.
> This drive may need to be quirked in some way, or there may be
> some hardware issues here of some kind.
Should I have pointed out that these two disks are both identical and not SSDs:
Geom name: ada0
Providers:
1. Name: ada0
    Mediasize: 1000204886016 (932G)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r2w2e3
    descr: ST1000LM035-1RK172
    lunid: 5000c5009d4d4c12
    ident: WDE0R5LL
    rotationrate: 5400
    fwsectors: 63
    fwheads: 16

>> Can you test whether the same happens _without_ Xen running?
>>
>> Ie: booting FreeBSD without Xen and then doing some kind of disk
>> stress test, like fio [0].
I've just run an fio thus (sorry, not used it before, this seemed like a 
reasonable set of options, but tell me if there's a better set):
fio --name=randwrite --iodepth=4 --rw=randwrite --bs=4k --direct=0 --size=512M 
--numjobs=10 --runtime=1200 --group_reporting

Leading to this output when I stopped it:
randwrite: (groupid=0, jobs=10): err= 0: pid=68148: Thu Feb 14 09:50:08 2019
   write: IOPS=926, BW=3705KiB/s (3794kB/s)(2400MiB/663425msec)
     clat (usec): min=10, max=4146.6k, avg=9558.71, stdev=94020.98
      lat (usec): min=10, max=4146.6k, avg=9558.97, stdev=94020.98
     clat percentiles (usec):
      |  1.00th=[     47],  5.00th=[     52], 10.00th=[ 100],
      | 20.00th=[    133], 30.00th=[    161], 40.00th=[ 174],
      | 50.00th=[    180], 60.00th=[    204], 70.00th=[ 249],
      | 80.00th=[    367], 90.00th=[   2008], 95.00th=[ 10552],
      | 99.00th=[ 160433], 99.50th=[ 566232], 99.90th=[1367344],
      | 99.95th=[2055209], 99.99th=[2868904]
    bw (  KiB/s): min=    7, max=16383, per=16.36%, avg=606.11, stdev=1379.59, 
samples=7795
    iops        : min=    1, max= 4095, avg=151.06, stdev=344.94, samples=7795
   lat (usec)   : 20=0.51%, 50=2.53%, 100=6.88%, 250=60.31%, 500=12.97%
   lat (usec)   : 750=2.16%, 1000=1.64%
   lat (msec)   : 2=2.98%, 4=2.65%, 10=2.27%, 20=1.16%, 50=1.58%
   lat (msec)   : 100=0.95%, 250=0.63%, 500=0.22%, 750=0.17%, 1000=0.16%
   cpu          : usr=0.04%, sys=0.63%, ctx=660907, majf=1, minf=10
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued rwts: total=0,614484,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   WRITE: bw=3705KiB/s (3794kB/s), 3705KiB/s-3705KiB/s (3794kB/s-3794kB/s), 
io=2400MiB (2517MB), run=663425-663425msec

I didn't manage to produce any errors in the log files...

Just to be on the safe side, I have changed the dom0 memory to 4GB and limited 
ZFS arc to 1GB thus:
xen_cmdline="dom0_mem=4092M dom0_max_vcpus=2 dom0=pvh console=com1,vga 
com1=115200,8n1 guest_loglvl=all loglvl=all"
vfs.zfs.arc_max="1024M"

I've now re-created one of my domUs and I have not experienced any issues at all 
this time. Of course I now don't know if it was the limiting of ZFS arc, the 
increase in memory or both together that fixed it.
I will attempt further tests and update the list....

Thanks again.
Eric

-- 

       ____
      /          .                           Eric A. Bautsch
     /--   __       ___                ______________________________________
    /     /    /   /                  /
   (_____/____(___(__________________/       email: eric.bautsch at pobox.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4127 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-xen/attachments/20190215/c92455c4/attachment.bin>