Re: bhyve disk performance issue
- Reply: Vitaliy Gusev : "Re: bhyve disk performance issue"
- In reply to: Vitaliy Gusev : "Re: bhyve disk performance issue"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 28 Feb 2024 20:03:03 UTC
On 2/28/24 13:31, Vitaliy Gusev wrote: > Hi, Matthew. > HI Vitaliy, Thanks for the pointers. > I still do not know what command line was used for bhyve. I couldn't > find it through the thread, sorry. And I couldn't find virtual disk > size that you used. > Sorry about that. I'll try to get you the exact command line invocation used to launch the guest process once I have test hardware again. > > Could you, please, simplify bonnie++ output, it is hard to decode due > to alignment and use exact numbers for: > > READ seq - I see you had 1.6GB/s for the good time and ~500MB/s for > the worst. > WRITE seq - ... > I summarized the output for you. Here it is again: Fast: ~ 1.6g/s seq write and 1.3g/s seq read Slow: ~ 451m/s seq write and 402m/s seq read > If you have slow results both for the read and write operations, you > probably should perform testing _only_ for READs and do not do > anything until READs are fine. > > Again, if you have slow performance for Ext4 Filesystem in guest VM > placed on the passed disk image, you should try to test on the raw > disk image, i.e. without Ext4, because it could be related. > > If you run test inside VM on a filesystem, you can have deal with > filesystem bottlenecks, bugs, fragmentation etc. Do you want to fix > them all? I don’t think so. > > For example, if you pass disk image 40G and create Ext4 filesystem, > and during testing the filesystem becomes full over 80%, I/O could be > performed not so fine. > > You probably should eliminate that guest filesystem behaviour when you > meet IO performance slowdown. > > Also, please look at the TRIM operations when you perform WRITE > testing. It could be also related to the slow write I/O. > The virtual disks were provisioned with either a 128G disk image or a 1TB raw partition, so I don't think space was an issue. Trim is definitely not an issue. I'm using a tiny fraction of the 32TB array have tried both heavily under-provisioned HW RAID10 and SW RAID10 using GEOM. The latter was tested after sending full trim resets to all drives individually. I will try to incorporate the rest of your feedback into my next round of testing. If I can find a benchmark tool that works with a raw block device, that would be ideal. Thanks, -Matthew > —— > Vitaliy > >> On 28 Feb 2024, at 21:29, Matthew Grooms <mgrooms@shrew.net> wrote: >> >> On 2/27/24 04:21, Vitaliy Gusev wrote: >>> Hi, >>> >>> >>>> On 23 Feb 2024, at 18:37, Matthew Grooms <mgrooms@shrew.net> wrote: >>>> >>>>> ... >>>> The problem occurs when an image file is used on either ZFS or UFS. >>>> The problem also occurs when the virtual disk is backed by a raw >>>> disk partition or a ZVOL. This issue isn't related to a specific >>>> underlying filesystem. >>>> >>> >>> Do I understand right, you ran testing inside VM inside guest VM on >>> ext4 filesystem? If so you should be aware about additional overhead >>> in comparison when you were running tests on the hosts. >>> >> Hi Vitaliy, >> >> I appreciate you providing the feedback and suggestions. I spent over >> a week trying as many combinations of host and guest options as >> possible to narrow this issue down to a specific host storage or a >> guest device model option. Unfortunately the problem occurred with >> every combination I tested while running Linux as the guest. Note, I >> only tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ). >> The problem did not occur when I ran FreeBSD as the guest. The >> problem did not occur when I ran KVM in the host and Linux as the guest. >> >>> I would suggest to run fio (or even dd) on raw disk device inside >>> VM, i.e. without filesystem at all. Just do not forget do “echo 3 > >>> /proc/sys/vm/drop_caches” in Linux Guest VM before you run tests. >> >> The two servers I was using to test with are are no longer available. >> However, I'll have two more identical servers arriving in the next >> week or so. I'll try to run additional tests and report back here. I >> used bonnie++ as that was easily installed from the package repos on >> all the systems I tested. >> >>> >>> Could you also give more information about: >>> >>> 1. What results did you get (decode bonnie++ output)? >> >> If you look back at this email thread, there are many examples of >> running bonnie++ on the guest. I first ran the tests on the host >> system using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a >> baseline of performance. Then I ran bonnie++ tests using bhyve as the >> hypervisor and Linux & FreeBSD as the guest. The combination of host >> and guest storage options included ... >> >> 1) block device + virtio blk >> 2) block device + nvme >> 3) UFS disk image + virtio blk >> 4) UFS disk image + nvme >> 5) ZFS disk image + virtio blk >> 6) ZFS disk image + nvme >> 7) ZVOL + virtio blk >> 8) ZVOL + nvme >> >> In every instance, I observed the Linux guest disk IO often perform >> very well for some time after the guest was first booted. Then the >> performance of the guest would drop to a fraction of the original >> performance. The benchmark test was run every 5 or 10 minutes in a >> cron job. Sometimes the guest would perform well for up to an hour >> before performance would drop off. Most of the time it would only >> perform well for a few cycles ( 10 - 30 mins ) before performance >> would drop off. The only way to restore the performance was to reboot >> the guest. Once I determined that the problem was not specific to a >> particular host or guest storage option, I switched my testing to >> only use a block device as backing storage on the host to avoid >> hitting any system disk caches. >> >> Here is the test script I used in the cron job ... >> >> #!/bin/sh >> FNAME='output.txt' >> >> echo >> ================================================================================ >> >> $FNAME >> echo Begin @ `/usr/bin/date` >> $FNAME >> echo >> $FNAME >> /usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME >> echo >> $FNAME >> echo End @ `/usr/bin/date` >> $FNAME >> >> As you can see, I'm calling bonnie++ with the system defaults. That >> uses a data set size that's 2x the guest RAM in an attempt to >> minimize the effect of filesystem cache on results. Here is an >> example of the output that bonnie++ produces ... >> >> Version 2.00 ------Sequential Output------ --Sequential Input- >> --Random- >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> linux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g 69 >> +++++ +++ >> Latency 11579us 535us 11889us 8597us 21819us >> 8238us >> Version 2.00 ------Sequential Create------ --------Random >> Create-------- >> linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ >> +++++ +++ >> Latency 7620us 126us 1648us 151us 15us >> 633us >> >> --------------------------------- speed drop >> --------------------------------- >> >> Version 2.00 ------Sequential Output------ --Sequential Input- >> --Random- >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> linux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m 99 >> 15167 530 >> Latency 11902us 8959us 24711us 10185us 20884us >> 5831us >> Version 2.00 ------Sequential Create------ --------Random >> Create-------- >> linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 0 96 +++++ +++ +++++ +++ 0 96 +++++ >> +++ 0 75 >> Latency 343us 165us 1636us 113us 55us >> 1836us >> >> In the example above, the benchmark test repeated about 20 times with >> results that were similar to the performance shown above the dotted >> line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the >> performance dropped to what's shown below the dotted line which is >> less than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seq >> read ). >> >>> 2. What results expecting? >>> >> What I expect is that, when I perform the same test with the same >> parameters, the results would stay more or less consistent over >> time. This is true when KVM is used as the hypervisor on the same >> hardware and guest options. That said, I'm not worried about bhyve >> being consistently slower than kvm or a FreeBSD guest being >> consistently slower than a Linux guest. I'm concerned that the >> performance drop over time is indicative of an issue with how bhyve >> interacts with non-freebsd guests. >> >>> 3. VM configuration, virtio-blk disk size, etc. >>> 4. Full command for tests (including size of test-set), bhyve, etc. >> >> I believe this was answered above. Please let me know if you have >> additional questions. >> >>> >>> 5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you >>> should try 4K. >>> >> The testing performed was not exclusively with virtio-blk. >> >>> 6. Linux has several read-ahead options for IO schedule, and it >>> could be related too. >>> >> I suppose it's possible that bhyve could be somehow causing the disk >> scheduler in the Linux guest to act differently. I'll see if I can >> figure out how to disable that in future tests. >> >>> Additionally could also you play with “sync=disabled” volume/zvol >>> option? Of course it is only for write testing. >> >> The testing performed was not exclusively with zvols. >> >> Once I have more hardware available, I'll try to report back with >> more testing. It may be interesting to also see how a Windows guest >> performs compared to Linux & FreeBSD. I suspect that this issue may >> only be triggered when a fast disk array is in use on the host. My >> tests use a 16x SSD RAID 10 array. It's also quite possible that the >> disk IO slowdown is only a symptom of another issue that's triggered >> by the disk IO test ( please see end of my last post related to >> scheduler priority observations ). All I can say for sure is that ... >> >> 1) There is a problem and it's reproducible across multiple hosts >> 2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests >> 3) It is not specific to any host or guest storage option >> >> Thanks, >> >> -Matthew >> >