Re: bhyve disk performance issue

From: Matthew Grooms <mgrooms_at_shrew.net>
Date: Fri, 23 Feb 2024 05:51:06 UTC
On 2/18/24 09:21, Matthew Grooms wrote:
> On 2/17/24 15:53, Matthew Grooms wrote:
>>
>> Unfortunately same story with 13.2. I'm going to try and downgrade to 
>> 12.4 and see if it gets any better.
>>
>> ================================================================================
>> Begin @ Sat Feb 17 11:00:01 CST 2024
>>
>> Version  2.00       ------Sequential Output------ --Sequential Input- 
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>> localhost.lo 63640M  690k  99  1.5g  97  727m  78  950k  99 1.3g  68 
>> +++++ +++
>> Latency             11759us   29114us    8098us    8649us 23413us    
>> 4540us
>> Version  2.00       ------Sequential Create------ --------Random 
>> Create--------
>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
>> +++++ +++
>> Latency              7791us     131us    1671us     464us 15us    1811us
>>
>> End @ Sat Feb 17 11:03:13 CST 2024
>> ================================================================================
>> Begin @ Sat Feb 17 11:10:01 CST 2024
>>
>> Version  2.00       ------Sequential Output------ --Sequential Input- 
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>> localhost.lo 63640M  667k  99  449m  99  313m  94  940k  99 398m  99 
>> 16204 563
>> Latency             12147us    1079us   24470us    8795us 17853us    
>> 4384us
>> Version  2.00       ------Sequential Create------ --------Random 
>> Create--------
>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>>                  16     0  93 +++++ +++ +++++ +++     0  96 +++++ +++ 
>> +++++ +++
>> Latency               118us     159us     269us     164us 54us     844us
>>
>> End @ Sat Feb 17 11:18:43 CST 2024
>>
> I wasn't able to get a working 12.4 system in place due to lack of 
> packages. However, I did fire up a FreeBSD 14 VM and let it run 
> overnight on the same SSD array. It consistently ran at a much higher 
> speed for 50+ runs @ 10m intervals ...
>
> ================================================================================
> Begin @ Sun Feb 18 15:00:00 UTC 2024
>
> Version  1.98       ------Sequential Output------ --Sequential Input- 
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
> /sec %CP
> freebsd.shrew.l 64G  628k  99  1.6g  98  831m  60 1278k  99 1.1g  42 
> +++++ +++
> Latency             13447us   68490us     207ms    7187us 195ms   17665us
> Version  1.98       ------Sequential Create------ --------Random 
> Create--------
> freebsd.shrew.lab   -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
> /sec %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
> +++++ +++
> Latency             18225us      18us      28us   18812us 18us      25us
>
> End @ Sun Feb 18 15:03:11 UTC 2024
>
> I used identical options to run both that VM and the Linux VM I've 
> been testing. The backing store for each VM has a 1TB partition and 
> the guest interfaces are NVME. Now I'm really scratching my head.
>
> Chuck, were you testing disk performance in Linux VMs or only FreeBSD?
>
> Anyone have ideas on why Linux disk performance would drop off a cliff 
> over time?
>
I took a detour trying out Xen but apparently that has it's own set of 
performance issues related to the FreeBSD port missing newer features. I 
did install KVM on the same hardware for comparison. I then tested a 
guest provisioned identically to the bhyve VM with a virtio blk device 
which ran for 2.5 hours with consistently impressive output ...

================================================================================
Begin @ Thu Feb 22 20:55:01 CST 2024

Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
linux2.shrew.pr 31G 2191k  99  1.4g  55  1.1g  59 3484k  99  2.5g 83  
7817 127
Latency              4480us    2528us   17656us    2650us 102us    3568us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux2.shrew.prv    -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency              9722us      90us     123us      61us 20us     998us

End @ Thu Feb 22 20:56:11 CST 2024
================================================================================


For comparison, here is the output from a recent run of a Linux VM on 
bhyve using the virtio blk device. Note the performance drop off between 
the first and second run ...


================================================================================
Begin @ Thu Feb 22 22:00:02 CST 2024

Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
linux.shrew. 63640M  694k  99  1.5g  97  772m  70  985k  99  1.4g 75  
2302 115
Latency             11557us   28959us   27540us    8308us 25227us    8605us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency              4058us     125us    1651us      96us 23us     627us

End @ Thu Feb 22 22:03:07 CST 2024
================================================================================
Begin @ Thu Feb 22 22:10:02 CST 2024

Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
linux.shrew. 63640M  676k  99  406m  99  300m  92  952k  99  373m 99  
2145 158
Latency             11871us     154us   20673us    9926us 22765us   14034us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16     0  95 +++++ +++     0  79     0  96 +++++ 
+++     0  76
Latency              4426us     125us    1687us     576us 72us     654us

End @ Thu Feb 22 22:19:19 CST 2024
================================================================================

It certainly feels like a deficiency in bhyve that isn't tied to any 
particular storage device model. I did notice a pattern in top that I 
thought was peculiar. When watching the bhyve threads while the 
benchmark test is running, I see several CPU threads running at the top 
of the list followed by what I assume to be block device threads. When 
the disk performance is high, it looks like this ...

   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  5628 root         68    0    32G    31G CPU24   24   0:53  88.90% 
bhyve{vcpu 5}
  5628 root         36    0    32G    31G vmidle   7   0:18  17.86% 
bhyve{vcpu 15}
  5628 root         34    0    32G    31G vmidle  26   1:06  16.76% 
bhyve{vcpu 8}
  5628 root         21    0    32G    31G uwait   37   0:05   2.69% 
bhyve{blk-4:0-2}
  5628 root         21    0    32G    31G uwait   60   0:04   2.64% 
bhyve{blk-4:0-0}
  5628 root         21    0    32G    31G uwait   52   0:06   2.62% 
bhyve{blk-4:0-1}
  5628 root         21    0    32G    31G uwait   14   0:05   2.58% 
bhyve{blk-4:0-6}
  5628 root         21    0    32G    31G RUN     50   0:05   2.51% 
bhyve{blk-4:0-4}
  5628 root         20    0    32G    31G uwait   38   0:05   2.51% 
bhyve{blk-4:0-5}
  5628 root         21    0    32G    31G uwait   56   0:05   2.46% 
bhyve{blk-4:0-3}
  5628 root         20    0    32G    31G uwait   22   0:06   2.45% 
bhyve{blk-4:0-7}

When disk performance drops off, I see that one of the bhyve CPU threads 
shows it's PRI value climb quickly until it hits around 135. At that 
point, the one CPU thread basically is pegged at 100% WCPU until the 
test is done. Other bhyve threads are much less busy ...

   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  5518 root        135    0    32G    31G CPU59   59   6:49  99.97% 
bhyve{vcpu 2}
  5518 root         26    0    32G    31G vmidle  36   5:40   8.80% 
bhyve{vcpu 15}
  5518 root         23    0    32G    31G vmidle  18   0:41   4.74% 
bhyve{vcpu 13}
  5518 root         20    0    32G    31G vmidle  37   0:09   0.85% 
bhyve{vcpu 10}
  5518 root         20    0    32G    31G uwait    6   0:20   0.72% 
bhyve{blk-4:0-2}
  5518 root         20    0    32G    31G uwait    8   0:20   0.71% 
bhyve{blk-4:0-7}
  5518 root         20    0    32G    31G uwait   43   0:22   0.70% 
bhyve{blk-4:0-0}
  5518 root         20    0    32G    31G uwait   12   0:20   0.70% 
bhyve{blk-4:0-5}
  5518 root         20    0    32G    31G uwait   43   0:19   0.68% 
bhyve{blk-4:0-3}
  5518 root         20    0    32G    31G uwait   36   0:21   0.68% 
bhyve{blk-4:0-6}
  5518 root         20    0    32G    31G uwait   46   0:20   0.68% 
bhyve{blk-4:0-4}
  5518 root         20    0    32G    31G uwait   32   0:20   0.64% 
bhyve{blk-4:0-1}

This pattern in top repeats each time the benchmark is run unless the 
guest is rebooted.

Does bhyve call pthread_attr_setschedparam at run time under certain 
circumstances or is that the scheduler doing it's thing? Anyone have any 
ideas? I'm pretty much out of them :/

Thanks in advance,

-Matthew