Re: bhyve disk performance issue

From: Jason Tubnor <jason_at_tubnor.net>
Date: Fri, 23 Feb 2024 06:25:39 UTC
On 23/02/2024 4:51 pm, Matthew Grooms wrote:
> On 2/18/24 09:21, Matthew Grooms wrote:
>> On 2/17/24 15:53, Matthew Grooms wrote:
>>>
>>> Unfortunately same story with 13.2. I'm going to try and downgrade 
>>> to 12.4 and see if it gets any better.
>>>
>>> ================================================================================ 
>>>
>>> Begin @ Sat Feb 17 11:00:01 CST 2024
>>>
>>> Version  2.00       ------Sequential Output------ --Sequential 
>>> Input- --Random-
>>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- 
>>> --Block-- --Seeks--
>>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec 
>>> %CP  /sec %CP
>>> localhost.lo 63640M  690k  99  1.5g  97  727m  78  950k  99 1.3g  68 
>>> +++++ +++
>>> Latency             11759us   29114us    8098us    8649us 23413us    
>>> 4540us
>>> Version  2.00       ------Sequential Create------ --------Random 
>>> Create--------
>>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- 
>>> --Read--- -Delete--
>>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec 
>>> %CP  /sec %CP
>>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ 
>>> +++ +++++ +++
>>> Latency              7791us     131us    1671us     464us 15us    
>>> 1811us
>>>
>>> End @ Sat Feb 17 11:03:13 CST 2024
>>> ================================================================================ 
>>>
>>> Begin @ Sat Feb 17 11:10:01 CST 2024
>>>
>>> Version  2.00       ------Sequential Output------ --Sequential 
>>> Input- --Random-
>>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- 
>>> --Block-- --Seeks--
>>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec 
>>> %CP  /sec %CP
>>> localhost.lo 63640M  667k  99  449m  99  313m  94  940k  99 398m  99 
>>> 16204 563
>>> Latency             12147us    1079us   24470us    8795us 17853us    
>>> 4384us
>>> Version  2.00       ------Sequential Create------ --------Random 
>>> Create--------
>>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- 
>>> --Read--- -Delete--
>>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec 
>>> %CP  /sec %CP
>>>                  16     0  93 +++++ +++ +++++ +++     0  96 +++++ 
>>> +++ +++++ +++
>>> Latency               118us     159us     269us     164us 54us     
>>> 844us
>>>
>>> End @ Sat Feb 17 11:18:43 CST 2024
>>>
>> I wasn't able to get a working 12.4 system in place due to lack of 
>> packages. However, I did fire up a FreeBSD 14 VM and let it run 
>> overnight on the same SSD array. It consistently ran at a much higher 
>> speed for 50+ runs @ 10m intervals ...
>>
>> ================================================================================ 
>>
>> Begin @ Sun Feb 18 15:00:00 UTC 2024
>>
>> Version  1.98       ------Sequential Output------ --Sequential Input- 
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>> freebsd.shrew.l 64G  628k  99  1.6g  98  831m  60 1278k  99 1.1g  42 
>> +++++ +++
>> Latency             13447us   68490us     207ms    7187us 195ms   
>> 17665us
>> Version  1.98       ------Sequential Create------ --------Random 
>> Create--------
>> freebsd.shrew.lab   -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
>> /sec %CP
>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
>> +++++ +++
>> Latency             18225us      18us      28us   18812us 18us      25us
>>
>> End @ Sun Feb 18 15:03:11 UTC 2024
>>
>> I used identical options to run both that VM and the Linux VM I've 
>> been testing. The backing store for each VM has a 1TB partition and 
>> the guest interfaces are NVME. Now I'm really scratching my head.
>>
>> Chuck, were you testing disk performance in Linux VMs or only FreeBSD?
>>
>> Anyone have ideas on why Linux disk performance would drop off a 
>> cliff over time?
>>
> I took a detour trying out Xen but apparently that has it's own set of 
> performance issues related to the FreeBSD port missing newer features. 
> I did install KVM on the same hardware for comparison. I then tested a 
> guest provisioned identically to the bhyve VM with a virtio blk device 
> which ran for 2.5 hours with consistently impressive output ...
>
> ================================================================================ 
>
> Begin @ Thu Feb 22 20:55:01 CST 2024
>
> Version  2.00       ------Sequential Output------ --Sequential Input- 
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
> linux2.shrew.pr 31G 2191k  99  1.4g  55  1.1g  59 3484k  99  2.5g 83  
> 7817 127
> Latency              4480us    2528us   17656us    2650us 102us 3568us
> Version  2.00       ------Sequential Create------ --------Random 
> Create--------
> linux2.shrew.prv    -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
> +++++ +++
> Latency              9722us      90us     123us      61us 20us 998us
>
> End @ Thu Feb 22 20:56:11 CST 2024
> ================================================================================ 
>
>
>
> For comparison, here is the output from a recent run of a Linux VM on 
> bhyve using the virtio blk device. Note the performance drop off 
> between the first and second run ...
>
>
> ================================================================================ 
>
> Begin @ Thu Feb 22 22:00:02 CST 2024
>
> Version  2.00       ------Sequential Output------ --Sequential Input- 
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
> linux.shrew. 63640M  694k  99  1.5g  97  772m  70  985k  99  1.4g 75  
> 2302 115
> Latency             11557us   28959us   27540us    8308us 25227us    
> 8605us
> Version  2.00       ------Sequential Create------ --------Random 
> Create--------
> linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
> +++++ +++
> Latency              4058us     125us    1651us      96us 23us 627us
>
> End @ Thu Feb 22 22:03:07 CST 2024
> ================================================================================ 
>
> Begin @ Thu Feb 22 22:10:02 CST 2024
>
> Version  2.00       ------Sequential Output------ --Sequential Input- 
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
> linux.shrew. 63640M  676k  99  406m  99  300m  92  952k  99  373m 99  
> 2145 158
> Latency             11871us     154us   20673us    9926us 22765us   
> 14034us
> Version  2.00       ------Sequential Create------ --------Random 
> Create--------
> linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
> /sec %CP
>                  16     0  95 +++++ +++     0  79     0  96 +++++ 
> +++     0  76
> Latency              4426us     125us    1687us     576us 72us 654us
>
> End @ Thu Feb 22 22:19:19 CST 2024
> ================================================================================ 
>
>
> It certainly feels like a deficiency in bhyve that isn't tied to any 
> particular storage device model. I did notice a pattern in top that I 
> thought was peculiar. When watching the bhyve threads while the 
> benchmark test is running, I see several CPU threads running at the 
> top of the list followed by what I assume to be block device threads. 
> When the disk performance is high, it looks like this ...
>
>   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU 
> COMMAND
>  5628 root         68    0    32G    31G CPU24   24   0:53  88.90% 
> bhyve{vcpu 5}
>  5628 root         36    0    32G    31G vmidle   7   0:18  17.86% 
> bhyve{vcpu 15}
>  5628 root         34    0    32G    31G vmidle  26   1:06  16.76% 
> bhyve{vcpu 8}
>  5628 root         21    0    32G    31G uwait   37   0:05   2.69% 
> bhyve{blk-4:0-2}
>  5628 root         21    0    32G    31G uwait   60   0:04   2.64% 
> bhyve{blk-4:0-0}
>  5628 root         21    0    32G    31G uwait   52   0:06   2.62% 
> bhyve{blk-4:0-1}
>  5628 root         21    0    32G    31G uwait   14   0:05   2.58% 
> bhyve{blk-4:0-6}
>  5628 root         21    0    32G    31G RUN     50   0:05   2.51% 
> bhyve{blk-4:0-4}
>  5628 root         20    0    32G    31G uwait   38   0:05   2.51% 
> bhyve{blk-4:0-5}
>  5628 root         21    0    32G    31G uwait   56   0:05   2.46% 
> bhyve{blk-4:0-3}
>  5628 root         20    0    32G    31G uwait   22   0:06   2.45% 
> bhyve{blk-4:0-7}
>
> When disk performance drops off, I see that one of the bhyve CPU 
> threads shows it's PRI value climb quickly until it hits around 135. 
> At that point, the one CPU thread basically is pegged at 100% WCPU 
> until the test is done. Other bhyve threads are much less busy ...
>
>   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU 
> COMMAND
>  5518 root        135    0    32G    31G CPU59   59   6:49  99.97% 
> bhyve{vcpu 2}
>  5518 root         26    0    32G    31G vmidle  36   5:40   8.80% 
> bhyve{vcpu 15}
>  5518 root         23    0    32G    31G vmidle  18   0:41   4.74% 
> bhyve{vcpu 13}
>  5518 root         20    0    32G    31G vmidle  37   0:09   0.85% 
> bhyve{vcpu 10}
>  5518 root         20    0    32G    31G uwait    6   0:20   0.72% 
> bhyve{blk-4:0-2}
>  5518 root         20    0    32G    31G uwait    8   0:20   0.71% 
> bhyve{blk-4:0-7}
>  5518 root         20    0    32G    31G uwait   43   0:22   0.70% 
> bhyve{blk-4:0-0}
>  5518 root         20    0    32G    31G uwait   12   0:20   0.70% 
> bhyve{blk-4:0-5}
>  5518 root         20    0    32G    31G uwait   43   0:19   0.68% 
> bhyve{blk-4:0-3}
>  5518 root         20    0    32G    31G uwait   36   0:21   0.68% 
> bhyve{blk-4:0-6}
>  5518 root         20    0    32G    31G uwait   46   0:20   0.68% 
> bhyve{blk-4:0-4}
>  5518 root         20    0    32G    31G uwait   32   0:20   0.64% 
> bhyve{blk-4:0-1}
>
> This pattern in top repeats each time the benchmark is run unless the 
> guest is rebooted.
>
> Does bhyve call pthread_attr_setschedparam at run time under certain 
> circumstances or is that the scheduler doing it's thing? Anyone have 
> any ideas? I'm pretty much out of them :/
>
> Thanks in advance,
>
> -Matthew
>
>
Hi Matthew,

Have you tried using UFS2 or even memory backed host storage for testing 
yet? This will rule out if it is a ZFS or bhyve issue.

Cheers,

Jason.