Re: ena(4) tx timeout messages in dmesg

From: Pete Wright <pete_at_nomadlogic.org>
Date: Mon, 12 May 2025 19:29:57 UTC

On 5/12/25 11:56, Colin Percival wrote:
> On 5/12/25 11:25, Pete Wright wrote:
>> On 5/12/25 11:17, Colin Percival wrote:
>>> On 5/12/25 11:04, Pete Wright wrote:
>>>> hey there - i have an ec2 instance that i'm using as a nfs server 
>>>> and have noticed the following messages in my dmesg buffer:
>>>> [...]
>>>> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 
>>>> msecs have passed since last cleanup. Missing Tx timeout value 5000 
>>>> msecs.
>>>>
>>> I've heard that this can be caused by a thread being starved for CPU, 
>>> possibly
>>> due to FreeBSD kernel scheduler issues, but that was on a far more 
>>> heavily
>>> loaded system.  What instance type are you running on?
>>
>> oh of course, forgot to provide useful info:
>>
>> # uname -ar
>> FreeBSD airflow-nfs.q0.ringdna.net 14.2-RELEASE-p1 FreeBSD 14.2- 
>> RELEASE-p1 GENERIC amd64
>>
>> Instance type:
>> t3a.xlarge
>>
>> I also verified I have plenty of available "burstable credit" 
>> available since this is a t class system (current balance is steady at 
>> 2,300 credits).
> 
> Ah, this won't necessarily help you -- T family instances are on shared
> hardware so even if you have burstable credits it's possible that you'll
> be unlucky with "noisy neighbours" and the sibling instances will all want
> CPU at the same time as you.  But I think there's probably something else
> going on as well.
> 


oh that's a good point, since this is a pre-prod system that is less of 
a concern as we want to limit spend when possible.  i'll be spinning up 
production systems in the following week or so that will be on a "c" 
class system, i'll keep an eye out to see if see similar messages in 
that environment.

-pete

-- 
Pete Wright
pete@nomadlogic.org