From nobody Tue Jul 06 14:41:32 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id AAF3911DC58F for ; Tue, 6 Jul 2021 14:41:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660053.outbound.protection.outlook.com [40.107.66.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GK4xp27cMz3G85 for ; Tue, 6 Jul 2021 14:41:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XYeFM34YaFt0imknxoEj05c87q4jCd21erS7vD/sjhptZ9ANKEIgKMGhBBeIuwQNabkjciqLJmAZxvR+ZtRSTRRrFv0pBnv8uLzjoASjsZf66DjIJdsdG/bzDBE9hbOAc4gx02uX69NgjuQ+vwc6yGB1iOFzdEgy2p++R/YNcOndUEzS+dd7t69wlyTnh4VGHe8lZ2umh+9619RfgIgruaqZf9jy1ReNlJ41wULiYlZNZiXy7iwqu4A1G27ujQuNjU9cjylIGa+iHoGlPh9v5whDaUYXkx+plgoB01s1FWWlOkxhI5YWw4VlmI960zsSqlNjj48Jv/o0kj1Vbbt3Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Fpr04W9gRFg5a2cu8KE4F0BO5+cWTpHKx/VS1bKJcog=; b=TUeVK7D8reX+3b3/iSegoKAu7wEAzi41mINs/nZ80tFtGYyyBv+TD++udIQEZFfNYAAZRIyfqDVzmxWecAxsBa1UKbZ2PxKsk61VYbb8JtkG4MuxWExW3tIZiRy3ObripZkd/IJmdzoXAsz/UTAgvdqTQTkAOES3ZlTbHoZCNJ31YMKQ16hy2cXcpDAkSUF2S5IX/oLwDgGJwCsN/cr9klPcK+8mXcKh/G2r/s6E61ikd0jqAHrvmSQ2+N9lFe7hRtVYRgxFmkgQOyg+bevXGg5vS1p4xSHQmACiDuWxJOxXB4Y9O9THnrzfFhWdyxtOzOwOa7bNHSO+sRqoH9vipA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Fpr04W9gRFg5a2cu8KE4F0BO5+cWTpHKx/VS1bKJcog=; b=WpyrPXk5BLQsOoBgKHiOGQh+H2Uib3MSCHWpQTABXy32sI8ex/1MzQ561R+dQ7pzT119dgU/RD+ubjLBCbUZ70uAY5jSO39M/gjFIwgZzbT0bTQb0f+isZV++V0CU9C2l1Jry70snLv1U2owyoNCelVsKLl5TBoF72nyx8+FfdT8MzlHr19Gnp4HozRYL7Ib/QYhGeWFwUjrXbP7+UPh5bJvbYQIXXNDjx2zDlhgB+hIlCgeuY0m/9t6e98M4lgL5E5e8BxZUfigJ6md3kmwzm+AFIMYNf/rA8we/IvnjCW6uyAnCSihbj5P8SAxq2GrtwSrR2LG7J97m9BY2C5feA== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by YQXPR01MB2758.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:45::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.24; Tue, 6 Jul 2021 14:41:32 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::583:528b:dbac:37bf]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::583:528b:dbac:37bf%4]) with mapi id 15.20.4287.033; Tue, 6 Jul 2021 14:41:32 +0000 From: Rick Macklem To: Adam Stylinski , "freebsd-fs@freebsd.org" Subject: Re: Issues with NFS RPC Thread-Topic: Issues with NFS RPC Thread-Index: AQHXcm21NJMKcd9NW0iP6xnIn7lYjqs2Ar5I Date: Tue, 6 Jul 2021 14:41:32 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 24988f77-b118-4ce2-90c9-08d9408c25e7 x-ms-traffictypediagnostic: YQXPR01MB2758: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6108; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: v4KlIfBwDYqI7EBIkuyYjb+srz1AvW6cq8dEF4ycZrSjGwmU9EcEN1TdWro+MW60HB80cbU110txx6XodDmYU9lROxREHOAsouMtmPYTWYYj44iJflVzYJRcEeUxYfrbQTnPg49Go7SCNcvIv7ehKmhey5cB+XLIeSg4wakTdRXJ0M94MiLf/6xVPuj6p1yDSR/c+2jiiweJbUufKFz8yLiyTi/71lqr7B/Zzfm8QB5Fy/bk59iKdmvD304pRV39P3kOwNNO1RCzoYyW7bE+dZcPZjL5dSgnyrOofZMoQT6SmwilSyRubDRW5bF1rYH4JEyEIDiEBekdg/uZD6NRns+qygs2kNYhsJ0rEVnCEIULhKaxPqg+1zG82wWsPGV4DtZHR37ukh3Dk/xXNabB4FeKPDIZmPnkFaaUMrq5O2iRbQjGHnFQ40vxf0ZSt7BPhTNB9nzo0nixnVEn7YwWVjj2j1LZgiUmAO89t+Umpk2D6Be0VtZagGq+vDGoEKx+smmCmZcBjRZKJ5TbpuGVpdzq1ZLfia12Sd5nuWvX8ROBdZsWMYuuzzgw1Y/NiXhLUfUydcLP9+e9BWUxsjBNwXJVrY7nUnboosSGmBQwDhAB6kVmkfE8xf5PuUOpRw75pGepxcPPye1pu4Se/KELCt8guNtpFqoSkgSyy2oP7BLnWDSv4NAOs8Fx/zt6tRU2myvwrFBRz/fPp/fsXBbjCYWJ7x51ZlK2PQvpp7xH/fv4/sJ9AkGpoy3XI0BqFgSBSHc+ikpaokD4ZvheR9y7nw== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(136003)(366004)(376002)(39860400002)(396003)(346002)(83380400001)(33656002)(3480700007)(64756008)(110136005)(316002)(786003)(71200400001)(66446008)(66946007)(966005)(122000001)(76116006)(86362001)(55016002)(66556008)(2906002)(8676002)(5660300002)(7696005)(6506007)(66476007)(38100700002)(53546011)(478600001)(52536014)(8936002)(9686003)(186003)(10126625002);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?xUyWH9S6PJv45HKrc2yYYfTcIYMbrrZvckhDtjwYy0KkQMya84mDiCDtUXIY?= =?us-ascii?Q?m19lo/32/9yne05pCMUu2pG/lYk4K4RpybozDr2hONSkXW28veiB7Cpv/pVK?= =?us-ascii?Q?bDuOrnRERDBoAuNcexb2xpeHnokjd6AH0rnmETN5t5jEKVNKox2takHz91ME?= =?us-ascii?Q?eupubjxe+/F+MjHfb32/LFxyxPGqUAytnGB8S5FefnexSgx9AwDosWzuJeuF?= =?us-ascii?Q?9aXptIP6SzeU/tUVP9ieix+HRMiryVC8KW/frSBl8p43gFhGY+/dxMLlQEf8?= =?us-ascii?Q?BnC1elHTYIrBZswYZ9ZgOWQR38q5Xo7Gj/bdxsDyA7n5n/ZELaJi2sCQncAs?= =?us-ascii?Q?Y0ZGuW0Eigxx2PV7aEZ59mdth4qK5tTPh044wgttxY6fdX0ldWq+ybbSs12i?= =?us-ascii?Q?Xwi5a1QTrFfuQ/perMQin6m37eZZ4QaML2iG/FKrxW2XRpj+emhl8iLvgBTm?= =?us-ascii?Q?HMTsLvvw6Jx6yTm2u6vTjslEfA8+sT7ZtyHBGnA/1Ci2E5tLHJfoAeu0C6w3?= =?us-ascii?Q?9LPb8vQ812wiIM2Xit76jYQGMkUw6oXQmV4nRTn3uHoMpIXElwfEAl7hPwil?= =?us-ascii?Q?zWfvSmV+tiuofilR9W34n2n/YOYD9Q1vwjLhUj0lKMZPdVwVLOEGBOYHdRHE?= =?us-ascii?Q?JR6lcZGoSgJ4f8Q+bYys9eaxqP5M6rSV3OiIde9gOo6mH2Dk0RCpdeXcI9+x?= =?us-ascii?Q?WiKy84T1708r10zwn7pQ2PWt8FCM+VvAgRBOBu/B/GtgfVo8vgP9cwcPDxV4?= =?us-ascii?Q?uNy8CcLmn6fj7XI6EPAz2rAVbMCp/nWrNzaGkxAOtpONPElD1AzFyHGBBWJY?= =?us-ascii?Q?GNF4n9fccE8RLXJIaa25efCknuMC+DBiScP99vvQbjmd1kU40z6UX37tUJ0y?= =?us-ascii?Q?loYqCyYGH6Fb5Jl4exJ1zhd95wz7q5wr1elHDxxQQPzGgFXF+Xa0KdepSiY0?= =?us-ascii?Q?ZY9sBc5QdPwJrmQjXzxYiXe9xZzu/FUYyYClJWKWHVK7k3NKtG01pHxzqyyx?= =?us-ascii?Q?Lsp2OObbjG4TuzK8fY01CRjzf95SxNSqFx4bkRp/8nxWCmYs00hw99sUSv1q?= =?us-ascii?Q?6CBgBDK4m7KMxhFk34fOYRABtfcv8uFJnuvO96BtVaBBjcb+N0krqq1fvDWr?= =?us-ascii?Q?rYWV+JpqVke2/+fhQDEX2p4SuWvuELBZ6ieBXOGhlgVcDN5+6DOl0APc1a2m?= =?us-ascii?Q?LAWregLubhYykUNQxm7URO5beD8cQW9C6Y5CShiRsw4g1QZHNVGmeV3Rd1tC?= =?us-ascii?Q?nXnwqL1AoWHapnAi3i7YVbB31wuGgGemufgnruYuHluHAyotMf6CK9X2FpiD?= =?us-ascii?Q?lkXxDJmZYlJII7iNbSQ2jpYX+JHN+MAXHaktBnqKfkA8tj+mUQHjFeOwAzVZ?= =?us-ascii?Q?y7bhtQ0=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 24988f77-b118-4ce2-90c9-08d9408c25e7 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Jul 2021 14:41:32.4488 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: R6ipitfZL/vybgK6uXJPwsMuySgRbf+N7xHf4bwt71sgN8UrFAgEI85KNy5yBicSDVJiyvm8UX9Oqe3JY/ubFA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB2758 X-Rspamd-Queue-Id: 4GK4xp27cMz3G85 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Hope you don't mind a top post. Although you provided a lot of information= , but I didn't see mention of what version of FreeBSD you are using? If it is FreeBSD 13.0, then the problem is well known and fixed in stable/1= 3. (There is also a less common problem that is in older versions of FreeBSD as well.) See PR#256280 on bugs.freebsd.org (Comment #2) If your problem does not appear to be either of these (determined mostly by the output line of "netstat -a" at the time of the hung client for that client's connection, please post again or comment on the bug report). rick ________________________________________ From: owner-freebsd-fs@freebsd.org on behalf= of Adam Stylinski Sent: Tuesday, July 6, 2021 9:48 AM To: freebsd-fs@freebsd.org Subject: Issues with NFS RPC CAUTION: This email originated from outside of the University of Guelph. Do= not click links or open attachments unless you recognize the sender and kn= ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo= guelph.ca Hello, So this may be something somewhat specific to my configuration, but it's starting to smell like a bug somewhere in NFS's RPC handling (either the Linux client or the FreeBSD rpcbind). I have two machines, connected via a 40gbps direct attached link, with static IPs. They are leveraging jumbo frames (9000 byte MTU). The storage is backed by a healthy zpool. I can reliably reproduce this issue, but it takes a long amount of time (it was 40GB worth of packet capture before I gave up and then the issue finally reappeared). It seems that after a long enough time frame over an NFSv3 export, virtualbox hangs my VM that has disks backed over that share. The rsize and wsize are 128k to match the maximum stripe size of the pool, and I'm just using plain old sec=3Dsys, no kerberos involved. The error I get from rpcdebug on the Linux client looks as follows: https://pastebin.com/rCv2ZTri Error 110 I looked up is a generic timeout. During this time, when the server seems to be going deaf to these xids, I can ping the server over the interface the connection is over. Traffic flows fine, the NICs are basically unutilized. There are no visible errors on any of the interfaces. The NICs are ConnectX-3's, running in en mode (ethernet). I tried switching to NFSv4, and eventually had the same problem, but with the added bonus that it never seems to successfully retransmit and hangs in perpetuity (NFSv3 eventually recovers, after the likely 600 second timeout)= . These seem to be fairly reliable NICs, and I don't see anything on the server or client to indicate that it's a network hardware issue. Is there anything I can do to diagnose this on the FreeBSD server end? It seems that the Linux kernel's rpcdebug facilities seem to mostly just give a bunch of noise. I did manage to run wireshark on the client during this stall period, and I had noticed some TCP packets that were classified as duplicate ACKs when the NFS traffic finally turned over again.