From nobody Mon Jul 28 16:19:49 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4brNwk0LsXz63FR5; Mon, 28 Jul 2025 16:19:50 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4brNwj5nGqz3Ymy; Mon, 28 Jul 2025 16:19:49 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1753719589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=D/SE4fImxn58L1N944U4VcVZIwbGtJX/V+trbBRsSuU=; b=Tv5hWJbNEudPyOWaLTM+MEgtk5tHjpLWEA5ik+fuDjOG2BEaK7QLT1f2IbVpybjygLRsCX pzJZU9ozX0Y3ibvyWbqfv2twJy/jZL/symXbkmV3yzGE2imqtyXJ87unymCy1cCuZv05Q7 yk9YPRhZHVNRyqVj3p6eZSEIGgEN4TyaBxokpxP0UJBSh0iVWEw2rPxUBLc+7Ookh4q5Hh RrAm5V8VUSQ9uYFfm2XrbG7J5W0E9v4kgf10A+TXVW/GJMf5R3yZ/bkXsXAW7g4wAhOB1i tH+b9n9Lqx8T3AUVSiuXOs4tYbZup/Br2/PbT1rg4Bs59397pJZPotHZ1R+D/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1753719589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=D/SE4fImxn58L1N944U4VcVZIwbGtJX/V+trbBRsSuU=; b=iU+VYN67YZDqErXXrnveS6sImbQYBWDaf2TWGUKBCCl9RakQc1N9nmrZurZqwfVhF8BvG5 KpMxzOdn+uoF5InynS9SUPz8yotGNT1IfFA/dEOxIiaL+YRXSPCVSPgMpObRMAHnU1xj4I biJENFr6VkCDMtD3ed2SPJLKH1LSYwaZnIz+UzzhMCFKgbUviD3F9lQYHMbdj9kIwr5T38 rMu5d/E8Ohc91BBdPhp3wJ+UNIIOO2UbpVMfTPD5mZfuQdzXQ3UCh532QQLNtBJ5fRSVQO 5S5G5vfb0DGfKpWZrb4VfhL2PPL2Y6Rf3X865lOMpc2dg/h09UEwHFZo2Oi96g== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1753719589; a=rsa-sha256; cv=none; b=cZdpRCaIcaZlPyJPIUzm1uXug3CGMhqdn52o0meUA+6OkYXoazFmEOBr+1cUsyBxl2gzKZ ZCW8XNc+CSHydC5JL0DzD6kJ35hKipVINQpjJWkvCYqVcjCfXScoS3f7ahz2YQ6PQlEb0m KfjO93uMSDtuuV2fnJCTWMg9BdLqCyeGG4R9P93CV+FdzMLyW9xM9Kks1ZV7T3sb+8CteE Olxn9gLdvHOxhMdkdOV3TwXXmQDmDeFlcwXKa5gToryK22YVxS8GfV1OxiyZyT7KgbABYN wTi6i8MJzL2Jgv75WQPlxMkUMyMRVckH5f2KRPzWOH+3lPJsdjHfJizs/hxgLg== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4brNwj4xkXzsFy; Mon, 28 Jul 2025 16:19:49 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 56SGJni7091697; Mon, 28 Jul 2025 16:19:49 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 56SGJnXb091694; Mon, 28 Jul 2025 16:19:49 GMT (envelope-from git) Date: Mon, 28 Jul 2025 16:19:49 GMT Message-Id: <202507281619.56SGJnXb091694@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Mark Johnston Subject: git: 3b263fa76c08 - main - if_tuntap: Try to fix device refcount bugs List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 3b263fa76c08993f37ce94cb7cd4605f67dd5965 Auto-Submitted: auto-generated The branch main has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=3b263fa76c08993f37ce94cb7cd4605f67dd5965 commit 3b263fa76c08993f37ce94cb7cd4605f67dd5965 Author: Mark Johnston AuthorDate: 2025-07-28 16:03:20 +0000 Commit: Mark Johnston CommitDate: 2025-07-28 16:19:38 +0000 if_tuntap: Try to fix device refcount bugs There are two ways to create a tun device, via devfs cloning or with ifconfig. The latter is implemented by tun_clone_create() and the former by tunclone(), which invokes tun_clone_create() via if_clone_create(). Both of these functions were invoking dev_ref() after creating the devfs_node(), but this was extraneous. tunclone() does need to acquire an extra reference since this is required by the dev_clone EVENTHANDLER interface contract, but it was already doing so by specifying MAKEDEV_REF. Fix this by removing unnecessary refcount acquisitions. A second problem is with teardown in a VNET jail. A tun interface created by device cloning will hold a credential reference for the jail, which prevents it from being destroyed and in particular prevents VNET SYSUNINITs from running. To fix this, we need to register a PR_METHOD_REMOVE callback for jail teardown which, in a VNET jail, destroys cloned interfaces. This way, jail teardown can proceed. These bugs are noticeable with something like # jail -c name=test vnet command=ls /dev/tun # jls -vd While here, add some comments and be sure to destroy a nascent mutex and condition variable in an error path. Reviewed by: kib MFC after: 1 month Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D51525 --- sys/net/if_tuntap.c | 76 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 58 insertions(+), 18 deletions(-) diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c index 3bab04aa4d38..5e6f65c04b2f 100644 --- a/sys/net/if_tuntap.c +++ b/sys/net/if_tuntap.c @@ -74,6 +74,7 @@ #include #include #include +#include #include #include @@ -178,6 +179,7 @@ struct tuntap_softc { static struct mtx tunmtx; static eventhandler_tag arrival_tag; static eventhandler_tag clone_tag; +static int tuntap_osd_jail_slot; static const char tunname[] = "tun"; static const char tapname[] = "tap"; static const char vmnetname[] = "vmnet"; @@ -497,6 +499,10 @@ vmnet_clone_match(struct if_clone *ifc, const char *name) return (0); } +/* + * Create a clone via the ifnet cloning mechanism. Note that this is invoked + * indirectly by tunclone() below. + */ static int tun_clone_create(struct if_clone *ifc, char *name, size_t len, struct ifc_data *ifd, struct ifnet **ifpp) @@ -532,15 +538,19 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, if (i != 0) i = tun_create_device(drv, unit, NULL, &dev, name); if (i == 0) { - dev_ref(dev); + struct tuntap_softc *tp; + tuncreate(dev); - struct tuntap_softc *tp = dev->si_drv1; + tp = dev->si_drv1; *ifpp = tp->tun_ifp; } return (i); } +/* + * Create a clone via devfs access. + */ static void tunclone(void *arg, struct ucred *cred, char *name, int namelen, struct cdev **dev) @@ -595,11 +605,12 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, } i = tun_create_device(drv, u, cred, dev, name); - } - if (i == 0) { + } else { + /* Consumed by the dev_clone invoker. */ dev_ref(*dev); - if_clone_create(name, namelen, NULL); } + if (i == 0) + if_clone_create(name, namelen, NULL); out: CURVNET_RESTORE(); } @@ -669,16 +680,6 @@ vnet_tun_init(const void *unused __unused) VNET_SYSINIT(vnet_tun_init, SI_SUB_PROTO_IF, SI_ORDER_ANY, vnet_tun_init, NULL); -static void -vnet_tun_uninit(const void *unused __unused) -{ - - for (u_int i = 0; i < NDRV; ++i) - if_clone_detach(V_tuntap_driver_cloners[i]); -} -VNET_SYSUNINIT(vnet_tun_uninit, SI_SUB_PROTO_IF, SI_ORDER_ANY, - vnet_tun_uninit, NULL); - static void tun_uninit(const void *unused __unused) { @@ -689,6 +690,16 @@ tun_uninit(const void *unused __unused) EVENTHANDLER_DEREGISTER(ifnet_arrival_event, arrival_tag); EVENTHANDLER_DEREGISTER(dev_clone, clone_tag); + CURVNET_SET(vnet0); + for (u_int i = 0; i < NDRV; i++) { + if_clone_detach(V_tuntap_driver_cloners[i]); + V_tuntap_driver_cloners[i] = NULL; + } + CURVNET_RESTORE(); + + if (tuntap_osd_jail_slot != 0) + osd_jail_deregister(tuntap_osd_jail_slot); + mtx_lock(&tunmtx); while ((tp = TAILQ_FIRST(&tunhead)) != NULL) { TAILQ_REMOVE(&tunhead, tp, tun_list); @@ -724,6 +735,30 @@ tuntap_driver_from_ifnet(const struct ifnet *ifp) return (NULL); } +/* + * Remove devices that were created by devfs cloning, as they hold references + * which prevent the prison from collapsing, in which state VNET sysuninits will + * not be invoked. + */ +static int +tuntap_prison_remove(void *obj, void *data __unused) +{ +#ifdef VIMAGE + struct prison *pr; + + pr = obj; + if (prison_owns_vnet(pr)) { + CURVNET_SET(pr->pr_vnet); + for (u_int i = 0; i < NDRV; i++) { + if_clone_detach(V_tuntap_driver_cloners[i]); + V_tuntap_driver_cloners[i] = NULL; + } + CURVNET_RESTORE(); + } +#endif + return (0); +} + static int tuntapmodevent(module_t mod, int type, void *data) { @@ -738,8 +773,12 @@ tuntapmodevent(module_t mod, int type, void *data) clone_setup(&drv->clones); drv->unrhdr = new_unrhdr(0, IF_MAXUNIT, &tunmtx); } + osd_method_t methods[PR_MAXMETHOD] = { + [PR_METHOD_REMOVE] = tuntap_prison_remove, + }; + tuntap_osd_jail_slot = osd_jail_register(NULL, methods); arrival_tag = EVENTHANDLER_REGISTER(ifnet_arrival_event, - tunrename, 0, 1000); + tunrename, 0, 1000); if (arrival_tag == NULL) return (ENOMEM); clone_tag = EVENTHANDLER_REGISTER(dev_clone, tunclone, 0, 1000); @@ -747,7 +786,7 @@ tuntapmodevent(module_t mod, int type, void *data) return (ENOMEM); break; case MOD_UNLOAD: - /* See tun_uninit, so it's done after the vnet_sysuninit() */ + /* See tun_uninit(). */ break; default: return EOPNOTSUPP; @@ -798,6 +837,8 @@ tun_create_device(struct tuntap_driver *drv, int unit, struct ucred *cr, args.mda_si_drv1 = tp; error = make_dev_s(&args, dev, "%s", name); if (error != 0) { + mtx_destroy(&tp->tun_mtx); + cv_destroy(&tp->tun_cv); free(tp, M_TUN); return (error); } @@ -914,7 +955,6 @@ tap_transmit(struct ifnet *ifp, struct mbuf *m) return (error); } -/* XXX: should return an error code so it can fail. */ static void tuncreate(struct cdev *dev) {