From nobody Fri May 20 17:35:56 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 30DD61B3C201 for ; Fri, 20 May 2022 17:36:07 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L4YmQ2tRtz4qJG; Fri, 20 May 2022 17:36:06 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x731.google.com with SMTP id l82so2697109qke.3; Fri, 20 May 2022 10:36:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=GL5dkYnqsD9PNpwvU+s5rO8isuwbJ+6+IXQtSCkTd60=; b=X6Bj76WHNy6KQh6G9u3Rpplm6zjsqR8HXnou5TbBZeelQ150YeYzBWQzrxOJRFJuf6 JGg6X5jXmb6pDwRRBdPk6Q8TFoZWQCoAgSLetU6TTE6ADMekS9ajkxzzsqA6G2K0eJ9P kOT96K1HznZpght8N8RmC5FVMa/k/QVTz49YVANklFEBt7ZvZycsQ+k7WLyeXN5YZQiq WZgSbzPFR6U4vsmcgRi3SMO6jvHYq8J70zqHWSvEEPSedO6oTVzXJQsu2gUhZNesXsfO FqMxkE0mE4IalUph7C825NoJhjI/aI8gMHGS8WqOi1BplsAWaJ7pBHZ5PSSnYKx9Ju/n vrjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=GL5dkYnqsD9PNpwvU+s5rO8isuwbJ+6+IXQtSCkTd60=; b=BnC0cAVQ3Af1NBMnypkgQn3FSkqZKO7TZdfR0LPXfGZDi20eZGPbNpknP4f9IP0F+s VMdRrHRhgCt0AfhF8igt7emfmCuZ5JVduzbJ3Y5wZm6vJHAECIuaukBaUjb3QRqBHD9S duRZBa7YvLG70XQJVLXoXs66esTAbRVhsmPW4yEo0vZoCznRU+s3cMK+n9LoSRWYeBn7 IkLM04pP7FQmeA0VgXFVOJXCS8OBRUcOyy+MTT5+U/WOR17LV1GHvU5T58457bcLNtrc 7pTeaLxcMX2TrJeUG/GgwQ8avHp2f0fqlMqmlx1wMEjNUDfDUE+T2PQyFuY3NY31qkrd ci1g== X-Gm-Message-State: AOAM530kZ+stP3MxMk1EymkdbDXhY+thbMYm/ZxqInFIZpbBUItOATMp MvNr7ZF0REAFi1/JMkn+GfyFH0xu+q0= X-Google-Smtp-Source: ABdhPJws6TpVPWnQ4LIM3NZFySsjdHlhsLGP9B/Er0VKMFE7L3rjke9DRx4Hq//WxxmhcylPp8uMog== X-Received: by 2002:a05:620a:472a:b0:6a0:23f0:6a64 with SMTP id bs42-20020a05620a472a00b006a023f06a64mr7010889qkb.534.1653068159645; Fri, 20 May 2022 10:35:59 -0700 (PDT) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id d202-20020a379bd3000000b0069fc13ce1fcsm44697qke.45.2022.05.20.10.35.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 10:35:58 -0700 (PDT) Date: Fri, 20 May 2022 13:35:56 -0400 From: Mark Johnston To: Tomoaki AOKI Cc: Brooks Davis , Allan Jude , freebsd-hackers@freebsd.org Subject: Re: zfs support in makefs Message-ID: References: <20220518230427.GI15201@spindle.one-eyed-alien.net> <20220519182532.GJ15201@spindle.one-eyed-alien.net> <20220520213701.73a826e711b58a1799006825@dec.sakura.ne.jp> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220520213701.73a826e711b58a1799006825@dec.sakura.ne.jp> X-Rspamd-Queue-Id: 4L4YmQ2tRtz4qJG X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=X6Bj76WH; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::731 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.70 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; MID_RHS_NOT_FQDN(0.50)[]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; NEURAL_HAM_SHORT(-1.00)[-0.996]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::731:from]; MLMMJ_DEST(0.00)[freebsd-hackers]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N On Fri, May 20, 2022 at 09:37:01PM +0900, Tomoaki AOKI wrote: > On Thu, 19 May 2022 18:25:32 +0000 > Brooks Davis wrote: > > > On Thu, May 19, 2022 at 01:36:25PM -0400, Allan Jude wrote: > > > On 5/18/2022 7:04 PM, Brooks Davis wrote: > > > > On Wed, May 18, 2022 at 03:03:17PM -0400, Mark Johnston wrote: > > > >> Hi, > > > >> > > > >> For the past little while I've been working on ZFS support in makefs(8). > > > >> At this point I'm able to create a bootable FreeBSD VM image, using the > > > >> standard FreeBSD ZFS layout, and run through the regression test suite > > > >> in bhyve. I've also been able to create and boot an EC2 AMI. > > > > > > > > Very cool! > > > > > > > >> === Interface === > > > >> > > > >> Creating a pool with a single dataset is easy: > > > >> > > > >> $ makefs -t zfs -s 10g -o poolname=test ./zfs.img /path/to/input > > > >> > > > >> Upon importing such a pool, you'll get a dataset named "test" mounted at > > > >> /test containing everything under /path/to/input. > > > >> > > > >> It's possible to set properties on the root dataset: > > > >> > > > >> $ makefs -t zfs -s 10g -o poolname=test -o fs=test:setuid=off:atime=on ./zfs.img /path/to/input > > > >> > > > >> It's also possible to create additional datasets: > > > >> > > > >> $ makefs -t zfs -s 10g -o poolname=test -o fs=test/ds1:mountpoint=/test/dir1 ./zfs.img /path/to/input > > > >> > > > >> The parameter syntax is > > > >> "-o fs=[:=[:=[:...]]]". Only a > > > >> few properties are supported, at least for now. > > > >> > > > >> Dataset mountpoints behave the same as they would if created with the > > > >> standard ZFS tools. So by default the root dataset's mountpoint is > > > >> /test, test/ds1's mountpoint is /test/ds1, etc.. If a dataset overrides > > > >> its default mountpoint, its children inherit that mountpoint. > > > >> > > > >> makefs builds the output filesystem using a single input directory tree. > > > >> Thus, makefs -t zfs requires that at least one of the dataset's > > > >> mountpoints map to /path/to/input; that is, there is a "root" mount > > > >> point. > > > >> > > > >> The -o rootpath parameter defines this root mount point. By default it's > > > >> "/". All datasets in the pool must have their mountpoints > > > >> under this path, and one dataset's mountpoint must be equal to this > > > >> path. To build bootable images, one sets -o rootpath=/. > > > >> > > > >> Putting it all together, one can build a image using the standard layout > > > >> with an invocation like this: > > > >> > > > >> makefs -t zfs -o poolname=zroot -s 20g -o rootpath=/ -o bootfs=zroot/ROOT/default \ > > > >> -o fs=zroot:canmount=off:mountpoint=none \ > > > >> -o fs=zroot/ROOT:mountpoint=none \ > > > >> -o fs=zroot/ROOT/default:mountpoint=/ \ > > > >> -o fs=zroot/tmp:mountpoint=/tmp:exec=on:setuid=off \ > > > >> -o fs=zroot/usr:mountpoint=/usr:canmount=off \ > > > >> -o fs=zroot/usr/home \ > > > >> -o fs=zroot/usr/ports:setuid=off \ > > > >> -o fs=zroot/usr/src \ > > > >> -o fs=zroot/usr/obj \ > > > >> -o fs=zroot/var:mountpoint=/var:canmount=off \ > > > >> -o fs=zroot/var/audit:setuid=off:exec=off \ > > > >> -o fs=zroot/var/crash:setuid=off:exec=off \ > > > >> -o fs=zroot/var/log:setuid=off:exec=off \ > > > >> -o fs=zroot/var/mail:atime=on \ > > > >> -o fs=zroot/var/tmp:setuid=off \ > > > >> ${HOME}/tmp/zfs.img ${HOME}/tmp/world > > > >> > > > >> I'll admit this is somewhat clunky, but it doesn't seem worse than what > > > >> we have to do otherwise, see poudriere-image for example: > > > >> https://github.com/freebsd/poudriere/blob/master/src/share/poudriere/image_zfs.sh#L79 > > > >> > > > >> What do folks think of this interface? Is there anything missing, or > > > >> anything that doesn't make sense? > > > > > > > > I find it slightly confusing that -o options have a default namespace of > > > > pool options unless they have an fs=*: prefix, but making users type > > > > "pool:" for other options doesn't seem to make sense so this is probably > > > > the best solution. > > > > > > > > The density of data in the filesystem specification does suggest that > > > > someone might want to create a UCL config file format eventually, but > > > > what's here already seems entirely workable. > > > > > > > > -- Brooks > > > > > > In normal `zpool create` they use -o for pool properties, and -O for > > > dataset properties for the root dataset. I wonder if we might also want > > > -o poolprop=value and -O zroot/var:mountpoint=/var:canmount=off > > > > > > just to avoid the conceptual collision of those 2 different items. > > > > Sadly -O is taken in makefs. > > > > > One other possible issue: dataset properties can have a : in them, for > > > user-defined properties. Do we maybe want to use a , to separate them > > > instead? Although values can contain ,'s (the sharenfs property often > > > does), so that probably doesn't work either. > > > > One solution would be to allow the same fs=foo: to be specified multiple > > times (I've not checked if the current code allows this) to add options > > instead of having a separator. That does make the command line even more > > clunky though. > > > > -- Brooks > > Just an idea, what about moving partitioning (create pool) > functionality to sbin/gpart, keeping relatively common functionality > for datasets on /usr/sbin/makefs as primary proposal, and create, > for example, /usr/sbin/makefs_zfs for complicated, ZFS-only > functionalities. I think splitting ZFS pool creation into a separate tool would introduce some challenges; makefs would have to learn to read pool/vdev metadata and respect whatever properties that are set. Putting everything in one tool is simpler. gpart also doesn't seem like the right place, since one would typically use mkimg(1) to build a GPT. > It would look like gpart / mount / mount_* on other supported fs. > And keeps common makefs simper. > > IIRC, some fs-specific mount_* have extended functionality, that > `mount -t (fstype)` does not support. I like the idea of having a makefs_zfs since that would give us a new option namespace.