From nobody Fri Apr 05 14:23:20 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VB11c5SNKz5GYhK for ; Fri, 5 Apr 2024 14:23:32 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VB11c2Zrwz4CQN; Fri, 5 Apr 2024 14:23:32 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6e703e0e5deso1972948b3a.3; Fri, 05 Apr 2024 07:23:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712327011; x=1712931811; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZgqB9irm2zD89tqI5J1B2zI4O+2L3bPCs8gndeHHYnM=; b=ap6K6Bn4O7hPm8xICGxy/mjoCp7M7/K3iCgUyGxtRuvnQvLaxI/JdXCIA2ugQ6OOKt L0CFTdzJ7DzVbDa4vybFKvfoUL55qSby1FtzOhv5Raqpq5uBo63VXbgQApkxM7di0TOg AirAk1oN8ogLfeKyvkHK50LYKfB9jSV+HwmG9H6Pon/yCyeKxIp8gTY9zAjOL7RSVRkR YKryZD2A0t52klJo08UmqO4Y/aJuZ4WVuqxfE9YAFN9moFYgfdHqJbirrxIMl5EAKaai qR+Y2MhVmDoZniQF69KWZJpsx9la/9r4FVLhH51sbQhXxPH/gLb045y3W6IF+Aw18c2c ywWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712327011; x=1712931811; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZgqB9irm2zD89tqI5J1B2zI4O+2L3bPCs8gndeHHYnM=; b=J9lS7n50zGDtcK+D+I0w/hgIcKc4/z54nBhKyDWHwzmeUOZIecF8599iyb6QboDYlM ED64kKSHf/q+BITEC02G+3wqJ+LUWSQmTDdwiJgz6RBIKJcGUIPIAMN6XG+Cr01I0Dnu cXTKOypC5lfrleewv37naxxUM/T5eJX2I+/CuC1XEXbNryl/ZKKc1qLPg4JDHa5i9qKd 0X1NtE8reKega8H6wT+ygSpeP4AGf1niFWutHee5z4LFnDmvwo8WPbXxa58M58UxyVMq fXYsBGbKhc8+fm4vARvyn1kKGOYKVSaH75pfDvbGcjCPt/Yk4YRb7nVe+Hh1NKQjxMjK AajQ== X-Forwarded-Encrypted: i=1; AJvYcCWPZBXcr1XSEOEeTHbY9THdXhskYkvQrG3YqVPnTPFajA/cy5+XPAZogG4m3pV1fBIkFIs53stc+f4Atk81AUEJ3LIR/IlyZdF5+3tIVIt2OBUHDAEP5aglpyMajygwkBY= X-Gm-Message-State: AOJu0Yx6W8sEd3G+SWBM6XXiY6dFo03epGcVtaQThKE2l9VyyLsgAbwy T4qGiL1FBzuF+CbeOqRf+7ZDd94FFF5nEMvBIKiKsP13SIh4lKA9y92AINYPnDZlWbLEclncKh0 HEPnBDqBVFyrw2OGT2rlgz2BO6syiD0A= X-Google-Smtp-Source: AGHT+IF049EW95kZXCGh3XpKGESTw3k7os+YmJMBk7knMvvufo2Ro/pDHLZyN+NaR1nk4QDx65GptHX/XzdesO4CzDg= X-Received: by 2002:a05:6a21:33aa:b0:1a3:c2dd:f1cd with SMTP id yy42-20020a056a2133aa00b001a3c2ddf1cdmr2012276pzb.56.1712327010711; Fri, 05 Apr 2024 07:23:30 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <202404050543.4355hDcS009860@critter.freebsd.dk> <202404051354.435Ds1KX086243@critter.freebsd.dk> In-Reply-To: From: Rick Macklem Date: Fri, 5 Apr 2024 07:23:20 -0700 Message-ID: Subject: Re: SEEK_HOLE at EOF To: alan somers Cc: Poul-Henning Kamp , Alan Somers , FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4VB11c2Zrwz4CQN On Fri, Apr 5, 2024 at 7:13=E2=80=AFAM alan somers wrot= e: > > On Fri, Apr 5, 2024 at 7:54=E2=80=AFAM Poul-Henning Kamp wrote: > > > > -------- > > Alan Somers writes: > > > On Thu, Apr 4, 2024 at 11:43=3DE2=3D80=3DAFPM Poul-Henning Kamp > > dk> wrote: > > > > > > Just two minor quibbles: > > > > > > > > If the file position is EOF, then you /are/ "beyond the end of the = file" > > > > because a read(2) would not be able to return any data. > > > > > > Do you distinguish between "at EOF" and "beyond EOF"? As a bit of an aside, NFSv4.2 does differentiate between "at EOF" and "beyond EOF" for its Seek operation. The fun part is that Linux did not implement what is in the RFC and shipped to many before the "bug" was noticed (and still do not conform to the RFC afaik). As such, there are now two ways to do it, The RFC way or the Linux way. Selecting between them is what the sysctl vfs.nfsd.linux42server does. > > > And does it not > > > trouble you that calling SEEK_HOLE from the beginning of the "virtual > > > hole at EOF" will return ENXIO, even though calling SEEK_HOLE from th= e > > > beginning of any real hole will return the current offset? > > > > EOF is where the file ends and there's no "hole" there, because there > > no more file on the other side of that "hole". > > > > When you stand on a cliff, the ocean is not "a hole in the landscape", > > it's where the landscape ends. > > Except there is a hole at EOF, a virtual hole. The draft spec > specifically says "all seekable files shall have a virtual hole > starting at the > current size of the file". I think that they used the term "virtual" to indicate this is not a real ho= le and I think it was a good idea, since it allows file systems that do not support holes to support SEEK_DATA. However, I still believe that conforming to the Austin Group draft is preferable. rick > > > > > > > And returning ENXIO is more informative than returning the size of = the > > > > file, since it atomically tells you that there are no more holes. > > > > > > Ahh, that's a good point. It's the first point I've heard in favor o= f > > > this option. Are you aware of any applications that need to know > > > that? > > > > No, but that should not get in the way of good syscall architecture :-) > > > > It might be useful for archivers which try to be smart about sparse fil= es. > > I imagine that most archivers would work like this: > ofs =3D 0 > loop { > let start =3D lseek(fd, ofs, SEEK_DATA); > if ENXIO { > // No more data regions > break > } > let end =3D lseek(fd, ofs, SEEK_HOLE); > assert!(!ENXIO) // thanks to the virtual hole, we should never > have ENXIO here > copy(fd, start, end - start, ...) > ofs =3D end > } > truncate(output_file, fd.fsize) > > Since archivers really only care about data regions, not holes, I > don't think that they would usually call SEEK_HOLE at EOF. > > > > > -- > > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > > phk@FreeBSD.ORG | TCP/IP since RFC 956 > > FreeBSD committer | BSD since 4.3-tahoe > > Never attribute to malice what can adequately be explained by incompete= nce. >