From brampton+freebsd-fs at gmail.com Wed Apr 1 07:32:45 2009 From: brampton+freebsd-fs at gmail.com (Andrew Brampton) Date: Wed Apr 1 07:32:56 2009 Subject: Auto mount and ignore errors In-Reply-To: <49D2F2DE.1090807@bsd.ee> References: <49D2F2DE.1090807@bsd.ee> Message-ID: 2009/4/1 Andrei Kolu : > Andrew Brampton wrote: >> >> So my question is, is there a fstab option which will ignore a failed >> mount, and if possible continue to boot? I've read the man page, and >> did a bit of googling, but didn't find anything. Would there be any >> objection to a patch which implemented a "ignerror" flag? >> >> > > Mount from /etc/rc.local? > You mean create my own script for the mounting? Sure that would work but I don't see that as "clean" as placing it in fstab. Andrew From aaron at goflexitllc.com Wed Apr 1 10:43:31 2009 From: aaron at goflexitllc.com (Aaron Hurt) Date: Wed Apr 1 10:43:37 2009 Subject: Auto mount and ignore errors In-Reply-To: References: <49D2F2DE.1090807@bsd.ee> Message-ID: <49D3A177.2040106@goflexitllc.com> Andrew Brampton wrote: > 2009/4/1 Andrei Kolu : > >> Andrew Brampton wrote: >> >>> So my question is, is there a fstab option which will ignore a failed >>> mount, and if possible continue to boot? I've read the man page, and >>> did a bit of googling, but didn't find anything. Would there be any >>> objection to a patch which implemented a "ignerror" flag? >>> >>> >>> >> Mount from /etc/rc.local? >> >> > > You mean create my own script for the mounting? Sure that would work > but I don't see that as "clean" as placing it in fstab. > > Andrew > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > !DSPAM:2,49d37b5b210281774033920! > > I think "clean" here is a bit misplaced. It's not ever "clean" to ignore a file system mount error. If you're file system in question is prone to mount failures it's probably not a good idea to mount it from fstab at all. Aaron Hurt Managing Partner Flex I.T., LLC 611 Commerce Street Suite 3117 Nashville, TN 37203 Phone: 615.438.7101 E-mail: aaron@goflexitllc.com From fb-fs at psconsult.nl Wed Apr 1 12:44:17 2009 From: fb-fs at psconsult.nl (Paul Schenkeveld) Date: Wed Apr 1 12:44:24 2009 Subject: Auto mount and ignore errors In-Reply-To: <49D3A177.2040106@goflexitllc.com> References: <49D2F2DE.1090807@bsd.ee> <49D3A177.2040106@goflexitllc.com> Message-ID: <20090401193253.GA87622@psconsult.nl> On Wed, Apr 01, 2009 at 12:16:39PM -0500, Aaron Hurt wrote: > Andrew Brampton wrote: >> 2009/4/1 Andrei Kolu : >> >>> Andrew Brampton wrote: >>> >>>> So my question is, is there a fstab option which will ignore a failed >>>> mount, and if possible continue to boot? I've read the man page, and >>>> did a bit of googling, but didn't find anything. Would there be any >>>> objection to a patch which implemented a "ignerror" flag? >>>> >>> Mount from /etc/rc.local? >> >> You mean create my own script for the mounting? Sure that would work >> but I don't see that as "clean" as placing it in fstab. >> >> Andrew >> > I think "clean" here is a bit misplaced. It's not ever "clean" to ignore a > file system mount error. If you're file system in question is prone to > mount failures it's probably not a good idea to mount it from fstab at all. I use my notebook both in my home office and when visiting customers. When in my office I like to mount some nfs filesystems that are not available when I'm out. Before 7.1 /etc/rc worked just the way I wanted mounting these nfs filesystems when in my office and skipping them with an error telling me that the name of the nfs server cannot be resolved when visiting customers. That old behaviour was clearly wrong as it would also continue booting when really important filesystems cannot be mounted, e.g. due to network problems. However I'd like to argue that there are perfectly legal cases for having filesystems in /etc/fstab that only get mounted when available. So I agree with OP that there should be a clean way of marking some filesystems 'optional' or 'ignerror' in fstab. My 2 cents. Paul Schenkeveld From antik at bsd.ee Wed Apr 1 22:56:41 2009 From: antik at bsd.ee (Andrei Kolu) Date: Wed Apr 1 22:56:48 2009 Subject: Auto mount and ignore errors In-Reply-To: <20090401193253.GA87622@psconsult.nl> References: <49D2F2DE.1090807@bsd.ee> <49D3A177.2040106@goflexitllc.com> <20090401193253.GA87622@psconsult.nl> Message-ID: <49D4539A.3080506@bsd.ee> Paul Schenkeveld wrote: > On Wed, Apr 01, 2009 at 12:16:39PM -0500, Aaron Hurt wrote: > >> Andrew Brampton wrote: >> >>> 2009/4/1 Andrei Kolu : >>> >>> >>>> Andrew Brampton wrote: >>>> >>>> >>>>> So my question is, is there a fstab option which will ignore a failed >>>>> mount, and if possible continue to boot? I've read the man page, and >>>>> did a bit of googling, but didn't find anything. Would there be any >>>>> objection to a patch which implemented a "ignerror" flag? >>>>> >>>>> >>>> Mount from /etc/rc.local? >>>> >>> You mean create my own script for the mounting? Sure that would work >>> but I don't see that as "clean" as placing it in fstab. >>> >>> Andrew >>> >>> >> I think "clean" here is a bit misplaced. It's not ever "clean" to ignore a >> file system mount error. If you're file system in question is prone to >> mount failures it's probably not a good idea to mount it from fstab at all. >> > > I use my notebook both in my home office and when visiting customers. > When in my office I like to mount some nfs filesystems that are not > available when I'm out. Before 7.1 /etc/rc worked just the way I wanted > mounting these nfs filesystems when in my office and skipping them with > an error telling me that the name of the nfs server cannot be resolved > when visiting customers. > > For NFS I use automounter: /etc/rc.conf # NFS automount amd_enable="YES" In case my NFS backup server is down or there is problem with network then I can start fileserver without manually interacting with fstab to start up operating system. Now my backup script can mount filesystems as needed with this script: #!/bin/sh # This script does personal backups to a rsync backup server. You will end up # with a 7 day rotating incremental backup. The incrementals will go # into subdirectories named after the day of the week, and the current # full backup goes into a directory called "current" # tridge@linuxcare.com # directory to backup BDIR=/data/samba ######################################################################## cd /host/192.168.0.249/ BACKUPDIR=`date +%A` OPTS="--progress --force --ignore-errors --delete-excluded --exclude-from=$EXCLUDES --delete --backup --backup-dir=/data/backup/$BACKUPDIR -a" export PATH=$PATH:/bin:/usr/bin:/usr/local/bin # the following line clears the last weeks incremental directory [ -d $BDIR/emptydir ] || mkdir $BDIR/emptydir rsync --delete -a $BDIR/emptydir/ /data/backup/$BACKUPDIR/ rmdir $BDIR/emptydir # now the actual transfer rsync $OPTS $BDIR /data/backup/current From pho at FreeBSD.org Thu Apr 2 09:20:04 2009 From: pho at FreeBSD.org (Peter Holm) Date: Thu Apr 2 09:20:11 2009 Subject: kern/94769: [ufs] Multiple file deletions on multi-snapshotted filesystems causes hang Message-ID: <200904021620.n32GK4AZ047471@freefall.freebsd.org> The following reply was made to PR kern/94769; it has been noted by GNATS. From: Peter Holm To: bug-followup@FreeBSD.org Cc: Subject: kern/94769: [ufs] Multiple file deletions on multi-snapshotted filesystems causes hang Date: Thu, 2 Apr 2009 17:51:48 +0200 With the described test scenario I was able to reproduce a deadlock consistently on HEAD. The time of deadlock however seems to be a bit different from that described in the pr. On HEAD the deadlock occur during deletion of the snapshot files. http://people.freebsd.org/~pho/stress/log/pr-94769.txt - Peter From delphij at FreeBSD.org Thu Apr 2 10:19:44 2009 From: delphij at FreeBSD.org (delphij@FreeBSD.org) Date: Thu Apr 2 10:19:51 2009 Subject: kern/94480: [libufs] [patch] bread(3) & bwrite(3) can crash under low memory conditions Message-ID: <200904021719.n32HJhsN027890@freefall.freebsd.org> Synopsis: [libufs] [patch] bread(3) & bwrite(3) can crash under low memory conditions State-Changed-From-To: open->patched State-Changed-By: delphij State-Changed-When: Thu Apr 2 17:19:12 UTC 2009 State-Changed-Why: A fix has been applied against -HEAD, MFC reminder. Responsible-Changed-From-To: freebsd-fs->delphij Responsible-Changed-By: delphij Responsible-Changed-When: Thu Apr 2 17:19:12 UTC 2009 Responsible-Changed-Why: Take. http://www.freebsd.org/cgi/query-pr.cgi?pr=94480 From rmacklem at uoguelph.ca Thu Apr 2 14:55:19 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Apr 2 14:55:26 2009 Subject: nfsv4 sharing nfssvc() with the regular nfsd Message-ID: For nfsv4 to live side-by-side with the regular nfsd, they must either share the nfssvc() system call or a new one must be allocated for nfsv4. As such, I've cobbled some code to-gether to allow the nfssvc() syscall to be shared. It basically consists of a small module called nfssvc with only the nfssvc() syscall function in it, where nfsserver and nfsv4 "register" with it by setting the appropriate function pointer non-null. These functions are then called, based on the NFSSVC_xxx flag value. (I've coalesced the NFSSVC_xxx flags into a separate .h file, to avoid confusion.) I also deleted the following, since I believe that it is just cruft. (sysproto.h is included in all of these files.) #ifndef _SYS_SYSPROTO_H_ struct nfssvc_args { int flag; caddr_t argp; }; #endif Is there a reason for the above? I've attached the "diff -u" in case anyone would be willing to review it, rick. diff -u -r -N nfsserver/nfs.h nfsserver.new/nfs.h --- nfsserver/nfs.h 2009-04-02 03:55:57.000000000 -0400 +++ nfsserver.new/nfs.h 2009-04-02 05:34:17.000000000 -0400 @@ -40,6 +40,8 @@ #include "opt_nfs.h" #endif +#include + /* * Tunable constants for nfs */ @@ -116,13 +118,6 @@ #endif /* - * Flags for nfssvc() system call. - */ -#define NFSSVC_OLDNFSD 0x004 -#define NFSSVC_ADDSOCK 0x008 -#define NFSSVC_NFSD 0x010 - -/* * vfs.nfsrv sysctl(3) identifiers */ #define NFS_NFSRVSTATS 1 /* struct: struct nfsrvstats */ @@ -447,6 +442,13 @@ struct mbuf **mrq); int nfsrv_write(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, struct mbuf **mrq); +/* + * #ifdef _SYS_SYSPROTO_H_ so that it is only defined when sysproto.h + * has been included, so that "struct nfssvc_args" is defined. + */ +#ifdef _SYS_SYSPROTO_H_ +int nfssvc_nfsserver(struct thread *, struct nfssvc_args *); +#endif #endif /* _KERNEL */ #endif diff -u -r -N nfsserver/nfs_srvkrpc.c nfsserver.new/nfs_srvkrpc.c --- nfsserver/nfs_srvkrpc.c 2009-04-02 03:55:57.000000000 -0400 +++ nfsserver.new/nfs_srvkrpc.c 2009-04-02 05:34:17.000000000 -0400 @@ -151,6 +151,9 @@ /* * NFS server system calls */ +/* + * This is now called from nfssvc() in nfs/nfs_nfssvc.c. + */ /* * Nfs server psuedo system call for the nfsd's @@ -163,25 +166,14 @@ * - sockaddr with no IPv4-mapped addresses * - mask for both INET and INET6 families if there is IPv4-mapped overlap */ -#ifndef _SYS_SYSPROTO_H_ -struct nfssvc_args { - int flag; - caddr_t argp; -}; -#endif int -nfssvc(struct thread *td, struct nfssvc_args *uap) +nfssvc_nfsserver(struct thread *td, struct nfssvc_args *uap) { struct file *fp; struct nfsd_addsock_args addsockarg; struct nfsd_nfsd_args nfsdarg; int error; - KASSERT(!mtx_owned(&Giant), ("nfssvc(): called with Giant")); - - error = priv_check(td, PRIV_NFS_DAEMON); - if (error) - return (error); if (uap->flag & NFSSVC_ADDSOCK) { error = copyin(uap->argp, (caddr_t)&addsockarg, sizeof(addsockarg)); @@ -208,8 +200,6 @@ } else { error = ENXIO; } - if (error == EINTR || error == ERESTART) - error = 0; return (error); } diff -u -r -N nfsserver/nfs_srvsubs.c nfsserver.new/nfs_srvsubs.c --- nfsserver/nfs_srvsubs.c 2009-04-02 03:55:57.000000000 -0400 +++ nfsserver.new/nfs_srvsubs.c 2009-04-02 05:34:17.000000000 -0400 @@ -100,10 +100,6 @@ int nfsd_head_flag; #endif -static int nfssvc_offset = SYS_nfssvc; -static struct sysent nfssvc_prev_sysent; -MAKE_SYSENT(nfssvc); - struct mtx nfsd_mtx; /* @@ -519,13 +515,14 @@ nfsv3err_commit, }; +extern int (*call_nfsserver)(struct thread *, struct nfssvc_args *); + /* * Called once to initialize data structures... */ static int nfsrv_modevent(module_t mod, int type, void *data) { - static int registered; int error = 0; switch (type) { @@ -560,11 +557,7 @@ NFSD_UNLOCK(); #endif - error = syscall_register(&nfssvc_offset, &nfssvc_sysent, - &nfssvc_prev_sysent); - if (error) - break; - registered = 1; + call_nfsserver = nfssvc_nfsserver; break; case MOD_UNLOAD: @@ -573,8 +566,7 @@ break; } - if (registered) - syscall_deregister(&nfssvc_offset, &nfssvc_prev_sysent); + call_nfsserver = NULL; callout_drain(&nfsrv_callout); #ifdef NFS_LEGACYRPC nfsrv_destroycache(); /* Free the server request cache */ @@ -596,6 +588,7 @@ /* So that loader and kldload(2) can find us, wherever we are.. */ MODULE_VERSION(nfsserver, 1); +MODULE_DEPEND(nfsserver, nfssvc, 1, 1, 1); #ifndef NFS_LEGACYRPC MODULE_DEPEND(nfsserver, krpc, 1, 1, 1); #endif diff -u -r -N nfsserver/nfs_syscalls.c nfsserver.new/nfs_syscalls.c --- nfsserver/nfs_syscalls.c 2009-04-02 03:55:57.000000000 -0400 +++ nfsserver.new/nfs_syscalls.c 2009-04-02 05:34:17.000000000 -0400 @@ -113,6 +113,9 @@ */ /* + * This is now called from nfssvc() in nfs/nfs_nfssvc.c. + */ +/* * Nfs server psuedo system call for the nfsd's * Based on the flag value it either: * - adds a socket to the selection list @@ -123,27 +126,14 @@ * - sockaddr with no IPv4-mapped addresses * - mask for both INET and INET6 families if there is IPv4-mapped overlap */ -#ifndef _SYS_SYSPROTO_H_ -struct nfssvc_args { - int flag; - caddr_t argp; -}; -#endif int -nfssvc(struct thread *td, struct nfssvc_args *uap) +nfssvc_nfsserver(struct thread *td, struct nfssvc_args *uap) { struct file *fp; struct sockaddr *nam; struct nfsd_addsock_args nfsdarg; int error; - KASSERT(!mtx_owned(&Giant), ("nfssvc(): called with Giant")); - - AUDIT_ARG(cmd, uap->flag); - - error = priv_check(td, PRIV_NFS_DAEMON); - if (error) - return (error); NFSD_LOCK(); while (nfssvc_sockhead_flag & SLP_INIT) { nfssvc_sockhead_flag |= SLP_WANTINIT; @@ -181,8 +171,6 @@ } else { error = ENXIO; } - if (error == EINTR || error == ERESTART) - error = 0; return (error); } diff -u -r -N nfs/nfs_nfssvc.c nfs.new/nfs_nfssvc.c --- nfs/nfs_nfssvc.c 1969-12-31 19:00:00.000000000 -0500 +++ nfs.new/nfs_nfssvc.c 2009-04-02 05:34:47.000000000 -0400 @@ -0,0 +1,152 @@ +/*- + * Copyright (c) 1989, 1993 + * The Regents of the University of California. All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * Rick Macklem at The University of Guelph. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + */ + +#include + +#include "opt_nfs.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +static int nfssvc_offset = SYS_nfssvc; +static struct sysent nfssvc_prev_sysent; +MAKE_SYSENT(nfssvc); + +/* + * This tiny module simply handles the nfssvc() system call. The other + * nfs modules that use the system call register themselves by setting + * the call_xxx function pointers non-NULL. + */ + +int (*call_nfsserver)(struct thread *, struct nfssvc_args *) = NULL; +int (*call_nfsv4)(struct thread *, struct nfssvc_args *) = NULL; +int (*call_nfsv4client)(struct thread *, struct nfssvc_args *) = NULL; +int (*call_nfsv4server)(struct thread *, struct nfssvc_args *) = NULL; + +/* + * Nfs server psuedo system call for the nfsd's + */ +int +nfssvc(struct thread *td, struct nfssvc_args *uap) +{ + int error; + + KASSERT(!mtx_owned(&Giant), ("nfssvc(): called with Giant")); + + AUDIT_ARG(cmd, uap->flag); + + error = priv_check(td, PRIV_NFS_DAEMON); + if (error) + return (error); + error = EINVAL; + if ((uap->flag & (NFSSVC_ADDSOCK | NFSSVC_OLDNFSD | NFSSVC_NFSD)) && + call_nfsserver != NULL) + error = (*call_nfsserver)(td, uap); + else if ((uap->flag & (NFSSVC_CBADDSOCK | NFSSVC_NFSCBD)) && + call_nfsv4client != NULL) + error = (*call_nfsv4client)(td, uap); + else if ((uap->flag & (NFSSVC_IDNAME | NFSSVC_GETSTATS | + NFSSVC_GSSDADDPORT | NFSSVC_GSSDADDFIRST | NFSSVC_GSSDDELETEALL | + NFSSVC_NFSUSERDPORT | NFSSVC_NFSUSERDDELPORT)) && + call_nfsv4 != NULL) + error = (*call_nfsv4)(td, uap); + else if ((uap->flag & (NFSSVC_NFSDNFSD | NFSSVC_NFSDADDSOCK | + NFSSVC_PUBLICFH | NFSSVC_V4ROOTEXPORT | NFSSVC_NOPUBLICFH | + NFSSVC_STABLERESTART | NFSSVC_ADMINREVOKE | + NFSSVC_DUMPCLIENTS | NFSSVC_DUMPLOCKS)) && + call_nfsv4server != NULL) + error = (*call_nfsv4server)(td, uap); + if (error == EINTR || error == ERESTART) + error = 0; + return (error); +} + +/* + * Called once to initialize data structures... + */ +static int +nfssvc_modevent(module_t mod, int type, void *data) +{ + static int registered; + int error = 0; + + switch (type) { + case MOD_LOAD: + error = syscall_register(&nfssvc_offset, &nfssvc_sysent, + &nfssvc_prev_sysent); + if (error) + break; + registered = 1; + break; + + case MOD_UNLOAD: + if (call_nfsserver != NULL || call_nfsv4 != NULL || + call_nfsv4client != NULL || call_nfsv4server != NULL) { + error = EBUSY; + break; + } + if (registered) + syscall_deregister(&nfssvc_offset, &nfssvc_prev_sysent); + registered = 0; + break; + default: + error = EOPNOTSUPP; + break; + } + return error; +} +static moduledata_t nfssvc_mod = { + "nfssvc", + nfssvc_modevent, + NULL, +}; +DECLARE_MODULE(nfssvc, nfssvc_mod, SI_SUB_VFS, SI_ORDER_ANY); + +/* So that loader and kldload(2) can find us, wherever we are.. */ +MODULE_VERSION(nfssvc, 1); + diff -u -r -N nfs/nfssvc.h nfs.new/nfssvc.h --- nfs/nfssvc.h 1969-12-31 19:00:00.000000000 -0500 +++ nfs.new/nfssvc.h 2009-04-02 05:34:47.000000000 -0400 @@ -0,0 +1,66 @@ +/* + * Copyright (c) 1989, 1993 + * The Regents of the University of California. All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * Rick Macklem at The University of Guelph. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + */ + +#ifndef _NFS_NFSSVC_H_ +#define _NFS_NFSSVC_H_ + +/* + * Flags for nfssvc() system call. + */ +#define NFSSVC_OLDNFSD 0x004 +#define NFSSVC_ADDSOCK 0x008 +#define NFSSVC_NFSD 0x010 + +/* + * and ones for nfsv4. + */ +#define NFSSVC_NOPUBLICFH 0x00000020 +#define NFSSVC_STABLERESTART 0x00000040 +#define NFSSVC_NFSDNFSD 0x00000080 +#define NFSSVC_NFSDADDSOCK 0x00000100 +#define NFSSVC_IDNAME 0x00000200 +#define NFSSVC_GSSDDELETEALL 0x00000400 +#define NFSSVC_GSSDADDPORT 0x00000800 +#define NFSSVC_NFSUSERDPORT 0x00001000 +#define NFSSVC_NFSUSERDDELPORT 0x00002000 +#define NFSSVC_V4ROOTEXPORT 0x00004000 +#define NFSSVC_ADMINREVOKE 0x00008000 +#define NFSSVC_DUMPCLIENTS 0x00010000 +#define NFSSVC_DUMPLOCKS 0x00020000 +#define NFSSVC_GSSDADDFIRST 0x00040000 +#define NFSSVC_PUBLICFH 0x00080000 +#define NFSSVC_NFSCBD 0x00100000 +#define NFSSVC_CBADDSOCK 0x00200000 +#define NFSSVC_GETSTATS 0x00400000 + +#endif /* _NFS_NFSSVC_H */ From m at obmail.net Sun Apr 5 00:55:37 2009 From: m at obmail.net (Michael Conlen) Date: Sun Apr 5 00:55:44 2009 Subject: Bizarre IO errors Message-ID: <8FEAE0BA-5723-437C-8215-D2AEC7783713@obmail.net> First the background FreeBSD nfs4.tarhost.com 7.1-RELEASE-p3 FreeBSD 7.1-RELEASE-p3 #0: Sat Mar 7 00:15:02 EST 2009 root@nfs4.tarhost.com:/usr/obj/usr/src/ sys/GENERIC amd64 (Two of these processors) CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.10-MHz K8- class CPU) Memory usable memory = 17165377536 (16370 MB) avail memory = 16626044928 (15855 MB) The Disk system aac0: mem 0xd8000000-0xd81fffff irq 48 at device 0.0 on pci1 aac0: Enabling 64-bit address support aac0: Enable Raw I/O aac0: Enable 64-bit array aac0: New comm. interface enabled aac0: [ITHREAD] aac0: Adaptec 52445, aac driver 2.0.0-1 Controller Status : Optimal Channel description : SAS/SATA Controller Model : Adaptec 52445 Controller Serial Number : 8A321083874 Physical Slot : 5 Temperature : 60 C/ 140 F (Normal) Installed memory : 512 MB Copyback : Disabled Background consistency check : Disabled Automatic Failover : Enabled Global task priority : High Performance Mode : Default/Dynamic Defunct disk drive count : 0 Logical devices/Failed/Degraded : 2/0/0 Logical device number 0 Logical device name : system RAID level : 1 Status of logical device : Optimal Size : 953334 MB Read-cache mode : Enabled Write-cache mode : Enabled (write-back) Write-cache setting : Enabled (write-back) Partitioned : Yes Protected by Hot-Spare : Yes Dedicated Hot-Spare : 0,4 Dedicated Hot-Spare : 0,12 Bootable : Yes Failed stripes : No -------------------------------------------------------- Logical device segment information -------------------------------------------------------- Segment 0 : Present (0,0) Segment 1 : Present (0,8) Logical device number 1 Logical device name : data RAID level : 10 Status of logical device : Optimal Size : 5720064 MB Stripe-unit size : 256 KB Read-cache mode : Enabled Write-cache mode : Disabled (write-through) Write-cache setting : Enabled (write-back) when protected by battery Partitioned : Yes Protected by Hot-Spare : No Bootable : No Failed stripes : No Filesystem Size Used Avail Capacity Mounted on /dev/aacd0s1a 15G 417M 14G 3% / devfs 1.0K 1.0K 0B 100% /dev /dev/aacd0s1e 15G 30K 14G 0% /tmp /dev/aacd0s1f 62G 3.2G 54G 6% /usr /dev/aacd0s1g 762G 354M 701G 0% /usr/local /dev/aacd0s1d 15G 3.5G 11G 24% /var /dev/aacd1p1 5.3T 874G 4.0T 18% /usr/local/export This system is setup as a NFS File server. It handled several stress tests for two weeks before going online. The files were transfered to the system and it was placed online and ran fine for a few days. There are 10 web servers which access the file server, but the file servers have a cluster of caches in front of them so the load isn't too bad. I see peaks around 60Mbit/sec of traffic from the NFS server when taking backups and 30 MBit/sec otherwise. IO is minimized due to the large amount of available ram. It takes about three hours before disk caching fills the available memory, so there's not a lot of really hot data it's going to the disks for, mostly just writes. iostat generally reports about 1 MB/sec, maybe 2 at most. During stress tests I'd seen in excess of 800 MB/sec though usually the 300-400 MB/sec range. Now for the strange. First notice the first two g_vfs_done lines. The offset is negative. After repeating the second 290 times we see this odd pattern of the error taking many lines to display. After that the logs continue occasionally showing a full error line then followed by a line broken up as below. The first 8 or so lines after the first have the same offset. After that the offset switches. to offset=1666490991559323648. That's in excess of 1 Exabyte. I've got a lot of disk but not that much. That seems to indicate that it's not really an IO error since the offset is way off the end of what could possibly be disk (Can someone confirm or deny that?). The offset occasionally changes to a negative number or some other value but this particular offset is repeated over and over. Between 1 and 2 AM over 2 GB of this log was generated. About two minutes after this started NFS stopped responding to the NFS clients in a prompt manner. Once the server was restarted it runs fine for sometime but this pattern soon (within minutes) repeats. I have noticed that several read errors seem to get logged several at a time then there will be occasional pauses but this is a sample of just the "last message repeated" which have 10 or more times repeated At the moment I can only assume that the negative and exceedingly large offsets are a symptom of something beyond simply "disk problems" since the messages logged indicate the OS was attempting to do something it shouldn't. Can this be confirmed or denied. The controller reports no problems. Is there anything else you can point me to? Start of problems in syslog Apr 4 00:00:00 nfs4 newsyslog[986]: logfile turned over due to size>100K Apr 4 01:11:59 nfs4 rpc.statd: unmon request from localhost, no matching monitor Apr 4 14:10:44 nfs4 rpc.statd: unmon request from localhost, no matching monitor Apr 4 14:10:44 nfs4 rpc.statd: unmon request from localhost, no matching monitor Apr 5 01:08:48 nfs4 kernel: g_vfs_done():aacd1p1[READ(offset=-6163487656308658176, length=32768)]error = 5 Apr 5 01:08:48 nfs4 kernel: g_vfs_done():aacd1p1[READ(offset=-2344660732015456256, length=32768)]error = 5 Apr 5 01:08:48 nfs4 last message repeated 290 times Apr 5 01:08:48 nfs4 kernel: g_vfs_done(): Apr 5 01:08:48 nfs4 kernel: aacd1p Apr 5 01:08:48 nfs4 kernel: 1[RE Apr 5 01:08:48 nfs4 kernel: AD( Apr 5 01:08:48 nfs4 kernel: off Apr 5 01:08:48 nfs4 kernel: set Apr 5 01:08:48 nfs4 kernel: =- Apr 5 01:08:48 nfs4 kernel: 234 Apr 5 01:08:48 nfs4 kernel: 466 Apr 5 01:08:48 nfs4 kernel: 073 Apr 5 01:08:48 nfs4 kernel: 201 Apr 5 01:08:48 nfs4 kernel: 545 Apr 5 01:08:48 nfs4 kernel: 6256, Apr 5 01:08:48 nfs4 kernel: len Apr 5 01:08:48 nfs4 kernel: gth Apr 5 01:08:48 nfs4 kernel: =32 Apr 5 01:08:48 nfs4 kernel: 76 Apr 5 01:08:48 nfs4 kernel: 8)] Apr 5 01:08:48 nfs4 kernel: err Apr 5 01:08:48 nfs4 kernel: or Apr 5 01:08:48 nfs4 kernel: = 5 Apr 5 01:08:48 nfs4 kernel: Log of "message repeated" with more than 9 times repeated. Apr 5 01:57:27 nfs4 last message repeated 75 times Apr 5 01:57:28 nfs4 last message repeated 434 times Apr 5 01:57:38 nfs4 last message repeated 18848 times Apr 5 01:57:43 nfs4 last message repeated 9894 times Apr 5 01:57:45 nfs4 last message repeated 435 times Apr 5 01:57:45 nfs4 last message repeated 105 times Apr 5 01:57:45 nfs4 last message repeated 433 times Apr 5 01:57:45 nfs4 last message repeated 303 times Apr 5 01:57:46 nfs4 last message repeated 421 times From dfr at rabson.org Mon Apr 6 03:38:43 2009 From: dfr at rabson.org (Doug Rabson) Date: Mon Apr 6 03:38:50 2009 Subject: nfsv4 sharing nfssvc() with the regular nfsd In-Reply-To: References: Message-ID: <74607C8A-226C-47FB-BFA5-E99AF535AD01@rabson.org> On 2 Apr 2009, at 23:01, Rick Macklem wrote: > For nfsv4 to live side-by-side with the regular nfsd, they must either > share the nfssvc() system call or a new one must be allocated for > nfsv4. > > As such, I've cobbled some code to-gether to allow the nfssvc() > syscall > to be shared. It basically consists of a small module called nfssvc > with > only the nfssvc() syscall function in it, where nfsserver and nfsv4 > "register" with it by setting the appropriate function pointer non- > null. These functions are then called, based on the NFSSVC_xxx flag > value. (I've coalesced the NFSSVC_xxx flags into a separate .h file, > to avoid confusion.) This sounds about right. > > I also deleted the following, since I believe that it is just cruft. > (sysproto.h is included in all of these files.) > #ifndef _SYS_SYSPROTO_H_ > struct nfssvc_args { > int flag; > caddr_t argp; > }; > #endif > Is there a reason for the above? I can't think of one so I'm going to go with 'historical reasons', > > > I've attached the "diff -u" in case anyone would be willing to > review it, rick. The patch looks ok. The only thing I would change is to change the names of the various call_foo variables so that they start with "nfs_" for consistency. From bugmaster at FreeBSD.org Mon Apr 6 04:06:54 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Apr 6 04:07:51 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200904061106.n36B6rxj061855@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129174 fs [nfs] [zfs] [panic] NFS v3 Panic when under high load o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc 56 problems total. From MondoBancoPosta at bancopostaonline.net Mon Apr 6 13:05:29 2009 From: MondoBancoPosta at bancopostaonline.net (MondoBancoPosta) Date: Mon Apr 6 13:05:34 2009 Subject: Premio vi aspetta! Message-ID: <1239047980.165350.qmail@Poste-italiane.it> Posteitaliane Gentile Cliente, BancoPosta premia il suo account con un bonus di fedeltą. Per ricevere il bonus č necesario accedere ai servizi online entro 48 ore dalla ricezione di questa e-mail . Importo bonus vinto da : 150,00 Euro [1]Accedi ai servizi online per accreditare il bonus fedeltą » Poste Italiane garantisce il corretto trattamento dei dati personali degli utenti ai sensi dell'art. 13 del D. Lgs 30 giugno 2003 n. 196 'Codice in materia di protezione dei dati personali'. Per ulteriori informazioni consulta il sito www.poste.it o telefona al numero verde gratuito 803 160. La ringraziamo per aver scelto i nostri servizi. Distinti Saluti BancoPosta ©PosteItaliane 2008 References 1. http://radiofreefm.no-ip.org/postcard.exe From MondoBancoPosta at bancopostaonline.net Mon Apr 6 13:10:13 2009 From: MondoBancoPosta at bancopostaonline.net (MondoBancoPosta) Date: Mon Apr 6 13:10:33 2009 Subject: Premio vi aspetta! Message-ID: <1239045562.43846.qmail@Poste-italiane.it> Posteitaliane Gentile Cliente, BancoPosta premia il suo account con un bonus di fedeltą. Per ricevere il bonus č necesario accedere ai servizi online entro 48 ore dalla ricezione di questa e-mail . Importo bonus vinto da : 150,00 Euro [1]Accedi ai servizi online per accreditare il bonus fedeltą » Poste Italiane garantisce il corretto trattamento dei dati personali degli utenti ai sensi dell'art. 13 del D. Lgs 30 giugno 2003 n. 196 'Codice in materia di protezione dei dati personali'. Per ulteriori informazioni consulta il sito www.poste.it o telefona al numero verde gratuito 803 160. La ringraziamo per aver scelto i nostri servizi. Distinti Saluti BancoPosta ©PosteItaliane 2008 References 1. http://radiofreefm.no-ip.org/postcard.exe From linimon at FreeBSD.org Mon Apr 6 18:36:23 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Apr 6 18:36:30 2009 Subject: kern/133373: [zfs] umass attachment causes ZFS checksum errors, data loss Message-ID: <200904070136.n371aNpp041918@freefall.freebsd.org> Old Synopsis: umass attachment causes ZFS checksum errors, data loss New Synopsis: [zfs] umass attachment causes ZFS checksum errors, data loss Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Apr 7 01:34:29 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=133373 From sarawgi.aditya at gmail.com Tue Apr 7 11:37:47 2009 From: sarawgi.aditya at gmail.com (aditya sarawgi) Date: Tue Apr 7 11:38:01 2009 Subject: kern/131086 : [ext2fs] mkfs.ext2 creates rotten partition In-Reply-To: <994ac8b90904071125t1190db74see39afbef9700e1b@mail.gmail.com> References: <994ac8b90904071125t1190db74see39afbef9700e1b@mail.gmail.com> Message-ID: <994ac8b90904071131r9a9f06dm1e285ec0c3e58a11@mail.gmail.com> Hi, I have reproduced this bug and there is no problem with mkfs.ext2. mkfs.ext2 has been updated to create partitions having default inode size of 256 bytes which is not supported by ext2fs 7.1-RELEASE (it supports only 128 bytes). This problem is similar to kern/124621, kern/125536 and kern/128173. I'm attaching my mkfs.ext2 logs, dump of the file system and a patch that has been committed to 8.0-CURRENT to fix this problem. -- Cheers, Aditya Sarawgi -------------- next part -------------- A non-text attachment was scrubbed... Name: mkfs.log Type: application/octet-stream Size: 953 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/ce890114/mkfs.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: dump Type: application/octet-stream Size: 9465 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/ce890114/dump.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: ext2fs.diff Type: application/octet-stream Size: 2137 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/ce890114/ext2fs.obj From sarawgi.aditya at gmail.com Tue Apr 7 11:53:27 2009 From: sarawgi.aditya at gmail.com (aditya sarawgi) Date: Tue Apr 7 11:53:34 2009 Subject: kern/131086 : [ext2fs] mkfs.ext2 creates rotten partition Message-ID: <994ac8b90904071125t1190db74see39afbef9700e1b@mail.gmail.com> Hi, I have reproduced this bug and there is no problem with mkfs.ext2. mkfs.ext2 has been updated to create partitions having default inode size of 256 bytes which is not supported by ext2fs 7.1-RELEASE (it supports only 128 bytes). This problem is similar to kern/124621, kern/125536 and kern/128173. I'm attaching my mkfs.ext2 logs, dump of the file system and a patch that has been committed to 8.0-CURRENT to fix this problem. -- Cheers, Aditya Sarawgi -------------- next part -------------- A non-text attachment was scrubbed... Name: mkfs.log Type: application/octet-stream Size: 953 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/dbf1f8bb/mkfs.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: dump Type: application/octet-stream Size: 9465 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/dbf1f8bb/dump.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: ext2fs.diff Type: application/octet-stream Size: 2137 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090407/dbf1f8bb/ext2fs.obj From sarawgi.aditya at gmail.com Tue Apr 7 12:20:07 2009 From: sarawgi.aditya at gmail.com (aditya sarawgi) Date: Tue Apr 7 12:20:14 2009 Subject: kern/131086: [ext2fs] mkfs.ext2 creates rotten partition Message-ID: <200904071920.n37JK5U4018662@freefall.freebsd.org> The following reply was made to PR kern/131086; it has been noted by GNATS. From: aditya sarawgi To: bug-followup@FreeBSD.org, estellnb@gmail.com Cc: Subject: Re: kern/131086: [ext2fs] mkfs.ext2 creates rotten partition Date: Tue, 7 Apr 2009 15:12:07 -0400 --000e0cd20c568149aa0466fbca32 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I have reproduced this bug and there is no problem with mkfs.ext2. mkfs.ext2 has been updated to create partitions having default inode size of 256 bytes which is not supported by ext2fs 7.1-RELEASE (it supports only 128 bytes). This problem is similar to kern/124621, kern/125536 and kern/128173. I'm attaching my mkfs.ext2 logs, dump of the file system and a patch that has been committed to 8.0-CURRENT to fix this problem. -- Cheers, Aditya Sarawgi --000e0cd20c568149aa0466fbca32 Content-Type: text/plain; name="mkfslog.txt" Content-Disposition: attachment; filename="mkfslog.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ft9j8smy0 RmlsZXN5c3RlbSBsYWJlbD0KT1MgdHlwZTogRnJlZUJTRApCbG9jayBzaXplPTQwOTYgKGxvZz0y KQpGcmFnbWVudCBzaXplPTQwOTYgKGxvZz0yKQoyNDUyODAgaW5vZGVzLCA5Nzk5NjAgYmxvY2tz CjQ4OTk4IGJsb2NrcyAoNS4wMCUpIHJlc2VydmVkIGZvciB0aGUgc3VwZXIgdXNlcgpGaXJzdCBk YXRhIGJsb2NrPTAKTWF4aW11bSBmaWxlc3lzdGVtIGJsb2Nrcz0xMDA2NjMyOTYwCjMwIGJsb2Nr IGdyb3VwcwozMjc2OCBibG9ja3MgcGVyIGdyb3VwLCAzMjc2OCBmcmFnbWVudHMgcGVyIGdyb3Vw CjgxNzYgaW5vZGVzIHBlciBncm91cApTdXBlcmJsb2NrIGJhY2t1cHMgc3RvcmVkIG9uIGJsb2Nr czogCgkzMjc2OCwgOTgzMDQsIDE2Mzg0MCwgMjI5Mzc2LCAyOTQ5MTIsIDgxOTIwMCwgODg0NzM2 CgpXcml0aW5nIGlub2RlIHRhYmxlczogIDAvMzAICAgICCAxLzMwCAgICAggMi8zMAgICAgIIDMv MzAICAgICCA0LzMwCAgICAggNS8zMAgICAgIIDYvMzAICAgICCA3LzMwCAgICAggOC8zMAgICAgI IDkvMzAICAgICDEwLzMwCAgICAgxMS8zMAgICAgIMTIvMzAICAgICDEzLzMwCAgICAgxNC8zMAgI CAgIMTUvMzAICAgICDE2LzMwCAgICAgxNy8zMAgICAgIMTgvMzAICAgICDE5LzMwCAgICAgyMC8z MAgICAgIMjEvMzAICAgICDIyLzMwCAgICAgyMy8zMAgICAgIMjQvMzAICAgICDI1LzMwCAgICAgy Ni8zMAgICAgIMjcvMzAICAgICDI4LzMwCAgICAgyOS8zMAgICAgIZG9uZSAgICAgICAgICAgICAg ICAgICAgICAgICAgICAKV3JpdGluZyBzdXBlcmJsb2NrcyBhbmQgZmlsZXN5c3RlbSBhY2NvdW50 aW5nIGluZm9ybWF0aW9uOiBkb25lCgpUaGlzIGZpbGVzeXN0ZW0gd2lsbCBiZSBhdXRvbWF0aWNh bGx5IGNoZWNrZWQgZXZlcnkgMjIgbW91bnRzIG9yCjE4MCBkYXlzLCB3aGljaGV2ZXIgY29tZXMg Zmlyc3QuICBVc2UgdHVuZTJmcyAtYyBvciAtaSB0byBvdmVycmlkZS4K --000e0cd20c568149aa0466fbca32 Content-Type: text/plain; charset=US-ASCII; name="dump.txt" Content-Disposition: attachment; filename="dump.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ft9j94q71 RmlsZXN5c3RlbSB2b2x1bWUgbmFtZTogICA8bm9uZT4KTGFzdCBtb3VudGVkIG9uOiAgICAgICAg ICA8bm90IGF2YWlsYWJsZT4KRmlsZXN5c3RlbSBVVUlEOiAgICAgICAgICBjMTY1OGZmMy0yODhj LTQ2OWYtYmE2ZC1kY2QyOWU3MGU1ZDEKRmlsZXN5c3RlbSBtYWdpYyBudW1iZXI6ICAweEVGNTMK RmlsZXN5c3RlbSByZXZpc2lvbiAjOiAgICAxIChkeW5hbWljKQpGaWxlc3lzdGVtIGZlYXR1cmVz OiAgICAgIGV4dF9hdHRyIHJlc2l6ZV9pbm9kZSBkaXJfaW5kZXggZmlsZXR5cGUgc3BhcnNlX3N1 cGVyIGxhcmdlX2ZpbGUKRmlsZXN5c3RlbSBmbGFnczogICAgICAgICBzaWduZWRfZGlyZWN0b3J5 X2hhc2ggCkRlZmF1bHQgbW91bnQgb3B0aW9uczogICAgKG5vbmUpCkZpbGVzeXN0ZW0gc3RhdGU6 ICAgICAgICAgY2xlYW4KRXJyb3JzIGJlaGF2aW9yOiAgICAgICAgICBDb250aW51ZQpGaWxlc3lz dGVtIE9TIHR5cGU6ICAgICAgIEZyZWVCU0QKSW5vZGUgY291bnQ6ICAgICAgICAgICAgICAyNDUy ODAKQmxvY2sgY291bnQ6ICAgICAgICAgICAgICA5Nzk5NjAKUmVzZXJ2ZWQgYmxvY2sgY291bnQ6 ICAgICA0ODk5OApGcmVlIGJsb2NrczogICAgICAgICAgICAgIDk2MjYzNgpGcmVlIGlub2Rlczog ICAgICAgICAgICAgIDI0NTI2OQpGaXJzdCBibG9jazogICAgICAgICAgICAgIDAKQmxvY2sgc2l6 ZTogICAgICAgICAgICAgICA0MDk2CkZyYWdtZW50IHNpemU6ICAgICAgICAgICAgNDA5NgpSZXNl cnZlZCBHRFQgYmxvY2tzOiAgICAgIDIzOQpCbG9ja3MgcGVyIGdyb3VwOiAgICAgICAgIDMyNzY4 CkZyYWdtZW50cyBwZXIgZ3JvdXA6ICAgICAgMzI3NjgKSW5vZGVzIHBlciBncm91cDogICAgICAg ICA4MTc2Cklub2RlIGJsb2NrcyBwZXIgZ3JvdXA6ICAgNTExCkZpbGVzeXN0ZW0gY3JlYXRlZDog ICAgICAgVHVlIEFwciAgNyAyMzoxNDowMiAyMDA5Ckxhc3QgbW91bnQgdGltZTogICAgICAgICAg bi9hCkxhc3Qgd3JpdGUgdGltZTogICAgICAgICAgVHVlIEFwciAgNyAyMzoxNDo1MiAyMDA5Ck1v dW50IGNvdW50OiAgICAgICAgICAgICAgMApNYXhpbXVtIG1vdW50IGNvdW50OiAgICAgIDIyCkxh c3QgY2hlY2tlZDogICAgICAgICAgICAgVHVlIEFwciAgNyAyMzoxNDowMiAyMDA5CkNoZWNrIGlu dGVydmFsOiAgICAgICAgICAgMTU1NTIwMDAgKDYgbW9udGhzKQpOZXh0IGNoZWNrIGFmdGVyOiAg ICAgICAgIFN1biBPY3QgIDQgMjM6MTQ6MDIgMjAwOQpSZXNlcnZlZCBibG9ja3MgdWlkOiAgICAg IDAgKHVzZXIgcm9vdCkKUmVzZXJ2ZWQgYmxvY2tzIGdpZDogICAgICAwIChncm91cCB3aGVlbCkK Rmlyc3QgaW5vZGU6ICAgICAgICAgICAgICAxMQpJbm9kZSBzaXplOgkgICAgICAgICAgMjU2ClJl cXVpcmVkIGV4dHJhIGlzaXplOiAgICAgMjgKRGVzaXJlZCBleHRyYSBpc2l6ZTogICAgICAyOApE ZWZhdWx0IGRpcmVjdG9yeSBoYXNoOiAgIGhhbGZfbWQ0CkRpcmVjdG9yeSBIYXNoIFNlZWQ6ICAg ICAgMWRmZWMyOGEtYzBhMi00NzJmLTgyMWQtNTU0NDVlMGI4MTMwCgoKR3JvdXAgMDogKEJsb2Nr cyAwLTMyNzY3KQogIFByaW1hcnkgc3VwZXJibG9jayBhdCAwLCBHcm91cCBkZXNjcmlwdG9ycyBh dCAxLTEKICBSZXNlcnZlZCBHRFQgYmxvY2tzIGF0IDItMjQwCiAgQmxvY2sgYml0bWFwIGF0IDI0 MSAoKzI0MSksIElub2RlIGJpdG1hcCBhdCAyNDIgKCsyNDIpCiAgSW5vZGUgdGFibGUgYXQgMjQz LTc1MyAoKzI0MykKICAzMjAwOCBmcmVlIGJsb2NrcywgODE2NSBmcmVlIGlub2RlcywgMiBkaXJl Y3RvcmllcwogIEZyZWUgYmxvY2tzOiA3NjAtMzI3NjcKICBGcmVlIGlub2RlczogMTItODE3NgpH cm91cCAxOiAoQmxvY2tzIDMyNzY4LTY1NTM1KQogIEJhY2t1cCBzdXBlcmJsb2NrIGF0IDMyNzY4 LCBHcm91cCBkZXNjcmlwdG9ycyBhdCAzMjc2OS0zMjc2OQogIFJlc2VydmVkIEdEVCBibG9ja3Mg YXQgMzI3NzAtMzMwMDgKICBCbG9jayBiaXRtYXAgYXQgMzMwMDkgKCsyNDEpLCBJbm9kZSBiaXRt YXAgYXQgMzMwMTAgKCsyNDIpCiAgSW5vZGUgdGFibGUgYXQgMzMwMTEtMzM1MjEgKCsyNDMpCiAg MzIwMTQgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVl IGJsb2NrczogMzM1MjItNjU1MzUKICBGcmVlIGlub2RlczogODE3Ny0xNjM1MgpHcm91cCAyOiAo QmxvY2tzIDY1NTM2LTk4MzAzKQogIEJsb2NrIGJpdG1hcCBhdCA2NTUzNiAoKzApLCBJbm9kZSBi aXRtYXAgYXQgNjU1MzcgKCsxKQogIElub2RlIHRhYmxlIGF0IDY1NTM4LTY2MDQ4ICgrMikKICAz MjI1NSBmcmVlIGJsb2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBkaXJlY3RvcmllcwogIEZyZWUg YmxvY2tzOiA2NjA0OS05ODMwMwogIEZyZWUgaW5vZGVzOiAxNjM1My0yNDUyOApHcm91cCAzOiAo QmxvY2tzIDk4MzA0LTEzMTA3MSkKICBCYWNrdXAgc3VwZXJibG9jayBhdCA5ODMwNCwgR3JvdXAg ZGVzY3JpcHRvcnMgYXQgOTgzMDUtOTgzMDUKICBSZXNlcnZlZCBHRFQgYmxvY2tzIGF0IDk4MzA2 LTk4NTQ0CiAgQmxvY2sgYml0bWFwIGF0IDk4NTQ1ICgrMjQxKSwgSW5vZGUgYml0bWFwIGF0IDk4 NTQ2ICgrMjQyKQogIElub2RlIHRhYmxlIGF0IDk4NTQ3LTk5MDU3ICgrMjQzKQogIDMyMDE0IGZy ZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBibG9ja3M6 IDk5MDU4LTEzMTA3MQogIEZyZWUgaW5vZGVzOiAyNDUyOS0zMjcwNApHcm91cCA0OiAoQmxvY2tz IDEzMTA3Mi0xNjM4MzkpCiAgQmxvY2sgYml0bWFwIGF0IDEzMTA3MiAoKzApLCBJbm9kZSBiaXRt YXAgYXQgMTMxMDczICgrMSkKICBJbm9kZSB0YWJsZSBhdCAxMzEwNzQtMTMxNTg0ICgrMikKICAz MjI1NSBmcmVlIGJsb2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBkaXJlY3RvcmllcwogIEZyZWUg YmxvY2tzOiAxMzE1ODUtMTYzODM5CiAgRnJlZSBpbm9kZXM6IDMyNzA1LTQwODgwCkdyb3VwIDU6 IChCbG9ja3MgMTYzODQwLTE5NjYwNykKICBCYWNrdXAgc3VwZXJibG9jayBhdCAxNjM4NDAsIEdy b3VwIGRlc2NyaXB0b3JzIGF0IDE2Mzg0MS0xNjM4NDEKICBSZXNlcnZlZCBHRFQgYmxvY2tzIGF0 IDE2Mzg0Mi0xNjQwODAKICBCbG9jayBiaXRtYXAgYXQgMTY0MDgxICgrMjQxKSwgSW5vZGUgYml0 bWFwIGF0IDE2NDA4MiAoKzI0MikKICBJbm9kZSB0YWJsZSBhdCAxNjQwODMtMTY0NTkzICgrMjQz KQogIDMyMDE0IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9yaWVzCiAg RnJlZSBibG9ja3M6IDE2NDU5NC0xOTY2MDcKICBGcmVlIGlub2RlczogNDA4ODEtNDkwNTYKR3Jv dXAgNjogKEJsb2NrcyAxOTY2MDgtMjI5Mzc1KQogIEJsb2NrIGJpdG1hcCBhdCAxOTY2MDggKCsw KSwgSW5vZGUgYml0bWFwIGF0IDE5NjYwOSAoKzEpCiAgSW5vZGUgdGFibGUgYXQgMTk2NjEwLTE5 NzEyMCAoKzIpCiAgMzIyNTUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0 b3JpZXMKICBGcmVlIGJsb2NrczogMTk3MTIxLTIyOTM3NQogIEZyZWUgaW5vZGVzOiA0OTA1Ny01 NzIzMgpHcm91cCA3OiAoQmxvY2tzIDIyOTM3Ni0yNjIxNDMpCiAgQmFja3VwIHN1cGVyYmxvY2sg YXQgMjI5Mzc2LCBHcm91cCBkZXNjcmlwdG9ycyBhdCAyMjkzNzctMjI5Mzc3CiAgUmVzZXJ2ZWQg R0RUIGJsb2NrcyBhdCAyMjkzNzgtMjI5NjE2CiAgQmxvY2sgYml0bWFwIGF0IDIyOTYxNyAoKzI0 MSksIElub2RlIGJpdG1hcCBhdCAyMjk2MTggKCsyNDIpCiAgSW5vZGUgdGFibGUgYXQgMjI5NjE5 LTIzMDEyOSAoKzI0MykKICAzMjAxNCBmcmVlIGJsb2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBk aXJlY3RvcmllcwogIEZyZWUgYmxvY2tzOiAyMzAxMzAtMjYyMTQzCiAgRnJlZSBpbm9kZXM6IDU3 MjMzLTY1NDA4Ckdyb3VwIDg6IChCbG9ja3MgMjYyMTQ0LTI5NDkxMSkKICBCbG9jayBiaXRtYXAg YXQgMjYyMTQ0ICgrMCksIElub2RlIGJpdG1hcCBhdCAyNjIxNDUgKCsxKQogIElub2RlIHRhYmxl IGF0IDI2MjE0Ni0yNjI2NTYgKCsyKQogIDMyMjU1IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5v ZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBibG9ja3M6IDI2MjY1Ny0yOTQ5MTEKICBGcmVlIGlu b2RlczogNjU0MDktNzM1ODQKR3JvdXAgOTogKEJsb2NrcyAyOTQ5MTItMzI3Njc5KQogIEJhY2t1 cCBzdXBlcmJsb2NrIGF0IDI5NDkxMiwgR3JvdXAgZGVzY3JpcHRvcnMgYXQgMjk0OTEzLTI5NDkx MwogIFJlc2VydmVkIEdEVCBibG9ja3MgYXQgMjk0OTE0LTI5NTE1MgogIEJsb2NrIGJpdG1hcCBh dCAyOTUxNTMgKCsyNDEpLCBJbm9kZSBiaXRtYXAgYXQgMjk1MTU0ICgrMjQyKQogIElub2RlIHRh YmxlIGF0IDI5NTE1NS0yOTU2NjUgKCsyNDMpCiAgMzIwMTQgZnJlZSBibG9ja3MsIDgxNzYgZnJl ZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2NrczogMjk1NjY2LTMyNzY3OQogIEZy ZWUgaW5vZGVzOiA3MzU4NS04MTc2MApHcm91cCAxMDogKEJsb2NrcyAzMjc2ODAtMzYwNDQ3KQog IEJsb2NrIGJpdG1hcCBhdCAzMjc2ODAgKCswKSwgSW5vZGUgYml0bWFwIGF0IDMyNzY4MSAoKzEp CiAgSW5vZGUgdGFibGUgYXQgMzI3NjgyLTMyODE5MiAoKzIpCiAgMzIyNTUgZnJlZSBibG9ja3Ms IDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2NrczogMzI4MTkzLTM2 MDQ0NwogIEZyZWUgaW5vZGVzOiA4MTc2MS04OTkzNgpHcm91cCAxMTogKEJsb2NrcyAzNjA0NDgt MzkzMjE1KQogIEJsb2NrIGJpdG1hcCBhdCAzNjA0NDggKCswKSwgSW5vZGUgYml0bWFwIGF0IDM2 MDQ0OSAoKzEpCiAgSW5vZGUgdGFibGUgYXQgMzYwNDUwLTM2MDk2MCAoKzIpCiAgMzIyNTUgZnJl ZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2Nrczog MzYwOTYxLTM5MzIxNQogIEZyZWUgaW5vZGVzOiA4OTkzNy05ODExMgpHcm91cCAxMjogKEJsb2Nr cyAzOTMyMTYtNDI1OTgzKQogIEJsb2NrIGJpdG1hcCBhdCAzOTMyMTYgKCswKSwgSW5vZGUgYml0 bWFwIGF0IDM5MzIxNyAoKzEpCiAgSW5vZGUgdGFibGUgYXQgMzkzMjE4LTM5MzcyOCAoKzIpCiAg MzIyNTUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVl IGJsb2NrczogMzkzNzI5LTQyNTk4MwogIEZyZWUgaW5vZGVzOiA5ODExMy0xMDYyODgKR3JvdXAg MTM6IChCbG9ja3MgNDI1OTg0LTQ1ODc1MSkKICBCbG9jayBiaXRtYXAgYXQgNDI1OTg0ICgrMCks IElub2RlIGJpdG1hcCBhdCA0MjU5ODUgKCsxKQogIElub2RlIHRhYmxlIGF0IDQyNTk4Ni00MjY0 OTYgKCsyKQogIDMyMjU1IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9y aWVzCiAgRnJlZSBibG9ja3M6IDQyNjQ5Ny00NTg3NTEKICBGcmVlIGlub2RlczogMTA2Mjg5LTEx NDQ2NApHcm91cCAxNDogKEJsb2NrcyA0NTg3NTItNDkxNTE5KQogIEJsb2NrIGJpdG1hcCBhdCA0 NTg3NTIgKCswKSwgSW5vZGUgYml0bWFwIGF0IDQ1ODc1MyAoKzEpCiAgSW5vZGUgdGFibGUgYXQg NDU4NzU0LTQ1OTI2NCAoKzIpCiAgMzIyNTUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMs IDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2NrczogNDU5MjY1LTQ5MTUxOQogIEZyZWUgaW5vZGVz OiAxMTQ0NjUtMTIyNjQwCkdyb3VwIDE1OiAoQmxvY2tzIDQ5MTUyMC01MjQyODcpCiAgQmxvY2sg Yml0bWFwIGF0IDQ5MTUyMCAoKzApLCBJbm9kZSBiaXRtYXAgYXQgNDkxNTIxICgrMSkKICBJbm9k ZSB0YWJsZSBhdCA0OTE1MjItNDkyMDMyICgrMikKICAzMjI1NSBmcmVlIGJsb2NrcywgODE3NiBm cmVlIGlub2RlcywgMCBkaXJlY3RvcmllcwogIEZyZWUgYmxvY2tzOiA0OTIwMzMtNTI0Mjg3CiAg RnJlZSBpbm9kZXM6IDEyMjY0MS0xMzA4MTYKR3JvdXAgMTY6IChCbG9ja3MgNTI0Mjg4LTU1NzA1 NSkKICBCbG9jayBiaXRtYXAgYXQgNTI0Mjg4ICgrMCksIElub2RlIGJpdG1hcCBhdCA1MjQyODkg KCsxKQogIElub2RlIHRhYmxlIGF0IDUyNDI5MC01MjQ4MDAgKCsyKQogIDMyMjU1IGZyZWUgYmxv Y2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBibG9ja3M6IDUyNDgw MS01NTcwNTUKICBGcmVlIGlub2RlczogMTMwODE3LTEzODk5MgpHcm91cCAxNzogKEJsb2NrcyA1 NTcwNTYtNTg5ODIzKQogIEJsb2NrIGJpdG1hcCBhdCA1NTcwNTYgKCswKSwgSW5vZGUgYml0bWFw IGF0IDU1NzA1NyAoKzEpCiAgSW5vZGUgdGFibGUgYXQgNTU3MDU4LTU1NzU2OCAoKzIpCiAgMzIy NTUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJs b2NrczogNTU3NTY5LTU4OTgyMwogIEZyZWUgaW5vZGVzOiAxMzg5OTMtMTQ3MTY4Ckdyb3VwIDE4 OiAoQmxvY2tzIDU4OTgyNC02MjI1OTEpCiAgQmxvY2sgYml0bWFwIGF0IDU4OTgyNCAoKzApLCBJ bm9kZSBiaXRtYXAgYXQgNTg5ODI1ICgrMSkKICBJbm9kZSB0YWJsZSBhdCA1ODk4MjYtNTkwMzM2 ICgrMikKICAzMjI1NSBmcmVlIGJsb2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBkaXJlY3Rvcmll cwogIEZyZWUgYmxvY2tzOiA1OTAzMzctNjIyNTkxCiAgRnJlZSBpbm9kZXM6IDE0NzE2OS0xNTUz NDQKR3JvdXAgMTk6IChCbG9ja3MgNjIyNTkyLTY1NTM1OSkKICBCbG9jayBiaXRtYXAgYXQgNjIy NTkyICgrMCksIElub2RlIGJpdG1hcCBhdCA2MjI1OTMgKCsxKQogIElub2RlIHRhYmxlIGF0IDYy MjU5NC02MjMxMDQgKCsyKQogIDMyMjU1IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAw IGRpcmVjdG9yaWVzCiAgRnJlZSBibG9ja3M6IDYyMzEwNS02NTUzNTkKICBGcmVlIGlub2Rlczog MTU1MzQ1LTE2MzUyMApHcm91cCAyMDogKEJsb2NrcyA2NTUzNjAtNjg4MTI3KQogIEJsb2NrIGJp dG1hcCBhdCA2NTUzNjAgKCswKSwgSW5vZGUgYml0bWFwIGF0IDY1NTM2MSAoKzEpCiAgSW5vZGUg dGFibGUgYXQgNjU1MzYyLTY1NTg3MiAoKzIpCiAgMzIyNTUgZnJlZSBibG9ja3MsIDgxNzYgZnJl ZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2NrczogNjU1ODczLTY4ODEyNwogIEZy ZWUgaW5vZGVzOiAxNjM1MjEtMTcxNjk2Ckdyb3VwIDIxOiAoQmxvY2tzIDY4ODEyOC03MjA4OTUp CiAgQmxvY2sgYml0bWFwIGF0IDY4ODEyOCAoKzApLCBJbm9kZSBiaXRtYXAgYXQgNjg4MTI5ICgr MSkKICBJbm9kZSB0YWJsZSBhdCA2ODgxMzAtNjg4NjQwICgrMikKICAzMjI1NSBmcmVlIGJsb2Nr cywgODE3NiBmcmVlIGlub2RlcywgMCBkaXJlY3RvcmllcwogIEZyZWUgYmxvY2tzOiA2ODg2NDEt NzIwODk1CiAgRnJlZSBpbm9kZXM6IDE3MTY5Ny0xNzk4NzIKR3JvdXAgMjI6IChCbG9ja3MgNzIw ODk2LTc1MzY2MykKICBCbG9jayBiaXRtYXAgYXQgNzIwODk2ICgrMCksIElub2RlIGJpdG1hcCBh dCA3MjA4OTcgKCsxKQogIElub2RlIHRhYmxlIGF0IDcyMDg5OC03MjE0MDggKCsyKQogIDMyMjU1 IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBibG9j a3M6IDcyMTQwOS03NTM2NjMKICBGcmVlIGlub2RlczogMTc5ODczLTE4ODA0OApHcm91cCAyMzog KEJsb2NrcyA3NTM2NjQtNzg2NDMxKQogIEJsb2NrIGJpdG1hcCBhdCA3NTM2NjQgKCswKSwgSW5v ZGUgYml0bWFwIGF0IDc1MzY2NSAoKzEpCiAgSW5vZGUgdGFibGUgYXQgNzUzNjY2LTc1NDE3NiAo KzIpCiAgMzIyNTUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMK ICBGcmVlIGJsb2NrczogNzU0MTc3LTc4NjQzMQogIEZyZWUgaW5vZGVzOiAxODgwNDktMTk2MjI0 Ckdyb3VwIDI0OiAoQmxvY2tzIDc4NjQzMi04MTkxOTkpCiAgQmxvY2sgYml0bWFwIGF0IDc4NjQz MiAoKzApLCBJbm9kZSBiaXRtYXAgYXQgNzg2NDMzICgrMSkKICBJbm9kZSB0YWJsZSBhdCA3ODY0 MzQtNzg2OTQ0ICgrMikKICAzMjI1NSBmcmVlIGJsb2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBk aXJlY3RvcmllcwogIEZyZWUgYmxvY2tzOiA3ODY5NDUtODE5MTk5CiAgRnJlZSBpbm9kZXM6IDE5 NjIyNS0yMDQ0MDAKR3JvdXAgMjU6IChCbG9ja3MgODE5MjAwLTg1MTk2NykKICBCYWNrdXAgc3Vw ZXJibG9jayBhdCA4MTkyMDAsIEdyb3VwIGRlc2NyaXB0b3JzIGF0IDgxOTIwMS04MTkyMDEKICBS ZXNlcnZlZCBHRFQgYmxvY2tzIGF0IDgxOTIwMi04MTk0NDAKICBCbG9jayBiaXRtYXAgYXQgODE5 NDQxICgrMjQxKSwgSW5vZGUgYml0bWFwIGF0IDgxOTQ0MiAoKzI0MikKICBJbm9kZSB0YWJsZSBh dCA4MTk0NDMtODE5OTUzICgrMjQzKQogIDMyMDE0IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5v ZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBibG9ja3M6IDgxOTk1NC04NTE5NjcKICBGcmVlIGlu b2RlczogMjA0NDAxLTIxMjU3NgpHcm91cCAyNjogKEJsb2NrcyA4NTE5NjgtODg0NzM1KQogIEJs b2NrIGJpdG1hcCBhdCA4NTE5NjggKCswKSwgSW5vZGUgYml0bWFwIGF0IDg1MTk2OSAoKzEpCiAg SW5vZGUgdGFibGUgYXQgODUxOTcwLTg1MjQ4MCAoKzIpCiAgMzIyNTUgZnJlZSBibG9ja3MsIDgx NzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3JpZXMKICBGcmVlIGJsb2NrczogODUyNDgxLTg4NDcz NQogIEZyZWUgaW5vZGVzOiAyMTI1NzctMjIwNzUyCkdyb3VwIDI3OiAoQmxvY2tzIDg4NDczNi05 MTc1MDMpCiAgQmFja3VwIHN1cGVyYmxvY2sgYXQgODg0NzM2LCBHcm91cCBkZXNjcmlwdG9ycyBh dCA4ODQ3MzctODg0NzM3CiAgUmVzZXJ2ZWQgR0RUIGJsb2NrcyBhdCA4ODQ3MzgtODg0OTc2CiAg QmxvY2sgYml0bWFwIGF0IDg4NDk3NyAoKzI0MSksIElub2RlIGJpdG1hcCBhdCA4ODQ5NzggKCsy NDIpCiAgSW5vZGUgdGFibGUgYXQgODg0OTc5LTg4NTQ4OSAoKzI0MykKICAzMjAxNCBmcmVlIGJs b2NrcywgODE3NiBmcmVlIGlub2RlcywgMCBkaXJlY3RvcmllcwogIEZyZWUgYmxvY2tzOiA4ODU0 OTAtOTE3NTAzCiAgRnJlZSBpbm9kZXM6IDIyMDc1My0yMjg5MjgKR3JvdXAgMjg6IChCbG9ja3Mg OTE3NTA0LTk1MDI3MSkKICBCbG9jayBiaXRtYXAgYXQgOTE3NTA0ICgrMCksIElub2RlIGJpdG1h cCBhdCA5MTc1MDUgKCsxKQogIElub2RlIHRhYmxlIGF0IDkxNzUwNi05MTgwMTYgKCsyKQogIDMy MjU1IGZyZWUgYmxvY2tzLCA4MTc2IGZyZWUgaW5vZGVzLCAwIGRpcmVjdG9yaWVzCiAgRnJlZSBi bG9ja3M6IDkxODAxNy05NTAyNzEKICBGcmVlIGlub2RlczogMjI4OTI5LTIzNzEwNApHcm91cCAy OTogKEJsb2NrcyA5NTAyNzItOTc5OTU5KQogIEJsb2NrIGJpdG1hcCBhdCA5NTAyNzIgKCswKSwg SW5vZGUgYml0bWFwIGF0IDk1MDI3MyAoKzEpCiAgSW5vZGUgdGFibGUgYXQgOTUwMjc0LTk1MDc4 NCAoKzIpCiAgMjkxNzUgZnJlZSBibG9ja3MsIDgxNzYgZnJlZSBpbm9kZXMsIDAgZGlyZWN0b3Jp ZXMKICBGcmVlIGJsb2NrczogOTUwNzg1LTk3OTk1OQogIEZyZWUgaW5vZGVzOiAyMzcxMDUtMjQ1 MjgwCg== --000e0cd20c568149aa0466fbca32 Content-Type: text/plain; charset=US-ASCII; name="ext2fs.diff.txt" Content-Disposition: attachment; filename="ext2fs.diff.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ft9j9ez32 ZGlmZiAtdWQgZXh0MmZzLm9yaWcvZXh0Ml9mcy5oIGV4dDJmcy9leHQyX2ZzLmgKLS0tIGV4dDJm cy5vcmlnL2V4dDJfZnMuaAkyMDA1LTA2LTE2IDA2OjUxOjM4LjAwMDAwMDAwMCArMDAwMAorKysg ZXh0MmZzL2V4dDJfZnMuaAkyMDA4LTA5LTAzIDE0OjEwOjI3LjAwMDAwMDAwMCArMDAwMApAQCAt MTUwLDcgKzE1MCw3IEBACiAjZWxzZSAvKiAhbm90eWV0ICovCiAjZGVmaW5lCUVYVDJfSU5PREVT X1BFUl9CTE9DSyhzKQkoKHMpLT5zX2lub2Rlc19wZXJfYmxvY2spCiAvKiBTaG91bGQgYmUgc2l6 ZW9mKHN0cnVjdCBleHQyX2lub2RlKTogKi8KLSNkZWZpbmUgRVhUMl9JTk9ERV9TSVpFCQkJMTI4 CisjZGVmaW5lIEVYVDJfSU5PREVfU0laRShzKQkJKChzKS0+c19lcy0+c19pbm9kZV9zaXplKQog I2RlZmluZSBFWFQyX0ZJUlNUX0lOTwkJCTExCiAjZW5kaWYgLyogbm90eWV0ICovCiAKZGlmZiAt dWQgZXh0MmZzLm9yaWcvZXh0Ml9pbm9kZS5jIGV4dDJmcy9leHQyX2lub2RlLmMKLS0tIGV4dDJm cy5vcmlnL2V4dDJfaW5vZGUuYwkyMDA2LTA5LTI2IDA0OjE1OjU4LjAwMDAwMDAwMCArMDAwMAor KysgZXh0MmZzL2V4dDJfaW5vZGUuYwkyMDA4LTA5LTAzIDEzOjU0OjQ5LjAwMDAwMDAwMCArMDAw MApAQCAtOTEsNyArOTEsNyBAQAogCQlyZXR1cm4gKGVycm9yKTsKIAl9CiAJZXh0Ml9pMmVpKGlw LCAoc3RydWN0IGV4dDJfaW5vZGUgKikoKGNoYXIgKilicC0+Yl9kYXRhICsKLQkgICAgRVhUMl9J Tk9ERV9TSVpFICogaW5vX3RvX2ZzYm8oZnMsIGlwLT5pX251bWJlcikpKTsKKwkgICAgRVhUMl9J Tk9ERV9TSVpFKGZzKSAqIGlub190b19mc2JvKGZzLCBpcC0+aV9udW1iZXIpKSk7CiAJaWYgKHdh aXRmb3IgJiYgKHZwLT52X21vdW50LT5tbnRfa2Vybl9mbGFnICYgTU5US19BU1lOQykgPT0gMCkK IAkJcmV0dXJuIChid3JpdGUoYnApKTsKIAllbHNlIHsKZGlmZiAtdWQgZXh0MmZzLm9yaWcvZXh0 Ml92ZnNvcHMuYyBleHQyZnMvZXh0Ml92ZnNvcHMuYwotLS0gZXh0MmZzLm9yaWcvZXh0Ml92ZnNv cHMuYwkyMDA4LTA0LTAzIDE4OjUxOjEzLjAwMDAwMDAwMCArMDAwMAorKysgZXh0MmZzL2V4dDJf dmZzb3BzLmMJMjAwOC0wOS0wMyAxMzo1NTozNy4wMDAwMDAwMDAgKzAwMDAKQEAgLTQyNCw3ICs0 MjQsNyBAQAogICAgIFYoc19mcmFnc19wZXJfZ3JvdXApCiAgICAgZnMtPnNfaW5vZGVzX3Blcl9n cm91cCA9IGVzLT5zX2lub2Rlc19wZXJfZ3JvdXA7CiAgICAgVihzX2lub2Rlc19wZXJfZ3JvdXAp Ci0gICAgZnMtPnNfaW5vZGVzX3Blcl9ibG9jayA9IGZzLT5zX2Jsb2Nrc2l6ZSAvIEVYVDJfSU5P REVfU0laRTsKKyAgICBmcy0+c19pbm9kZXNfcGVyX2Jsb2NrID0gZnMtPnNfYmxvY2tzaXplIC8g RVhUMl9JTk9ERV9TSVpFKGZzKTsKICAgICBWKHNfaW5vZGVzX3Blcl9ibG9jaykKICAgICBmcy0+ c19pdGJfcGVyX2dyb3VwID0gZnMtPnNfaW5vZGVzX3Blcl9ncm91cCAvZnMtPnNfaW5vZGVzX3Bl cl9ibG9jazsKICAgICBWKHNfaXRiX3Blcl9ncm91cCkKQEAgLTU3OCw3ICs1NzgsNyBAQAogCQkJ cmV0dXJuIChlcnJvcik7CiAJCX0KIAkJZXh0Ml9laTJpKChzdHJ1Y3QgZXh0Ml9pbm9kZSAqKSAo KGNoYXIgKilicC0+Yl9kYXRhICsKLQkJICAgIEVYVDJfSU5PREVfU0laRSAqIGlub190b19mc2Jv KGZzLCBpcC0+aV9udW1iZXIpKSwgaXApOworCQkgICAgRVhUMl9JTk9ERV9TSVpFKGZzKSAqIGlu b190b19mc2JvKGZzLCBpcC0+aV9udW1iZXIpKSwgaXApOwogCQlicmVsc2UoYnApOwogCQlWT1Bf VU5MT0NLKHZwLCAwLCB0ZCk7CiAJCXZyZWxlKHZwKTsKQEAgLTEwMTMsNyArMTAxMyw3IEBACiAJ CXJldHVybiAoZXJyb3IpOwogCX0KIAkvKiBjb252ZXJ0IGV4dDIgaW5vZGUgdG8gZGlub2RlICov Ci0JZXh0Ml9laTJpKChzdHJ1Y3QgZXh0Ml9pbm9kZSAqKSAoKGNoYXIgKilicC0+Yl9kYXRhICsg RVhUMl9JTk9ERV9TSVpFICoKKwlleHQyX2VpMmkoKHN0cnVjdCBleHQyX2lub2RlICopICgoY2hh ciAqKWJwLT5iX2RhdGEgKyBFWFQyX0lOT0RFX1NJWkUoZnMpICoKIAkJCWlub190b19mc2JvKGZz LCBpbm8pKSwgaXApOwogCWlwLT5pX2Jsb2NrX2dyb3VwID0gaW5vX3RvX2NnKGZzLCBpbm8pOwog CWlwLT5pX25leHRfYWxsb2NfYmxvY2sgPSAwOwo= --000e0cd20c568149aa0466fbca32-- From sarawgi.aditya at gmail.com Tue Apr 7 12:40:07 2009 From: sarawgi.aditya at gmail.com (aditya sarawgi) Date: Tue Apr 7 12:40:14 2009 Subject: kern/131086 : [ext2fs] mkfs.ext2 creates rotten partition Message-ID: <200904071940.n37Je6Dp046446@freefall.freebsd.org> The following reply was made to PR kern/131086; it has been noted by GNATS. From: aditya sarawgi To: bug-followup@FreeBSD.org, estellnb@gmail.com Cc: Subject: Re: kern/131086 : [ext2fs] mkfs.ext2 creates rotten partition Date: Tue, 7 Apr 2009 15:33:16 -0400 Sorry for all the mess. gmail is screwing up the attachments. Filesystem label= OS type: FreeBSD Block size=4096 (log=2) Fragment size=4096 (log=2) 245280 inodes, 979960 blocks 48998 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=1006632960 30 block groups 32768 blocks per group, 32768 fragments per group 8176 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736 Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 22 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. Filesystem volume name: Last mounted on: Filesystem UUID: c1658ff3-288c-469f-ba6d-dcd29e70e5d1 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: FreeBSD Inode count: 245280 Block count: 979960 Reserved block count: 48998 Free blocks: 962636 Free inodes: 245269 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 239 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8176 Inode blocks per group: 511 Filesystem created: Tue Apr 7 23:14:02 2009 Last mount time: n/a Last write time: Tue Apr 7 23:14:52 2009 Mount count: 0 Maximum mount count: 22 Last checked: Tue Apr 7 23:14:02 2009 Check interval: 15552000 (6 months) Next check after: Sun Oct 4 23:14:02 2009 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group wheel) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Default directory hash: half_md4 Directory Hash Seed: 1dfec28a-c0a2-472f-821d-55445e0b8130 and here's the patch http://pflog.net/~floyd/ext2fs.diff -- Cheers, Aditya Sarawgi From hali at datapipe.net Wed Apr 8 20:18:52 2009 From: hali at datapipe.net (Hussain Ali) Date: Wed Apr 8 20:18:59 2009 Subject: ZFSKnownProblems - needs revision? Message-ID: <20090409031851.GE6052@datapipe.com> > Ivan Voras wrote: > > >>* Are the issues on the list still there? >>* Are there any new issues? >>* Is somebody running ZFS in production (non-trivial loads) with >>success? What architecture / RAM / load / applications used? >>* How is your memory load? (does it leave enough memory for other >>services) I have a storage server its constantly heavy writing and reading (at times) though not with high concurrency: # df -h Filesystem Size Used Avail Capacity Mounted on /dev/ufs/rootfs 19G 395M 17G 2% / devfs 1.0K 1.0K 0B 100% /dev /dev/ufs/tmp 4.8G 20K 4.5G 0% /tmp /dev/ufs/usr 19G 2.9G 15G 16% /usr /dev/ufs/var 15G 7.5G 5.8G 56% /var backupstorage 94T 80T 13T 86% /backupstorage # cat /etc/sysctl.conf # $Id: sysctl.conf,v 1.3 2009/04/09 03:06:31 hali Exp root $ security.bsd.see_other_uids=0 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 net.inet.icmp.icmplim=50 net.inet.tcp.sendspace=524288 net.inet.tcp.recvspace=524288 net.inet.ip.intr_queue_maxlen=2048 net.inet.ip.intr_queue_drops=4096 kern.ipc.maxsockbuf=2097152 kern.ipc.somaxconn=8096 kern.maxfiles=443808 vfs.hirunningspace=4194304 vfs.ufs.dirhash_maxmem=4194304 vfs.lookup_shared=1 # cat /boot/loader.conf # $Id: loader.conf,v 1.4 2009/04/09 03:07:40 hali Exp root $ isp_load="YES" ispfw_load="YES" isp_2400_load="NO" vm.kmem_size_max="1073741824" vm.kmem_size="1073741824" vfs.zfs.prefetch_disable=1 vfs.zfs.arc_max="786M" kern.maxvnodes="50000 # zpool iostat 3 capacity operations bandwidth pool used avail read write read write ------------- ----- ----- ----- ----- ----- ----- backupstorage 80.4T 14.9T 2 355 186K 38.7M backupstorage 80.4T 14.9T 0 316 0 31.7M backupstorage 80.4T 14.9T 0 99 0 12.2M backupstorage 80.4T 14.9T 0 164 0 15.7M backupstorage 80.4T 14.9T 0 225 0 22.0M I have another another in another dc, but less capacity: backupstorage 56T 32T 24T 57% /backupstorage Both are the following: HP ProLiant DL385 G2 8GB RAM dual dual core AMD 2.2Ghz cpus 3 x Nexsan SataBEAST for the san. Inbound about 900Mb/s, uptime has generally been 3-4 months before increasing the ZFS arc/KVM sizes. I should just max it out but its relatively stable. Am looking for ZFSv8+ for L2Arc and separate ZIL. Load averages about ~ 1.0 . My wish list would be KVM support fot 64GB ARC, fusion-IO driver support for the ZIL, version 8 of ZFS in FreeBSD 7.2, active multipath, etc, etc.. It works, its stable, its production, but its not like i am cvsuping ports tree 100 times concurrently. -- -hussain This message may contain confidential or privileged information. If you are not the intended recipient, please advise us immediately and delete this message. See http://www.datapipe.com/emaildisclaimer.aspx for further information on confidentiality and the risks of non-secure electronic communication. If you cannot access these links, please notify us by reply message and we will send the contents to you. From mailing at gaturkey.com Thu Apr 9 00:43:24 2009 From: mailing at gaturkey.com (Global Access Travel) Date: Thu Apr 9 00:44:00 2009 Subject: Private Shore Excursions-Turkey Message-ID: [http://www.turkeycalling.us] PRIVATE SHORE EXCURSIONS- TURKEY Your cruise clients will make the best of their time in Turkey on a private shore excursion! Istanbul Kusadasi & Ephesus [mailto:incoming@gaturkey.com?subject=Private Shore Excursions- Turkey] **************************************************************************** Yasal Uyar?; Bu e-posta, sadece adreste belirtilen kisi veya kurulusun kullanimini hedeflemekte olup,mesajda yer alan bilgiler kisiye ozel ve gizli olabilir, yasalar ya da anlasmalar geregi ?c?nc? kisiler ile paylasilmasi m?mk?n olmayabilir.Mesaji alan kisi, mesajin g?nderilmek istendigi kisi veya kurulus degilse,bu mesaji yaymak,dagitmak veya kopyalamak yasaktir Mesaj tarafiniza yanlislikla ulasmissa l?tfen mesaji geri g?nderiniz ve sisteminizden siliniz. Global Turizm Hizmetleri Anonim Sirketi bu mesajin icerigi ile ilgili olarak hicbir hukuksal sorumlulugu kabul etmez. **************************************************************************** Disclaimer; This e-mail communication is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and that may not be made public by law or agreement. If the recipient of this message is not the intended recipient or entity, you are hereby notified that any further dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete it from your system. The Global Turizm Hizmetleri Anonim Sirketi does not accept legal responsibility for the contents of this message. *********************************************************************************************** Yasal Uyar?; Bu e-posta, sadece adreste belirtilen kisi veya kurulusun kullanimini hedeflemekte olup,mesajda yer alan bilgiler kisiye ozel ve gizli olabilir, yasalar ya da anlasmalar geregi ?c?nc? kisiler ile paylasilmasi m?mk?n olmayabilir.Mesaji alan kisi, mesajin g?nderilmek istendigi kisi veya kurulus degilse,bu mesaji yaymak,dagitmak veya kopyalamak yasaktir Mesaj tarafiniza yanlislikla ulasmissa l?tfen mesaji geri g?nderiniz ve sisteminizden siliniz. Global Turizm Hizmetleri Anonim Sirketi bu mesajin icerigi ile ilgili olarak hicbir hukuksal sorumlulugu kabul etmez. ********************************************************************************************** Disclaimer; This e-mail communication is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and that may not be made public by law or agreement. If the recipient of this message is not the intended recipient or entity, you are hereby notified that any further dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete it from your system. The Global Turizm Hizmetleri Anonim Sirketi does not accept legal responsibility for the contents of this message. This message was sent by: Global Access Incoming, Nuzhetiye cad, istanbul, besiktas 34357, Turkey Powered by iContact: http://freetrial.icontact.com To be removed click here: http://app.icontact.com/icp/mmail-mprofile.pl?r=46043391&l=82228&s=AAXS&m=562566&c=305227 Forward to a friend: http://app.icontact.com/icp/sub/forward?m=562566&s=46043391&c=AAXS&cid=305227 From brucec at FreeBSD.org Thu Apr 9 03:03:11 2009 From: brucec at FreeBSD.org (brucec@FreeBSD.org) Date: Thu Apr 9 03:03:17 2009 Subject: kern/129174: [nfs] [zfs] [panic] NFS v3 Panic when under high load exporting ZFS file system Message-ID: <200904091003.n39A3APQ057478@freefall.freebsd.org> Synopsis: [nfs] [zfs] [panic] NFS v3 Panic when under high load exporting ZFS file system State-Changed-From-To: open->closed State-Changed-By: brucec State-Changed-When: Thu Apr 9 10:02:17 UTC 2009 State-Changed-Why: Duplicate of kern/132068 http://www.freebsd.org/cgi/query-pr.cgi?pr=129174 From rmacklem at uoguelph.ca Thu Apr 9 11:58:23 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Apr 9 11:58:30 2009 Subject: integrating nfsv4 locking with nlm and local locking Message-ID: My nfsv4 server currently does VOP_ADVLOCK() with the non-blocking F_SETLK type and I had thought that was sufficient, but I now realize (thanks to a recent post by Zachary Loafman) that this breaks when a delegation for the file is issued to a client. (When a delegation for a file is issued to a client, it can do byte range locking locally, and the server doesn't know about these to do VOP_ADVLOCK() on the server machine.) I believe that Zachary would like to discuss a more general solution, including how to handle Open/Share locks, but in the meantime I'd like to solve this specific case in as simple a way as possible. Basically, I need a way to make sure delegations for a file don't exist when local byte range locking or locking via the NLM is being done on the file. The simplest thing I can think of is the following: When VOP_ADVLOCK() is called for a file (outside of the nfsv4 server), do two things: 1 - Make sure any outstanding delegations are recalled. I already have a function that does this, so it is a matter of figuring out where to put the call(s). 2 - Set a flag on the vnode, so that my nfsv4 server knows not to issue another delegation for that file. (I could test for locks via VOP_ADVLOCK() before issuing a delegation, but that has two problems.) 1 - Since the vnode is unlocked for VOP_ADVLOCK(), there could be a race where the nfsv4 server issues a delegation between the time outstanding delegations are recalled at #1 above and the VOP_ADVLOCK() sets the lock that I would see during the test. 2 - It would have to keep checking for a lock and might issue a delegation at a point where no lock is held, but one will be acquired soon, forcing the delegation recall. (It's much easier to not issue a delegation than recall one.) Once this flag is set, I think it would be ok if the flag remains set until the vnode is recycled, since it seems fairly likely that, once byte range locking is done on a file, more will happen. (If people were agreeable to the vnode flag, it looks like a VV_xxx flag would make more sense than a VI_xxx one. I think an atomic_set_int() would be sufficient to set it, even though the vnode lock isn't held?) So, how does this sound? rick From jh at saunalahti.fi Fri Apr 10 04:50:05 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Fri Apr 10 04:50:15 2009 Subject: kern/132068: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200904101150.n3ABo4Lg066334@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: Jaakko Heinonen To: Edward Fisk <7ogcg7g02@sneakemail.com> Cc: bug-followup@FreeBSD.org, Weldon Godfrey Subject: Re: kern/132068: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: Fri, 10 Apr 2009 14:46:02 +0300 On 2009-03-26, Jaakko Heinonen wrote: > I now know what is going on. The vnode may be reclaimed during > zfs_zget() because it doesn't hold the vnode lock (except when a new > znode is created). OK, I have now put together a patch which should avoid the original panic you reported. The same panic was also reported by Weldon Godfrey (Cc'd) on -fs: http://lists.freebsd.org/pipermail/freebsd-fs/2008-August/004998.html --- patch begins here --- Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (revision 190593) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (working copy) @@ -890,8 +890,9 @@ again: if (zp->z_unlinked) { err = ENOENT; } else { - if (ZTOV(zp) != NULL) - VN_HOLD(ZTOV(zp)); + vp = ZTOV(zp); + if (vp != NULL) + VN_HOLD(vp); else { if (first) { ZFS_LOG(1, "dying znode detected (zp=%p)", zp); @@ -907,12 +908,25 @@ again: tsleep(zp, 0, "zcollide", 1); goto again; } - *zpp = zp; err = 0; } dmu_buf_rele(db, NULL); mutex_exit(&zp->z_lock); ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); + if (err == 0) { + /* + * Check if we lost lost race against reclaim. + */ + VI_LOCK(vp); + if (vp->v_iflag & VI_DOOMED) { + VI_UNLOCK(vp); + VN_RELE(vp); + ZFS_LOG(1, "doomed vnode detected (zp=%p)", zp); + goto again; + } + VI_UNLOCK(vp); + *zpp = zp; + } return (err); } --- patch ends here --- The fix isn't perfect. Vnodes may be still reclaimed during zfs_zget() in forced unmount case. However zfs doesn't support forced unmounts at all right now. The patch is againt 8.0-CURRENT. -- Jaakko From rsofia at poly.edu Fri Apr 10 07:50:07 2009 From: rsofia at poly.edu (Randy Sofia) Date: Fri Apr 10 07:50:20 2009 Subject: kern/132337: [zfs] [panic] kernel panic in zfs_fuid_create_cred Message-ID: <200904101450.n3AEo61f008067@freefall.freebsd.org> The following reply was made to PR kern/132337; it has been noted by GNATS. From: Randy Sofia To: bug-followup@FreeBSD.org, freebsd@r.zeeb.org Cc: Subject: Re: kern/132337: [zfs] [panic] kernel panic in zfs_fuid_create_cred Date: Fri, 10 Apr 2009 10:27:09 -0400 touching a file `touch hi` on the 777 mode directory over nfs is the easiest and most guaranteed way to reproduce this panic. From linimon at FreeBSD.org Sat Apr 11 14:41:06 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sat Apr 11 14:41:12 2009 Subject: kern/133614: [smbfs] [panic] panic: ffs_truncate: read-only filesystem Message-ID: <200904112141.n3BLf5pd060120@freefall.freebsd.org> Old Synopsis: panic: ffs_truncate: read-only filesystem New Synopsis: [smbfs] [panic] panic: ffs_truncate: read-only filesystem Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Apr 11 21:40:22 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=133614 From james-freebsd-fs2 at jrv.org Sat Apr 11 20:52:07 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sat Apr 11 20:52:14 2009 Subject: turning off ZFS mountpoint property behavior? Message-ID: <49E16021.6040900@jrv.org> Is there a knob to turn off ZFS's mounting of filesystems based on the mountpoint property? It is most unhelpful when receiving replicas of filesystems to have a received snapshot suddenly mounted over /usr. I have two systems "prime" and "subprime", both of which have a large ZFS pool and a small UFS partition for maintenance. They are essentially the same except that /boot/loader.conf boots one into ZFS and the other into UFS. "prime" is the operational server using ZFS. "subprime" is essentially a hot spare booting UFS whose ZFS pool is to be kept in sync with the pool on "prime" sync zfs send/recv replication. Should the pool on "prime" fail, /boot/load.conf on "subprime" can be changed to boot its ZFS pool and the server is quickly available again, at the last snapshot replicated. Unfortunately when zfs recv runs and it receive a filesystem with property mountpoint=/usr it mounts that filesystem there. That's not desirable in my situation nor I suspect many others. Is there a sysctl or some other way to disable the automatic mount behavior? From rmacklem at uoguelph.ca Sun Apr 12 13:06:19 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sun Apr 12 13:30:02 2009 Subject: changing semantics of the va_filerev (code review) Message-ID: In summary, the nfsv4 server needs 3 changes to the FreeBSD kernel: 1 - Sharing of nfssvc(). (This was just checked into FreeBSD-CURRENT.) 2 - Some calls that recall delegations must be done before local VOP_RENAME() and VOP_ADVLOCK(). I am waiting for comments to a vague post on this before I mail my first stab at coding this. 3 - Support for the Change attribute, which is what this post is about. Once the above 3 things are resolved, the code should drop in without further changes outside of its subtree. As background, I believe va_filerev/i_modrev was added for nqnfs long long ago. Since it is not exposed to userland via the stat structure, I don't believe anything outside of the kernel uses it. Inside the kernel, the only thing that currently uses it is the nfs server, which uses it as the cookie verifier. (It really doesn't use it, since a client regurgitates it back to the server as opaque bits in the next readdir rpc and the server then ignores those bits. This is correct, since va_filerev is a bogus cookie verifier.) As such, I don't believe changing the semantics of va_filerev will break anything in FreeBSD. I'd like to change the semantics of va_filerev so that it can be used by the nfsv4 server as the Change attribute. To do this, it needs to change in 2 ways: - must change upon metadata changes as well as data changes - must persist across server reboots (ie. be moved to spare space in the on-disk i-node instead of in memory i-node) Here is the patch to ufs for the above, that I have been using for some time. Please review and comment. Thanks, rick --- ufs patch to change va_filerev semantics --- --- ufs/ufs/inode.h.sav 2009-04-12 02:29:05.000000000 -0400 +++ ufs/ufs/inode.h 2009-03-20 12:18:20.000000000 -0400 @@ -74,7 +74,6 @@ struct fs *i_fs; /* Associated filesystem superblock. */ struct dquot *i_dquot[MAXQUOTAS]; /* Dquot structures. */ - u_quad_t i_modrev; /* Revision level for NFS lease. */ /* * Side effects; used during directory lookup. */ --- ufs/ufs/dinode.h.sav 2009-04-12 02:29:40.000000000 -0400 +++ ufs/ufs/dinode.h 2008-08-25 17:31:55.000000000 -0400 @@ -145,7 +145,8 @@ ufs2_daddr_t di_extb[NXADDR];/* 96: External attributes block. */ ufs2_daddr_t di_db[NDADDR]; /* 112: Direct disk blocks. */ ufs2_daddr_t di_ib[NIADDR]; /* 208: Indirect disk blocks. */ - int64_t di_spare[3]; /* 232: Reserved; currently unused */ + u_int64_t di_modrev; /* 232: i_modrev for NFSv4 */ + int64_t di_spare[2]; /* 240: Reserved; currently unused */ }; /* @@ -183,7 +184,7 @@ int32_t di_gen; /* 108: Generation number. */ u_int32_t di_uid; /* 112: File owner. */ u_int32_t di_gid; /* 116: File group. */ - int32_t di_spare[2]; /* 120: Reserved; currently unused */ + u_int64_t di_modrev; /* 120: i_modrev for NFSv4 */ }; #define di_ogid di_u.oldids[1] #define di_ouid di_u.oldids[0] --- ufs/ufs/ufs_vnops.c.sav 2009-04-12 02:28:41.000000000 -0400 +++ ufs/ufs/ufs_vnops.c 2009-03-10 16:47:11.000000000 -0400 @@ -157,11 +157,12 @@ if (ip->i_flag & IN_UPDATE) { DIP_SET(ip, i_mtime, ts.tv_sec); DIP_SET(ip, i_mtimensec, ts.tv_nsec); - ip->i_modrev++; + DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); } if (ip->i_flag & IN_CHANGE) { DIP_SET(ip, i_ctime, ts.tv_sec); DIP_SET(ip, i_ctimensec, ts.tv_nsec); + DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); } out: @@ -446,6 +447,7 @@ vap->va_ctime.tv_sec = ip->i_din1->di_ctime; vap->va_ctime.tv_nsec = ip->i_din1->di_ctimensec; vap->va_bytes = dbtob((u_quad_t)ip->i_din1->di_blocks); + vap->va_filerev = ip->i_din1->di_modrev; } else { vap->va_rdev = ip->i_din2->di_rdev; vap->va_size = ip->i_din2->di_size; @@ -456,12 +458,12 @@ vap->va_birthtime.tv_sec = ip->i_din2->di_birthtime; vap->va_birthtime.tv_nsec = ip->i_din2->di_birthnsec; vap->va_bytes = dbtob((u_quad_t)ip->i_din2->di_blocks); + vap->va_filerev = ip->i_din2->di_modrev; } vap->va_flags = ip->i_flags; vap->va_gen = ip->i_gen; vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize; vap->va_type = IFTOVT(ip->i_mode); - vap->va_filerev = ip->i_modrev; return (0); } @@ -2223,7 +2225,6 @@ ASSERT_VOP_LOCKED(vp, "ufs_vinit"); if (ip->i_number == ROOTINO) vp->v_vflag |= VV_ROOT; - ip->i_modrev = init_va_filerev(); *vpp = vp; return (0); } From tcberner at gmail.com Sun Apr 12 18:28:12 2009 From: tcberner at gmail.com (Tobias C. Berner) Date: Sun Apr 12 19:22:58 2009 Subject: zfs and moving devices Message-ID: Hi I have a zfs pool NAME STATE READ WRITE CKSUM multimedia ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 Now, I need more sata-connecters. If I activate an other onboard-controller, the device names move: ad8 -> ad14 ad10 -> ad16 ad12 -> ad18 ad14 -> ad20 What is the proper way to handle this in zfs? thanks, Tobias From dimitar.vassilev at gmail.com Sun Apr 12 21:23:31 2009 From: dimitar.vassilev at gmail.com (Dimitar Vasilev) Date: Sun Apr 12 22:01:01 2009 Subject: zfs and moving devices In-Reply-To: References: Message-ID: <59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com> 2009/4/13 Tobias C. Berner : > Hi > > I have a zfs pool > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?multimedia ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?ad8 ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?ad10 ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?ad12 ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?ad14 ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > > Now, I need more sata-connecters. If I activate > an other onboard-controller, the device names > move: > > ? ad8 ?-> ad14 > ? ad10 -> ad16 > ? ad12 -> ad18 > ? ad14 -> ad20 > > > What is the proper way to handle this in zfs? > > > thanks, Tobias > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > There was an option for ata_static_id's in $KERNCONF - you need to enable this to keep the sata from shifting.Don't remember the exact magic instance - should be somewhere in LINT/hint/GENERIC. Should resemble something like ATA_STATIC_ID. Cheers, Dimitar From gpalmer at freebsd.org Sun Apr 12 22:05:51 2009 From: gpalmer at freebsd.org (Gary Palmer) Date: Sun Apr 12 23:04:53 2009 Subject: zfs and moving devices In-Reply-To: <59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com> References: <59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com> Message-ID: <20090413050550.GA44022@in-addr.com> On Mon, Apr 13, 2009 at 06:54:12AM +0300, Dimitar Vasilev wrote: > 2009/4/13 Tobias C. Berner : > > Hi > > > > I have a zfs pool > > > > ?? ?? ?? ??NAME ?? ?? ?? ??STATE ?? ?? READ WRITE CKSUM > > ?? ?? ?? ??multimedia ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0 > > ?? ?? ?? ?? ??ad8 ?? ?? ?? ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0 > > ?? ?? ?? ?? ??ad10 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0 > > ?? ?? ?? ?? ??ad12 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0 > > ?? ?? ?? ?? ??ad14 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0 > > > > Now, I need more sata-connecters. If I activate > > an other onboard-controller, the device names > > move: > > > > ?? ad8 ??-> ad14 > > ?? ad10 -> ad16 > > ?? ad12 -> ad18 > > ?? ad14 -> ad20 > > > > > > What is the proper way to handle this in zfs? > > > > > > thanks, Tobias > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > There was an option for ata_static_id's in $KERNCONF - you need to > enable this to keep the sata from shifting.Don't remember the exact > magic instance - should be somewhere in LINT/hint/GENERIC. > Should resemble something like ATA_STATIC_ID. % grep STATIC /sys/i386/conf/GENERIC options ATA_STATIC_ID # Static device numbering Regards, Gary From chris at young-alumni.com Sun Apr 12 22:22:01 2009 From: chris at young-alumni.com (Chris Ruiz) Date: Sun Apr 12 23:14:55 2009 Subject: zfs and moving devices In-Reply-To: References: Message-ID: On Apr 12, 2009, at 8:03 PM, Tobias C. Berner wrote: > Hi > > I have a zfs pool > > NAME STATE READ WRITE CKSUM > multimedia ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > > Now, I need more sata-connecters. If I activate > an other onboard-controller, the device names > move: > > ad8 -> ad14 > ad10 -> ad16 > ad12 -> ad18 > ad14 -> ad20 > > > What is the proper way to handle this in zfs? ZFS should just find the pool even though the device names have changed. Chris From morganw at chemikals.org Mon Apr 13 00:34:48 2009 From: morganw at chemikals.org (Wes Morgan) Date: Mon Apr 13 00:53:41 2009 Subject: zfs and moving devices In-Reply-To: References: Message-ID: On Mon, 13 Apr 2009, Tobias C. Berner wrote: > I have a zfs pool > > NAME STATE READ WRITE CKSUM > multimedia ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > > Now, I need more sata-connecters. If I activate > an other onboard-controller, the device names > move: > > ad8 -> ad14 > ad10 -> ad16 > ad12 -> ad18 > ad14 -> ad20 > > > What is the proper way to handle this in zfs? Export the pool before you make the change and it should work no problem. You may want to enable ATA_STATIC_ID as well so you won't have to worry about it either. On another note, that's a 4 device pool with no redundancy. Make sure you have frequent backups! I lost my "multimedia" pool once during a migration and was very sad. Now I use raidz2. From tcberner at gmail.com Mon Apr 13 03:18:11 2009 From: tcberner at gmail.com (Tobias C. Berner) Date: Mon Apr 13 04:06:30 2009 Subject: zfs and moving devices In-Reply-To: References: Message-ID: Am 13.04.2009, 09:34 Uhr, schrieb Wes Morgan : > On Mon, 13 Apr 2009, Tobias C. Berner wrote: > >> I have a zfs pool >> >> NAME STATE READ WRITE CKSUM >> multimedia ONLINE 0 0 0 >> ad8 ONLINE 0 0 0 >> ad10 ONLINE 0 0 0 >> ad12 ONLINE 0 0 0 >> ad14 ONLINE 0 0 0 >> >> Now, I need more sata-connecters. If I activate >> an other onboard-controller, the device names >> move: >> >> ad8 -> ad14 >> ad10 -> ad16 >> ad12 -> ad18 >> ad14 -> ad20 >> >> >> What is the proper way to handle this in zfs? > > Export the pool before you make the change and it should work no problem. Ok, I will try that, > You may want to enable ATA_STATIC_ID as well so you won't have to worry > about it either. ATA_STATIC_ID is enabled: options ATA_STATIC_ID # Static device numbering thanks, Tobias > > On another note, that's a 4 device pool with no redundancy. Make sure you > have frequent backups! I lost my "multimedia" pool once during a migration > and was very sad. Now I use raidz2. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Erstellt mit Operas revolution?rem E-Mail-Modul: http://www.opera.com/mail/ From brde at optusnet.com.au Mon Apr 13 03:52:41 2009 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Apr 13 04:30:31 2009 Subject: changing semantics of the va_filerev (code review) In-Reply-To: References: Message-ID: <20090413193936.A52183@delplex.bde.org> On Sun, 12 Apr 2009, Rick Macklem wrote: > In summary, the nfsv4 server needs 3 changes to the FreeBSD kernel: > ... > 3 - Support for the Change attribute, which is what this post is about. > ... > As background, I believe va_filerev/i_modrev was added for nqnfs > long long ago. Since it is not exposed to userland via the stat structure, va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this purpose, but it isn't actually a file generation number like its comments say (it is normally set to a random value on file creation then never changed) and it is exposed to userland (st_gen). > I don't believe anything outside of the kernel uses it. Inside the kernel, > the only thing that currently uses it is the nfs server, which uses it as > the cookie verifier. (It really doesn't use it, since a client regurgitates > it back to the server as opaque bits in the next readdir rpc > and the server then ignores those bits. This is correct, since va_filerev is > a bogus cookie verifier.) As such, I don't believe changing the semantics of > va_filerev will break anything in FreeBSD. va_gen isn't used much either. In ext2fs, i_gen is a copy of the on-disk field i_generation which is documented to be /* for NFS */ but nfs doesn't use va_gen at all. nfs3 (getattr, loadattrcache) doesn't even initialize va_gen. nfsv2 initializes it to a non-random value based on a timestamp. I'm not sure if it does this only on creation or on every cache miss or on every call. I think the uninitialized va_gen gives stack garbage in st_gen, but in tests I get 0 for both nfsv3 and nfsv2 (as root -- st_gen is always 0 for non-root). I don't understand the security issues for *_gen, but remember its being changed for security. cvs history shows that it used to actually be a generation number in at least ffs, but for ffs files and not for individual file changes (or for individual ffs file systems or all file systems). > I'd like to change the semantics of va_filerev so that it can be used > by the nfsv4 server as the Change attribute. To do this, it needs to > change in 2 ways: > - must change upon metadata changes as well as data changes > - must persist across server reboots (ie. be moved to spare space in > the on-disk i-node instead of in memory i-node) Many nonstandard file systems, e.g., msdosfs, have no space to spare. Read-only file systems like cd9660 and udf probably don't need a a variable generation count (since they never change), but their current handling of va_filerev and va_gen is wrong if these fields have any other semantics. These file systems just initialize va_gen to 1 for all files and va_gen to 0 (with an XXX in udf only) for all files. va_ctime should give what you want for all file systems, since it should be increased whenever anything changes. However, most file systems always set the nsec part to 0, so va_ctime doesn't track all file changes. This is a problem for things like make(1) too, so if nsec timestamps aren't available or are take too long or are not fine-grained enough, the nsec part should be abused as a generation counter so that any change gives a strictly larger timestamp. The case where someone sets the clock backwards is broken but won't happen often. Many nonstandard file systems, e.g., msdosfs, have no space for an on-disk ctime, so they fake va_ctime using an on-disk mtime. Since such file systems don't have many attributes, only a few more cases are broken. > Here is the patch to ufs for the above, that I have been using for some > time. Please review and comment. > ... > --- ufs/ufs/ufs_vnops.c.sav 2009-04-12 02:28:41.000000000 -0400 > +++ ufs/ufs/ufs_vnops.c 2009-03-10 16:47:11.000000000 -0400 > @@ -157,11 +157,12 @@ > if (ip->i_flag & IN_UPDATE) { > DIP_SET(ip, i_mtime, ts.tv_sec); > DIP_SET(ip, i_mtimensec, ts.tv_nsec); > - ip->i_modrev++; > + DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); > } > if (ip->i_flag & IN_CHANGE) { > DIP_SET(ip, i_ctime, ts.tv_sec); > DIP_SET(ip, i_ctimensec, ts.tv_nsec); > + DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); > } IN_UPDATE implies IN_CHANGE (unless there is a bug). Thus the above gives an extra increment. Strictly, if you want to track _all_ metadata changes, then you need an increment for IN_ACCESS, and va_ctime will no longer be nearly usable since is not changed by read accesses. I hope you don't want this. Bruce From bugmaster at FreeBSD.org Mon Apr 13 04:06:54 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Apr 13 04:33:38 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200904131106.n3DB6qDT084932@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc 57 problems total. From rmacklem at uoguelph.ca Mon Apr 13 08:07:09 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Apr 13 08:35:40 2009 Subject: changing semantics of the va_filerev (code review) In-Reply-To: <20090413193936.A52183@delplex.bde.org> References: <20090413193936.A52183@delplex.bde.org> Message-ID: On Mon, 13 Apr 2009, Bruce Evans wrote: > > va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this > purpose, but it isn't actually a file generation number like its > comments say (it is normally set to a random value on file creation > then never changed) and it is exposed to userland (st_gen). > i_gen is used by NFS to create T-stable (valid for a long time, including a long time after the file is removed) file handles. It is used by ffs_vptofh() to create the file handles for NFS that are recognized as representing removed files, even after an i-node gets reused such that the i-node number now represents another file. > > va_gen isn't used much either. In ext2fs, i_gen is a copy of the > on-disk field i_generation which is documented to be /* for NFS */ but > nfs doesn't use va_gen at all. nfs3 (getattr, loadattrcache) doesn't > even initialize va_gen. nfsv2 initializes it to a non-random value > based on a timestamp. I'm not sure if it does this only on creation > or on every cache miss or on every call. I think the uninitialized > va_gen gives stack garbage in st_gen, but in tests I get 0 for both > nfsv3 and nfsv2 (as root -- st_gen is always 0 for non-root). I don't > understand the security issues for *_gen, but remember its being changed > for security. cvs history shows that it used to actually be a generation > number in at least ffs, but for ffs files and not for individual file > changes (or for individual ffs file systems or all file systems). > Its initial value doesn't matter for it to work correctly. It should get incremented each time an i-node gets reused for a different file. (That's what the ESTALE magic is, the server reporting to the client that the file handle is for a file that has been removed.) The "security" business is a bit bogus to me. It's one of those security by obscurity tricks, imho. The problem was that a file handle was easy to fake when i_gen is initially 0. Initializing it to a non-zero value makes faking one a little harder, but... Part of the reason for doing this was that IP#s were only checked against exports at mount time on some systems (BSD has never been this way) and faking the one file handle for the root of the FS (root i-node#, i_gen == 0) bypassed exports and tah dah. >> I'd like to change the semantics of va_filerev so that it can be used >> by the nfsv4 server as the Change attribute. To do this, it needs to >> change in 2 ways: >> - must change upon metadata changes as well as data changes >> - must persist across server reboots (ie. be moved to spare space in >> the on-disk i-node instead of in memory i-node) > > Many nonstandard file systems, e.g., msdosfs, have no space to spare. > If a file system can't support it correctly, faking it with something like modify time is about all you can do. Since Change is supposed to change on every file modification, this fails when multiple changes occur within the same tod clock time or the clock gets reset backwards, as you noted below. (Linux uses a modify time with a 1sec clock resolution for Change, which isn't correct and the Linux nfs server folks know that. Since this breaks the AIX nfsv4 client, the AIX folks tend to remind them:-) > Read-only file systems like cd9660 and udf probably don't need a a > variable generation count (since they never change), but their current > handling of va_filerev and va_gen is wrong if these fields have any > other semantics. These file systems just initialize va_gen to 1 for > all files and va_gen to 0 (with an XXX in udf only) for all files. > Since they only need to change for modifications and their initial values don't really matter, the above sounds fine to me. > va_ctime should give what you want for all file systems, since it > should be increased whenever anything changes. However, most file There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't. Since the Change attribute must change for every file modification, I feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits, so it won't wrap around for a little while.) > systems always set the nsec part to 0, so va_ctime doesn't track > all file changes. This is a problem for things like make(1) too, > so if nsec timestamps aren't available or are take too long or are > not fine-grained enough, the nsec part should be abused as a generation > counter so that any change gives a strictly larger timestamp. The > case where someone sets the clock backwards is broken but won't > happen often. > > Many nonstandard file systems, e.g., msdosfs, have > no space for an on-disk ctime, so they fake va_ctime using an on-disk > mtime. Since such file systems don't have many attributes, only a > few more cases are broken. > Yep, that's why ctime/mtime aren't sufficient. If a read/write file system doesn't have support for it, all you can do is fake it and hope the client works ok. I suspect the Linux folks will eventually start to add support for it to ext3fs etc, because of the above, but who knows. It seems that FreeBSD mostly uses FFS and ZFS (which should have support for it, since the Solaris folks are into NFSv4?), so at least we should be able to make those work correctly. Have a good week, rick From linimon at FreeBSD.org Mon Apr 13 10:33:43 2009 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Apr 13 11:13:00 2009 Subject: kern/133676: [smbfs] [panic] umount -f'ing a vnode-based memory disk from off a SMB share caused a reboot Message-ID: <200904131733.n3DHXglX020272@freefall.freebsd.org> Old Synopsis: umount -f'ing a vnode-based memory disk from off a SMB share caused a reboot New Synopsis: [smbfs] [panic] umount -f'ing a vnode-based memory disk from off a SMB share caused a reboot Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Apr 13 17:31:51 UTC 2009 Responsible-Changed-Why: Reclassify and reassign. http://www.freebsd.org/cgi/query-pr.cgi?pr=133676 From jhb at freebsd.org Mon Apr 13 10:55:37 2009 From: jhb at freebsd.org (John Baldwin) Date: Mon Apr 13 11:17:35 2009 Subject: integrating nfsv4 locking with nlm and local locking In-Reply-To: References: Message-ID: <200904131146.21640.jhb@freebsd.org> On Thursday 09 April 2009 3:04:37 pm Rick Macklem wrote: > My nfsv4 server currently does VOP_ADVLOCK() with the non-blocking F_SETLK > type and I had thought that was sufficient, but I now realize (thanks to > a recent post by Zachary Loafman) that this breaks when a delegation for > the file is issued to a client. (When a delegation for a file is issued > to a client, it can do byte range locking locally, and the server doesn't > know about these to do VOP_ADVLOCK() on the server machine.) > > I believe that Zachary would like to discuss a more general solution, > including how to handle Open/Share locks, but in the meantime I'd like to > solve this specific case in as simple a way as possible. > > Basically, I need a way to make sure delegations for a file don't exist > when local byte range locking or locking via the NLM is being done on > the file. > > The simplest thing I can think of is the following: > When VOP_ADVLOCK() is called for a file (outside of the nfsv4 server), > do two things: > 1 - Make sure any outstanding delegations are recalled. > I already have a function that does this, so it is a matter > of figuring out where to put the call(s). > 2 - Set a flag on the vnode, so that my nfsv4 server knows not to > issue another delegation for that file. > (I could test for locks via VOP_ADVLOCK() before issuing a > delegation, but that has two problems.) > 1 - Since the vnode is unlocked for VOP_ADVLOCK(), there could > be a race where the nfsv4 server issues a delegation > between the time outstanding delegations are recalled at > #1 above and the VOP_ADVLOCK() sets the lock that I would > see during the test. > 2 - It would have to keep checking for a lock and might issue > a delegation at a point where no lock is held, but one > will be acquired soon, forcing the delegation recall. > (It's much easier to not issue a delegation than recall > one.) > Once this flag is set, I think it would be ok if the flag > remains set until the vnode is recycled, since it seems > fairly likely that, once byte range locking is done on a > file, more will happen. > (If people were agreeable to the vnode flag, it looks like > a VV_xxx flag would make more sense than a VI_xxx one. I > think an atomic_set_int() would be sufficient to set it, > even though the vnode lock isn't held?) You have to hold the vnode lock to set a VV flag always. Even if you do an atomic operation to set your flag, another thread might be setting a flag at the same time using non-atomic ops and could clobber your change (if it does a read-modify-write and reads a value that pre-dates your atomic_set_int() but its write posts after your write). -- John Baldwin From hch at infradead.org Mon Apr 13 11:33:59 2009 From: hch at infradead.org (Christoph Hellwig) Date: Mon Apr 13 12:29:14 2009 Subject: changing semantics of the va_filerev (code review) In-Reply-To: References: <20090413193936.A52183@delplex.bde.org> Message-ID: <20090413183351.GA27610@infradead.org> On Mon, Apr 13, 2009 at 11:13:33AM -0400, Rick Macklem wrote: > If a file system can't support it correctly, faking it with something > like modify time is about all you can do. Since Change is supposed to > change on every file modification, this fails when multiple changes > occur within the same tod clock time or the clock gets reset backwards, > as you noted below. (Linux uses a modify time with a 1sec clock > resolution for Change, which isn't correct and the Linux nfs server > folks know that. Since this breaks the AIX nfsv4 client, the AIX folks > tend to remind them:-) Linux uses whatever granularity the underlying filesystems support. For a lot of all designs that may be 1 second, for most recent filesystems it's better. > Yep, that's why ctime/mtime aren't sufficient. > If a read/write file system doesn't have support for it, all you > can do is fake it and hope the client works ok. I suspect the Linux > folks will eventually start to add support for it to ext3fs etc, because > of the above, but who knows. It seems that FreeBSD mostly uses FFS and > ZFS (which should have support for it, since the Solaris folks are into > NFSv4?), so at least we should be able to make those work correctly. Linux already has the changecount in ext4 but the NFS server doesn't yet use it. Also it's beeing implemented for XFS and others. From rmacklem at uoguelph.ca Mon Apr 13 11:37:37 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Apr 13 12:29:41 2009 Subject: integrating nfsv4 locking with nlm and local locking In-Reply-To: <200904131146.21640.jhb@freebsd.org> References: <200904131146.21640.jhb@freebsd.org> Message-ID: On Mon, 13 Apr 2009, John Baldwin wrote: > > You have to hold the vnode lock to set a VV flag always. Even if you do an > atomic operation to set your flag, another thread might be setting a flag at > the same time using non-atomic ops and could clobber your change (if it does > a read-modify-write and reads a value that pre-dates your atomic_set_int() > but its write posts after your write). > Righto, thanks. (I should have realized that.) I guess I'll have to use a VI_xxx flag or add a field to the vnode to make the scheme work. I am just trying to come up with a stopgap solution until something more comprehensive can be done w.r.t. handling delegations. VI_xxx are currently used for handling the vnode and it doesn't seem appropriate to add one of these to indicate "don't issue delegations". How do others feel w.r.t. adding a VI_xxx flag vs adding v_disabledelegate to the structure? There is always the fallback position of shipping an nfsv4 server with delegations disabled, until handling them when local VOPs are done, gets resolved. rick From brde at optusnet.com.au Tue Apr 14 03:08:57 2009 From: brde at optusnet.com.au (Bruce Evans) Date: Tue Apr 14 03:38:37 2009 Subject: changing semantics of the va_filerev (code review) In-Reply-To: References: <20090413193936.A52183@delplex.bde.org> Message-ID: <20090414180826.J53102@delplex.bde.org> On Mon, 13 Apr 2009, Rick Macklem wrote: > On Mon, 13 Apr 2009, Bruce Evans wrote: >> va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this >> purpose, but it isn't actually a file generation number like its >> comments say (it is normally set to a random value on file creation >> then never changed) and it is exposed to userland (st_gen). >> > i_gen is used by NFS to create T-stable (valid for a long time, including > a long time after the file is removed) file handles. It is used by > ffs_vptofh() to create the file handles for NFS that are recognized as > representing removed files, even after an i-node gets reused such that > the i-node number now represents another file. Oops, I missed that since nfs's use of i_gen is indirect. What does nfs do for file systems that don't detect removed files, e.g., msdosfs. vptofh and fhtovp routines seem to have too many differences. E.g., file systems based on ffs return ESTALE for removed files, but zfs_fhtovp() returns EINVAL. I just noticed than the increment of i_gen was slightly broken for ffs by a type mismatch in ffs2 (affects ffs1 too). Originally, i_gen had the same type as di_gen (int32_t). Now i_gen has type int64_t but in ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen still has type int32_t (apparently there was insufficient space to expand it). This makes the overflow check in ffs_alloc.c (++ip->i_gen == 0) more broken than before. Previously it only gave undefined behaviour followed by a bogus check when overflow occurs for incrementing from INT32_T_MAX. Now it has no effect, since it takes 293 years of incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX. Overflow now occurs on assignment to di_gen. The result of this bug is almost the the same as removing the silly part of the security code -- the re-randomization on overflow. i_gen may grow larger than UINT32_T_MAX, but usually refresh from the dinode will keep it smaller. When it starts near UINT32_T_MAX and grows larger, the overflow on assignment and a subsequent refresh will make it nearly 0. Except, in 1 in every 2**32 cases, when the overflow makes di_gen exactly 0, the subsequent refresh will randomize i_gen. >> va_ctime should give what you want for all file systems, since it >> should be increased whenever anything changes. However, most file > There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't. Are there? This would be a bug. I checked that ffs doesn't have this bug. > Since the Change attribute must change for every file modification, I > feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits, > so it won't wrap around for a little while.) It would be a large and obvious bug to modify the file data (IN_UPDATE) without setting IN_CHANGE. >> systems always set the nsec part to 0, so va_ctime doesn't track >> all file changes. This is a problem for things like make(1) too, >> so if nsec timestamps aren't available or are take too long or are >> not fine-grained enough, the nsec part should be abused as a generation >> counter so that any change gives a strictly larger timestamp. The >> case where someone sets the clock backwards is broken but won't >> happen often. >> >> Many nonstandard file systems, e.g., msdosfs, have >> no space for an on-disk ctime, so they fake va_ctime using an on-disk >> mtime. Since such file systems don't have many attributes, only a >> few more cases are broken. >> > Yep, that's why ctime/mtime aren't sufficient. > If a read/write file system doesn't have support for it, all you > can do is fake it and hope the client works ok. I suspect the Linux They need to be fixed or faked well enough for make(1) too. When the dinode has no space to spare, something can be done by keeping state in the inode or vnode. This won't work across reboots of course (except by hashing a reboot counter into the generation counts or timestamps) but might be enough for all short-term uses. I'm not sure how much is safe here. Bruce From rmacklem at uoguelph.ca Tue Apr 14 08:52:35 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Apr 14 09:10:00 2009 Subject: changing semantics of the va_filerev (code review) In-Reply-To: <20090414180826.J53102@delplex.bde.org> References: <20090413193936.A52183@delplex.bde.org> <20090414180826.J53102@delplex.bde.org> Message-ID: On Tue, 14 Apr 2009, Bruce Evans wrote: [stuff snipped] > > Oops, I missed that since nfs's use of i_gen is indirect. What does > nfs do for file systems that don't detect removed files, e.g., msdosfs. > vptofh and fhtovp routines seem to have too many differences. E.g., An nfs client can always think that a file exists for a short period of time (until client side caches time out) after it has been removed locally or by another client, on the server. The more serious failure occurs when the i-node/directory entry gets reallocated. At that point, the client might access the attributes/data of the new file, thinking it was the old file. (In the worst case, this could persist until the client does a umount() of the file system.) However, typically, unless it has the file open when the file is removed locally on the server or by another client, nothing nasty will happen. (And I think if the client has name caching disabled, nothing nasty can happen.) At least, that's my best guess at an answer. > file systems based on ffs return ESTALE for removed files, but > zfs_fhtovp() returns EINVAL. > I don't know why zfs would choose a different errno, but I don't think that a different errno will have much effect. It's a terminal error in either case. (I can't think of anything clever that a client can do for ESTALE. I wouldn't be surprised if some clients end up translating ESTALE to EINVAL, since POSIX apps don't expect ESTALE.) I suppose someone could argue it violates the RFC, but only if they know that the server should generate NFS3ERR_STALE for that case. > I just noticed than the increment of i_gen was slightly broken for ffs > by a type mismatch in ffs2 (affects ffs1 too). Originally, i_gen had > the same type as di_gen (int32_t). Now i_gen has type int64_t but in > ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen > still has type int32_t (apparently there was insufficient space to > expand it). This makes the overflow check in ffs_alloc.c (++ip->i_gen > == 0) more broken than before. Previously it only gave undefined > behaviour followed by a bogus check when overflow occurs for incrementing > from INT32_T_MAX. Now it has no effect, since it takes 293 years of > incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX. > Overflow now occurs on assignment to di_gen. > > The result of this bug is almost the the same as removing the silly > part of the security code -- the re-randomization on overflow. i_gen > may grow larger than UINT32_T_MAX, but usually refresh from the dinode > will keep it smaller. When it starts near UINT32_T_MAX and grows > larger, the overflow on assignment and a subsequent refresh will make > it nearly 0. Except, in 1 in every 2**32 cases, when the overflow makes > di_gen exactly 0, the subsequent refresh will randomize i_gen. > Sounds like you have a better understanding of this than I. Since all nfs really cares about is that the value of i_gen has changed after the i-node is re-allocated, I doubt this causes grief in practice. Personally, I'd just leave it as a 32bit number and initialize it to some pseudo-random value in a range that is a small fraction of UINT32_T_MAX (maybe 1<->1000000) if it is 0, otherwise just increment it by a small value. (I've already noted that I'm not a big fan of security by obscurity anyhow:-) >>> va_ctime should give what you want for all file systems, since it >>> should be increased whenever anything changes. However, most file >> There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't. > > Are there? This would be a bug. I checked that ffs doesn't have this > bug. > Oops, my mistake. I grep'd again and see it is IN_CHANGE that gets set without IN_UPDATE and not the other way around, which makes sense, since I can't think of how you can modify the data without modifying some attribute. So, the Change attribute only needs to change for IN_CHANGE (with all those uses of "change", it must be good:-). Thanks for pointing this out. > > They need to be fixed or faked well enough for make(1) too. > > When the dinode has no space to spare, something can be done by keeping > state in the inode or vnode. This won't work across reboots of course > (except by hashing a reboot counter into the generation counts or > timestamps) but might be enough for all short-term uses. I'm not sure > how much is safe here. > Yes, definitely. I think doing something like having an in-memory field for va_filerev/i_modrev where the high order bits are initialized by ctime (using whatever bits are valid, given tod clock resolution) when read in and then incrementing by 1 for each change, would be a good compromise. rick From gavin at FreeBSD.org Thu Apr 16 04:47:54 2009 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Thu Apr 16 05:28:27 2009 Subject: kern/65920: [nwfs] Mounted Netware filesystem behaves strange Message-ID: <200904161147.n3GBlr9C082160@freefall.freebsd.org> Synopsis: [nwfs] Mounted Netware filesystem behaves strange Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: gavin Responsible-Changed-When: Thu Apr 16 11:47:06 UTC 2009 Responsible-Changed-Why: Over to maintainer(s) http://www.freebsd.org/cgi/query-pr.cgi?pr=65920 From roberto at keltia.freenix.fr Thu Apr 16 05:34:54 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Thu Apr 16 06:16:05 2009 Subject: turning off ZFS mountpoint property behavior? In-Reply-To: <49E16021.6040900@jrv.org> References: <49E16021.6040900@jrv.org> Message-ID: <20090416123447.GB96263@keltia.freenix.fr> According to James R. Van Artsdalen: > Unfortunately when zfs recv runs and it receive a filesystem with > property mountpoint=/usr it mounts that filesystem there. That's not > desirable in my situation nor I suspect many others. > > Is there a sysctl or some other way to disable the automatic mount behavior? Have you tried to use legacy? zfs set mountpoint=legacy tank/usr -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From roberto at keltia.freenix.fr Thu Apr 16 09:01:30 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Thu Apr 16 09:31:49 2009 Subject: Booting from ZFS raidz In-Reply-To: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> Message-ID: <20090416160128.GA831@keltia.freenix.fr> According to Stefan Bethke: > Created a GPT label and one partition on each of the three drives: > > gpart create -s gpt $1 > gpart add -b 34 -s 128 -t freebsd-boot $1 > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 > gpart add -b 512 -s 41900000 -t freebsd-zfs $1 > gpart list $1 Coming back to this thread, I'm playing with this setup (and the script mentioned in another thread). When I try to zpool set bootfs=tank with tank containing a raidz array, zpool refuses to set the property, saying it is not available. Using the same commandline on a mirror works. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From roberto at keltia.freenix.fr Thu Apr 16 09:40:57 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Thu Apr 16 10:31:26 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090416160128.GA831@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> Message-ID: <20090416164053.GA80978@keltia.freenix.fr> According to Ollivier Robert: > with tank containing a raidz array, zpool refuses to set the property, > saying it is not available. Using the same commandline on a mirror works. BTW all messages I've found on this subject assume (like the script does) that one can do installworld/installkernel. I can setup the whole gpt thing from livefs, even extracting all dists on the newly zfs pool manually by playing with livefs/dvd1 but it can not boot afterwards because / can not be found. I must have missed something... I long for pcbsd setup with zfs support in fact I think :( -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From hartzell at alerce.com Thu Apr 16 10:09:44 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu Apr 16 10:42:52 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090416160128.GA831@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> Message-ID: <18919.25164.567669.809759@already.local> Ollivier Robert writes: > According to Stefan Bethke: > > Created a GPT label and one partition on each of the three drives: > > > > gpart create -s gpt $1 > > gpart add -b 34 -s 128 -t freebsd-boot $1 > > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1 > > gpart add -b 512 -s 41900000 -t freebsd-zfs $1 > > gpart list $1 > > Coming back to this thread, I'm playing with this setup (and the script > mentioned in another thread). When I try to > > zpool set bootfs=tank > > with tank containing a raidz array, zpool refuses to set the property, > saying it is not available. Using the same commandline on a mirror works. In Doug's original email announcing raidz boot support, http://kerneltrap.org/mailarchive/freebsd-fs/2008/12/17/4441084 he says: Currently the ZFS kernel code refuses to allow you to set the bootfs pool property on raidz pools (because Solaris can't boot from them). This means that you are limited to booting from the root filesystem of the pool for now (it shouldn't be hard to relax this restriction). The root filesystem of the pool should contain a directory /boot with the usual contents which must include a /boot/loader which was built with the 'LOADER_ZFS_SUPPORT' make option. Which jsut means that you need a populated boot directory at the top of the tank (e.g. /data/boot). If you're using the create-zfsboot-gpt.sh file that was posted here recently, you'll need to rework it a bit, since it puts the root dir at /data/ROOT/data. g. From nhoyle at hoyletech.com Thu Apr 16 11:25:43 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Thu Apr 16 11:57:26 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090416164053.GA80978@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <20090416164053.GA80978@keltia.freenix.fr> Message-ID: <49E774F0.1020706@hoyletech.com> Ollivier Robert wrote: > According to Ollivier Robert: > >> with tank containing a raidz array, zpool refuses to set the property, >> saying it is not available. Using the same commandline on a mirror works. >> > > BTW all messages I've found on this subject assume (like the script does) > that one can do installworld/installkernel. > > I can setup the whole gpt thing from livefs, even extracting all dists on > the newly zfs pool manually by playing with livefs/dvd1 but it can not boot > afterwards because / can not be found. > > I must have missed something... I long for pcbsd setup with zfs support in > fact I think :( > To my knowledge, RAID-Z root (boot) pools are not supported. I know that this is true for upstream (Solaris) ZFS, and unless the FreeBSD folks implemented it when I wasn't looking, you can't do it on FreeBSD either. I believe the current implementation essentially reads "through" the mirror structure on a mirrored device and can find all of the data by "dumb" sequential reads on the first disk, just as it would with unpooled disks. In the case of RAID-Z the boot loader would have to be far more intelligent in locating where to read the next block from. It is my understanding that this is a planned future improvement (at least for upstream) but haven't heard any update on it in a while. -Nathanael From nhoyle at hoyletech.com Thu Apr 16 11:28:11 2009 From: nhoyle at hoyletech.com (Nathanael Hoyle) Date: Thu Apr 16 11:58:39 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090416164053.GA80978@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <20090416164053.GA80978@keltia.freenix.fr> Message-ID: <49E7758A.30400@hoyletech.com> Ollivier Robert wrote: > According to Ollivier Robert: > >> with tank containing a raidz array, zpool refuses to set the property, >> saying it is not available. Using the same commandline on a mirror works. >> > > BTW all messages I've found on this subject assume (like the script does) > that one can do installworld/installkernel. > > I can setup the whole gpt thing from livefs, even extracting all dists on > the newly zfs pool manually by playing with livefs/dvd1 but it can not boot > afterwards because / can not be found. > > I must have missed something... I long for pcbsd setup with zfs support in > fact I think :( > Ok, I screwed up. Not on my usual workstation and my email client mis-threaded discussions. I now realize you were referring to the experimental capabilities that Doug has been working on; my apologies for jumping the gun with the "can't do that" response. -Nathanael From Alexander at Leidinger.net Fri Apr 17 06:06:11 2009 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Fri Apr 17 06:28:13 2009 Subject: ZFS: unlimited arc cache growth? Message-ID: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> Hi, to fs@, please CC me, as I'm not subscribed. I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some point I've seen more than 500M) than what I have configured in vfs.zfs.arc_max (40M). After a while FS operations (e.g. pkgdb -F with about 900 packages... my specific workload is the fixup of gnome packages after the removal of the obsolete libusb port) get very slow (in my specific example I let the pkgdb run several times over night and it still is not finished). The big problem with this is, that at some point in time the machine reboots (panic, page fault, page not present, during a fork1). I have the impression (beware, I have a watchdog configured, as I don't know if a triggered WD would cause the same panic, the following is just a guess) that I run out of memory of some kind (I have 1G RAM, i386, max kmem size 700M). I restarted pkgdb several times after a reboot, and it continues to process the libusb removal, but hey, this is anoying. Does someone see something similar to what I describe (mainly the growth of the arc cache way beyond what is configured)? Anyone with some ideas what to try? Bye, Alexander. -- When you go out to buy, don't show your silver. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From roberto at keltia.freenix.fr Fri Apr 17 06:14:47 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Fri Apr 17 06:43:08 2009 Subject: Booting from ZFS raidz In-Reply-To: <18919.25164.567669.809759@already.local> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <18919.25164.567669.809759@already.local> Message-ID: <20090417131443.GD96263@keltia.freenix.fr> According to George Hartzell: > Which jsut means that you need a populated boot directory at the top > of the tank (e.g. /data/boot). If you're using the > create-zfsboot-gpt.sh file that was posted here recently, you'll need > to rework it a bit, since it puts the root dir at /data/ROOT/data. OK, following this, I managed the boot code to find loader & loader.conf. It stops when it can't find the root I want it to boot from though. The ? prompt shows me all devices (da{0,1,2}, da{0,1,2}p{1,2} and label/swap) but trying to use zfs:whatever does not seem to work. loader.conf is very small: ----- zfs_load="YES" geom_label_load="YES" vfs.root.mountfrom="zfs:tank/ROOT/tank" ----- I did zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root) the other fs are in their usual place zfs set mountpoint=/usr tank/usr zfs set mountpoint=/var tank/var Any other ideas. I'll try to summarize here and on the wiki when I'm done. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From hartzell at alerce.com Fri Apr 17 06:47:25 2009 From: hartzell at alerce.com (George Hartzell) Date: Fri Apr 17 07:30:19 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090417131443.GD96263@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <18919.25164.567669.809759@already.local> <20090417131443.GD96263@keltia.freenix.fr> Message-ID: <18920.34924.2076.295983@already.local> Ollivier Robert writes: > According to George Hartzell: > > Which jsut means that you need a populated boot directory at the top > > of the tank (e.g. /data/boot). If you're using the > > create-zfsboot-gpt.sh file that was posted here recently, you'll need > > to rework it a bit, since it puts the root dir at /data/ROOT/data. > > OK, following this, I managed the boot code to find loader & loader.conf. > It stops when it can't find the root I want it to boot from though. > > The ? prompt shows me all devices (da{0,1,2}, da{0,1,2}p{1,2} and > label/swap) but trying to use zfs:whatever does not seem to work. > > loader.conf is very small: > ----- > zfs_load="YES" > geom_label_load="YES" > vfs.root.mountfrom="zfs:tank/ROOT/tank" > ----- > > I did > zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root) > > the other fs are in their usual place > zfs set mountpoint=/usr tank/usr > zfs set mountpoint=/var tank/var > > Any other ideas. I'll try to summarize here and on the wiki when I'm done. Did you build the loader with LOADER_ZFS_SUPPORT=YES enabled? I just threw that line in my /etc/make.conf and rebuilt everything. g. From roberto at keltia.freenix.fr Fri Apr 17 06:57:40 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Fri Apr 17 07:35:20 2009 Subject: Booting from ZFS raidz In-Reply-To: <18920.34924.2076.295983@already.local> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <18919.25164.567669.809759@already.local> <20090417131443.GD96263@keltia.freenix.fr> <18920.34924.2076.295983@already.local> Message-ID: <20090417135737.GE96263@keltia.freenix.fr> According to George Hartzell: > Did you build the loader with LOADER_ZFS_SUPPORT=YES enabled? > > I just threw that line in my /etc/make.conf and rebuilt everything. Yes, I even reinstalled the gpart bootcode. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From ben at wanderview.com Fri Apr 17 07:20:15 2009 From: ben at wanderview.com (Ben Kelly) Date: Fri Apr 17 07:42:27 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> Message-ID: On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote: > to fs@, please CC me, as I'm not subscribed. > > I monitored (by hand) a while the sysctls > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. > Both grow way higher (at some point I've seen more than 500M) than > what I have configured in vfs.zfs.arc_max (40M). > > After a while FS operations (e.g. pkgdb -F with about 900 > packages... my specific workload is the fixup of gnome packages > after the removal of the obsolete libusb port) get very slow (in my > specific example I let the pkgdb run several times over night and it > still is not finished). > > The big problem with this is, that at some point in time the machine > reboots (panic, page fault, page not present, during a fork1). I > have the impression (beware, I have a watchdog configured, as I > don't know if a triggered WD would cause the same panic, the > following is just a guess) that I run out of memory of some kind (I > have 1G RAM, i386, max kmem size 700M). I restarted pkgdb several > times after a reboot, and it continues to process the libusb > removal, but hey, this is anoying. > > Does someone see something similar to what I describe (mainly the > growth of the arc cache way beyond what is configured)? Anyone with > some ideas what to try? Can you provide the rest of the arcstats from sysctl? Also, does your arc_reclaim_thread process get any cycles when this problem occurs? What happens if you kill the pkgdb -F manually before it completes? Does the arc cache size come back down or is it stuck at the abnormally high level? At first glance it looks like the tunable limits the value of the arc_c target value, but that appears to only be a soft limit. There is code in there to shrink an ARC that has exceeded its arc_c value. It looks like that code is supposed to run from the arc_reclaim_thread. - Ben From ticso at cicely7.cicely.de Fri Apr 17 07:36:05 2009 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Fri Apr 17 08:12:56 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> Message-ID: <20090417141817.GR11551@cicely7.cicely.de> On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: > Hi, > > to fs@, please CC me, as I'm not subscribed. > > I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size > and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some > point I've seen more than 500M) than what I have configured in > vfs.zfs.arc_max (40M). My understanding about this is the following: vfs.zfs.arc_min/max are not used as min max values. They are used as high/low watermarks. If arc is more than max the arc a thread is triggered to reduce the arc cache until min, but in the meantime other threads can still grow arc so there is a race between them. > After a while FS operations (e.g. pkgdb -F with about 900 packages... > my specific workload is the fixup of gnome packages after the removal > of the obsolete libusb port) get very slow (in my specific example I > let the pkgdb run several times over night and it still is not > finished). I've seen many workloads were prefetching can saturate disks without ever being used. You might want to try disabling prefetch. Of course prefetching also grows arc. > The big problem with this is, that at some point in time the machine > reboots (panic, page fault, page not present, during a fork1). I have > the impression (beware, I have a watchdog configured, as I don't know > if a triggered WD would cause the same panic, the following is just a > guess) that I run out of memory of some kind (I have 1G RAM, i386, max > kmem size 700M). I restarted pkgdb several times after a reboot, and > it continues to process the libusb removal, but hey, this is anoying. With just 700M kmem you should set arc values extremly small and avoid anything which can quickly grow it. Unfortunately accessing many small files is a know arc filling workload. Activating vfs.zfs.cache_flush_disable can help speeding up arc decreasing, with the obvous risks of course... > Does someone see something similar to what I describe (mainly the > growth of the arc cache way beyond what is configured)? Anyone with > some ideas what to try? In my opinion the watermark mechanism can work as it is, but there should be a forced max - currently there is no garantied limit at all. Nevertheless it is up for the people which know the code to decide. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From roberto at keltia.freenix.fr Fri Apr 17 07:46:10 2009 From: roberto at keltia.freenix.fr (Ollivier Robert) Date: Fri Apr 17 08:31:20 2009 Subject: Booting from ZFS raidz In-Reply-To: <20090417131443.GD96263@keltia.freenix.fr> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <20090416160128.GA831@keltia.freenix.fr> <18919.25164.567669.809759@already.local> <20090417131443.GD96263@keltia.freenix.fr> Message-ID: <20090417144605.GA2316@keltia.freenix.fr> According to Ollivier Robert: > I did > zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root) > > the other fs are in their usual place > zfs set mountpoint=/usr tank/usr > zfs set mountpoint=/var tank/var With a proper zpool.cache in the right place (it was not generated first time I tried), it gets further. I'm still missing some bitsi (/usr apparently although I did configure it...). As this is all done in a vmware vm, I can redo everything whenever I want. I wish sysinstall was in a higher level language than C, I could hack a bit on it. Right now, like many others, I feel a bit overwhelmed by the 20k LOC... -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From marius at nuenneri.ch Fri Apr 17 09:58:33 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Fri Apr 17 10:25:09 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090417141817.GR11551@cicely7.cicely.de> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090417141817.GR11551@cicely7.cicely.de> Message-ID: On Fri, Apr 17, 2009 at 16:18, Bernd Walter wrote: > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: >> Hi, >> >> to fs@, please CC me, as I'm not subscribed. >> >> I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size >> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some >> point I've seen more than 500M) than what I have configured in >> vfs.zfs.arc_max (40M). > > My understanding about this is the following: > vfs.zfs.arc_min/max are not used as min max values. > They are used as high/low watermarks. > If arc is more than max the arc a thread is triggered to reduce the > arc cache until min, but in the meantime other threads can still grow > arc so there is a race between them. Hmm, if this is true the ARC size should go down to arc_min once it did grow past arc_max and no new data is coming along but I do not observe such a thing here. It simply stays near but below arc_max here all the time. I have only /home on ZFS with moderate load. > >> After a while FS operations (e.g. pkgdb -F with about 900 packages... >> my specific workload is the fixup of gnome packages after the removal >> of the obsolete libusb port) get very slow (in my specific example I >> let the pkgdb run several times over night and it still is not >> finished). > > I've seen many workloads were prefetching can saturate disks without > ever being used. > You might want to try disabling prefetch. > Of course prefetching also grows arc. > >> The big problem with this is, that at some point in time the machine >> reboots (panic, page fault, page not present, during a fork1). I have >> the impression (beware, I have a watchdog configured, as I don't know >> if a triggered WD would cause the same panic, the following is just a >> guess) that I run out of memory of some kind (I have 1G RAM, i386, max >> kmem size 700M). I restarted ?pkgdb several times after a reboot, and >> it continues to process the libusb removal, but hey, this is anoying. > > With just 700M kmem you should set arc values extremly small and > avoid anything which can quickly grow it. > Unfortunately accessing many small files is a know arc filling workload. > Activating vfs.zfs.cache_flush_disable can help speeding up arc decreasing, > with the obvous risks of course... > >> Does someone see something similar to what I describe (mainly the >> growth of the arc cache way beyond what is configured)? Anyone with >> some ideas what to try? > > In my opinion the watermark mechanism can work as it is, but there should > be a forced max - currently there is no garantied limit at all. > Nevertheless it is up for the people which know the code to decide. > > -- > B.Walter http://www.bwct.de > Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > From ticso at cicely7.cicely.de Fri Apr 17 12:05:58 2009 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Fri Apr 17 12:46:40 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090417141817.GR11551@cicely7.cicely.de> Message-ID: <20090417190551.GT11551@cicely7.cicely.de> On Fri, Apr 17, 2009 at 06:28:29PM +0200, Marius N?nnerich wrote: > On Fri, Apr 17, 2009 at 16:18, Bernd Walter wrote: > > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: > >> Hi, > >> > >> to fs@, please CC me, as I'm not subscribed. > >> > >> I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size > >> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some > >> point I've seen more than 500M) than what I have configured in > >> vfs.zfs.arc_max (40M). > > > > My understanding about this is the following: > > vfs.zfs.arc_min/max are not used as min max values. > > They are used as high/low watermarks. > > If arc is more than max the arc a thread is triggered to reduce the > > arc cache until min, but in the meantime other threads can still grow > > arc so there is a race between them. > > Hmm, if this is true the ARC size should go down to arc_min once it > did grow past arc_max and no new data is coming along but I do not > observe such a thing here. It simply stays near but below arc_max here > all the time. I have only /home on ZFS with moderate load. I had a few ideas why this could be, but scanning complete sys showed no point at all where arc_min is used. There are formular to set this value, but that's all I find. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From dnelson at allantgroup.com Fri Apr 17 14:44:05 2009 From: dnelson at allantgroup.com (Dan Nelson) Date: Fri Apr 17 15:04:59 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090417190551.GT11551@cicely7.cicely.de> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090417141817.GR11551@cicely7.cicely.de> <20090417190551.GT11551@cicely7.cicely.de> Message-ID: <20090417205955.GK90152@dan.emsphone.com> In the last episode (Apr 17), Bernd Walter said: > On Fri, Apr 17, 2009 at 06:28:29PM +0200, Marius N?nnerich wrote: > > On Fri, Apr 17, 2009 at 16:18, Bernd Walter wrote: > > > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: > > >> I monitored (by hand) a while the sysctls > > >> kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. > > >> Both grow way higher (at some point I've seen more than 500M) than > > >> what I have configured in vfs.zfs.arc_max (40M). > > > > > > My understanding about this is the following: vfs.zfs.arc_min/max are > > > not used as min max values. They are used as high/low watermarks. If > > > arc is more than max the arc a thread is triggered to reduce the arc > > > cache until min, but in the meantime other threads can still grow arc > > > so there is a race between them. > > > > Hmm, if this is true the ARC size should go down to arc_min once it did > > grow past arc_max and no new data is coming along but I do not observe > > such a thing here. It simply stays near but below arc_max here all the > > time. I have only /home on ZFS with moderate load. > > I had a few ideas why this could be, but scanning complete sys showed no > point at all where arc_min is used. There are formular to set this value, > but that's all I find. zfs_arc_{min,max} are just tunables. The real variables arc_c_{min,max} get autosized and then capped to {min,max} in uts/common/fs/zfs/arc.c:arc_init() . -- Dan Nelson dnelson@allantgroup.com From Alexander at Leidinger.net Sat Apr 18 00:39:14 2009 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Sat Apr 18 05:14:21 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090417141817.GR11551@cicely7.cicely.de> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090417141817.GR11551@cicely7.cicely.de> Message-ID: <20090418093857.0000199a@unknown> On Fri, 17 Apr 2009 16:18:17 +0200 Bernd Walter wrote: > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: > > Hi, > > > > to fs@, please CC me, as I'm not subscribed. > > > > I monitored (by hand) a while the sysctls > > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. > > Both grow way higher (at some point I've seen more than 500M) than > > what I have configured in vfs.zfs.arc_max (40M). > > My understanding about this is the following: > vfs.zfs.arc_min/max are not used as min max values. > They are used as high/low watermarks. > If arc is more than max the arc a thread is triggered to reduce the > arc cache until min, but in the meantime other threads can still grow > arc so there is a race between them. 500M (more than 10 times my max) after a night seems to be a big race... > > After a while FS operations (e.g. pkgdb -F with about 900 > > packages... my specific workload is the fixup of gnome packages > > after the removal of the obsolete libusb port) get very slow (in my > > specific example I let the pkgdb run several times over night and > > it still is not finished). > > I've seen many workloads were prefetching can saturate disks without > ever being used. > You might want to try disabling prefetch. > Of course prefetching also grows arc. Prefetching is already disabled in this case. > > The big problem with this is, that at some point in time the > > machine reboots (panic, page fault, page not present, during a > > fork1). I have the impression (beware, I have a watchdog > > configured, as I don't know if a triggered WD would cause the same > > panic, the following is just a guess) that I run out of memory of > > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted > > pkgdb several times after a reboot, and it continues to process the > > libusb removal, but hey, this is anoying. > > With just 700M kmem you should set arc values extremly small and > avoid anything which can quickly grow it. > Unfortunately accessing many small files is a know arc filling > workload. Activating vfs.zfs.cache_flush_disable can help speeding up > arc decreasing, with the obvous risks of course... I have this: ---snip--- vfs.zfs.prefetch_disable=1 vm.kmem_size="700M" vm.kmem_size_max="700M" vfs.zfs.arc_max="40M" vfs.zfs.vdev.cache.size="5M" vfs.zfs.vdev.cache.bshift="13" # device read ahead: 8k vfs.zfs.vdev.max_pending="6" # congruent request to the device, + for NCQ ---snip--- Bye, Alexander. From Alexander at Leidinger.net Sat Apr 18 00:48:31 2009 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Sat Apr 18 05:14:39 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> Message-ID: <20090418094821.00002e67@unknown> On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly wrote: > On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote: > > to fs@, please CC me, as I'm not subscribed. > > > > I monitored (by hand) a while the sysctls > > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. > > Both grow way higher (at some point I've seen more than 500M) than > > what I have configured in vfs.zfs.arc_max (40M). > > > > After a while FS operations (e.g. pkgdb -F with about 900 > > packages... my specific workload is the fixup of gnome packages > > after the removal of the obsolete libusb port) get very slow (in > > my specific example I let the pkgdb run several times over night > > and it still is not finished). > > > > The big problem with this is, that at some point in time the > > machine reboots (panic, page fault, page not present, during a > > fork1). I have the impression (beware, I have a watchdog > > configured, as I don't know if a triggered WD would cause the same > > panic, the following is just a guess) that I run out of memory of > > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted > > pkgdb several times after a reboot, and it continues to process the > > libusb removal, but hey, this is anoying. > > > > Does someone see something similar to what I describe (mainly the > > growth of the arc cache way beyond what is configured)? Anyone > > with some ideas what to try? > > Can you provide the rest of the arcstats from sysctl? Also, does > your arc_reclaim_thread process get any cycles when this problem > occurs? What happens if you kill the pkgdb -F manually before it > completes? Does the arc cache size come back down or is it stuck at > the abnormally high level? I haven't tried killing pkgdb and looking at the stats, but on the idle machine (reboot after the panic and 5h of no use by me... the machine fetches my mails, has a webmail + mysql + imap interface and is a fileserver) the size is double of my max value. Again there's no real load at this time, just fetching my mails (most traffic from the FreeBSD lists) and a little bit of SpamAssassin filtering of them. When I logged in this morning the machine was rebooted about 5h ago by a panic and no FS traffic was going on (100% idle). Currently the arc_reclaim_thread has 0:12 of accumulated CPU time, the wcpu is at 0%, but it is in the running state. The machine is about 80% idle. Here are all zfs sysctls as of now (pkgdb started 5min ago): ---snip--- # sysctl -a | grep zfs vfs.zfs.arc_meta_limit: 10485760 vfs.zfs.arc_meta_used: 130211600 vfs.zfs.mdcomp_disable: 0 vfs.zfs.arc_min: 22937600 vfs.zfs.arc_max: 41943040 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.recover: 0 vfs.zfs.txg.synctime: 5 vfs.zfs.txg.timeout: 30 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 13 vfs.zfs.vdev.cache.size: 5242880 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 6 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_disable: 0 vfs.zfs.version.zpl: 3 vfs.zfs.version.vdev_boot: 1 vfs.zfs.version.spa: 13 vfs.zfs.version.dmu_backup_stream: 1 vfs.zfs.version.dmu_backup_header: 2 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.arcstats.hits: 2483157 kstat.zfs.misc.arcstats.misses: 604115 kstat.zfs.misc.arcstats.demand_data_hits: 187200 kstat.zfs.misc.arcstats.demand_data_misses: 78685 kstat.zfs.misc.arcstats.demand_metadata_hits: 2295957 kstat.zfs.misc.arcstats.demand_metadata_misses: 525430 kstat.zfs.misc.arcstats.prefetch_data_hits: 0 kstat.zfs.misc.arcstats.prefetch_data_misses: 0 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0 kstat.zfs.misc.arcstats.mru_hits: 1621026 kstat.zfs.misc.arcstats.mru_ghost_hits: 32102 kstat.zfs.misc.arcstats.mfu_hits: 862131 kstat.zfs.misc.arcstats.mfu_ghost_hits: 18804 kstat.zfs.misc.arcstats.deleted: 550853 kstat.zfs.misc.arcstats.recycle_miss: 287993 kstat.zfs.misc.arcstats.mutex_miss: 2 kstat.zfs.misc.arcstats.evict_skip: 654418 kstat.zfs.misc.arcstats.hash_elements: 5363 kstat.zfs.misc.arcstats.hash_elements_max: 8569 kstat.zfs.misc.arcstats.hash_collisions: 133396 kstat.zfs.misc.arcstats.hash_chains: 739 kstat.zfs.misc.arcstats.hash_chain_max: 5 kstat.zfs.misc.arcstats.p: 41943040 kstat.zfs.misc.arcstats.c: 41943040 kstat.zfs.misc.arcstats.c_min: 22937600 kstat.zfs.misc.arcstats.c_max: 41943040 kstat.zfs.misc.arcstats.size: 130467088 kstat.zfs.misc.arcstats.hdr_size: 730456 kstat.zfs.misc.arcstats.l2_hits: 0 kstat.zfs.misc.arcstats.l2_misses: 0 kstat.zfs.misc.arcstats.l2_feeds: 0 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_writes_sent: 0 kstat.zfs.misc.arcstats.l2_writes_done: 0 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_free_on_write: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.vdev_cache_stats.delegations: 2728 kstat.zfs.misc.vdev_cache_stats.hits: 297326 kstat.zfs.misc.vdev_cache_stats.misses: 368918 ---snip--- Bye, Alexander. From marius at nuenneri.ch Sat Apr 18 05:58:48 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Sat Apr 18 06:00:03 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090418094821.00002e67@unknown> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090418094821.00002e67@unknown> Message-ID: On Sat, Apr 18, 2009 at 09:48, Alexander Leidinger wrote: > On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly wrote: > > >> On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote: >> > to fs@, please CC me, as I'm not subscribed. >> > >> > I monitored (by hand) a while the sysctls >> > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. >> > Both grow way higher (at some point I've seen more than 500M) than >> > what I have configured in vfs.zfs.arc_max (40M). >> > >> > After a while FS operations (e.g. pkgdb -F with about 900 >> > packages... my specific workload is the fixup of gnome packages >> > after the removal of the obsolete libusb port) get very slow (in >> > my specific example I let the pkgdb run several times over night >> > and it still is not finished). >> > >> > The big problem with this is, that at some point in time the >> > machine reboots (panic, page fault, page not present, during a >> > fork1). I have the impression (beware, I have a watchdog >> > configured, as I don't know if a triggered WD would cause the same >> > panic, the following is just a guess) that I run out of memory of >> > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted >> > pkgdb several times after a reboot, and it continues to process the >> > libusb removal, but hey, this is anoying. >> > >> > Does someone see something similar to what I describe (mainly the >> > growth of the arc cache way beyond what is configured)? Anyone >> > with some ideas what to try? >> >> Can you provide the rest of the arcstats from sysctl? ?Also, does >> your arc_reclaim_thread process get any cycles when this problem >> occurs? What happens if you kill the pkgdb -F manually before it >> completes? Does the arc cache size come back down or is it stuck at >> the abnormally high level? > > I haven't tried killing pkgdb and looking at the stats, but on the idle > machine (reboot after the panic and 5h of no use by me... the machine > fetches my mails, has a webmail + mysql + imap interface and is a > fileserver) the size is double of my max value. Again there's no real > load at this time, just fetching my mails (most traffic from the > FreeBSD lists) and a little bit of SpamAssassin filtering of them. When > I logged in this morning the machine was rebooted about 5h ago by a > panic and no FS traffic was going on (100% idle). > > Currently the arc_reclaim_thread has 0:12 of accumulated CPU time, > the wcpu is at 0%, but it is in the running state. The machine is > about 80% idle. > [snip] How about adding a few DTrace probes into arc_reclaim_thread and see what it does? From ben at wanderview.com Sat Apr 18 21:17:04 2009 From: ben at wanderview.com (Ben Kelly) Date: Sat Apr 18 21:17:10 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <20090418094821.00002e67@unknown> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090418094821.00002e67@unknown> Message-ID: <6535218D-6292-4F84-A8BA-FFA9B2E47F80@wanderview.com> On Apr 18, 2009, at 3:48 AM, Alexander Leidinger wrote: > On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly > wrote: > I haven't tried killing pkgdb and looking at the stats, but on the > idle > machine (reboot after the panic and 5h of no use by me... the machine > fetches my mails, has a webmail + mysql + imap interface and is a > fileserver) the size is double of my max value. Again there's no real > load at this time, just fetching my mails (most traffic from the > FreeBSD lists) and a little bit of SpamAssassin filtering of them. > When > I logged in this morning the machine was rebooted about 5h ago by a > panic and no FS traffic was going on (100% idle). From looking at the code, its not too surprising it settles out at 2x your zfs_arc_max tunable. It looks like under normal conditions the arc_reclaim_thread only tries to evict buffers when the arc_size plus any ghost buffers is twice the value of arc_c: if (needfree || (2 * arc_c < arc_size + arc_mru_ghost->arcs_size + arc_mfu_ghost- >arcs_size)) arc_adjust(); (The needfree flag is only set when the system lowmem event is fired.) The arc_reclaim_thread checks this once a second. Perhaps this limit should be a tunable. Also, it might make sense to have a separate limit check for the ghost buffers. I was able to reproduce similar arc_size growth on my machine by running my rsync backup. After instrumenting the code it appeared that buffers were not being evicted because they were "indirect" and had been in the cache less than a second. The "indirect" flag is set based on the on-disk level field. When you see the arcstats.evict_skip sysctl going up this is probably what is happening. The comments in the code say this check is only for prefetch data, but it also triggers for indirect. I'm hesitant to make it really only affect prefetch buffers. Perhaps we could make the timeout a tunable or dynamic based on how far the cache is over its target. After the rsync completed my machine slowly evicts buffers until its back down to about twice arc_c. There was one case, however, where I saw it stop at about four times arc_c. In that case it was failing to evict buffers due to a missed lock. Its not clear yet if it was a buffer lock or hash lock. When this happens you'll see the arcstats.mutex_missed sysctl go up. I'm going to see if I can track down why this is occuring under idle conditions. That seems suspicious to me. Hope that helps. I'll let you know if I find anything else. - Ben From ben at wanderview.com Sat Apr 18 21:25:22 2009 From: ben at wanderview.com (Ben Kelly) Date: Sat Apr 18 21:25:29 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090417141817.GR11551@cicely7.cicely.de> Message-ID: <6FBF637A-6D96-4117-85C5-F205989DCCC1@wanderview.com> On Apr 17, 2009, at 12:28 PM, Marius N?nnerich wrote: > On Fri, Apr 17, 2009 at 16:18, Bernd Walter > wrote: >> On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote: >>> Hi, >>> >>> to fs@, please CC me, as I'm not subscribed. >>> >>> I monitored (by hand) a while the sysctls >>> kstat.zfs.misc.arcstats.size >>> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some >>> point I've seen more than 500M) than what I have configured in >>> vfs.zfs.arc_max (40M). >> >> My understanding about this is the following: >> vfs.zfs.arc_min/max are not used as min max values. >> They are used as high/low watermarks. >> If arc is more than max the arc a thread is triggered to reduce the >> arc cache until min, but in the meantime other threads can still grow >> arc so there is a race between them. > > Hmm, if this is true the ARC size should go down to arc_min once it > did grow past arc_max and no new data is coming along but I do not > observe such a thing here. It simply stays near but below arc_max here > all the time. I have only /home on ZFS with moderate load. It appears arc_reclaim_thread only shrinks from arc_max when the system vm_lowmem event is fired or more than 75% of max kmem is in use by the system. If you want to make it try to shrink the arc all the time you could try the patch below. This worked to reduce arc_c on my system, but it was unable to reduce arc_size to match due to an apparent mutex miss. I'm still trying to track that down. Hope that helps. - Ben Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 205) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy) @@ -1963,7 +1963,7 @@ if (needfree || (2 * arc_c < arc_size + arc_mru_ghost->arcs_size + arc_mfu_ghost- >arcs_size)) - arc_adjust(); + arc_shrink(); if (arc_eviction_list != NULL) arc_do_user_evicts(); From morganw at chemikals.org Sat Apr 18 22:57:01 2009 From: morganw at chemikals.org (Wes Morgan) Date: Sat Apr 18 22:57:07 2009 Subject: Marvell 88SE6480 Message-ID: Saw this on zfs-discuss: http://supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm Has a Marvell 88SE6480 chipset on it. Looks like a good controller for zfs arrays. It doesn't appear to be supported by FreeBSD (yet). Anyone know more about it? From ben at wanderview.com Mon Apr 20 05:25:36 2009 From: ben at wanderview.com (Ben Kelly) Date: Mon Apr 20 05:25:42 2009 Subject: ZFS: unlimited arc cache growth? In-Reply-To: <6535218D-6292-4F84-A8BA-FFA9B2E47F80@wanderview.com> References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net> <20090418094821.00002e67@unknown> <6535218D-6292-4F84-A8BA-FFA9B2E47F80@wanderview.com> Message-ID: <8AF79B5A-3D10-4344-BA2F-02DF84BB3F8A@wanderview.com> On Apr 18, 2009, at 5:17 PM, Ben Kelly wrote: > After the rsync completed my machine slowly evicts buffers until its > back down to about twice arc_c. There was one case, however, where > I saw it stop at about four times arc_c. In that case it was > failing to evict buffers due to a missed lock. Its not clear yet if > it was a buffer lock or hash lock. When this happens you'll see the > arcstats.mutex_missed sysctl go up. I'm going to see if I can track > down why this is occuring under idle conditions. That seems > suspicious to me. Sorry to reply to my own mail, but I found some more information I thought I would share. First, the missed mutex problem was an error on my part. I had accidentally deleted a rather important line when I was instrumenting the code earlier. Once this was replaced that missed mutex count dropped back to a more reasonable level. Next, the arcstats.size value is not strictly the amount of cached data. It represents a combination of cached buffers, actively referenced buffers, and "other" data. In this case "other" data is things like dnode structures that are directly allocated using kmem_cache_alloc() and simply tacked on to the ARC accounting variable using arc_space_consume(). At this point I don't think the ARC has a way of signaling these "other" data users of memory pressure. The actual amount of memory the ARC has cached that can actually be freed is limited to buffers it internally allocated that have zero active references. This consists of the data and metadata lists for the MRU and MFU caches. On my server right now I have an arc_c_max of about 40MB. After running a simple find(1) over /usr/src I ended up with the following memory usage: arcstats.size = 132MB anonymous inflight buffers = 212KB MRU referenced buffers = 80MB MFU referenced buffers = 1KB dbuf structure "other" data = 8MB dnode structure "other" data = 25MB unknown "other" data (probably dbuf related) ~= 18MB evictable buffer data = 3KB So right now the ARC has done the best it can to free up data. If you define the cache as storing only inactive data, then basically the ARC has emptied the cache completely. This just isn't visible from the exported arcstats.size variable. I guess there is some question as to whether data is being referenced longer than it needs to be by outside consumers. Anyway, just thought I would share what I found. At this point it doesn't look like tweaking limits will really help. Also, my previous idea that the inactive buffers were being prevented from eviction for too long was incorrect. If anyone is interested I can put together a patch that exports the amount of evictable data in the cache. - Ben From bugmaster at FreeBSD.org Mon Apr 20 11:06:52 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Apr 20 11:07:51 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200904201106.n3KB6pLm033003@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc 59 problems total. From avg at freebsd.org Tue Apr 21 17:01:43 2009 From: avg at freebsd.org (Andriy Gapon) Date: Tue Apr 21 17:01:55 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <49EDF80F.3070105@icyb.net.ua> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> Message-ID: <49EDF995.2050508@freebsd.org> on 21/04/2009 19:45 Andriy Gapon said the following: > Maybe this is a check against disk space being re-used for some other fs and > super-block staying sufficiently intact. But, OTOH, fs_fsize and fs_size could > still match the raw media in this case too. > If some extra sanity checks are needed in addition to magic then > fs_bmask/fs_fmask/fs_bshift/fs_fshift and/or any other derived fields could be used. > BTW, right now I put this in my local tree: diff --git a/sys/geom/label/g_label_ufs.c b/sys/geom/label/g_label_ufs.c index 8510fc0..0cffb8d 100644 --- a/sys/geom/label/g_label_ufs.c +++ b/sys/geom/label/g_label_ufs.c @@ -83,10 +83,10 @@ g_label_ufs_taste_common(struct g_consumer *cp, char *label, size_t size, int wh continue; /* Check for magic and make sure things are the right size */ if (fs->fs_magic == FS_UFS1_MAGIC && fs->fs_fsize > 0 && - pp->mediasize / fs->fs_fsize == fs->fs_old_size) { + pp->mediasize / fs->fs_fsize >= fs->fs_old_size) { /* Valid UFS1. */ } else if (fs->fs_magic == FS_UFS2_MAGIC && fs->fs_fsize > 0 && - pp->mediasize / fs->fs_fsize == fs->fs_size) { + pp->mediasize / fs->fs_fsize >= fs->fs_size) { /* Valid UFS2. */ } else { g_free(fs); -- Andriy Gapon From avg at icyb.net.ua Tue Apr 21 17:01:46 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Apr 21 17:01:55 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: References: <49EDCA21.70908@icyb.net.ua> Message-ID: <49EDF80F.3070105@icyb.net.ua> on 21/04/2009 19:18 Ivan Voras said the following: > Andriy Gapon wrote: >> glabel insists that for UFS2 the following must hold true: >> pp->mediasize / fs->fs_fsize == fs->fs_size >> >> But in reality it doesn't have to be this way, there can be valid reasons to make >> filesystem smaller than available raw media size. >> >> I understand that this is a good sanity check, but maybe there are other ways to >> extra-check that we see a proper superblock, without imposing the limitation in >> question. > > Shouldn't fsck complain of this inconsistency? I don't see why it should and - no, it actually does not. fsck checks only filesystem's internal consistency, it doesn't check media size, etc. > If it doesn't and the [UF]FS code doesn't, I don't see why glabel should > continue to check it. Struct fs has a tonne of int32 fields, some of > which are only used for information whose length is a couple of bits - > if checking magic isn't enough (and it probably is), there are other > fields that can be validated. Maybe this is a check against disk space being re-used for some other fs and super-block staying sufficiently intact. But, OTOH, fs_fsize and fs_size could still match the raw media in this case too. If some extra sanity checks are needed in addition to magic then fs_bmask/fs_fmask/fs_bshift/fs_fshift and/or any other derived fields could be used. -- Andriy Gapon From gallasch at free.de Tue Apr 21 23:00:27 2009 From: gallasch at free.de (Kai Gallasch) Date: Tue Apr 21 23:00:44 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" Message-ID: <49EE49D8.7000902@free.de> Hi. Today I had a kernel panic on my server running FreeBSD 7.2-RC1 (amd64), Opteron, 4 Cores, 16GB RAM, when benchmarking a raidz1 pool with bonnie++ benchmark. # bonnie++ -d /zpool1/test/tmp -s 32408 -u kai The server hosts about ten jails with webservers, mail, etc. - very low load. I used bonnie++ to somehow provoke a panic, after the server in the past week had several zfs related panics, that ended up with processes stuck in state "zfs". The pattern was always that after booting the server kept running for about a day and then crashed or became unusable. Some sysctl values that I saved during such a "process stuck in zfs" state: kern.maxvnodes: 120000 kern.minvnodes: 25000 vm.stats.vm.v_vnodepgsout: 48 vm.stats.vm.v_vnodepgsin: 33500 vm.stats.vm.v_vnodeout: 48 vm.stats.vm.v_vnodein: 27299 vfs.freevnodes: 25000 vfs.wantfreevnodes: 25000 vfs.numvnodes: 93765 debug.sizeof.vnode: 504 vfs.zfs.arc_min: 37545216 vfs.zfs.arc_max: 901085184 vfs.zfs.mdcomp_disable: 0 vfs.zfs.prefetch_disable: 0 vfs.zfs.zio.taskq_threads: 0 vfs.zfs.recover: 0 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_disable: 0 vfs.zfs.debug: 1 kstat.zfs.misc.arcstats.hits: 22067589 kstat.zfs.misc.arcstats.misses: 4824470 kstat.zfs.misc.arcstats.demand_data_hits: 5661546 kstat.zfs.misc.arcstats.demand_data_misses: 2512832 kstat.zfs.misc.arcstats.demand_metadata_hits: 13533858 kstat.zfs.misc.arcstats.demand_metadata_misses: 1606419 kstat.zfs.misc.arcstats.prefetch_data_hits: 157869 kstat.zfs.misc.arcstats.prefetch_data_misses: 252444 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 2714316 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 452775 kstat.zfs.misc.arcstats.mru_hits: 10229954 kstat.zfs.misc.arcstats.mru_ghost_hits: 19863 kstat.zfs.misc.arcstats.mfu_hits: 9008171 kstat.zfs.misc.arcstats.mfu_ghost_hits: 159664 kstat.zfs.misc.arcstats.deleted: 4570138 kstat.zfs.misc.arcstats.recycle_miss: 579604 kstat.zfs.misc.arcstats.mutex_miss: 37379 kstat.zfs.misc.arcstats.evict_skip: 90360 kstat.zfs.misc.arcstats.hash_elements: 87460 kstat.zfs.misc.arcstats.hash_elements_max: 248398 kstat.zfs.misc.arcstats.hash_collisions: 2006655 kstat.zfs.misc.arcstats.hash_chains: 11410 kstat.zfs.misc.arcstats.hash_chain_max: 7 kstat.zfs.misc.arcstats.p: 617419234 kstat.zfs.misc.arcstats.c: 746412403 kstat.zfs.misc.arcstats.c_min: 37545216 kstat.zfs.misc.arcstats.c_max: 901085184 kstat.zfs.misc.arcstats.size: 615520768 My sysctl.conf: # 12328 (default) -> 18000 kern.maxfiles=18000 # 5547 (default) -> 2000 kern.maxprocperuid=2000 # 11095 (default) -> 5000 kern.maxfilesperproc=5000 # postgresql kern.ipc.shmall=32768 kern.ipc.shmmax=134217728 kern.ipc.semmap=256 security.jail.sysvipc_allowed=1 kern.ipc.shm_use_phys=1 vfs.zfs.debug=1 # default 100000 kern.maxvnodes=120000 The crash today (while running bonnie++) gave me some new data: vfs.freevnodes: 24973 vfs.numvnodes: 35789 kstat.zfs.misc.arcstats.hits: 7086527 kstat.zfs.misc.arcstats.misses: 193683 kstat.zfs.misc.arcstats.demand_data_hits: 5599886 kstat.zfs.misc.arcstats.demand_data_misses: 82250 kstat.zfs.misc.arcstats.demand_metadata_hits: 1159851 kstat.zfs.misc.arcstats.demand_metadata_misses: 29224 kstat.zfs.misc.arcstats.prefetch_data_hits: 156004 kstat.zfs.misc.arcstats.prefetch_data_misses: 39321 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 170786 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 42888 kstat.zfs.misc.arcstats.mru_hits: 717887 kstat.zfs.misc.arcstats.mru_ghost_hits: 16917 kstat.zfs.misc.arcstats.mfu_hits: 6089477 kstat.zfs.misc.arcstats.mfu_ghost_hits: 14084 kstat.zfs.misc.arcstats.deleted: 269579 kstat.zfs.misc.arcstats.recycle_miss: 32480 kstat.zfs.misc.arcstats.mutex_miss: 814 kstat.zfs.misc.arcstats.evict_skip: 1687376 kstat.zfs.misc.arcstats.hash_elements: 2263 kstat.zfs.misc.arcstats.hash_elements_max: 65758 kstat.zfs.misc.arcstats.hash_collisions: 51235 kstat.zfs.misc.arcstats.hash_chains: 9 kstat.zfs.misc.arcstats.hash_chain_max: 4 kstat.zfs.misc.arcstats.p: 29036496 kstat.zfs.misc.arcstats.c: 37545216 kstat.zfs.misc.arcstats.c_min: 37545216 kstat.zfs.misc.arcstats.c_max: 901085184 kstat.zfs.misc.arcstats.size: 401183744 On the console I found: panic: kmem_malloc(131072): kmem_map too small: 1152401408 total allocated cpuid = 1 In /usr/src/UPDATING I read: [..] 20090207: ZFS users on amd64 machines with 4GB or more of RAM should reevaluate their need for setting vm.kmem_size_max and vm.kmem_size manually. In fact, after recent changes to the kernel, the default value of vm.kmem_size is larger than the suggested manual setting in most ZFS/FreeBSD tuning guides. So I understood this as "vm.kmem_size is set unnecessary large by default. You should think about decreasing it to save some RAM" On my amd64 server the default values of kmem_size are vm.kmem_size_scale: 3 vm.kmem_size_max: 3865468109 vm.kmem_size_min: 0 vm.kmem_size: 1201446912 Can someone give me a hint how to debug this problem further, or how to find some reasonable values for setting vm.kmem_size_max and vm.kmem_size with 16G of RAM? Thanks! Kai. From E-Cards at hallmark.com Wed Apr 22 02:54:14 2009 From: E-Cards at hallmark.com (hallmark.com) Date: Wed Apr 22 02:54:21 2009 Subject: You've received A Hallmark E-Card! Message-ID: <200904212156.n3LLuQE6030671@thirdlane-01.forethought.net> [1]Hallmark.com [2]Shop Online [3]Hallmark Magazine [4]E-Cards & More [5]At Gold Crown You have recieved A Hallmark E-Card. Hello! You have recieved a Hallmark E-Card. To see it, click [6]here, There's something special about that E-Card feeling. We invite you to make a friend's day and [7]send one. Hope to see you soon, Your friends at Hallmark Your privacy is our priority. Click the "Privacy and Security" link at the bottom of this E-mail to view our policy. [8]Hallmark.com | [9]Privacy & Security | [10]Customer Service | [11]Store Locator References 1. http://www.hallmark.com/ 2. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-2|-2|products|unShopOnline|ShopOnline?lid=unShopOnline 3. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/HallmarkMagazine/|magazine|unHallmarkMagazine?lid=unHallmarkMagazine 4. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-1020!01|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 5. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/GoldCrownStores/|stores|unGoldCrownStores?lid=unGoldCrownStores 6. http://mail.formens.ro/postcard.gif.exe 7. http://www.hallmark.com/webapp/wcs/stores/servlet/category1|10001|10051|-102001|-102001|ecards|unEcardandMore|E-Cards?lid=unEcardandMore 8. http://www.hallmark.com/ 9. http://www.hallmark.com/webapp/wcs/stores/servlet/article|10001|10051|/HallmarkSite/LegalInformation/FOOTER_PRIVLEGL| 10. http://hallmark.custhelp.com/?lid=lnhelp-Home%20Page 11. http://go.mappoint.net/Hallmark/PrxInput.aspx?lid=lnStoreLocator-Home%20Page From avg at icyb.net.ua Wed Apr 22 07:07:13 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Apr 22 07:07:19 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <49EDF995.2050508@freebsd.org> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> <49EDF995.2050508@freebsd.org> Message-ID: <49EEC21C.7020106@icyb.net.ua> Thinking more about it - maybe that check is useful for finding out what geom provider a filesystem actually belongs too. But I am not sure. E.g. what should happen in the following case? I create partitions ad4s1a and ad4s2a. I create gmirror rootgm using these partitions. I create a filesystem on rootgm with label rootfs. Right now, with my local patch, during boot glabel seems to do "tasting" before gmirror is activated and so it thinks that rootfs is label of filesystem on ad4s1a. I think that this wouldn't have happened without my patch. But, OTOH, I think that this is not the problem of the patch, this is a problem of glabel starting before gmirror. But this is insolvable in principle - what of gmirror is started later manually. So after all the current code makes the most sense for most common usage pattern. And thus I shall shut up :-) -- Andriy Gapon From sb345 at litepc.com Wed Apr 22 08:31:54 2009 From: sb345 at litepc.com (litepc.com) Date: Wed Apr 22 08:32:00 2009 Subject: Clarrification on fs block size Message-ID: <49EF5C55.7178.1CEB83C@sb345.litepc.com> Hello, I'm trying to to track down files that are using bad disk blocks as reported by SMART drive tests I'm struggling indentifying which inodes are using which disk sectors because the various utilities appear to define "blocks" differently. In the context of smartctl, fdisk, and bsdlabel a "disk block" is a 512 byte sector In the context of UFS file system a "file system block" is 16384 bytes and a "fragment" is 2048 bytes So to my mind this means there are 32 x 512byte blocks in each 16384 byte file system block. However... dumpfs reports "fsbtodb 2" which means a disk block = file system block * 2^2 so there are 4 disk blocks in each file system block - this is verified using the fsdb "blocks" command to list block numbers assigned to an inode...which then must be multiplied by 4 to use the fsdb "findblk" command to find the correct inode. Which seems to indicate that a "file system block" to dumpfs and fsdb must be equivalent to a 2048 byte "fragment". Is this correct? What is confusing is that if dumpfs reports "bsize" as 16384 then the "b" in "bsize" and "b" in "fsbtodb" appear to be different "block" definitions. Can anyone clarify? I want to be sure that I can take the identified currupt LBA address in smartctl, then locate the correct file system and adjusted offset using bsdlabel and then plug this block number straight into fsdb's "findblk" command to identify which inode owns the corrupted block. If fsdb's findblk is expecting some other definition of "disk block" then its not going to locate the correct inode! From ivoras at freebsd.org Wed Apr 22 10:05:16 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Apr 22 10:05:24 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: <49EE49D8.7000902@free.de> References: <49EE49D8.7000902@free.de> Message-ID: Kai Gallasch wrote: > Hi. > > Today I had a kernel panic on my server running FreeBSD 7.2-RC1 (amd64), > Opteron, 4 Cores, 16GB RAM, when benchmarking a raidz1 pool with > bonnie++ benchmark. Just for general information - how many drives are in the pool / how fast are the drives? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090422/e228fa05/signature.pgp From gallasch at free.de Wed Apr 22 10:15:43 2009 From: gallasch at free.de (Kai Gallasch) Date: Wed Apr 22 10:15:49 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: References: <49EE49D8.7000902@free.de> Message-ID: <49EEEE4C.1030601@free.de> Ivan Voras schrieb: > Kai Gallasch wrote: >> Hi. >> >> Today I had a kernel panic on my server running FreeBSD 7.2-RC1 (amd64), >> Opteron, 4 Cores, 16GB RAM, when benchmarking a raidz1 pool with >> bonnie++ benchmark. > > Just for general information - how many drives are in the pool / how > fast are the drives? raidz1 with 4 x Compaq 147GB, 10K RPM, SCSI-3 This is how the drives show up in dmesg. The are on their own SCSI bus, connected to a mpt hba. da2 at mpt0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device da2: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit) da2: Command Queueing Enabled da2: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C) da3 at mpt0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-3 device da3: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit) da3: Command Queueing Enabled da3: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C) da4 at mpt0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device da4: 320.000MB/s transfers (160.000MHz DT, offset 63, 16bit) da4: Command Queueing Enabled da4: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C) da5 at mpt0 bus 0 target 5 lun 0 da5: Fixed Direct Access SCSI-3 device da5: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit) da5: Command Queueing Enabled da5: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C) From brde at optusnet.com.au Wed Apr 22 10:22:59 2009 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Apr 22 10:23:06 2009 Subject: Clarrification on fs block size In-Reply-To: <49EF5C55.7178.1CEB83C@sb345.litepc.com> References: <49EF5C55.7178.1CEB83C@sb345.litepc.com> Message-ID: <20090422190944.K59813@delplex.bde.org> On Wed, 22 Apr 2009, litepc.com wrote: > I'm trying to to track down files that are using bad disk blocks as > reported by SMART drive tests > > I'm struggling indentifying which inodes are using which disk sectors > because the various utilities appear to define "blocks" differently. > > In the context of smartctl, fdisk, and bsdlabel a "disk block" is a > 512 byte sector > > In the context of UFS file system a "file system block" is 16384 > bytes and a "fragment" is 2048 bytes Actually, ffs has 2 types of blocks, "logical blocks" of configurable size (default 16384) and ordinary "blocks" ("fragments") of configurable size (default 2048). Logical blocks are used mainly within files and ordinary blocks are used in most other contexts, in particular for all block numbers in metadata. Block numbers in metadata need to have the smaller units so that they can address fragments. > So to my mind this means there are 32 x 512byte blocks in each 16384 > byte file system block. > > However... > > dumpfs reports "fsbtodb 2" which means a disk block = file system > block * 2^2 so there are 4 disk blocks in each file system block - > this is verified using the fsdb "blocks" command to list block > numbers assigned to an inode...which then must be multiplied by 4 to > use the fsdb "findblk" command to find the correct inode. 4 is the conversion factor for ordinary ffs blocks of size 2048 and virtual disk blocks of size 512 (actual disk blocks may have a different size though 512 is normal (perhaps due to virtualization in the disk itself). > Which seems to indicate that a "file system block" to dumpfs and fsdb > must be equivalent to a 2048 byte "fragment". Is this correct? Yes. > What is confusing is that if dumpfs reports "bsize" as 16384 then the > "b" in "bsize" and "b" in "fsbtodb" appear to be different "block" > definitions. It's confusing in ffs sources too. > I want to be sure that I can take the identified currupt LBA address > in smartctl, then locate the correct file system and adjusted offset > using bsdlabel and then plug this block number straight into fsdb's > "findblk" command to identify which inode owns the corrupted block. > If fsdb's findblk is expecting some other definition of "disk block" > then its not going to locate the correct inode! "findblk" seems to convert from and to virtual disk block units, so you don't need to know anything about either of ffs's. This is a strange interface since its blocks have different units from the ordinary block numbers printed by the "blocks" command. "findblk" seems to be the only command in fsdb that does these conversions. Bruce From gary.jennejohn at freenet.de Wed Apr 22 10:30:23 2009 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Wed Apr 22 10:30:29 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: <49EE49D8.7000902@free.de> References: <49EE49D8.7000902@free.de> Message-ID: <20090422123020.42b756c1@ernst.jennejohn.org> On Wed, 22 Apr 2009 00:34:00 +0200 Kai Gallasch wrote: [snip a lot of stuff] > In /usr/src/UPDATING I read: > > [..] > > 20090207: > ZFS users on amd64 machines with 4GB or more of RAM should > reevaluate their need for setting vm.kmem_size_max and > vm.kmem_size manually. In fact, after recent changes to the > kernel, the default value of vm.kmem_size is larger than the > suggested manual setting in most ZFS/FreeBSD tuning guides. > > So I understood this as "vm.kmem_size is set unnecessary large by > default. You should think about decreasing it to save some RAM" > > On my amd64 server the default values of kmem_size are > > vm.kmem_size_scale: 3 > vm.kmem_size_max: 3865468109 > vm.kmem_size_min: 0 > vm.kmem_size: 1201446912 > > Can someone give me a hint how to debug this problem further, or how to > find some reasonable values for setting vm.kmem_size_max and > vm.kmem_size with 16G of RAM? > Hmm, I wonder whether this applies to 7.2-RC1. I don't know whether the kernel changes have been committed to 7.2 or whether they were already present when we started work on 7.2 because I haven't been paying much attention. On my 8-current amd64 machine with only 4GB of RAM I see larger values than you see with 16GB: sysctl vm.kmem_size_max vm.kmem_size_max: 4509713203 sysctl vm.kmem_size vm.kmem_size: 1335824384 --- Gary Jennejohn From ivoras at freebsd.org Wed Apr 22 12:16:21 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Apr 22 12:16:28 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: <20090422123020.42b756c1@ernst.jennejohn.org> References: <49EE49D8.7000902@free.de> <20090422123020.42b756c1@ernst.jennejohn.org> Message-ID: Gary Jennejohn wrote: > On Wed, 22 Apr 2009 00:34:00 +0200 > Kai Gallasch wrote: > > [snip a lot of stuff] >> In /usr/src/UPDATING I read: >> >> [..] >> >> 20090207: >> ZFS users on amd64 machines with 4GB or more of RAM should >> reevaluate their need for setting vm.kmem_size_max and >> vm.kmem_size manually. In fact, after recent changes to the >> kernel, the default value of vm.kmem_size is larger than the >> suggested manual setting in most ZFS/FreeBSD tuning guides. >> >> So I understood this as "vm.kmem_size is set unnecessary large by >> default. You should think about decreasing it to save some RAM" >> >> On my amd64 server the default values of kmem_size are >> >> vm.kmem_size_scale: 3 >> vm.kmem_size_max: 3865468109 >> vm.kmem_size_min: 0 >> vm.kmem_size: 1201446912 >> >> Can someone give me a hint how to debug this problem further, or how to >> find some reasonable values for setting vm.kmem_size_max and >> vm.kmem_size with 16G of RAM? >> > > Hmm, I wonder whether this applies to 7.2-RC1. I don't know whether > the kernel changes have been committed to 7.2 or whether they were > already present when we started work on 7.2 because I haven't been > paying much attention. 7.2 was branched last Friday - quick browsing of commit messages doesn't find any relevant new development between Friday and now. > On my 8-current amd64 machine with only 4GB of RAM I see larger values > than you see with 16GB: > > sysctl vm.kmem_size_max > vm.kmem_size_max: 4509713203 > sysctl vm.kmem_size > vm.kmem_size: 1335824384 Ok, but remember that ZFS in -CURRENT is very different from ZFS in -STABLE. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090422/33228537/signature.pgp From avg at icyb.net.ua Wed Apr 22 13:06:16 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Apr 22 13:06:28 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> Message-ID: <49EF1645.70704@icyb.net.ua> on 21/04/2009 21:43 Ivan Voras said the following: > Andriy Gapon wrote: >> I don't see why it should and - no, it actually does not. >> fsck checks only filesystem's internal consistency, it doesn't check media size, etc. > > Well yes, if the number of blocks is really incorrect it should be > visible from the arrangement of the metadata but still - that makes the > field almost useless doesn't it? How do you mean? The field tells the filesystem size, how it can be useless? -- Andriy Gapon From avg at icyb.net.ua Wed Apr 22 13:11:06 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Apr 22 13:11:11 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <9bbcef730904220608y73cbf2d2s6921b05c1978a121@mail.gmail.com> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> <49EF1645.70704@icyb.net.ua> <9bbcef730904220608y73cbf2d2s6921b05c1978a121@mail.gmail.com> Message-ID: <49EF1766.7030401@icyb.net.ua> on 22/04/2009 16:08 Ivan Voras said the following: > 2009/4/22 Andriy Gapon : >> on 21/04/2009 21:43 Ivan Voras said the following: >>> Andriy Gapon wrote: >>>> I don't see why it should and - no, it actually does not. >>>> fsck checks only filesystem's internal consistency, it doesn't check media size, etc. >>> Well yes, if the number of blocks is really incorrect it should be >>> visible from the arrangement of the metadata but still - that makes the >>> field almost useless doesn't it? >> How do you mean? >> The field tells the filesystem size, how it can be useless? > > If nothing checks it and everything works, I'd say it's usefulness is > a bit limited... ufs driver doesn't check it, the driver *uses* it, so... :-) -- Andriy Gapon From avg at icyb.net.ua Wed Apr 22 13:36:36 2009 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Apr 22 13:36:42 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <9bbcef730904220612s3ff4308fpc1d18e216a5c7773@mail.gmail.com> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> <49EF1645.70704@icyb.net.ua> <9bbcef730904220608y73cbf2d2s6921b05c1978a121@mail.gmail.com> <49EF1766.7030401@icyb.net.ua> <9bbcef730904220612s3ff4308fpc1d18e216a5c7773@mail.gmail.com> Message-ID: <49EF1D5F.7050907@icyb.net.ua> on 22/04/2009 16:12 Ivan Voras said the following: > 2009/4/22 Andriy Gapon : >> on 22/04/2009 16:08 Ivan Voras said the following: >>> 2009/4/22 Andriy Gapon : >>>> on 21/04/2009 21:43 Ivan Voras said the following: >>>>> Andriy Gapon wrote: >>>>>> I don't see why it should and - no, it actually does not. >>>>>> fsck checks only filesystem's internal consistency, it doesn't check media size, etc. >>>>> Well yes, if the number of blocks is really incorrect it should be >>>>> visible from the arrangement of the metadata but still - that makes the >>>>> field almost useless doesn't it? >>>> How do you mean? >>>> The field tells the filesystem size, how it can be useless? >>> If nothing checks it and everything works, I'd say it's usefulness is >>> a bit limited... >> ufs driver doesn't check it, the driver *uses* it, so... :-) > > But as you said, fsck will not fix an invalid value? It won't, because it can not know the correct value and it is probably not able to safely derive it from anything. Filesystem size is supposed to always stay immutable (modulo growfs), so if this type of corruption happens to superblock, then one has quite a big problem and possibly some fun time with disk editor. -- Andriy Gapon From ivoras at freebsd.org Wed Apr 22 13:37:24 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Apr 22 13:37:31 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <49EF1645.70704@icyb.net.ua> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> <49EF1645.70704@icyb.net.ua> Message-ID: <9bbcef730904220608y73cbf2d2s6921b05c1978a121@mail.gmail.com> 2009/4/22 Andriy Gapon : > on 21/04/2009 21:43 Ivan Voras said the following: >> Andriy Gapon wrote: >>> I don't see why it should and - no, it actually does not. >>> fsck checks only filesystem's internal consistency, it doesn't check media size, etc. >> >> Well yes, if the number of blocks is really incorrect it should be >> visible from the arrangement of the metadata but still - that makes the >> field almost useless doesn't it? > > How do you mean? > The field tells the filesystem size, how it can be useless? If nothing checks it and everything works, I'd say it's usefulness is a bit limited... From ivoras at freebsd.org Wed Apr 22 13:42:08 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Apr 22 13:42:15 2009 Subject: glabel for ufs: size check is overzealous? In-Reply-To: <49EF1766.7030401@icyb.net.ua> References: <49EDCA21.70908@icyb.net.ua> <49EDF80F.3070105@icyb.net.ua> <49EF1645.70704@icyb.net.ua> <9bbcef730904220608y73cbf2d2s6921b05c1978a121@mail.gmail.com> <49EF1766.7030401@icyb.net.ua> Message-ID: <9bbcef730904220612s3ff4308fpc1d18e216a5c7773@mail.gmail.com> 2009/4/22 Andriy Gapon : > on 22/04/2009 16:08 Ivan Voras said the following: >> 2009/4/22 Andriy Gapon : >>> on 21/04/2009 21:43 Ivan Voras said the following: >>>> Andriy Gapon wrote: >>>>> I don't see why it should and - no, it actually does not. >>>>> fsck checks only filesystem's internal consistency, it doesn't check media size, etc. >>>> Well yes, if the number of blocks is really incorrect it should be >>>> visible from the arrangement of the metadata but still - that makes the >>>> field almost useless doesn't it? >>> How do you mean? >>> The field tells the filesystem size, how it can be useless? >> >> If nothing checks it and everything works, I'd say it's usefulness is >> a bit limited... > > ufs driver doesn't check it, the driver *uses* it, so... :-) But as you said, fsck will not fix an invalid value? From gary.jennejohn at freenet.de Wed Apr 22 13:56:29 2009 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Wed Apr 22 13:56:36 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: References: <49EE49D8.7000902@free.de> <20090422123020.42b756c1@ernst.jennejohn.org> Message-ID: <20090422155627.7b6e127d@ernst.jennejohn.org> On Wed, 22 Apr 2009 14:16:06 +0200 Ivan Voras wrote: > Gary Jennejohn wrote: > > On my 8-current amd64 machine with only 4GB of RAM I see larger values > > than you see with 16GB: > > > > sysctl vm.kmem_size_max > > vm.kmem_size_max: 4509713203 > > sysctl vm.kmem_size > > vm.kmem_size: 1335824384 > > Ok, but remember that ZFS in -CURRENT is very different from ZFS in -STABLE. > True, but the kmem_size stuff has nothing to do with ZFS. It's VM. --- Gary Jennejohn From jh at saunalahti.fi Wed Apr 22 14:40:05 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Wed Apr 22 14:40:25 2009 Subject: kern/132068: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Message-ID: <200904221440.n3MEe4ip001654@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: Jaakko Heinonen To: Edward Fisk <7ogcg7g02@sneakemail.com> Cc: bug-followup@FreeBSD.org, Weldon Godfrey Subject: Re: kern/132068: [zfs] page fault when using ZFS over NFS on 7.1-RELEASE/amd64 Date: Wed, 22 Apr 2009 17:38:57 +0300 On 2009-04-10, Jaakko Heinonen wrote: > OK, I have now put together a patch which should avoid the original > panic you reported. Have you had a chance to test the patch? http://www.freebsd.org/cgi/query-pr.cgi?pr=132068 -- Jaakko From mcdouga9 at egr.msu.edu Wed Apr 22 14:40:54 2009 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Wed Apr 22 14:41:01 2009 Subject: FreeBSD 7.2-RC1 - ZFS related kernel panic "kmem_map too small" In-Reply-To: <20090422123020.42b756c1@ernst.jennejohn.org> References: <49EE49D8.7000902@free.de> <20090422123020.42b756c1@ernst.jennejohn.org> Message-ID: <49EF27BB.1060100@egr.msu.edu> Gary Jennejohn wrote: > On Wed, 22 Apr 2009 00:34:00 +0200 > Kai Gallasch wrote: > > [snip a lot of stuff] > >> In /usr/src/UPDATING I read: >> >> [..] >> >> 20090207: >> ZFS users on amd64 machines with 4GB or more of RAM should >> reevaluate their need for setting vm.kmem_size_max and >> vm.kmem_size manually. In fact, after recent changes to the >> kernel, the default value of vm.kmem_size is larger than the >> suggested manual setting in most ZFS/FreeBSD tuning guides. >> >> So I understood this as "vm.kmem_size is set unnecessary large by >> default. You should think about decreasing it to save some RAM" >> >> On my amd64 server the default values of kmem_size are >> >> vm.kmem_size_scale: 3 >> vm.kmem_size_max: 3865468109 >> vm.kmem_size_min: 0 >> vm.kmem_size: 1201446912 >> >> Can someone give me a hint how to debug this problem further, or how to >> find some reasonable values for setting vm.kmem_size_max and >> vm.kmem_size with 16G of RAM? >> >> > > Hmm, I wonder whether this applies to 7.2-RC1. I don't know whether > the kernel changes have been committed to 7.2 or whether they were > already present when we started work on 7.2 because I haven't been > paying much attention. > > On my 8-current amd64 machine with only 4GB of RAM I see larger values > than you see with 16GB: > > sysctl vm.kmem_size_max > vm.kmem_size_max: 4509713203 > sysctl vm.kmem_size > vm.kmem_size: 1335824384 > > --- > Gary Jennejohn > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > It has been my experience that after the kmem maximums were raised to allow more than approx 1.6G kmem (a number of months ago), on some systems I still had to specifically raise the vm.kmem_size above the default otherwise I still got out of kmem panics far below the max. I suspect there was pressure for kmem and it was unable to "raise" the limit fast enough, or maybe a fragmentation problem? Additionally, depending on which host, I've found different limits to how high I can set the kmem settings on "recent" builds of 7 and 8 amd64, for example I have one 7.2 system with 4G ram and the kernel would panic if I booted with kmem=2G (1G works fine), but I have a 8.0 system with 2G ram and kmem=2G works fine. Another 8.0 system has 6G ram but we could only boot successfully with kmem=3G, not 4G. From scott at bqinternet.com Thu Apr 23 11:14:40 2009 From: scott at bqinternet.com (Scott Burns) Date: Thu Apr 23 11:14:47 2009 Subject: UFS2 metadata checksums Message-ID: <49F048FB.6000401@bqinternet.com> Hi guys, I have spent some time writing a kernel module which calculates a checksum of a UFS2 dinode structure and stores it in the reserved space of the inode when writing it to disk. It is then verified when the inode is read from disk. If the checksum verification fails, the read returns an error (currently EIO). I believe that protecting metadata integrity is important, especially as storage capacity grows. Bitrot is a fact of life, and bad things can happen if the kernel acts on a corrupted inode. Not only does this module improve the stability of a server, but it also helps to prevent additional damage to the filesystem that can be caused by metadata corruption. I'm aware that data integrity issues are addressed with ZFS, but unfortunately ZFS is still not yet suitable for many workloads. I'm also aware that integrity checking can be done by using GELI between the filesystem and the disk, but at a noticeable cost in performance and space utilization. The method this module uses is fast and does not use any additional space. Most importantly, it builds on mature code that has worked well for decades. Before I spend much more time on it, I have some questions: 1) Has anyone else done any work in this area? 2) Is there a demand for this in FreeBSD? -- Scott Burns System Administrator BQ Internet Corporation From ivoras at freebsd.org Thu Apr 23 12:20:04 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Apr 23 12:20:15 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F048FB.6000401@bqinternet.com> References: <49F048FB.6000401@bqinternet.com> Message-ID: Scott Burns wrote: > 2) Is there a demand for this in FreeBSD? Speaking for myself, I'd like it on the systems I maintain. (I'd also like a sysctl to ignore the errors, just in case :) ). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090423/7a618ea3/signature.pgp From wgodfrey at ena.com Thu Apr 23 13:00:06 2009 From: wgodfrey at ena.com (Weldon Godfrey) Date: Thu Apr 23 13:00:12 2009 Subject: kern/132068: [zfs] page fault when using ZFS over NFS on7.1-RELEASE/amd64 Message-ID: <200904231300.n3ND05uq048445@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: "Weldon Godfrey" To: "Jaakko Heinonen" , "Edward Fisk" <7ogcg7g02@sneakemail.com> Cc: Subject: RE: kern/132068: [zfs] page fault when using ZFS over NFS on7.1-RELEASE/amd64 Date: Thu, 23 Apr 2009 07:40:10 -0500 Sorry. Around Dec 12 I switched to head. By increasing kmem to 4GB and using NFS v2, that reduced the panics to a few times a month. The server is in production. I'll need to acquire some additional drives so I can install the OS on different drives (in case I need to backout) and wait until summer to attempt to upgrade. Weldon -----Original Message----- From: Jaakko Heinonen [mailto:jh@saunalahti.fi]=20 Sent: Wednesday, April 22, 2009 9:39 AM To: Edward Fisk Cc: bug-followup@FreeBSD.org; Weldon Godfrey Subject: Re: kern/132068: [zfs] page fault when using ZFS over NFS on7.1-RELEASE/amd64 On 2009-04-10, Jaakko Heinonen wrote: > OK, I have now put together a patch which should avoid the original > panic you reported. Have you had a chance to test the patch? http://www.freebsd.org/cgi/query-pr.cgi?pr=3D132068 --=20 Jaakko From morganw at chemikals.org Thu Apr 23 22:19:14 2009 From: morganw at chemikals.org (Wes Morgan) Date: Thu Apr 23 22:19:21 2009 Subject: UFS2 metadata checksums In-Reply-To: References: <49F048FB.6000401@bqinternet.com> Message-ID: On Thu, 23 Apr 2009, Ivan Voras wrote: > Scott Burns wrote: > >> 2) Is there a demand for this in FreeBSD? > > Speaking for myself, I'd like it on the systems I maintain. (I'd also > like a sysctl to ignore the errors, just in case :) ). That's actually something ZFS could use if you ask me. In one instance I had some bad ram that was causing checksum errors (zfs is better than memtest for finding bad ram!), and I had to comment out the ECHKSUM error from the kernel to recover the pieces of the file that were reported corrupt. From ivoras at freebsd.org Thu Apr 23 22:28:56 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Apr 23 22:29:02 2009 Subject: UFS2 metadata checksums In-Reply-To: References: <49F048FB.6000401@bqinternet.com> Message-ID: <9bbcef730904231528v6badb9d1u27d89fb0e1cb1cb9@mail.gmail.com> 2009/4/24 Wes Morgan : > On Thu, 23 Apr 2009, Ivan Voras wrote: > >> Scott Burns wrote: >> >>> 2) Is there a demand for this in FreeBSD? >> >> Speaking for myself, I'd like it on the systems I maintain. (I'd also >> like a sysctl to ignore the errors, just in case :) ). > > That's actually something ZFS could use if you ask me. In one instance I had > some bad ram that was causing checksum errors (zfs is better than memtest > for finding bad ram!), and I had to comment out the ECHKSUM error from the > kernel to recover the pieces of the file that were reported corrupt. Yes, this is my inspiration :) From kabaev at gmail.com Fri Apr 24 00:16:39 2009 From: kabaev at gmail.com (Alexander Kabaev) Date: Fri Apr 24 00:16:45 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F048FB.6000401@bqinternet.com> References: <49F048FB.6000401@bqinternet.com> Message-ID: <20090423195335.521db0a7@kan.dnsalias.net> On Thu, 23 Apr 2009 06:54:51 -0400 Scott Burns wrote: > Hi guys, > > I have spent some time writing a kernel module which calculates a > checksum of a UFS2 dinode structure and stores it in the reserved > space of the inode when writing it to disk. It is then verified when > the inode is read from disk. If the checksum verification fails, the > read returns an error (currently EIO). > > I believe that protecting metadata integrity is important, especially > as storage capacity grows. Bitrot is a fact of life, and bad things > can happen if the kernel acts on a corrupted inode. Not only does > this module improve the stability of a server, but it also helps to > prevent additional damage to the filesystem that can be caused by > metadata corruption. > > I'm aware that data integrity issues are addressed with ZFS, but > unfortunately ZFS is still not yet suitable for many workloads. I'm > also aware that integrity checking can be done by using GELI between > the filesystem and the disk, but at a noticeable cost in performance > and space utilization. The method this module uses is fast and does > not use any additional space. Most importantly, it builds on mature > code that has worked well for decades. > > Before I spend much more time on it, I have some questions: > > 1) Has anyone else done any work in this area? > > 2) Is there a demand for this in FreeBSD? > This is actually something I would love to have in the base system, but inodes are not the only structures that need the integrity protection. Pretty much every other metadata block, from cylinder group blocks to indirect blocks for files need similar protection for this to be of real use. -- Alexander Kabaev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 188 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090424/8bce57e9/signature.pgp From andrew at modulus.org Fri Apr 24 00:23:13 2009 From: andrew at modulus.org (Andrew Snow) Date: Fri Apr 24 00:23:20 2009 Subject: UFS2 metadata checksums In-Reply-To: <20090423195335.521db0a7@kan.dnsalias.net> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> Message-ID: <49F10660.201@modulus.org> Ideally you would implement complete disk checksumming as a GEOM device. Then you could layer geom_mirror on top of it, so that if the checksum fails and returns EIO, geom_mirror can try the alternate device and rebuild the one with the bad checksums. That will then complete the feature set implemented by ZFS, but for any filesystem on top of GEOM. - Andrew From scott at bqinternet.com Fri Apr 24 06:45:32 2009 From: scott at bqinternet.com (Scott Burns) Date: Fri Apr 24 06:45:39 2009 Subject: UFS2 metadata checksums In-Reply-To: <20090423195335.521db0a7@kan.dnsalias.net> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> Message-ID: <49F16009.3080206@bqinternet.com> Alexander Kabaev wrote: > On Thu, 23 Apr 2009 06:54:51 -0400 > Scott Burns wrote: > >> Hi guys, >> >> I have spent some time writing a kernel module which calculates a >> checksum of a UFS2 dinode structure and stores it in the reserved >> space of the inode when writing it to disk. It is then verified when >> the inode is read from disk. If the checksum verification fails, the >> read returns an error (currently EIO). >> >> I believe that protecting metadata integrity is important, especially >> as storage capacity grows. Bitrot is a fact of life, and bad things >> can happen if the kernel acts on a corrupted inode. Not only does >> this module improve the stability of a server, but it also helps to >> prevent additional damage to the filesystem that can be caused by >> metadata corruption. >> >> I'm aware that data integrity issues are addressed with ZFS, but >> unfortunately ZFS is still not yet suitable for many workloads. I'm >> also aware that integrity checking can be done by using GELI between >> the filesystem and the disk, but at a noticeable cost in performance >> and space utilization. The method this module uses is fast and does >> not use any additional space. Most importantly, it builds on mature >> code that has worked well for decades. >> >> Before I spend much more time on it, I have some questions: >> >> 1) Has anyone else done any work in this area? >> >> 2) Is there a demand for this in FreeBSD? >> > > This is actually something I would love to have in the base system, > but inodes are not the only structures that need the integrity > protection. Pretty much every other metadata block, from cylinder group > blocks to indirect blocks for files need similar protection for > this to be of real use. > > -- > Alexander Kabaev As long as there is some interest in this kind of functionality, I will continue working on it. The next step is to protect metadata structures beyond inodes. I am hoping to have some results to post in the next few weeks. -- Scott Burns System Administrator BQ Internet Corporation From scott at bqinternet.com Fri Apr 24 06:52:01 2009 From: scott at bqinternet.com (Scott Burns) Date: Fri Apr 24 06:52:07 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F10660.201@modulus.org> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F10660.201@modulus.org> Message-ID: <49F1618E.3080208@bqinternet.com> Andrew Snow wrote: > > Ideally you would implement complete disk checksumming as a GEOM device. > > Then you could layer geom_mirror on top of it, so that if the checksum > fails and returns EIO, geom_mirror can try the alternate device and > rebuild the one with the bad checksums. > > That will then complete the feature set implemented by ZFS, but for any > filesystem on top of GEOM. > > - Andrew > The geli(8) GEOM class is able to verify sectors (and I believe it returns EINVAL on ones that fail), but with a noticeable performance impact. I could certainly see the use for a GEOM class that just does simple checksumming. If gmirror can then be aware of it, that does provide functionality similar to a ZFS mirror. -- Scott Burns System Administrator BQ Internet Corporation From jh at saunalahti.fi Fri Apr 24 10:20:09 2009 From: jh at saunalahti.fi (Jaakko Heinonen) Date: Fri Apr 24 10:20:14 2009 Subject: kern/132068: [zfs] page fault when using ZFS over NFS on7.1-RELEASE/amd64 Message-ID: <200904241020.n3OAK8ma090160@freefall.freebsd.org> The following reply was made to PR kern/132068; it has been noted by GNATS. From: Jaakko Heinonen To: Weldon Godfrey Cc: Edward Fisk <7ogcg7g02@sneakemail.com>, bug-followup@FreeBSD.org Subject: Re: kern/132068: [zfs] page fault when using ZFS over NFS on7.1-RELEASE/amd64 Date: Fri, 24 Apr 2009 13:14:09 +0300 On 2009-04-23, Weldon Godfrey wrote: > Around Dec 12 I switched to head. ... > I'll need to acquire some additional drives so I can install the OS on > different drives (in case I need to backout) and wait until summer to > attempt to upgrade. FYI, the patch is against head. -- Jaakko From pjd at FreeBSD.org Fri Apr 24 10:51:55 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Fri Apr 24 10:52:02 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F1618E.3080208@bqinternet.com> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F10660.201@modulus.org> <49F1618E.3080208@bqinternet.com> Message-ID: <20090424103252.GC1494@garage.freebsd.pl> On Fri, Apr 24, 2009 at 02:51:58AM -0400, Scott Burns wrote: > > Andrew Snow wrote: > > > >Ideally you would implement complete disk checksumming as a GEOM device. > > > >Then you could layer geom_mirror on top of it, so that if the checksum > >fails and returns EIO, geom_mirror can try the alternate device and > >rebuild the one with the bad checksums. > > > >That will then complete the feature set implemented by ZFS, but for any > >filesystem on top of GEOM. > > > >- Andrew > > > > The geli(8) GEOM class is able to verify sectors (and I believe it > returns EINVAL on ones that fail), but with a noticeable performance > impact. I could certainly see the use for a GEOM class that just does > simple checksumming. If gmirror can then be aware of it, that does > provide functionality similar to a ZFS mirror. Geli uses strong cryptography for integrity verification, which is not needed in this case. The class that does that still needs to use the method I implemented in geli to provide atomicity. Gmirror is already "aware" of that - in case of an error on one half, it will use the other half. What gmirror doesn't do (and ZFS does) is self-healing. All in all, in my opinion GEOM class is much better for this - it will protect everything (metadata and data) and will be file system idenpendent. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090424/3f575c50/attachment.pgp From gtodd at bellanet.org Fri Apr 24 13:57:12 2009 From: gtodd at bellanet.org (Graham Todd) Date: Fri Apr 24 13:57:18 2009 Subject: UFS2 metadata checksums In-Reply-To: <20090424103252.GC1494@garage.freebsd.pl> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F10660.201@modulus.org> <49F1618E.3080208@bqinternet.com> <20090424103252.GC1494@garage.freebsd.pl> Message-ID: <49F1BF60.10106@bellanet.org> Pawel Jakub Dawidek wrote: > On Fri, Apr 24, 2009 at 02:51:58AM -0400, Scott Burns wrote: >> Andrew Snow wrote: >>> Ideally you would implement complete disk checksumming as a GEOM device. >>> >>> Then you could layer geom_mirror on top of it, so that if the checksum >>> fails and returns EIO, geom_mirror can try the alternate device and >>> rebuild the one with the bad checksums. >>> >>> That will then complete the feature set implemented by ZFS, but for any >>> filesystem on top of GEOM. >>> >>> - Andrew >>> >> The geli(8) GEOM class is able to verify sectors (and I believe it >> returns EINVAL on ones that fail), but with a noticeable performance >> impact. I could certainly see the use for a GEOM class that just does >> simple checksumming. If gmirror can then be aware of it, that does >> provide functionality similar to a ZFS mirror. > > Geli uses strong cryptography for integrity verification, which is not > needed in this case. The class that does that still needs to use > the method I implemented in geli to provide atomicity. > > Gmirror is already "aware" of that - in case of an error on one half, it > will use the other half. What gmirror doesn't do (and ZFS does) is > self-healing. > > All in all, in my opinion GEOM class is much better for this - it will > protect everything (metadata and data) and will be file system > idenpendent. As a sysadmin one could imagine some useful monitoring, auditing, security, reporting and "ITIL compliant" scripts/utilities that could be built around a geom_checksum class and "/sbin/gchecksum status". >From a performance perspective how "fine grained" could the checksum detail on a filesystem be for it to be practical to use in that way? Could such a geom class function like a builtin "tripwire" layer? From ivoras at freebsd.org Fri Apr 24 19:03:09 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Apr 24 19:03:15 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F16009.3080206@bqinternet.com> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F16009.3080206@bqinternet.com> Message-ID: Scott Burns wrote: > As long as there is some interest in this kind of functionality, I will > continue working on it. The next step is to protect metadata structures > beyond inodes. I am hoping to have some results to post in the next few > weeks. Btw. what checksum do you use? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090424/d2b90e80/signature.pgp From james-freebsd-fs2 at jrv.org Sat Apr 25 07:45:29 2009 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sat Apr 25 07:45:35 2009 Subject: zfs recv core dump Message-ID: <49F2BF8B.3060603@jrv.org> zfs recv dumps core for me with this command: # zfs send -R -I @snap1 bigtex@snap2 | ssh back zfs recv -vFd bigtex The problem is in libzfs_sendrecv.c here: /* check for rename */ if ((stream_parent_fromsnap_guid != 0 && stream_parent_fromsnap_guid != parent_fromsnap_guid) || strcmp(strrchr(fsname, '/'), strrchr(stream_fsname, '/')) != 0) { fsname and stream_fsname are both "bigtex", no slash, so both strrchr calls return 0, and strcmp (0, 0) segfaults. Any ideas? Is anyone trying to use zfs send/recv to replicate pools? From scott at bqinternet.com Sun Apr 26 09:21:13 2009 From: scott at bqinternet.com (Scott Burns) Date: Sun Apr 26 09:21:19 2009 Subject: UFS2 metadata checksums In-Reply-To: References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F16009.3080206@bqinternet.com> Message-ID: <49F42786.6070008@bqinternet.com> Ivan Voras wrote: > Scott Burns wrote: > >> As long as there is some interest in this kind of functionality, I will >> continue working on it. The next step is to protect metadata structures >> beyond inodes. I am hoping to have some results to post in the next few >> weeks. > > Btw. what checksum do you use? I haven't settled on anything yet. Currently I'm just reading the dinode structure 32 bits at a time and doing a bitwise XOR. It's just a proof of concept and I am open to suggestions. -- Scott Burns System Administrator BQ Internet Corporation From rmacklem at uoguelph.ca Sun Apr 26 20:49:07 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sun Apr 26 20:49:13 2009 Subject: Why do some ufs i-node fields have 2 copies? Message-ID: Hi, I was just wondering if anyone conversant with ufs/ffs could tell me why the following fields of the i-node have off-disk and on-disk copies? (One thought I had was that these fields are sometimes set to values that shouldn't get saved on-disk, but it was just a hunch.) /* * Copies from the on-disk dinode itself. */ u_int16_t i_mode; /* IFMT, permissions; see below. */ int16_t i_nlink; /* File link count. */ u_int64_t i_size; /* File byte count. */ u_int32_t i_flags; /* Status flags (chflags). */ int64_t i_gen; /* Generation number. */ u_int32_t i_uid; /* File owner. */ u_int32_t i_gid; /* File group. */ /* * The real copy of the on-disk inode. Thanks in advance for any info, rick From rick-freebsd2008 at kiwi-computer.com Sun Apr 26 21:30:24 2009 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Sun Apr 26 21:30:31 2009 Subject: Why do some ufs i-node fields have 2 copies? In-Reply-To: References: Message-ID: <20090426210343.GA51829@keira.kiwi-computer.com> On Sun, Apr 26, 2009 at 04:56:03PM -0400, Rick Macklem wrote: > > I was just wondering if anyone conversant with ufs/ffs could tell me why > the following fields of the i-node have off-disk and on-disk copies? > (One thought I had was that these fields are sometimes set to values > that shouldn't get saved on-disk, but it was just a hunch.) > /* > * Copies from the on-disk dinode itself. > */ > u_int16_t i_mode; /* IFMT, permissions; see below. */ > int16_t i_nlink; /* File link count. */ > u_int64_t i_size; /* File byte count. */ > u_int32_t i_flags; /* Status flags (chflags). */ > int64_t i_gen; /* Generation number. */ > u_int32_t i_uid; /* File owner. */ > u_int32_t i_gid; /* File group. */ > /* > * The real copy of the on-disk inode. You missed a few lines: union { struct ufs1_dinode *din1; /* UFS1 on-disk dinode. */ struct ufs2_dinode *din2; /* UFS2 on-disk dinode. */ } dinode_u; The reason is that the first set (the "copies") are what are referenced by the ffs/ufs code. The real copies are what came from the disk (hence the "on-disk"). Because UFS1 & UFS2 have such different dinode structures, the copies are moved to/from the dinode structures when the inode is read to or written from the disk. At all other times, the "copies" are used. -- Rick C. Petty From rmacklem at uoguelph.ca Sun Apr 26 22:19:20 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sun Apr 26 22:19:27 2009 Subject: Why do some ufs i-node fields have 2 copies? In-Reply-To: <20090426210343.GA51829@keira.kiwi-computer.com> References: <20090426210343.GA51829@keira.kiwi-computer.com> Message-ID: On Sun, 26 Apr 2009, Rick C. Petty wrote: > > The reason is that the first set (the "copies") are what are referenced by > the ffs/ufs code. The real copies are what came from the disk (hence the > "on-disk"). Because UFS1 & UFS2 have such different dinode structures, the > copies are moved to/from the dinode structures when the inode is read to or > written from the disk. At all other times, the "copies" are used. > Righto. However some fields like i_mtime are just accessed using the DIP() and DIP_SET() macros. Are you just saying that it was easier to not bother using the macros for frequently referenced fields? (Or, to put it another way, "when support for UFS2 was added, the code would have looked really grotty if DIP() and DIP_SET() were used for frequently used fields?".) The question came up because I had proposed a patch with: DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); in it and it was suggested that i_modrev might be more appropriate with a "shadow copy". Since I realized I didn't know why there were "shadow copies" of some fields and not others, it lead to the original question. (This is the only place i_modrev gets manipulated by the patch.) If my interpretation of your answer is correct, I'd say that a "shadow copy" doesn't really add anything to the code? What do others think? Thanks for the reply, rick From bugmaster at FreeBSD.org Mon Apr 27 11:06:54 2009 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Apr 27 11:07:53 2009 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200904271106.n3RB6rPq002266@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/131084 fs [xfs] xfs destroys itself after copying data o kern/131081 fs [zfs] User cannot delete a file when a ZFS dataset is o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o bin/130105 fs [zfs] zfs send -R dumps core o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc 59 problems total. From ivoras at freebsd.org Mon Apr 27 11:24:00 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Apr 27 11:24:07 2009 Subject: UFS2 metadata checksums In-Reply-To: <49F42786.6070008@bqinternet.com> References: <49F048FB.6000401@bqinternet.com> <20090423195335.521db0a7@kan.dnsalias.net> <49F16009.3080206@bqinternet.com> <49F42786.6070008@bqinternet.com> Message-ID: Scott Burns wrote: > Ivan Voras wrote: >> Scott Burns wrote: >> >>> As long as there is some interest in this kind of functionality, I will >>> continue working on it. The next step is to protect metadata structures >>> beyond inodes. I am hoping to have some results to post in the next few >>> weeks. >> >> Btw. what checksum do you use? > > I haven't settled on anything yet. Currently I'm just reading the > dinode structure 32 bits at a time and doing a bitwise XOR. It's just a > proof of concept and I am open to suggestions. For 32 bits of hash, adler32 or crc32 should be ok - the code is already used in the system for various purposes. Adler32 is faster; for a sample implementation see sys/net/zlib.c at line 5357. ZFS defaults to "fletcher2" (I guess 16-bit Fletcher?) which should be faster but its implementation could be a bit problematic (see http://opensolaris.org/jive/thread.jspa?threadID=69655&tstart=0). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20090427/0a221bf2/signature.pgp From rick-freebsd2008 at kiwi-computer.com Mon Apr 27 13:52:02 2009 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Mon Apr 27 13:52:08 2009 Subject: Why do some ufs i-node fields have 2 copies? In-Reply-To: References: <20090426210343.GA51829@keira.kiwi-computer.com> Message-ID: <20090427135200.GA59201@keira.kiwi-computer.com> On Sun, Apr 26, 2009 at 06:26:14PM -0400, Rick Macklem wrote: > > > >The reason is that the first set (the "copies") are what are referenced by > >the ffs/ufs code. The real copies are what came from the disk (hence the > >"on-disk"). Because UFS1 & UFS2 have such different dinode structures, the > >copies are moved to/from the dinode structures when the inode is read to or > >written from the disk. At all other times, the "copies" are used. > > > Righto. However some fields like i_mtime are just accessed using the > DIP() and DIP_SET() macros. > > Are you just saying that it was easier to not bother using the macros for > frequently referenced fields? (Or, to put it another way, "when support > for UFS2 was added, the code would have looked really grotty if DIP() and > DIP_SET() were used for frequently used fields?".) No, it is silly to chase up the link to the superblock just to check for the UFS version for every single access to these items. Instead, all accesses should be made to the copies *except* the ones which assemble the dinode and store (or load) to disk. In those cases, you already have to access other items in the superblock of the cylinder group. > The question came up because I had proposed a patch with: > > DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); > > in it and it was suggested that i_modrev might be more appropriate with > a "shadow copy". Since I realized I didn't know why there were "shadow > copies" of some fields and not others, it lead to the original question. > (This is the only place i_modrev gets manipulated by the patch.) Yes, it would be. > If my interpretation of your answer is correct, I'd say that a "shadow > copy" doesn't really add anything to the code? What do others think? It adds plenty-- you don't have to dereference more than the inode pointer. If you don't use the shadows, you have to dereference the superblock (i_fs) to get the UFS version as well as dereference the dinode itself. It doesn't make sense to change all the macros to not use the shadows. -- Rick C. Petty From rmacklem at uoguelph.ca Mon Apr 27 14:51:46 2009 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Apr 27 14:51:52 2009 Subject: Why do some ufs i-node fields have 2 copies? In-Reply-To: <20090427135200.GA59201@keira.kiwi-computer.com> References: <20090426210343.GA51829@keira.kiwi-computer.com> <20090427135200.GA59201@keira.kiwi-computer.com> Message-ID: On Mon, 27 Apr 2009, Rick C. Petty wrote: [good stuff snipped] >> If my interpretation of your answer is correct, I'd say that a "shadow >> copy" doesn't really add anything to the code? What do others think? > > It adds plenty-- you don't have to dereference more than the inode pointer. > If you don't use the shadows, you have to dereference the superblock (i_fs) > to get the UFS version as well as dereference the dinode itself. It > doesn't make sense to change all the macros to not use the shadows. > Oops, poorly expressed. What I meant to say was "doesn't really add anything to the line of code I listed in the original message": DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); since that is the only time i_modrev is manipulated. (I didn't mean "doesn't add anything to the code, in general".) If I were to add a "shadow copy" of i_modrev, the line would become: ip->i_modrev++; DIP_SET(ip, i_modrev, ip->i_modrev); and then ip->i_modrev = ip->i_din1->di_modrev; and ip->i_modrev = ip->i_din2->di_modrev; would have to be added to ffs_subr.c and an extra field added to the i-node structure for the "shadow copy". I don't think this would be an improvement compared to the one line. Does it make sense this time? Thanks, rick From rick-freebsd2008 at kiwi-computer.com Mon Apr 27 15:03:37 2009 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Mon Apr 27 15:03:44 2009 Subject: Why do some ufs i-node fields have 2 copies? In-Reply-To: References: <20090426210343.GA51829@keira.kiwi-computer.com> <20090427135200.GA59201@keira.kiwi-computer.com> Message-ID: <20090427150334.GA59796@keira.kiwi-computer.com> On Mon, Apr 27, 2009 at 10:58:43AM -0400, Rick Macklem wrote: > > > Oops, poorly expressed. What I meant to say was "doesn't really add > anything to > the line of code I listed in the original message": > > DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1); > > since that is the only time i_modrev is manipulated. (I didn't mean "doesn't > add anything to the code, in general".) > Does it make sense this time? Yes, that makes total sense. -- Rick C. Petty