From kometen at gmail.com Fri Aug 1 12:19:39 2008 From: kometen at gmail.com (Claus Guttesen) Date: Fri Aug 1 12:19:49 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: >> The patch above contains the most recent ZFS version that could be found >> in OpenSolaris as of today. Apart for large amount of new functionality, >> I belive there are many stability (and also performance) improvements >> compared to the version from the base system. >> >> Please test, test, test. If I get enough positive feedback, I may be >> able to squeeze it into 7.1-RELEASE, but this might be hard. >> > > I applied your patch to a current as of July the 31'st. I had to > remove /usr/src and perform a clean csup and remove the two empty > files as mentioned in this thread. > > I have a areca arc-1680 sas-card and an external sas-cabinet with 16 > sas-drives each 1 TB (931 binary GB). They have been setup in three > raidz-partitions with five disks each in one zpool and one spare. > > There does seem to be a speed-improvement. I nfs-mounted a partition > from solaris 9 on sparc and is copying approx.400 GB using rsync. I > saw write of 429 MB/s. The spikes occured every 10 secs. to begin > with. After some minutes I get writes almost every sec. (watching > zpool iostat 1). The limit is clearly the network-connection between > the two hosts. I'll do some internal copying later. > > It's to early to say whether zfs is stable (enough) allthough I > haven't been able to make it halt unless I removed a disk. This was > with version 6. I'll remove a disk tomorrow and see how it goes. Replying to my own mail! :-) My conclusion about it's stability was a bit hasty. I was copying approx. 400 GB from a nfs-share mounted from a solaris 9 on sparc using tcp and read- and write-size of 32768. The files are images slightly less than 1 MB and a thumbnail (approx. 983000 files). During creation of my pool I saw these error-messages: WARNING pid 1065 (zfs): ioctl sign-extension ioctl ffffffffcc285a18 WARNING pid 1067 (zfs): ioctl sign-extension ioctl ffffffffcc285a18 WARNING pid 1069 (zfs): ioctl sign-extension ioctl ffffffffcc285a15 WARNING pid 1070 (zfs): ioctl sign-extension ioctl ffffffffcc285a15 WARNING pid 1076 (zfs): ioctl sign-extension ioctl ffffffffcc285a19 WARNING pid 1077 (zfs): ioctl sign-extension ioctl ffffffffcc285a18 WARNING pid 1079 (zfs): ioctl sign-extension ioctl ffffffffcc285a15 Twice during the copy (rsync) access to the pool stopped. I took a copy of top during the first and second incident: last pid: 4287; load averages: 0.00, 0.17, 0.48 up 0+03:02:30 00:27:42 33 processes: 1 running, 32 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 43M Active, 6350M Inact, 1190M Wired, 220M Cache, 682M Buf, 130M Free Swap: 8192M Total, 16K Used, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 4237 www 1 58 0 23056K 17328K tx->tx 2 0:07 0.00% rsync 4159 root 1 44 0 14336K 3476K pause 0 0:00 0.00% zsh 3681 claus 1 44 0 14480K 3524K pause 1 0:00 0.00% zsh 4154 claus 1 44 0 36580K 3768K select 1 0:00 0.00% sshd 4273 claus 1 44 0 14480K 3552K pause 3 0:00 0.00% zsh 4125 www 1 44 0 14600K 3584K ttyin 1 0:00 0.00% zsh 4120 root 1 44 0 12992K 3088K pause 0 0:00 0.00% zsh 4156 claus 1 46 0 13140K 3196K pause 2 0:00 0.00% zsh 4284 root 1 44 0 12992K 3264K pause 1 0:00 0.00% zsh 3679 claus 1 44 0 36580K 3612K select 2 0:00 0.00% sshd 1016 root 1 44 0 6768K 1168K nanslp 0 0:00 0.00% cron 3676 root 1 46 0 36580K 3624K sbwait 0 0:00 0.00% sshd 4150 root 1 45 0 36580K 3780K sbwait 2 0:00 0.00% sshd 793 root 1 44 0 5712K 1164K select 1 0:00 0.00% syslogd 4268 root 1 45 0 36580K 3896K sbwait 1 0:00 0.00% sshd 4271 claus 1 44 0 36580K 3892K select 2 0:00 0.00% sshd 4287 root 1 44 0 8140K 1896K CPU0 0 0:00 0.00% top 4123 root 1 45 0 20460K 1412K wait 0 0:00 0.00% su 1007 root 1 44 0 24652K 2788K select 1 0:00 0.00% sshd last pid: 2812; load averages: 0.01, 0.53, 0.87 up 0+01:01:45 10:03:55 34 processes: 1 running, 33 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 150M Active, 166M Inact, 1469M Wired, 40K Cache, 680M Buf, 6147M Free Swap: 8192M Total, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 2787 www 1 44 0 117M 99M select 3 0:05 0.00% rsync 2785 www 1 65 0 117M 100M zio->i 1 0:05 0.00% rsync 1326 root 1 44 0 14500K 2300K nanslp 3 0:02 0.00% zpool 1195 claus 1 44 0 8140K 2704K CPU0 0 0:01 0.00% top 1224 www 1 65 0 14804K 4576K pause 1 0:00 0.00% zsh 2786 www 1 44 0 98832K 87432K select 0 0:00 0.00% rsync 1203 claus 1 44 0 36580K 5320K select 0 0:00 0.00% sshd 1155 claus 1 44 0 14608K 4408K pause 1 0:00 0.00% zsh 1177 claus 1 44 0 36580K 5320K select 1 0:00 0.00% sshd 1208 root 1 44 0 15392K 4292K pause 2 0:00 0.00% zsh 1153 claus 1 44 0 36580K 5320K select 3 0:00 0.00% sshd 2708 claus 1 44 0 13140K 3976K ttyin 3 0:00 0.00% zsh 1219 root 1 44 0 12992K 3892K pause 0 0:00 0.00% zsh 1179 claus 1 44 0 13140K 3976K pause 2 0:00 0.00% zsh 1205 claus 1 47 0 13140K 3976K pause 1 0:00 0.00% zsh 1146 root 1 45 0 36580K 5284K sbwait 1 0:00 0.00% sshd 2703 root 1 46 0 36580K 5276K sbwait 0 0:00 0.00% sshd 1171 root 1 46 0 36580K 5276K sbwait 0 0:00 0.00% sshd 1200 root 1 46 0 36580K 5276K sbwait 0 0:00 0.00% sshd 795 root 1 44 0 5712K 1412K select 0 0:00 0.00% syslogd 1018 root 1 44 0 6768K 1484K nanslp 2 0:00 0.00% cron 2706 claus 1 44 0 36580K 5320K select 1 0:00 0.00% sshd 1222 root 1 45 0 20460K 1840K wait 1 0:00 0.00% su When copying was completed I then copied the same data to a different zfs-partition. It stopped once and I saw the following in dmesg: Aug 1 09:22:02 malene root: ZFS: checksum mismatch, zpool=ef1 path=/dev/da4 offset=294400 size=512 The zpool was defined with three raidz-partitions with five disk each and one spare. I need to get some storage available very soon so I re-installed the server with solaris express b79. Zpool-information (from solaris): zpool status pool: ef1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ef1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t0d1 ONLINE 0 0 0 c3t0d2 ONLINE 0 0 0 c3t0d3 ONLINE 0 0 0 c3t0d4 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3t0d5 ONLINE 0 0 0 c3t0d6 ONLINE 0 0 0 c3t0d7 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t1d1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3t1d2 ONLINE 0 0 0 c3t1d3 ONLINE 0 0 0 c3t1d4 ONLINE 0 0 0 c3t1d5 ONLINE 0 0 0 c3t1d6 ONLINE 0 0 0 spares c3t1d7 AVAIL errors: No known data errors -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare From kometen at gmail.com Fri Aug 1 13:38:52 2008 From: kometen at gmail.com (Claus Guttesen) Date: Fri Aug 1 13:38:57 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: >>> The patch above contains the most recent ZFS version that could be found >>> in OpenSolaris as of today. Apart for large amount of new functionality, >>> I belive there are many stability (and also performance) improvements >>> compared to the version from the base system. >>> >>> Please test, test, test. If I get enough positive feedback, I may be >>> able to squeeze it into 7.1-RELEASE, but this might be hard. >>> >> >> I applied your patch to a current as of July the 31'st. I had to >> remove /usr/src and perform a clean csup and remove the two empty >> files as mentioned in this thread. >> >> I have a areca arc-1680 sas-card and an external sas-cabinet with 16 >> sas-drives each 1 TB (931 binary GB). They have been setup in three >> raidz-partitions with five disks each in one zpool and one spare. >> >> There does seem to be a speed-improvement. I nfs-mounted a partition >> from solaris 9 on sparc and is copying approx.400 GB using rsync. I >> saw write of 429 MB/s. The spikes occured every 10 secs. to begin >> with. After some minutes I get writes almost every sec. (watching >> zpool iostat 1). The limit is clearly the network-connection between >> the two hosts. I'll do some internal copying later. >> >> It's to early to say whether zfs is stable (enough) allthough I >> haven't been able to make it halt unless I removed a disk. This was >> with version 6. I'll remove a disk tomorrow and see how it goes. > > Replying to my own mail! :-) > > My conclusion about it's stability was a bit hasty. I was copying > approx. 400 GB from a nfs-share mounted from a solaris 9 on sparc > using tcp and read- and write-size of 32768. The files are images > slightly less than 1 MB and a thumbnail (approx. 983000 files). Replying once more to my own mail. There seems to be a hardware-related problem to my setup. I'm getting some 'arcmsr0: scsi id=1 lun=4 ccb='0xffffff02d5cc8e00' outstanding command timeout' (in solaris). I'll check with my vendor. I did not see such errors in FreeBSD. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare From grafan at gmail.com Sun Aug 3 15:11:07 2008 From: grafan at gmail.com (Rong-en Fan) Date: Sun Aug 3 15:11:19 2008 Subject: journaling filesystem Message-ID: <6eb82e0808030743hc41c68bgd0c5121daba95d42@mail.gmail.com> In NetBSD, they now have metadata journaling support, see http://www.netbsd.org/changes/#wapbl I'm not a fs guru, I just want to know what are the status of BluFFS and UFS journaling support which were mentioned in recent years. Thanks, Rong-En Fan From hartzell at alerce.com Sun Aug 3 20:59:24 2008 From: hartzell at alerce.com (George Hartzell) Date: Sun Aug 3 20:59:37 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <18582.7211.285531.190516@almost.alerce.com> Pawel Jakub Dawidek writes: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > [...] Thanks Pawel! I have a version of -CURRENT from 7-29-08 running on an AMD 3800+ with 2GB of RAM and two Seagate ST3300622AS disks. This machine was previously running zfs filesystems using -STABLE. It uses a root on zfs configuration based on http://yds.coolrat.org/zfsboot.shtml I tested for stablilty by firing up a bunch of xterms on my desktop workstation, ssh'ing into the system and running a bunch of stuff in parallel. The list of stuff settled down to be: two while(1) loops that did "dd if=/dev/random of=FILENAME bs=1M count=1000" (w/ distinct filenames) then removed the file; rsyncing a large directory of mp3's from another local machine; make -j 4 buildworld; and several windows with top, systat -vmstat 1, etc.... With the -STABLE zfs I ended up with the following settings to achieve stability: vm.kmem_size="512M" vm.kmem_size_max="512M" vfs.zfs.arc_max="192M" I started off with no tuning, added the kmem_size{,_max} settings, then added the arc_max setting and tuned it down until it didn't wedge. With the experimental code on -CURRENT I needed the following settings: vm.kmem_size="512M" vm.kmem_size_max="512M" vfs.zfs.arc_max="128M" I don't know if I just got lucky with the larger arc_max under -STABLE or if something's changed. Otherwise the new code seems to be behaving nicely. g. From bugmaster at FreeBSD.org Mon Aug 4 11:06:55 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 4 11:07:37 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200808041106.m74B6sEY082049@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li 8 problems total. From dfr at rabson.org Mon Aug 4 13:25:23 2008 From: dfr at rabson.org (Doug Rabson) Date: Mon Aug 4 13:25:30 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: References: <86myk06e18.fsf@ds4.des.no> Message-ID: <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> On 29 Jul 2008, at 15:27, Rick Macklem wrote: > > > On Tue, 29 Jul 2008, Dag-Erling Sm?rgrav wrote: > >> Rick Macklem writes: >>> Hope this isn't too simplistic for this list, but I need to know >>> which >>> GSSAPI library sources are being used. They don't appear to be >>> either >>> vanilla MIT nor Heimdal. >> >> Homegrown (by Doug Rabson, dfr@) with portions borrowed from Heimdal. >> > Ok, thanks. I was able to work around my problem by statically linking > my gssd against libraries built from vanilla Heimdal sources. It looks > like it inherited the heimdal-0.6 bug, which ignores the lack of the > GSS_C_SEQUENCE_FLAG and checks it even if it wasn't specified. This > breaks the client side of RPCSEC_GSS, since somewhat out-of-order > Sun RPCs, is normal. (RPCSEC_GSS uses a window of recent seq#s to > protect against replay attempts.) > > Should I email Doug or submit a bug report, to see if someone is > willing > to work on fixing this? Try using current - I updated heimdal to 1.1 in current. The GSS-API implementation in 7.x and current is a plugin system which heimdal's krb5 code plugs into as a GSS-API mechanism provider. With heimdal 1.1, it also supports spnego and ntlm as plugins. From olli at lurza.secnetix.de Mon Aug 4 15:44:47 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Mon Aug 4 15:44:53 2008 Subject: Should we change dirent for 64 bit directory cookies ? In-Reply-To: <809288.56058.qm@web32703.mail.mud.yahoo.com> Message-ID: <200808041518.m74FID4b080414@lurza.secnetix.de> Pedro Giffuni wrote: > I've been sort of following the DragonFly list wrt to the changes Matt made for his HAMMER fs. > I don't know if anyone is considering a port: he added a lot of stuff to the base system that will be a pain to port, but he also triggered some bugs in the old BSD code that would be nice to fix on FreeBSD too. > > One of the not-*too*-tough things to consider adopting would be 64 directory cookies: > > Main commit: > http://leaf.dragonflybsd.org/mailarchive/commits/2007-11/msg00151.html > Follow up for the linuxulator: > http://leaf.dragonflybsd.org/mailarchive/commits/2007-11/msg00153.html > > Here is a excerpt of a discussion from the DragonFly Kernel ML (Re: [Tux3] Comparison to Hammer fs design), that pretty much sums up the issues: > [...] > > I'd recommend dropping support for NFSv2. It is not really worth > supporting any more. Does it even support 64 bit inodes? (I don't > remember), or 64 bit file offsets? NFSv2 is garbage. One of the problems with that is that our PXE boot loader requires NFSv2 support, AFAIK. If you drop NFSv2 server support from the kernel, you can't boot PXE clients from it anymore, unless someone adds NFSv3 support to the boot loader. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "[...] one observation we can make here is that Python makes an excellent pseudocoding language, with the wonderful attribute that it can actually be executed." -- Bruce Eckel From pfgshield-freebsd at yahoo.com Mon Aug 4 16:34:33 2008 From: pfgshield-freebsd at yahoo.com (Pedro Giffuni) Date: Mon Aug 4 17:17:28 2008 Subject: Should we change dirent for 64 bit directory cookies ? In-Reply-To: <200808041518.m74FID4b080414@lurza.secnetix.de> Message-ID: <370602.32048.qm@web32704.mail.mud.yahoo.com> --- Lun 4/8/08, Oliver Fromme ha scritto: ... One of the problems with that is that our PXE boot loader requires NFSv2 support, AFAIK. If you drop NFSv2 server support from the kernel, you can't boot PXE clients from it anymore, unless someone adds NFSv3 support to the boot loader. Best regards Oliver ____ I see, thanks for the explanation... yes, it sounds like this will have to wait a while then. I am looking at the Intel PXE spec and there is no mention about NFS. I will investigate a bit more but I guess there is a new wish for the PXE guys ;-). cheers, Pedro. Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina From rmacklem at uoguelph.ca Mon Aug 4 20:53:46 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Aug 4 20:53:56 2008 Subject: doing vfs_hash_get when vnode locked Message-ID: There's a place in my nfsv4 client where I need to vfs_hash_get() when another blocked thread may be holding a lock on the vnode. (It's during a recovery case where the other threads are blocked, so there isn't a race problem, as far as I understand it.) For FreeBSD7, all I did was call vfs_hash_get() with flags == 0 and it gave me what I wanted (the vnode for the file handle with a reference count, but no lock). For FreeBSD-CURRENT/8, this no longer works, because vfs_hash_get() calls vget(), which calls _vn_lock() and _vn_lock() now complains if the lock type field of "flags" is 0. I came up with a really ugly workaround, by setting the flags arg. to LK_EXCLOTHER for vfs_hash_get() and then providing my own VOP_LOCK1() which just VI_UNLOCK()s and returns 0 for this case (doing the same as vop_stdlock() for other cases). Yuck!! Is it possible to re-enable the case of _vn_lock() getting a locktype field == 0 (or defining one that says "just return 0 unless VI_DOOMED"), so I don't need the dirty hack? Have a good week, rick From kostikbel at gmail.com Tue Aug 5 08:32:35 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Aug 5 08:32:42 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: References: Message-ID: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> On Mon, Aug 04, 2008 at 05:04:58PM -0400, Rick Macklem wrote: > There's a place in my nfsv4 client where I need to vfs_hash_get() when > another blocked thread may be holding a lock on the vnode. (It's during > a recovery case where the other threads are blocked, so there isn't a > race problem, as far as I understand it.) > > For FreeBSD7, all I did was call vfs_hash_get() with flags == 0 and it > gave me what I wanted (the vnode for the file handle with a reference > count, but no lock). > > For FreeBSD-CURRENT/8, this no longer works, because vfs_hash_get() calls > vget(), which calls _vn_lock() and _vn_lock() now complains if the lock > type field of "flags" is 0. I came up with a really ugly workaround, by > setting the flags arg. to LK_EXCLOTHER for vfs_hash_get() and then > providing my own VOP_LOCK1() which just VI_UNLOCK()s and returns 0 for > this case (doing the same as vop_stdlock() for other cases). Yuck!! > > Is it possible to re-enable the case of _vn_lock() getting a locktype > field == 0 (or defining one that says "just return 0 unless VI_DOOMED"), > so I don't need the dirty hack? I do not quite understand what you really need there. The non-locked vnode may be reclaimed at any moment, so the check for !VI_DOOMED returns no useful information. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080805/2e7236f7/attachment.pgp From olli at lurza.secnetix.de Tue Aug 5 14:30:50 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Tue Aug 5 14:31:02 2008 Subject: Should we change dirent for 64 bit directory cookies ? In-Reply-To: <370602.32048.qm@web32704.mail.mud.yahoo.com> Message-ID: <200808051430.m75EUQkK037117@lurza.secnetix.de> Pedro Giffuni wrote: > Oliver Fromme wrote: > > One of the problems with that is that our PXE boot loader > > requires NFSv2 support, AFAIK. If you drop NFSv2 server > > support from the kernel, you can't boot PXE clients from > > it anymore, unless someone adds NFSv3 support to the boot > > loader. > > I see, thanks for the explanation... yes, it sounds like this will > have to wait a while then. I am looking at the Intel PXE spec and > there is no mention about NFS. I will investigate a bit more but I > guess there is a new wish for the PXE guys ;-). The PXE spec is rather low-level, i.e. bascially it allows for handling raw packets only. It does not implement a real TCP/IP stack. So if you want to use any higher-level protocols, you have to implement them yourself on top of the PXE interface. Note that our bootlaoder uses libstand for the actual protocol implementations that are needed (BOOTP, NFS, RPC, TFTP). You'll find the sources in /usr/src/lib/libstand. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From rmacklem at uoguelph.ca Tue Aug 5 14:57:46 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Aug 5 14:58:03 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> Message-ID: On Tue, 5 Aug 2008, Kostik Belousov wrote: > On Mon, Aug 04, 2008 at 05:04:58PM -0400, Rick Macklem wrote: >> There's a place in my nfsv4 client where I need to vfs_hash_get() when >> another blocked thread may be holding a lock on the vnode. (It's during >> a recovery case where the other threads are blocked, so there isn't a >> race problem, as far as I understand it.) >> >> For FreeBSD7, all I did was call vfs_hash_get() with flags == 0 and it >> gave me what I wanted (the vnode for the file handle with a reference >> count, but no lock). >> >> For FreeBSD-CURRENT/8, this no longer works, because vfs_hash_get() calls >> vget(), which calls _vn_lock() and _vn_lock() now complains if the lock >> type field of "flags" is 0. I came up with a really ugly workaround, by >> setting the flags arg. to LK_EXCLOTHER for vfs_hash_get() and then >> providing my own VOP_LOCK1() which just VI_UNLOCK()s and returns 0 for >> this case (doing the same as vop_stdlock() for other cases). Yuck!! >> >> Is it possible to re-enable the case of _vn_lock() getting a locktype >> field == 0 (or defining one that says "just return 0 unless VI_DOOMED"), >> so I don't need the dirty hack? > > I do not quite understand what you really need there. The non-locked > vnode may be reclaimed at any moment, so the check for !VI_DOOMED > returns no useful information. > I need a referenced vnode (v_usecount incremented, which I thought would avoid it being recycled) when another blocked thread in the kernel has the vnode locked. (This thread is doing recovery while the other blocked thread is waiting for that recovery to complete.) For FreeBSD7, I call vfs_hash_get(mntp, hash, 0, td, &nvp, mycmpfh, nfhp) for this case. It: - calls vget() with (0 | LK_INTERLOCK) for flags - it calls vn_lock() with the flags as above - _vn_lock() simply checks for VI_DOOMED and then, otherwise, returns 0 because (flags & LK_TYPE_MASK) == 0 - vget then calls v_upgrade_usecount(), incrementing the use count (so it won't be recycled, as I understand it) --> and I get the vnode with an incremented usecount that I need. When this thread is done with it, it simply vrele(vp)'s it. For FreeBSD-CURRENT, this panics because there is now a check at the beginning of _vn_lock() requiring an non-zero LK_TYPE_MASK field. (I worked around it by using LK_EXCLOTHER instead of 0 as the argument to vfs_hash_get() and then implemented by own nfs_lock1() VOP, that handles the bogus case of the LK_TYPE_MASK being set to LK_EXCLOTHER by just returning 0 after VI_UNLOCK(vp) if LK_INTERLOCK is set. For all other cases, it calls _lockmgr_args() just like vop_stdlock().) Did this help or just muddy the waters? rick From brooks at freebsd.org Tue Aug 5 15:32:29 2008 From: brooks at freebsd.org (Brooks Davis) Date: Tue Aug 5 15:32:35 2008 Subject: Should we change dirent for 64 bit directory cookies ? In-Reply-To: <200808051430.m75EUQkK037117@lurza.secnetix.de> References: <370602.32048.qm@web32704.mail.mud.yahoo.com> <200808051430.m75EUQkK037117@lurza.secnetix.de> Message-ID: <20080805153305.GB55735@lor.one-eyed-alien.net> On Tue, Aug 05, 2008 at 04:30:26PM +0200, Oliver Fromme wrote: > Pedro Giffuni wrote: > > Oliver Fromme wrote: > > > One of the problems with that is that our PXE boot loader > > > requires NFSv2 support, AFAIK. If you drop NFSv2 server > > > support from the kernel, you can't boot PXE clients from > > > it anymore, unless someone adds NFSv3 support to the boot > > > loader. > > > > I see, thanks for the explanation... yes, it sounds like this will > > have to wait a while then. I am looking at the Intel PXE spec and > > there is no mention about NFS. I will investigate a bit more but I > > guess there is a new wish for the PXE guys ;-). > > The PXE spec is rather low-level, i.e. bascially it allows > for handling raw packets only. It does not implement a > real TCP/IP stack. So if you want to use any higher-level > protocols, you have to implement them yourself on top of > the PXE interface. > > Note that our bootlaoder uses libstand for the actual > protocol implementations that are needed (BOOTP, NFS, RPC, > TFTP). You'll find the sources in /usr/src/lib/libstand. A summer of code project last year implemented TCP and HTTP support. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080805/5fb4f3de/attachment.pgp From kostikbel at gmail.com Tue Aug 5 15:32:35 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Aug 5 15:32:42 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> Message-ID: <20080805153221.GG97161@deviant.kiev.zoral.com.ua> On Tue, Aug 05, 2008 at 11:09:00AM -0400, Rick Macklem wrote: > > > On Tue, 5 Aug 2008, Kostik Belousov wrote: > > >On Mon, Aug 04, 2008 at 05:04:58PM -0400, Rick Macklem wrote: > >>There's a place in my nfsv4 client where I need to vfs_hash_get() when > >>another blocked thread may be holding a lock on the vnode. (It's during > >>a recovery case where the other threads are blocked, so there isn't a > >>race problem, as far as I understand it.) > >> > >>For FreeBSD7, all I did was call vfs_hash_get() with flags == 0 and it > >>gave me what I wanted (the vnode for the file handle with a reference > >>count, but no lock). > >> > >>For FreeBSD-CURRENT/8, this no longer works, because vfs_hash_get() calls > >>vget(), which calls _vn_lock() and _vn_lock() now complains if the lock > >>type field of "flags" is 0. I came up with a really ugly workaround, by > >>setting the flags arg. to LK_EXCLOTHER for vfs_hash_get() and then > >>providing my own VOP_LOCK1() which just VI_UNLOCK()s and returns 0 for > >>this case (doing the same as vop_stdlock() for other cases). Yuck!! > >> > >>Is it possible to re-enable the case of _vn_lock() getting a locktype > >>field == 0 (or defining one that says "just return 0 unless VI_DOOMED"), > >>so I don't need the dirty hack? > > > >I do not quite understand what you really need there. The non-locked > >vnode may be reclaimed at any moment, so the check for !VI_DOOMED > >returns no useful information. > > > I need a referenced vnode (v_usecount incremented, which I thought would > avoid it being recycled) when another blocked thread in the kernel has No, this is a wrong assumption. Use count does not prevent the vnode from being reclaimed. Unless you held the vnode lock, it may be reclaimed. To set the VI_DOOMED flag, both exclusive vnode lock and vnode interlock must be held. > the vnode locked. (This thread is doing recovery while the other blocked > thread is waiting for that recovery to complete.) If you can guarantee that the other thread does not relinquish the vnode lock while curthread operates on the vnode, you may use vref() and direct check on VI_DOOMED. I shall admit that this is quite perversive and fragile. > > For FreeBSD7, I call vfs_hash_get(mntp, hash, 0, td, &nvp, mycmpfh, nfhp) > for this case. It: > - calls vget() with (0 | LK_INTERLOCK) for flags > - it calls vn_lock() with the flags as above > - _vn_lock() simply checks for VI_DOOMED and then, otherwise, > returns 0 because (flags & LK_TYPE_MASK) == 0 > - vget then calls v_upgrade_usecount(), incrementing the use count > (so it won't be recycled, as I understand it) > --> and I get the vnode with an incremented usecount that I need. When > this thread is done with it, it simply vrele(vp)'s it. > > For FreeBSD-CURRENT, this panics because there is now a check at the > beginning of _vn_lock() requiring an non-zero LK_TYPE_MASK field. > (I worked around it by using LK_EXCLOTHER instead of 0 as the argument to > vfs_hash_get() and then implemented by own nfs_lock1() VOP, that handles > the bogus case of the LK_TYPE_MASK being set to LK_EXCLOTHER by just > returning 0 after VI_UNLOCK(vp) if LK_INTERLOCK is set. For all other > cases, it calls _lockmgr_args() just like vop_stdlock().) > > Did this help or just muddy the waters? rick -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080805/fafbc09e/attachment.pgp From rmacklem at uoguelph.ca Tue Aug 5 16:43:11 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Aug 5 16:43:18 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: <20080805153221.GG97161@deviant.kiev.zoral.com.ua> References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> Message-ID: On Tue, 5 Aug 2008, Kostik Belousov wrote: [stuff snipped] >>> >> I need a referenced vnode (v_usecount incremented, which I thought would >> avoid it being recycled) when another blocked thread in the kernel has > No, this is a wrong assumption. Use count does not prevent the vnode > from being reclaimed. > What does v_usecount mean then, if it doesn't say "I have it in use, so you can't recycle it until I vrele() it"? I suppose I can test for the lock and grab it, if no other thread already has it locked. > Unless you held the vnode lock, it may be reclaimed. To set the > VI_DOOMED flag, both exclusive vnode lock and vnode interlock must be > held. > I don't care about VI_DOOMED nor want to set it. It is just what vget() checked for the case of LK_TYPE_MASK == 0 under FreeBSD7. > If you can guarantee that the other thread does not relinquish the vnode > lock while curthread operates on the vnode, you may use vref() and > direct check on VI_DOOMED. I shall admit that this is quite perversive > and fragile. > I'll have to think about it but, yes, I think I can guarantee that if another thread holds the vnode lock then it is blocked waiting for this thread to complete recovery. (The only other way to do this recovery is without the vnode and that means I have to do a lot of coding. I'm pretty sure holding a v_usecount works for OpenBSD and Mac OS X. I've done quite a bit of testing on both and not had a problem.) rick From nork at FreeBSD.org Tue Aug 5 15:46:05 2008 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Tue Aug 5 16:47:49 2008 Subject: ZFS patches. In-Reply-To: <20080731013229.9d342ee5.nork@FreeBSD.org> References: <20080727125413.GG1345@garage.freebsd.pl> <488F0C71.9010902@moneybookers.com> <20080729125551.GA70379@eos.sc1.parodius.com> <1217338852.10413.1.camel@dingo-laptop> <488F2078.708@psg.com> <1217347882.10413.5.camel@dingo-laptop> <20080729211137.GA52154@nobby.studby.ntnu.no> <20080731013229.9d342ee5.nork@FreeBSD.org> Message-ID: <20080806004557.6e538e5c.nork@FreeBSD.org> On Thu, 31 Jul 2008 01:32:29 +0900 Norikatsu Shigemura wrote: > > However, this feature is a bit undocumented yet, and it didn't work correctly > > for me. But you can always test it out. > I'm using zfsboot on my note PC, and not using UFS. I know many > problems about it:-). > 1. zpool configuration is too limited, only single and mirror > usable. If you want to zfsboot, you can't use RAIDZ, striping > and cache(zpool add ... cache ...):-(. I missed. zfsboot is disregarded zpool cache rather than supports it. > SEE ALSO: > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004895.html > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/125878 I found some zfsboot issues, please apply following patches: 1. zfsboot2 (boot2) doesn't %d (printf), so change %d to %u. 2. chase new zpool versioning as SPA_VERSION. Obtained from: sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- sys/boot/zfs/zfsimpl.c.orig 2008-07-28 01:54:49.194419000 +0900 +++ sys/boot/zfs/zfsimpl.c 2008-08-05 23:48:12.035247220 +0900 @@ -656,8 +656,8 @@ return (EIO); } - if (val != ZFS_VERSION) { - printf("ZFS: unsupported ZFS version %d\n", (int) val); + if (val > SPA_VERSION) { + printf("ZFS: unsupported ZFS version %u (should be %u)\n", (int) val, (int) SPA_VERSION); return (EIO); } --- sys/cddl/boot/zfs/zfsimpl.h.orig 2008-07-28 01:54:49.296418000 +0900 +++ sys/cddl/boot/zfs/zfsimpl.h 2008-08-06 00:07:41.871760182 +0900 @@ -448,19 +448,24 @@ /* * On-disk version number. */ -#define ZFS_VERSION_1 1ULL -#define ZFS_VERSION_2 2ULL -#define ZFS_VERSION_3 3ULL -#define ZFS_VERSION_4 4ULL -#define ZFS_VERSION_5 5ULL -#define ZFS_VERSION_6 6ULL +#define SPA_VERSION_1 1ULL +#define SPA_VERSION_2 2ULL +#define SPA_VERSION_3 3ULL +#define SPA_VERSION_4 4ULL +#define SPA_VERSION_5 5ULL +#define SPA_VERSION_6 6ULL +#define SPA_VERSION_7 7ULL +#define SPA_VERSION_8 8ULL +#define SPA_VERSION_9 9ULL +#define SPA_VERSION_10 10ULL +#define SPA_VERSION_11 11ULL /* * When bumping up ZFS_VERSION, make sure GRUB ZFS understand the on-disk * format change. Go to usr/src/grub/grub-0.95/stage2/{zfs-include/, fsys_zfs*}, * and do the appropriate changes. */ -#define ZFS_VERSION ZFS_VERSION_6 -#define ZFS_VERSION_STRING "6" +#define SPA_VERSION SPA_VERSION_11 +#define SPA_VERSION_STRING "11" /* * Symbolic names for the changes that caused a ZFS_VERSION switch. @@ -473,16 +478,26 @@ * last synced uberblock. Checking the in-flight version can * be dangerous in some cases. */ -#define ZFS_VERSION_INITIAL ZFS_VERSION_1 -#define ZFS_VERSION_DITTO_BLOCKS ZFS_VERSION_2 -#define ZFS_VERSION_SPARES ZFS_VERSION_3 -#define ZFS_VERSION_RAID6 ZFS_VERSION_3 -#define ZFS_VERSION_BPLIST_ACCOUNT ZFS_VERSION_3 -#define ZFS_VERSION_RAIDZ_DEFLATE ZFS_VERSION_3 -#define ZFS_VERSION_DNODE_BYTES ZFS_VERSION_3 -#define ZFS_VERSION_ZPOOL_HISTORY ZFS_VERSION_4 -#define ZFS_VERSION_GZIP_COMPRESSION ZFS_VERSION_5 -#define ZFS_VERSION_BOOTFS ZFS_VERSION_6 +#define SPA_VERSION_INITIAL SPA_VERSION_1 +#define SPA_VERSION_DITTO_BLOCKS SPA_VERSION_2 +#define SPA_VERSION_SPARES SPA_VERSION_3 +#define SPA_VERSION_RAID6 SPA_VERSION_3 +#define SPA_VERSION_BPLIST_ACCOUNT SPA_VERSION_3 +#define SPA_VERSION_RAIDZ_DEFLATE SPA_VERSION_3 +#define SPA_VERSION_DNODE_BYTES SPA_VERSION_3 +#define SPA_VERSION_ZPOOL_HISTORY SPA_VERSION_4 +#define SPA_VERSION_GZIP_COMPRESSION SPA_VERSION_5 +#define SPA_VERSION_BOOTFS SPA_VERSION_6 +#define SPA_VERSION_SLOGS SPA_VERSION_7 +#define SPA_VERSION_DELEGATED_PERMS SPA_VERSION_8 +#define SPA_VERSION_FUID SPA_VERSION_9 +#define SPA_VERSION_REFRESERVATION SPA_VERSION_9 +#define SPA_VERSION_REFQUOTA SPA_VERSION_9 +#define SPA_VERSION_UNIQUE_ACCURATE SPA_VERSION_9 +#define SPA_VERSION_L2CACHE SPA_VERSION_10 +#define SPA_VERSION_NEXT_CLONES SPA_VERSION_11 +#define SPA_VERSION_ORIGIN SPA_VERSION_11 +#define SPA_VERSION_DSL_SCRUB SPA_VERSION_11 /* * The following are configuration names used in the nvlist describing a pool's --- sys/cddl/boot/zfs/zfssubr.c.orig 2008-07-28 01:54:49.297420000 +0900 +++ sys/cddl/boot/zfs/zfssubr.c 2008-08-06 00:19:29.665026084 +0900 @@ -162,7 +162,7 @@ /* ASSERT((uint_t)cpfunc < ZIO_COMPRESS_FUNCTIONS); */ if (!ci->ci_decompress) { - printf("ZFS: unsupported compression algorithm %d\n", cpfunc); + printf("ZFS: unsupported compression algorithm %u\n", cpfunc); return (EIO); } - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - From kostikbel at gmail.com Tue Aug 5 16:51:20 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Aug 5 16:51:27 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> Message-ID: <20080805165114.GH97161@deviant.kiev.zoral.com.ua> On Tue, Aug 05, 2008 at 12:54:26PM -0400, Rick Macklem wrote: > > > On Tue, 5 Aug 2008, Kostik Belousov wrote: > > [stuff snipped] > >>> > >>I need a referenced vnode (v_usecount incremented, which I thought would > >>avoid it being recycled) when another blocked thread in the kernel has > >No, this is a wrong assumption. Use count does not prevent the vnode > >from being reclaimed. > > > What does v_usecount mean then, if it doesn't say "I have it in use, so > you can't recycle it until I vrele() it"? It means that the vnode memory will not be freed until vrele(). But the VOP_RECLAIM may be called any time, and it requires exclusive lock. After vnode is reclaimed, it is reassigned to the deadfs. In particular, VOP_RECLAIM implementation must clear v_data. For the reclaimed vnode you still hold a reference to, you can reliably obtain the vnode lock. > > I suppose I can test for the lock and grab it, if no other thread already > has it locked. > > >Unless you held the vnode lock, it may be reclaimed. To set the > >VI_DOOMED flag, both exclusive vnode lock and vnode interlock must be > >held. > > > I don't care about VI_DOOMED nor want to set it. It is just what vget() > checked for the case of LK_TYPE_MASK == 0 under FreeBSD7. > > >If you can guarantee that the other thread does not relinquish the vnode > >lock while curthread operates on the vnode, you may use vref() and > >direct check on VI_DOOMED. I shall admit that this is quite perversive > >and fragile. > > > I'll have to think about it but, yes, I think I can guarantee that if > another thread holds the vnode lock then it is blocked waiting for this > thread to complete recovery. (The only other way to do this recovery is > without the vnode and that means I have to do a lot of coding. I'm > pretty sure holding a v_usecount works for OpenBSD and Mac OS X. I've > done quite a bit of testing on both and not had a problem.) I do not know about these systems, esp. whether and how they implement a forced unmount. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080805/be6ae6e5/attachment.pgp From rmacklem at uoguelph.ca Tue Aug 5 17:40:26 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Aug 5 17:40:32 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: <20080805165114.GH97161@deviant.kiev.zoral.com.ua> References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> <20080805165114.GH97161@deviant.kiev.zoral.com.ua> Message-ID: On Tue, 5 Aug 2008, Kostik Belousov wrote: [stuff snipped] >> What does v_usecount mean then, if it doesn't say "I have it in use, so >> you can't recycle it until I vrele() it"? > It means that the vnode memory will not be freed until vrele(). > > But the VOP_RECLAIM may be called any time, and it requires exclusive lock. > After vnode is reclaimed, it is reassigned to the deadfs. In particular, > VOP_RECLAIM implementation must clear v_data. > > For the reclaimed vnode you still hold a reference to, you can reliably > obtain the vnode lock. > [stuff snipped] > I do not know about these systems, esp. whether and how they implement > a forced unmount. > Ok, I just spent a few minutes snooping around in vfs_subr.c and I think I see the problem. vget() has called vholdl() and then v_upgrade_usecount(), which has incremented the usecount and taken the vnode off the free list. This appears to prevent vgonel() from being called on it for most cases, but there is still the case in vflush() where the FORCECLOSE flag is set. But, it seems that it is my nfs_unmount() that calls this, so I can just delay the FORCECLOSE for this weird case. In fact, it looks like vgonel() would call VOP_CLOSE() because v_usecount is still non-zero (active) and that would block during the recovery in my code, anyhow. Thanks for clarifying it, rick From kostikbel at gmail.com Tue Aug 5 19:43:47 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Aug 5 19:43:54 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> <20080805165114.GH97161@deviant.kiev.zoral.com.ua> Message-ID: <20080805194341.GI97161@deviant.kiev.zoral.com.ua> On Tue, Aug 05, 2008 at 01:51:40PM -0400, Rick Macklem wrote: > > > On Tue, 5 Aug 2008, Kostik Belousov wrote: > > [stuff snipped] > >>What does v_usecount mean then, if it doesn't say "I have it in use, so > >>you can't recycle it until I vrele() it"? > >It means that the vnode memory will not be freed until vrele(). > > > >But the VOP_RECLAIM may be called any time, and it requires exclusive lock. > >After vnode is reclaimed, it is reassigned to the deadfs. In particular, > >VOP_RECLAIM implementation must clear v_data. > > > >For the reclaimed vnode you still hold a reference to, you can reliably > >obtain the vnode lock. > > > [stuff snipped] > >I do not know about these systems, esp. whether and how they implement > >a forced unmount. > > > Ok, I just spent a few minutes snooping around in vfs_subr.c and I think > I see the problem. vget() has called vholdl() and then > v_upgrade_usecount(), which has incremented the usecount and taken the > vnode off the free list. This appears to prevent vgonel() from being > called on it for most cases, but there is still the case in vflush() > where the FORCECLOSE flag is set. Yes, exactly. > > But, it seems that it is my nfs_unmount() that calls this, so I can just > delay the FORCECLOSE for this weird case. > > In fact, it looks like vgonel() would call VOP_CLOSE() because v_usecount > is still non-zero (active) and that would block during the recovery in my > code, anyhow. But, what guarantees that the vnode would not be reclaimed before/under your vref() it ? For instance, what if the vnode is locked due to reclaim being in progress ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080805/49857b15/attachment.pgp From rmacklem at uoguelph.ca Tue Aug 5 20:47:14 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Tue Aug 5 20:47:21 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: <20080805194341.GI97161@deviant.kiev.zoral.com.ua> References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> <20080805165114.GH97161@deviant.kiev.zoral.com.ua> <20080805194341.GI97161@deviant.kiev.zoral.com.ua> Message-ID: On Tue, 5 Aug 2008, Kostik Belousov wrote: [stuff snipped] >> Ok, I just spent a few minutes snooping around in vfs_subr.c and I think >> I see the problem. vget() has called vholdl() and then >> v_upgrade_usecount(), which has incremented the usecount and taken the >> vnode off the free list. This appears to prevent vgonel() from being >> called on it for most cases, but there is still the case in vflush() >> where the FORCECLOSE flag is set. > Yes, exactly. > [more stuff snipped] > > But, what guarantees that the vnode would not be reclaimed before/under > your vref() it ? For instance, what if the vnode is locked due to reclaim > being in progress ? > So long as I never do a vflush() with FORCECLOSE, I can't see anywhere that will vgonel() it once I have gotten it via vget(). (v_usecount incremented and not on the vnode freelist) The way I just coded it is: - the function that does the vfs_hash_get() without LK_EXCLUSIVE just fails if MNTK_UNMOUNTF is set. - my nfs_close just returns when MNTK_UNMOUNTF is set. - my nfs_unmount() doesn't set FORCECLOSE on the vflush() but instead sleeps and retries a bunch of times if vflush() fails for MNT_FORCE. - my nfs_unmount() and other code (mostly based on the vanilla FreeBSD client makes requests all fail with EINTR when MNTK_UNMOUNTF is set). I think this should work for a forced unmount, since once requests all fail and the recovery also fails, I think vflush() will work without the FORCECLOSE flag. As far as I can see, since I'm not vflush()'ng with FORCECLOSE, then nothing will vgonel() the vnode until it has been vrele()'d. (If there is a case other than vflush() with FORCECLOSE that will vgone() it when it is not on the freelist and has a v_usecount > 0, then I'll need to handle that as well, but I can't see one.) rick From jhb at freebsd.org Tue Aug 5 18:58:45 2008 From: jhb at freebsd.org (John Baldwin) Date: Tue Aug 5 20:49:28 2008 Subject: ZFS patches. In-Reply-To: <20080806004557.6e538e5c.nork@FreeBSD.org> References: <20080727125413.GG1345@garage.freebsd.pl> <20080731013229.9d342ee5.nork@FreeBSD.org> <20080806004557.6e538e5c.nork@FreeBSD.org> Message-ID: <200808051328.18308.jhb@freebsd.org> On Tuesday 05 August 2008 11:45:57 am Norikatsu Shigemura wrote: > On Thu, 31 Jul 2008 01:32:29 +0900 > Norikatsu Shigemura wrote: > > > However, this feature is a bit undocumented yet, and it didn't work correctly > > > for me. But you can always test it out. > > I'm using zfsboot on my note PC, and not using UFS. I know many > > problems about it:-). > > 1. zpool configuration is too limited, only single and mirror > > usable. If you want to zfsboot, you can't use RAIDZ, striping > > and cache(zpool add ... cache ...):-(. > > I missed. zfsboot is disregarded zpool cache rather than supports it. > > > SEE ALSO: > > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004895.html > > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/125878 > > I found some zfsboot issues, please apply following patches: > 1. zfsboot2 (boot2) doesn't %d (printf), so change %d to %u. > 2. chase new zpool versioning as SPA_VERSION. > Obtained from: sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > --- sys/boot/zfs/zfsimpl.c.orig 2008-07-28 01:54:49.194419000 +0900 > +++ sys/boot/zfs/zfsimpl.c 2008-08-05 23:48:12.035247220 +0900 > @@ -656,8 +656,8 @@ > return (EIO); > } > > - if (val != ZFS_VERSION) { > - printf("ZFS: unsupported ZFS version %d\n", (int) val); > + if (val > SPA_VERSION) { > + printf("ZFS: unsupported ZFS version %u (should be %u)\n", (int) val, (int) SPA_VERSION); > return (EIO); > } > > --- sys/cddl/boot/zfs/zfsimpl.h.orig 2008-07-28 01:54:49.296418000 +0900 > +++ sys/cddl/boot/zfs/zfsimpl.h 2008-08-06 00:07:41.871760182 +0900 > @@ -448,19 +448,24 @@ > /* > * On-disk version number. > */ > -#define ZFS_VERSION_1 1ULL > -#define ZFS_VERSION_2 2ULL > -#define ZFS_VERSION_3 3ULL > -#define ZFS_VERSION_4 4ULL > -#define ZFS_VERSION_5 5ULL > -#define ZFS_VERSION_6 6ULL > +#define SPA_VERSION_1 1ULL > +#define SPA_VERSION_2 2ULL > +#define SPA_VERSION_3 3ULL > +#define SPA_VERSION_4 4ULL > +#define SPA_VERSION_5 5ULL > +#define SPA_VERSION_6 6ULL FYI, style(9) prefers '#define' to '#define'. Keeping with the existing style would likely shorten the diffs. -- John Baldwin From morganw at chemikals.org Wed Aug 6 00:03:55 2008 From: morganw at chemikals.org (Wes Morgan) Date: Wed Aug 6 00:04:01 2008 Subject: ZFS Advice Message-ID: I'm looking for information and advice from those experienced in building storage arrays with good performance. Thus far, I've simply been using a motherboard with a lot of built-in SATA ports. I've concentrated on making most of the investment in high quality storage rather than controllers, cases etc. It's just a 4U chassis (I don't even have a rack for it, too much $$$) with 16 hot-swap bays, for use as a media server. However, I've reached the point where I have a 8-drive raidz2. Any additional storage would need to be another independent raidz2 set, and there are not a lot of inexpensive options for go to 16 ports. So this brings up a few questions: - Has anyone looked at what kind of workloads tend to perform best with prefetch enabled or disabled? - Would I have better performance from a dedicated controller, and would the improvement be worth the cost? As it stands now, heavy read/write activity definitely interferes with both streaming and rtorrent. - The 16-port controllers tend to have a lot of fancy "Intel RAID chips" etc, which is simply a waste of money when using zfs, right? - Is one 16-port controller better than 2 8-port? Assuming two 8-device arrays, which will perform better? - Which brand of controllers are best supported by FreeBSD? I've seen 3Ware, Areca and LSI mentioned, and the prices are all pretty much the same. Can anyone share some of their experiences with these vendors? Thanks, WM From linimon at FreeBSD.org Wed Aug 6 00:15:04 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Aug 6 00:15:11 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808060015.m760F389023892@freefall.freebsd.org> Synopsis: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Aug 6 00:14:52 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=126287 From hartzell at alerce.com Wed Aug 6 02:41:05 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed Aug 6 02:41:16 2008 Subject: ZFS Advice In-Reply-To: References: Message-ID: <18585.3903.895425.122613@almost.alerce.com> Wes Morgan writes: > I'm looking for information and advice from those experienced in building > storage arrays with good performance. > [...] I don't have any experimental data to contribute, but there are some interesting hardware discussions on the opensolaris ZFS web site and blog entries. It does seem to be true that ZFS does best when used with simple controllers, and a lot of the opensolaris community seems to like this relatively inexpensive 8-port card (<$100 at newegg), it's apparently what SUN ships in their "Thumper": http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm One of the opensolaris threads links off to this this blog entry which discusses experiences with PCI-X and PCI-E cards, it might be useful. http://jmlittle.blogspot.com/2008/06/recommended-disk-controllers-for-zfs.html The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host controller", which seems to be AKA 88SX6081, which is listed as supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS experience with it? g. From remko at elvandar.org Wed Aug 6 07:00:07 2008 From: remko at elvandar.org (Remko Lodder) Date: Wed Aug 6 07:00:13 2008 Subject: kern/126287: Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808060700.m76706oq066673@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: "Remko Lodder" To: "Carel Braam" Cc: freebsd-gnats-submit@freebsd.org Subject: Re: kern/126287: Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 08:54:14 +0200 (CEST) On Tue, August 5, 2008 11:36 pm, Carel Braam wrote: Hello Carel, This information is a bit narrow. If the kernel panics, you should be able to get a kernelcoredump. Please follow the procedure at http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html. Thanks, Remko -- /"\ Best regards, | remko@FreeBSD.org \ / Remko Lodder | remko@EFnet X http://www.evilcoder.org/ | / \ ASCII Ribbon Campaign | Against HTML Mail and News From matt at corp.spry.com Wed Aug 6 07:32:47 2008 From: matt at corp.spry.com (Matt Simerson) Date: Wed Aug 6 07:32:54 2008 Subject: ZFS Advice In-Reply-To: <18585.3903.895425.122613@almost.alerce.com> References: <18585.3903.895425.122613@almost.alerce.com> Message-ID: <46EF69A7-349C-4E15-928C-D305E67273CA@spry.com> On Aug 5, 2008, at 7:41 PM, George Hartzell wrote: > Wes Morgan writes: >> I'm looking for information and advice from those experienced in >> building >> storage arrays with good performance. >> [...] > > I don't have any experimental data to contribute, but there are some > interesting hardware discussions on the opensolaris ZFS web site and > blog entries. > > It does seem to be true that ZFS does best when used with simple > controllers, and a lot of the opensolaris community seems to like this > relatively inexpensive 8-port card (<$100 at newegg), it's apparently > what SUN ships in their "Thumper": > > http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm > > One of the opensolaris threads links off to this this blog entry which > discusses experiences with PCI-X and PCI-E cards, it might be useful. > > http://jmlittle.blogspot.com/2008/06/recommended-disk-controllers-for-zfs.html > > The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host > controller", which seems to be AKA 88SX6081, which is listed as > supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS > experience with it? I had 3 of them inside a SuperMicro 24 disk chassis with 16GB RAM, 8 core Xeon, and 24 1TB disks each. The other 24 disk chassis just like it I build with two Areca 1231ML SATA RAID controllers. I first tested UFS performance with a single disk on FreeBSD (Areca and Marvell) and OpenSolaris (Marvell). Write performance heavily favored the Areca card with the optional BBWC. The read performance was close enough to call even. Then I tested ZFS with RAIDZ in various configs (raidz, raidz2, 4,6, and 8 disk arrays) on FreeBSD. When using raidz and FreeBSD, the difference in performance of the controllers is much smaller. It's bad with the Areca controller and worse with the Marvell. My overall impression is that ZFS performance under FreeBSD is poor. I say this because I also tested one of the systems with OpenSolaris on the Marvell card (OpenSolaris doesn't support the Areca). Read performance with ZFS and RAIDZ on Solaris was not just 2-3 but 10-12x faster on Solaris. OpenSolaris write performance was about 50% faster than FreeBSD on the Areca controller and 100% faster than FreeBSD on the Marvell. The only way I could get decent performance out of FreeBSD and ZFS was to use the Areca as a RAID controller and then ZFS stripe the data across the two RAID arrays. I haven't tried it but I'm willing to bet that if I used UFS and geom_stripe to do the same thing, I'd get better performance with UFS. If you are looking for performance, then raidz and ZFS is not where you want to be looking. I use ZFS because these are backup servers and without the file system compression, I'd be using 16TB of storage instead of 11. As far as workload with prefetch: under my workloads (heavy network & file system I/O) prefetch=almost instant crash and burn. As soon as I put any heavy load on it, it hangs (as I've described previously on this list). Because I need the performance and prefer FreeBSD, the Areca cards with BBWC are well worth it. But if you need serious performance on a shoestring budget, consider running OpenSolaris with the Marvell cards. As to whether an 8 or 16 port will perform better, it depends on the bus and the cards. As long as you are using them on a PCIe multilane bus, you'll likely be hitting your disks I/O limits long before you read the bus limits. So it won't matter much. 3ware controllers = cheap and you get what you pay for. At my last job we had thousands of 3ware cards deployed because they were so inexpensive and RAID = RAID, right? Well, they were the controllers most likely to result in catastrophic data loss for our clients. Maybe it's because the interface is confusing the NOC technicians, maybe it's because their recovery tools suck, or because when the controller fails it hoses the disks in interesting ways. For various reasons, our luck at recovering failed RAID arrays on 3ware cards was poor. I've used a lot of LSI in the past. They work well but they aren't performance stars. The Areca is going to be a faster card than the others and it comes with a built-in Ethernet jack. Plug that sucker into your private network and use a web server to remotely manage the card. That's a nice feature. :) Several weeks after deploying both systems, we took down the AoC based one and retrofitted it with another pair of Areca controllers. Publishing the benchmarks is on my TODO list. Matt From koitsu at FreeBSD.org Wed Aug 6 09:22:27 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Aug 6 09:22:34 2008 Subject: ZFS Advice In-Reply-To: <46EF69A7-349C-4E15-928C-D305E67273CA@spry.com> References: <18585.3903.895425.122613@almost.alerce.com> <46EF69A7-349C-4E15-928C-D305E67273CA@spry.com> Message-ID: <20080806092226.GA49691@eos.sc1.parodius.com> On Wed, Aug 06, 2008 at 12:32:43AM -0700, Matt Simerson wrote: > Then I tested ZFS with RAIDZ in various configs (raidz, raidz2, 4,6, and > 8 disk arrays) on FreeBSD. When using raidz and FreeBSD, the difference > in performance of the controllers is much smaller. It's bad with the > Areca controller and worse with the Marvell. My overall impression is > that ZFS performance under FreeBSD is poor. Performance has been significantly improved with a patch from pjd@ provided about a week ago. I have not tested it personally, but there have been a couple reports so far that the performance improvement is significant. I recommend you read the *entire thread*, and yes, it is very long. Subject is "ZFS patches". http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/thread.html It continues into August, with some people using mail clients that don't properly utilise mail reference IDs, so their replies are scattered. Again, look for "ZFS patches". http://lists.freebsd.org/pipermail/freebsd-fs/2008-August/thread.html > I say this because I also tested one of the systems with OpenSolaris on > the Marvell card (OpenSolaris doesn't support the Areca). Read > performance with ZFS and RAIDZ on Solaris was not just 2-3 but 10-12x > faster on Solaris. OpenSolaris write performance was about 50% faster > than FreeBSD on the Areca controller and 100% faster than FreeBSD on the > Marvell. > > The only way I could get decent performance out of FreeBSD and ZFS was > to use the Areca as a RAID controller and then ZFS stripe the data > across the two RAID arrays. I haven't tried it but I'm willing to bet > that if I used UFS and geom_stripe to do the same thing, I'd get better > performance with UFS. If you are looking for performance, then raidz and > ZFS is not where you want to be looking. Do you have any actual numbers showing performance differential? If so, how did you obtain them under FreeBSD? Also, what tuning parameters did you use on FreeBSD (specifically kernel/loader.conf stuff)? It seems that using "zpool iostat" is only useful if one wishes to see the amount of I/O that ends up hitting the physical disks in the pool; if the data is cached in memory, "zpool iostat" won't show any I/O. I can get performance data from gstat(8), but that won't tell me how ZFS itself is actually performing, only at what rate the kernel read/write data from the physical disks. On my system (a 3-disk raidz spool, using WDC WD5000AAKS disks, SATA300, on an Intel ICH7 controller), I can get about 70MB/sec from each disk when reading, and somewhere around 55-60MB/sec when writing. But again, this is for actual disk I/O and isn't testing ZFS performance. > As far as workload with prefetch: under my workloads (heavy network & > file system I/O) prefetch=almost instant crash and burn. As soon as I > put any heavy load on it, it hangs (as I've described previously on this > list). I assume by "hangs" you mean the system becomes unresponsive while disk reads/writes are being performed, then recovers, then stalls again, recovers, rinse lather repeat? If so -- yes, that's the exact behaviour others and myself have reported. Disabling prefetch makes the system much more usable during heavy I/O. > 3ware controllers = cheap and you get what you pay for. At my last job > we had thousands of 3ware cards deployed because they were so > inexpensive and RAID = RAID, right? Well, they were the controllers > most likely to result in catastrophic data loss for our clients. Maybe > it's because the interface is confusing the NOC technicians, maybe it's > because their recovery tools suck, or because when the controller fails > it hoses the disks in interesting ways. For various reasons, our luck at > recovering failed RAID arrays on 3ware cards was poor. This story is pretty much the norm. Once in a while you'll find someone praising 3ware controllers, but I often wonder just what kind of workload and failure testing they've done prior to their praise. A friend of at Rackable told me horror stories involving 3ware controllers. I'm thankful 3ware cares about FreeBSD (most vendors do not), but with a history of firmware/BIOS bugs and the sensitive nature of their cards, I choose to stay away from them. I've heard nothing but praise when it comes to Areca controllers. All that said -- have you actually performed a hard failure with an Areca controller on FreeBSD (using both UFS and ZFS)? Assuming you have hot-swap enclosures/carriers, what happens if you yank a disk on the Areca controller? How does FreeBSD behave in this case? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From gergely.czuczy at harmless.hu Wed Aug 6 09:29:48 2008 From: gergely.czuczy at harmless.hu (CZUCZY Gergely) Date: Wed Aug 6 09:29:55 2008 Subject: ZFS hang issue and prefetch_disable - UPDATE In-Reply-To: <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> References: <20253C48-38CB-4A77-9C59-B993E7E5D78A@corp.spry.com> <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> Message-ID: <20080806112944.6793fc11@twoflower.in.publishing.hu> Hello, A few weeks ago, i was exactly referring to this. Somewhere around here: http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004796.html The thing, that it works on pointyhat, and it works on kris@'s box, is just an IWFM-level, not the proof of any stability, reliability. FreeBSD is a quite stable OS, the code has a relatively good quality as far as I've seen it, and it's quite stable. Somewhy the ZFS port seems to be an exception, it's refused to be merged properly and the issues to be solved. No matter how much someone tunes ZFS, no matter what you disable, it's not garanteed, not even on the tiniest level ever, to not to freeze your box, not to throw a panic, to keep your data and everything. Many of us has reported this, bot noone looked into it. I know, we're free to use something else, but that's not the point. The point is, I don't see a meaning of a port of this quality. I know it's quite complex and whatnot, but at this level, it cannot be run in a production environment. It's missing reliability. No matter how much you hack it, there's always a not-so-impossible chance, that it will shot you in your back, when you're not watching. I hope the latest ZFS patches will solve a lot of issues, and we won't see problems like this anymore. On Thu, 31 Jul 2008 13:58:26 -0700 Matt Simerson wrote: > > My announcement that vfs.zfs.prefetch_disable=1 resulted in a stable > system was premature. > > One of my backup servers (see specs below) hung. When I got onto the > console via KVM, it looked normal with no errors but didn't respond to > Control-Alt-Delete. After a power cycle, zpool status showed 8 disks > FAULTED and the action state was: http://www.sun.com/msg/ZFS-8000-5E > > Basically, that meant my ZFS file system and 7.5TB of data was gone. > Ouch. > > I'm using a pair of ARECA 1231ML RAID controllers. Previously, I had > them configured in JBOD with raidz2. This time around, I configured > both controllers with one 12 disk RAID 6 volume. Now FreeBSD just sees > two 10TB disks which I stripe with ZFS: zpool create back01 /dev/ > da0 /dev/da1 > > I also did a bit more fiddling with /boot/loader.conf. Previous I had: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.prefetch_disable=1 > > This resulted in ZFS using 1.1GB of RAM (as measured using the > technique described on the wiki) during normal use. The system in > question hung during the nightly processing (which backs up some other > systems via rsync) and my suspicions are that when I/O load picked up, > it exhausted the available kernel memory and hung the system. So now I > have these settings on one system: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.arc_min="16M" > vfs.zfs.arc_max="64M" > vfs.zfs.prefetch_disable=1 > > and the same except vfs.zfs.arc_max="256M" on the other. The one with > 64M uses 256MB of RAM for ZFS and the one set at 256M uses 600MB of > RAM. These are measured under heavy network and disk IO load being > generated by multiple rsync processes pulling backups from remote > nodes and storing it on ZFS. I am using ZFS compression. > > I get much better performance now with RAID 6 on the controller and > ZFS striping than using raidz2. > > Unless tuning the arc_ settings made the difference. Either way, the > system I just rebuilt is now quite a bit faster with RAID 6 than JBOD > + raidz2. > > Hopefully tuning vfs.zfs.arc_max will result in stability. If it > doesn't, my next choice is upgrading to -HEAD with the recent ZFS > patch or ditching ZFS entirely and using geom_stripe. I don't like > either option. > > Matt > > > > From: Matt Simerson > > Date: July 22, 2008 1:25:42 PM PDT > > To: freebsd-fs@freebsd.org > > Subject: ZFS hang issue and prefetch_disable > > > > Symptoms > > > > Deadlocks under heavy IO load on the ZFS file system with > > prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a > > stable system. > > > > Configuration > > > > Two machines. Identically built. Both exhibit identical behavior. > > 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. > > FreeBSD 7.0 amd64 > > dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt > > > > Boot disk is a read only 1GB compact flash > > # cat /etc/fstab > > /dev/ad0s1a / ufs ro,noatime 2 2 > > > > # df -h / > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > /dev/ad0s1a 939M 555M 309M 64% / > > > > RAM has been boosted as suggested in ZFS Tuning Guide > > # cat /boot/loader.conf > > vm.kmem_size= 1610612736 > > vm.kmem_size_max= 1610612736 > > vfs.zfs.prefetch_disable=1 > > > > I haven't mucked much with the other memory settings as I'm using > > amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. > > I've tried higher settings for kmem but that resulted in a failed > > boot. I have ample RAM And would love to use as much as possible for > > network and disk I/O buffers as that's principally all this system > > does. > > > > Disks & ZFS options > > > > Sun's "Best Practices" suggests limiting the number of disks in a > > raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: > > http://matt.simerson.net/computing/zfs/zpool.txt > > > > I'm using all of the ZFS default properties except: atime=off, > > compression=on. > > > > Environment > > > > I'm using these machines as backup servers. I wrote an application > > that generates a list of the thousands of VPS accounts we host. For > > each host, it generates a rsnapshot configuration file and backs up > > up their VPS to these systems via rsync. The application manages > > concurrency and will spawn additional rsync processes if system i/o > > load is below a defined threshhold. Which is to say, I can crank up > > or down the amount of disk IO the system sees. > > > > With vfs.zfs.prefetch_disable=0, I can trigger a hang within a few > > hours (no more than a day). If I keep the i/o load (measured via > > iostat) down to a low level (< 200 iops) then I still get hangs but > > less frequently (1-6 days). The only way I have found to prevent > > the hangs is by setting vfs.zfs.prefetch_disable=1. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- ?dv?lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/2ae13997/signature.pgp From mjguzik at gmail.com Wed Aug 6 09:30:05 2008 From: mjguzik at gmail.com (Mateusz Guzik) Date: Wed Aug 6 09:30:11 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808060930.m769U5Yf008912@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: "Mateusz Guzik" To: bug-followup@freebsd.org Cc: Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 11:24:16 +0200 ------=_Part_20104_30199813.1218014656704 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, function vfs_deleteopt() was called with NULL pointer (opts) used in TAILQ_FOREACH_SAFE macro -- I believe that simple `if (opts == NULL) return; ' in that function is ok to fix this. (Take a look at attachment.) At least the kernel does not panic. ;) Thanks, -- Mateusz Guzik ------=_Part_20104_30199813.1218014656704 Content-Type: application/octet-stream; name=vfs_mount.diff Content-Transfer-Encoding: base64 X-Attachment-Id: f_fjjqf5in0 Content-Disposition: attachment; filename=vfs_mount.diff LS0tIHN5cy9rZXJuL3Zmc19tb3VudC5jLm9yaWcJMjAwOC0wOC0wNiAxMToxNDoxNi4wMDAwMDAw MDAgKzAyMDAKKysrIHN5cy9rZXJuL3Zmc19tb3VudC5jCTIwMDgtMDgtMDYgMTE6MTQ6MzIuMDAw MDAwMDAwICswMjAwCkBAIC0xOTYsMTAgKzE5NiwxMyBAQAogdm9pZAogdmZzX2RlbGV0ZW9wdChz dHJ1Y3QgdmZzb3B0bGlzdCAqb3B0cywgY29uc3QgY2hhciAqbmFtZSkKIHsKIAlzdHJ1Y3QgdmZz b3B0ICpvcHQsICp0ZW1wOwogCisJaWYgKG9wdHMgPT0gTlVMTCkKKwkJcmV0dXJuOworCiAJVEFJ TFFfRk9SRUFDSF9TQUZFKG9wdCwgb3B0cywgbGluaywgdGVtcCkgIHsKIAkJaWYgKHN0cmNtcChv cHQtPm5hbWUsIG5hbWUpID09IDApCiAJCQl2ZnNfZnJlZW9wdChvcHRzLCBvcHQpOwogCX0KIH0K ------=_Part_20104_30199813.1218014656704-- From kometen at gmail.com Wed Aug 6 10:15:07 2008 From: kometen at gmail.com (Claus Guttesen) Date: Wed Aug 6 10:15:13 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: >>> I have a areca arc-1680 sas-card and an external sas-cabinet with 16 >>> sas-drives each 1 TB (931 binary GB). They have been setup in three >>> raidz-partitions with five disks each in one zpool and one spare. >>> >> My conclusion about it's stability was a bit hasty. I was copying >> approx. 400 GB from a nfs-share mounted from a solaris 9 on sparc >> using tcp and read- and write-size of 32768. The files are images >> slightly less than 1 MB and a thumbnail (approx. 983000 files). > > There seems to be a hardware-related problem to my setup. I'm getting > some 'arcmsr0: scsi id=1 lun=4 ccb='0xffffff02d5cc8e00' outstanding > command timeout' (in solaris). I'll check with my vendor. I did not > see such errors in FreeBSD. I changed the configration on the areca arc-1680-card and put all disks in throughput-mode and is now able to copy large amounts of data (more than 1 TB) without problems. I have 'zpool offline'd a disk and 'zpool replace'd it with a spare. The resilver is progressing as normal and the zpool is accessible. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare From mjguzik at gmail.com Wed Aug 6 10:20:05 2008 From: mjguzik at gmail.com (Mateusz Guzik) Date: Wed Aug 6 10:20:12 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808061020.m76AK5NI013323@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: "Mateusz Guzik" To: bug-followup@freebsd.org Cc: Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 12:15:00 +0200 Something weird happened to my attachment, I'll paste it here: --- sys/kern/vfs_mount.c.orig 2008-08-06 11:14:16.000000000 +0200 +++ sys/kern/vfs_mount.c 2008-08-06 11:14:32.000000000 +0200 @@ -196,10 +196,13 @@ void vfs_deleteopt(struct vfsoptlist *opts, const char *name) { struct vfsopt *opt, *temp; + if (opts == NULL) + return; + TAILQ_FOREACH_SAFE(opt, opts, link, temp) { if (strcmp(opt->name, name) == 0) vfs_freeopt(opts, opt); } } Again, it should work fine ;) Thanks, -- Mateusz Guzik From kostikbel at gmail.com Wed Aug 6 13:34:47 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Aug 6 13:34:53 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled In-Reply-To: <200808061020.m76AK5NI013323@freefall.freebsd.org> References: <200808061020.m76AK5NI013323@freefall.freebsd.org> Message-ID: <20080806133441.GM97161@deviant.kiev.zoral.com.ua> On Wed, Aug 06, 2008 at 10:20:05AM +0000, Mateusz Guzik wrote: > The following reply was made to PR kern/126287; it has been noted by GNATS. > > From: "Mateusz Guzik" > To: bug-followup@freebsd.org > Cc: > Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled > Date: Wed, 6 Aug 2008 12:15:00 +0200 > > Something weird happened to my attachment, I'll paste it here: > > --- sys/kern/vfs_mount.c.orig 2008-08-06 11:14:16.000000000 +0200 > +++ sys/kern/vfs_mount.c 2008-08-06 11:14:32.000000000 +0200 > @@ -196,10 +196,13 @@ > void > vfs_deleteopt(struct vfsoptlist *opts, const char *name) > { > struct vfsopt *opt, *temp; > > + if (opts == NULL) > + return; > + > TAILQ_FOREACH_SAFE(opt, opts, link, temp) { > if (strcmp(opt->name, name) == 0) > vfs_freeopt(opts, opt); > } > } > > Again, it should work fine ;) > > Thanks, > -- > Mateusz Guzik The PR lacks the backtrace (preferrable the ddb output or "bt full" from kgdb) for the panic. Please, show me the backtrace. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/54ea951a/attachment.pgp From kostikbel at gmail.com Wed Aug 6 13:39:55 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Aug 6 13:42:38 2008 Subject: doing vfs_hash_get when vnode locked In-Reply-To: References: <20080805083229.GB97161@deviant.kiev.zoral.com.ua> <20080805153221.GG97161@deviant.kiev.zoral.com.ua> <20080805165114.GH97161@deviant.kiev.zoral.com.ua> <20080805194341.GI97161@deviant.kiev.zoral.com.ua> Message-ID: <20080806133950.GN97161@deviant.kiev.zoral.com.ua> On Tue, Aug 05, 2008 at 04:58:30PM -0400, Rick Macklem wrote: > > > On Tue, 5 Aug 2008, Kostik Belousov wrote: > > [stuff snipped] > >>Ok, I just spent a few minutes snooping around in vfs_subr.c and I think > >>I see the problem. vget() has called vholdl() and then > >>v_upgrade_usecount(), which has incremented the usecount and taken the > >>vnode off the free list. This appears to prevent vgonel() from being > >>called on it for most cases, but there is still the case in vflush() > >>where the FORCECLOSE flag is set. > >Yes, exactly. > > > [more stuff snipped] > > > >But, what guarantees that the vnode would not be reclaimed before/under > >your vref() it ? For instance, what if the vnode is locked due to reclaim > >being in progress ? > > > So long as I never do a vflush() with FORCECLOSE, I can't see anywhere > that will vgonel() it once I have gotten it via vget(). (v_usecount > incremented and not on the vnode freelist) > > The way I just coded it is: > - the function that does the vfs_hash_get() without LK_EXCLUSIVE just > fails if MNTK_UNMOUNTF is set. > - my nfs_close just returns when MNTK_UNMOUNTF is set. > - my nfs_unmount() doesn't set FORCECLOSE on the vflush() but instead > sleeps and retries a bunch of times if vflush() fails for MNT_FORCE. > - my nfs_unmount() and other code (mostly based on the vanilla FreeBSD > client makes requests all fail with EINTR when MNTK_UNMOUNTF is set). You still has the race where the MNTK_UNMOUNTF is set after you check returned false, isn't it ? BTW, is your fs marked as mpsafe ? > > I think this should work for a forced unmount, since once requests all > fail and the recovery also fails, I think vflush() will work without > the FORCECLOSE flag. > > As far as I can see, since I'm not vflush()'ng with FORCECLOSE, then > nothing will vgonel() the vnode until it has been vrele()'d. (If there > is a case other than vflush() with FORCECLOSE that will vgone() it when > it is not on the freelist and has a v_usecount > 0, then I'll need to > handle that as well, but I can't see one.) Yes, ATM it should be safe, since only vflush() does reclamation for the vnodes with usecount > 0. On the other hand, I believe our VFS never makes a guarantee that this is the only location of the call. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/fdbbb661/attachment.pgp From kostikbel at gmail.com Wed Aug 6 13:40:04 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Aug 6 13:43:00 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808061340.m76De44w033942@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: Kostik Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org, bug-followup@freebsd.org Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 16:34:41 +0300 --11IAkegDWp8TRrA/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 06, 2008 at 10:20:05AM +0000, Mateusz Guzik wrote: > The following reply was made to PR kern/126287; it has been noted by GNAT= S. >=20 > From: "Mateusz Guzik" > To: bug-followup@freebsd.org > Cc: =20 > Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an U= FS filesystem with snapshot enabled > Date: Wed, 6 Aug 2008 12:15:00 +0200 >=20 > Something weird happened to my attachment, I'll paste it here: > =20 > --- sys/kern/vfs_mount.c.orig 2008-08-06 11:14:16.000000000 +0200 > +++ sys/kern/vfs_mount.c 2008-08-06 11:14:32.000000000 +0200 > @@ -196,10 +196,13 @@ > void > vfs_deleteopt(struct vfsoptlist *opts, const char *name) > { > struct vfsopt *opt, *temp; > =20 > + if (opts =3D=3D NULL) > + return; > + > TAILQ_FOREACH_SAFE(opt, opts, link, temp) { > if (strcmp(opt->name, name) =3D=3D 0) > vfs_freeopt(opts, opt); > } > } > =20 > Again, it should work fine ;) > =20 > Thanks, > -- > Mateusz Guzik The PR lacks the backtrace (preferrable the ddb output or "bt full" from kgdb) for the panic. Please, show me the backtrace. --11IAkegDWp8TRrA/ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkiZqHAACgkQC3+MBN1Mb4jCYQCg8Zuw0keIHdOXrkv9Q5yK8M6r tEkAn3GayEaX5S9xQqiqDRBTooAe8ggD =zxuG -----END PGP SIGNATURE----- --11IAkegDWp8TRrA/-- From mjguzik at gmail.com Wed Aug 6 14:00:03 2008 From: mjguzik at gmail.com (Mateusz Guzik) Date: Wed Aug 6 14:00:09 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808061400.m76E028t034822@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: "Mateusz Guzik" To: "Kostik Belousov" Cc: freebsd-fs@freebsd.org, bug-followup@freebsd.org Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 15:52:24 +0200 2008/8/6 Kostik Belousov : > On Wed, Aug 06, 2008 at 10:20:05AM +0000, Mateusz Guzik wrote: >> The following reply was made to PR kern/126287; it has been noted by GNATS. >> >> From: "Mateusz Guzik" >> To: bug-followup@freebsd.org >> Cc: >> Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled >> Date: Wed, 6 Aug 2008 12:15:00 +0200 >> >> Something weird happened to my attachment, I'll paste it here: >> >> --- sys/kern/vfs_mount.c.orig 2008-08-06 11:14:16.000000000 +0200 >> +++ sys/kern/vfs_mount.c 2008-08-06 11:14:32.000000000 +0200 >> @@ -196,10 +196,13 @@ >> void >> vfs_deleteopt(struct vfsoptlist *opts, const char *name) >> { >> struct vfsopt *opt, *temp; >> >> + if (opts == NULL) >> + return; >> + >> TAILQ_FOREACH_SAFE(opt, opts, link, temp) { >> if (strcmp(opt->name, name) == 0) >> vfs_freeopt(opts, opt); >> } >> } >> >> Again, it should work fine ;) > > The PR lacks the backtrace (preferrable the ddb output or "bt full" from > kgdb) for the panic. Please, show me the backtrace. > Sorry, I don't have currently access to fbsd 7, so here is backtrace from CURRENT(crashed by mount -o snapshot /somefilesystem): [..] #11 0xc06e1e5b in calltrap () at /srv/build/CURRENT/src/sys/i386/i386/exception.s:165 #12 0xc05c86d4 in vfs_deleteopt (opts=0x0, name=0xc074ef52 "snapshot") at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:195 #13 0xc068d689 in ffs_mount (mp=0xc29f52a0, td=0xc2875af0) at /srv/build/CURRENT/src/sys/ufs/ffs/ffs_vfsops.c:172 #14 0xc05cb1d8 in vfs_donmount (td=0xc2875af0, fsflags=0, fsoptions=0xc261db80) at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:1010 #15 0xc05cc3bb in nmount (td=0xc2875af0, uap=0xcd3a7cf8) at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:417 #16 0xc06f9157 in syscall (frame=0xcd3a7d38) at /srv/build/CURRENT/src/sys/i386/i386/trap.c:1081 #17 0xc06e1ef0 in Xint0x80_syscall () at /srv/build/CURRENT/src/sys/i386/i386/exception.s:261 Thanks, -- Mateusz Guzik From mjguzik at gmail.com Wed Aug 6 14:18:00 2008 From: mjguzik at gmail.com (Mateusz Guzik) Date: Wed Aug 6 14:18:29 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled In-Reply-To: <20080806133441.GM97161@deviant.kiev.zoral.com.ua> References: <200808061020.m76AK5NI013323@freefall.freebsd.org> <20080806133441.GM97161@deviant.kiev.zoral.com.ua> Message-ID: 2008/8/6 Kostik Belousov : > On Wed, Aug 06, 2008 at 10:20:05AM +0000, Mateusz Guzik wrote: >> The following reply was made to PR kern/126287; it has been noted by GNATS. >> >> From: "Mateusz Guzik" >> To: bug-followup@freebsd.org >> Cc: >> Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled >> Date: Wed, 6 Aug 2008 12:15:00 +0200 >> >> Something weird happened to my attachment, I'll paste it here: >> >> --- sys/kern/vfs_mount.c.orig 2008-08-06 11:14:16.000000000 +0200 >> +++ sys/kern/vfs_mount.c 2008-08-06 11:14:32.000000000 +0200 >> @@ -196,10 +196,13 @@ >> void >> vfs_deleteopt(struct vfsoptlist *opts, const char *name) >> { >> struct vfsopt *opt, *temp; >> >> + if (opts == NULL) >> + return; >> + >> TAILQ_FOREACH_SAFE(opt, opts, link, temp) { >> if (strcmp(opt->name, name) == 0) >> vfs_freeopt(opts, opt); >> } >> } >> >> Again, it should work fine ;) > > The PR lacks the backtrace (preferrable the ddb output or "bt full" from > kgdb) for the panic. Please, show me the backtrace. > Sorry, I don't have currently access to fbsd 7, so here is backtrace from CURRENT(crashed by mount -o snapshot /somefilesystem): [..] #11 0xc06e1e5b in calltrap () at /srv/build/CURRENT/src/sys/i386/i386/exception.s:165 #12 0xc05c86d4 in vfs_deleteopt (opts=0x0, name=0xc074ef52 "snapshot") at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:195 #13 0xc068d689 in ffs_mount (mp=0xc29f52a0, td=0xc2875af0) at /srv/build/CURRENT/src/sys/ufs/ffs/ffs_vfsops.c:172 #14 0xc05cb1d8 in vfs_donmount (td=0xc2875af0, fsflags=0, fsoptions=0xc261db80) at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:1010 #15 0xc05cc3bb in nmount (td=0xc2875af0, uap=0xcd3a7cf8) at /srv/build/CURRENT/src/sys/kern/vfs_mount.c:417 #16 0xc06f9157 in syscall (frame=0xcd3a7d38) at /srv/build/CURRENT/src/sys/i386/i386/trap.c:1081 #17 0xc06e1ef0 in Xint0x80_syscall () at /srv/build/CURRENT/src/sys/i386/i386/exception.s:261 Thanks, -- Mateusz Guzik From kostikbel at gmail.com Wed Aug 6 14:48:25 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Aug 6 14:48:31 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled In-Reply-To: References: <200808061020.m76AK5NI013323@freefall.freebsd.org> <20080806133441.GM97161@deviant.kiev.zoral.com.ua> Message-ID: <20080806144820.GO97161@deviant.kiev.zoral.com.ua> On Wed, Aug 06, 2008 at 03:52:24PM +0200, Mateusz Guzik wrote: > Sorry, I don't have currently access to fbsd 7, so here is backtrace > from CURRENT(crashed by mount -o snapshot /somefilesystem): I very much doubt that original submitter has mean this problem. But thanks for noting the issue. I prefer the following change, committed as r181345: diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 5ee123a..4d9754e 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -169,7 +169,8 @@ ffs_mount(struct mount *mp, struct thread *td) * persist "snapshot" in the options list. */ vfs_deleteopt(mp->mnt_optnew, "snapshot"); - vfs_deleteopt(mp->mnt_opt, "snapshot"); + if (mp->mnt_opt != NULL) + vfs_deleteopt(mp->mnt_opt, "snapshot"); } MNT_ILOCK(mp); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/74ce3e1c/attachment.pgp From kostikbel at gmail.com Wed Aug 6 14:50:06 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Wed Aug 6 14:50:38 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808061450.m76Eo59Y039894@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: Kostik Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org, bug-followup@freebsd.org Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Wed, 6 Aug 2008 17:48:20 +0300 --ltihE5wS63FR6l1A Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 06, 2008 at 03:52:24PM +0200, Mateusz Guzik wrote: > Sorry, I don't have currently access to fbsd 7, so here is backtrace > from CURRENT(crashed by mount -o snapshot /somefilesystem): I very much doubt that original submitter has mean this problem. But thanks for noting the issue. I prefer the following change, committed as r181345: diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 5ee123a..4d9754e 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -169,7 +169,8 @@ ffs_mount(struct mount *mp, struct thread *td) * persist "snapshot" in the options list. */ vfs_deleteopt(mp->mnt_optnew, "snapshot"); - vfs_deleteopt(mp->mnt_opt, "snapshot"); + if (mp->mnt_opt !=3D NULL) + vfs_deleteopt(mp->mnt_opt, "snapshot"); } =20 MNT_ILOCK(mp); --ltihE5wS63FR6l1A Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkiZubQACgkQC3+MBN1Mb4jkfwCgg+sP0nON+SXsND0/W1nlJU79 aD4AoOOETOGS1Jf5rv4NWLY0cukGLNeR =Cx00 -----END PGP SIGNATURE----- --ltihE5wS63FR6l1A-- From weldon at excelsus.com Wed Aug 6 15:29:01 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Wed Aug 6 15:29:07 2008 Subject: ZFS-NFS kernel panic under load Message-ID: <20080806101621.H24586@emmett.excelsus.com> Hello, Please forgive me, I didn't really see this discussed in the archives but I am wondering if anyone has seen this issue. I can replicate this issue under FreeBSD amd64 7.0-RELEASE and the latest -STABLE (RELENG_7). I do not replicate any problems running 9 instances of postmark on the machine directly, so the issue appears to be isolated with NFS. There are backtraces and more information in ticket kern/124280 I am experiencing random kernel panics while running postmark benchmark from 9 NFS clients (clients on RedHat) to a 3TB ZFS filesystem exported with NFS. The panics happen as soon as 5 mins from starting the benchmark or may take hours before it panics and reboots. It doesn't correspond to a time a cron job is going on. I am using the following settings in postmark: set number 20000 set transactions 10000000 set subdirectories 1000 set size 10000 15000 set report verbose set location /var/mail/store1/X (where X is a number 1-9 so each is operating in its own tree) The problem happens if I run 1 postmark on 9 NFS clients at the same time (each client is its own server) or if I run 9 postmarks on one NFS client. commands used to create filesystem: zpool create tank mirror da0 da12 mirror da1 da13 mirror da2 da14 mirror da3 da15\ mirror da4 da16 mirror da5 da17 mirror da6 da18 mirror da7 da19 mirror da8 da20 \ mirror da9 da21 mirror da10 da22 spare da11 da23 zfs set atime=off tank zfs create tank/mail zfs set mountpoint=/var/mail tank/mail zfs set sharenfs="-maproot=root -network 192.168.2.0 -mask 255.255.255.0" tank/mail I am using a 3ware 9690 SAS controller. I have 2 IBM EXP3000 enclosures, each drive is shown as single disk by the controller. this is my loader.conf: vm.kmem_size_max="1073741824" vm.kmem_size="1073741824" kern.maxvnodes="800000" vfs.zfs.prefetch_disable="1" vfs.zfs.cache_flush_disable="1" (I should note that kern.maxnodes in loader.conf does not appear to do anything, after boot, it is shown to be at 100000 with sysctl. It does change to 800000 if I manually set it with sysctl. However it appears my vnode usage sits at around 25-26K and is near that within 5s of the panic. The server has 16GB of RAM, and 2 quad core XEON processors. This server is only a NFS fileserver. The only non-default daemon running is sshd. It is running the GENERIC kernel, right now, unmodified. I am using two NICs. NFS is exported only on the secondary NIC. Each NIC is in it's own subnet. nothing in /var/log/messages near time of panic except: Aug 6 08:45:30 store1 savecore: reboot after panic: page fault Aug 6 08:45:30 store1 savecore: writing core to vmcore.2 I can provide cores if needed. Thank you for your time! Weldon kgdb with backtrace: store1# kgdb kernel.debug /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0xdc fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff8063b3d8 stack pointer = 0x10:0xffffffffdfbc5720 frame pointer = 0x10:0xffffff00543ed000 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 839 (nfsd) trap number = 12 panic: page fault cpuid = 5 Uptime: 18m53s Physical memory: 16366 MB Dumping 1991 MB: 1976 1960 1944 1928 1912 1896 1880 1864 1848 1832 1816 1800 1784 1768 1752 1736 1720 1704 1688 1672 1656 1640 1624 1608 1592 1576 1560 1544 1528 1512 1496 1480 1464 1448 1432 1416 1400 1384 1368 1352 1336 1320 1304 1288 1272 1256 1240 1224 1208 1192 1176 1160 1144 1128 1112 1096 1080 1064 1048 1032 1016 1000 984 968 952 936 920 904 888 872 856 840 824 808 792 776 760 744 728 712 696 680 664 648 632 616 600 584 568 552 536 520 504 488 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko #0 doadump () at pcpu.h:194 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) backtrace #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff804a7049 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff804a744d in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:572 #4 0xffffffff807780e4 in trap_fatal (frame=0xffffff000bce26c0, eva=18446742974395967712) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff807784b5 in trap_pfault (frame=0xffffffffdfbc5670, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff80778de8 in trap (frame=0xffffffffdfbc5670) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff8075e7ce in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffffff8063b3d8 in nfsrv_access (vp=0xffffff00207d7dc8, flags=128, cred=0xffffff00403d4800, rdonly=0, td=0xffffff000bce26c0, override=0) at /usr/src/sys/nfsserver/nfs_serv.c:4284 #9 0xffffffff8063c4f1 in nfsrv3_access (nfsd=0xffffff00543ed000, slp=0xffffff0006396d00, td=0xffffff000bce26c0, mrq=0xffffffffdfbc5af0) at /usr/src/sys/nfsserver/nfs_serv.c:234 #10 0xffffffff8064cd1d in nfssvc (td=Variable "td" is not available. ) at /usr/src/sys/nfsserver/nfs_syscalls.c:456 #11 0xffffffff80778737 in syscall (frame=0xffffffffdfbc5c70) at /usr/src/sys/amd64/amd64/trap.c:852 #12 0xffffffff8075e9db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #13 0x0000000800687acc in ?? () Previous frame inner to this frame (corrupt stack?) From matt at corp.spry.com Wed Aug 6 17:03:18 2008 From: matt at corp.spry.com (Matt Simerson) Date: Wed Aug 6 17:03:24 2008 Subject: ZFS hang issue and prefetch_disable - UPDATE In-Reply-To: <20080806112944.6793fc11@twoflower.in.publishing.hu> References: <20253C48-38CB-4A77-9C59-B993E7E5D78A@corp.spry.com> <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> <20080806112944.6793fc11@twoflower.in.publishing.hu> Message-ID: On Aug 6, 2008, at 2:29 AM, CZUCZY Gergely wrote: > A few weeks ago, i was exactly referring to this. Somewhere around > here: > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004796.html > > The thing, that it works on pointyhat, and it works on kris@'s box, > is just an > IWFM-level, not the proof of any stability, reliability. > > FreeBSD is a quite stable OS, the code has a relatively good quality > as far as > I've seen it, and it's quite stable. Somewhy the ZFS port seems to > be an > exception, it's refused to be merged properly and the issues to be > solved. > > No matter how much someone tunes ZFS, no matter what you disable, > it's not > garanteed, not even on the tiniest level ever, to not to freeze your > box, not > to throw a panic, to keep your data and everything. You want/expect guarantees of stability with experimental features? I think someone needs their expectations calibrated. > Many of us has reported this, bot noone looked into it. Because you haven't seen proof that someone looked into doesn't mean nobody has. You are not being fair nor respectful to the time that others are investing in ZFS. > use something else, but that's not the point. The point is, I don't > see a > meaning of a port of this quality. I know it's quite complex and > whatnot, but > at this level, it cannot be run in a production environment. It's > missing > reliability. If you don't see the value of ZFS, don't use it. I'm not complaining because ZFS isn't stable. I'd like it to be, but the best way I can help is provide detailed information about my setup and under what conditions the feature has problems. By doing so, I'm providing useful data. Denigrating the authors because ZFS doesn't meet your expectations doesn't help anybody, so please don't do that. Matt > No matter how much you hack it, there's always a not-so-impossible > chance, that > it will shot you in your back, when you're not watching. > > I hope the latest ZFS patches will solve a lot of issues, and we > won't see > problems like this anymore. > > On Thu, 31 Jul 2008 13:58:26 -0700 > Matt Simerson wrote: > >> >> My announcement that vfs.zfs.prefetch_disable=1 resulted in a stable >> system was premature. >> >> One of my backup servers (see specs below) hung. When I got onto the >> console via KVM, it looked normal with no errors but didn't respond >> to >> Control-Alt-Delete. After a power cycle, zpool status showed 8 >> disks >> FAULTED and the action state was: http://www.sun.com/msg/ZFS-8000-5E >> >> Basically, that meant my ZFS file system and 7.5TB of data was gone. >> Ouch. >> >> I'm using a pair of ARECA 1231ML RAID controllers. Previously, I had >> them configured in JBOD with raidz2. This time around, I configured >> both controllers with one 12 disk RAID 6 volume. Now FreeBSD just >> sees >> two 10TB disks which I stripe with ZFS: zpool create back01 /dev/ >> da0 /dev/da1 >> >> I also did a bit more fiddling with /boot/loader.conf. Previous I >> had: >> >> vm.kmem_size="1536M" >> vm.kmem_size_max="1536M" >> vfs.zfs.prefetch_disable=1 >> >> This resulted in ZFS using 1.1GB of RAM (as measured using the >> technique described on the wiki) during normal use. The system in >> question hung during the nightly processing (which backs up some >> other >> systems via rsync) and my suspicions are that when I/O load picked >> up, >> it exhausted the available kernel memory and hung the system. So >> now I >> have these settings on one system: >> >> vm.kmem_size="1536M" >> vm.kmem_size_max="1536M" >> vfs.zfs.arc_min="16M" >> vfs.zfs.arc_max="64M" >> vfs.zfs.prefetch_disable=1 >> >> and the same except vfs.zfs.arc_max="256M" on the other. The one with >> 64M uses 256MB of RAM for ZFS and the one set at 256M uses 600MB of >> RAM. These are measured under heavy network and disk IO load being >> generated by multiple rsync processes pulling backups from remote >> nodes and storing it on ZFS. I am using ZFS compression. >> >> I get much better performance now with RAID 6 on the controller and >> ZFS striping than using raidz2. >> >> Unless tuning the arc_ settings made the difference. Either way, the >> system I just rebuilt is now quite a bit faster with RAID 6 than JBOD >> + raidz2. >> >> Hopefully tuning vfs.zfs.arc_max will result in stability. If it >> doesn't, my next choice is upgrading to -HEAD with the recent ZFS >> patch or ditching ZFS entirely and using geom_stripe. I don't like >> either option. >> >> Matt >> >> >>> From: Matt Simerson >>> Date: July 22, 2008 1:25:42 PM PDT >>> To: freebsd-fs@freebsd.org >>> Subject: ZFS hang issue and prefetch_disable >>> >>> Symptoms >>> >>> Deadlocks under heavy IO load on the ZFS file system with >>> prefetch_disable=0. Setting vfs.zfs.prefetch_disable=1 results in a >>> stable system. >>> >>> Configuration >>> >>> Two machines. Identically built. Both exhibit identical behavior. >>> 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. >>> FreeBSD 7.0 amd64 >>> dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt >>> >>> Boot disk is a read only 1GB compact flash >>> # cat /etc/fstab >>> /dev/ad0s1a / ufs ro,noatime 2 2 >>> >>> # df -h / >>> Filesystem 1K-blocks Used Avail Capacity Mounted on >>> /dev/ad0s1a 939M 555M 309M 64% / >>> >>> RAM has been boosted as suggested in ZFS Tuning Guide >>> # cat /boot/loader.conf >>> vm.kmem_size= 1610612736 >>> vm.kmem_size_max= 1610612736 >>> vfs.zfs.prefetch_disable=1 >>> >>> I haven't mucked much with the other memory settings as I'm using >>> amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. >>> I've tried higher settings for kmem but that resulted in a failed >>> boot. I have ample RAM And would love to use as much as possible for >>> network and disk I/O buffers as that's principally all this system >>> does. >>> >>> Disks & ZFS options >>> >>> Sun's "Best Practices" suggests limiting the number of disks in a >>> raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: >>> http://matt.simerson.net/computing/zfs/zpool.txt >>> >>> I'm using all of the ZFS default properties except: atime=off, >>> compression=on. >>> >>> Environment >>> >>> I'm using these machines as backup servers. I wrote an application >>> that generates a list of the thousands of VPS accounts we host. For >>> each host, it generates a rsnapshot configuration file and backs up >>> up their VPS to these systems via rsync. The application manages >>> concurrency and will spawn additional rsync processes if system i/o >>> load is below a defined threshhold. Which is to say, I can crank up >>> or down the amount of disk IO the system sees. >>> >>> With vfs.zfs.prefetch_disable=0, I can trigger a hang within a few >>> hours (no more than a day). If I keep the i/o load (measured via >>> iostat) down to a low level (< 200 iops) then I still get hangs but >>> less frequently (1-6 days). The only way I have found to prevent >>> the hangs is by setting vfs.zfs.prefetch_disable=1. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > -- > ?dv?lettel, > > Czuczy Gergely > Harmless Digital Bt > mailto: gergely.czuczy@harmless.hu > Tel: +36-30-9702963 From peter.schuller at infidyne.com Wed Aug 6 17:21:33 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Wed Aug 6 17:21:40 2008 Subject: ZFS Advice In-Reply-To: References: Message-ID: <200808061922.55472.peter.schuller@infidyne.com> > - Which brand of controllers are best supported by FreeBSD? I've seen > 3Ware, Areca and LSI mentioned, and the prices are all pretty much the > same. Can anyone share some of their experiences with these vendors? I realize I am ignoring the most interesting bits of your post, but I recently found indications that the HighPoint driver in FreeBSD supports port multipliers. I found someone claiming that it is supported on a mailinglist. Checking HighPoint does list multiplier support as an "official" card feature for those cards that support it, and they are maintaining FreeBSD has a supported platform AFAIK. I sent a request to them to confirm whether this was indeed the case, but haven't received a response (yet?). Since you mention you're on a budget, I thought I'd mention this since, right now, the most likely future path for me will be either HighPoint if I can confirm multiplier support, or future FreeBSD releases given that multiplier support seems to be on the way in for various controllers. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/16969cdb/attachment.pgp From peter.schuller at infidyne.com Wed Aug 6 17:27:12 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Wed Aug 6 17:27:19 2008 Subject: ZFS Advice In-Reply-To: <18585.3903.895425.122613@almost.alerce.com> References: <18585.3903.895425.122613@almost.alerce.com> Message-ID: <200808061928.37001.peter.schuller@infidyne.com> > The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host > controller", which seems to be AKA 88SX6081, which is listed as > supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS > experience with it? Yes; it has been working quite fine for me with 7 up to a release-candidate. In 7.0-RELEASE you must disable the hptrr driver because it eats the device. See: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120615 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120842 When I say "working fine", this is with 8 SATA drives (6 of them in a raidz2, two of them for other stuff) and not having any issues like corruption, timeouts or whatever else people have had with shaky controllers. However, I cannot speak to performance because it's a PCI-X card that I've plugged into a PCI slow, so throughput is limited by the PCI bus (and the machine is otherwise not the fastest to begin with). Note that this is on 32 bit; haven't been able to try it on 64 bit because I the PCI-X card wouldn't work on the motherboard (again PCI, so it's hit-and-miss) where I would otherwise have tried it. I'd love to find a buyable PCI-E version that also worked in FreeBSD... I'll see if the link in your post contains any such hints. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/89b76d76/attachment.pgp From nork at FreeBSD.org Wed Aug 6 17:36:06 2008 From: nork at FreeBSD.org (Norikatsu Shigemura) Date: Wed Aug 6 18:06:50 2008 Subject: ZFS patches. In-Reply-To: <200808051328.18308.jhb@freebsd.org> References: <20080727125413.GG1345@garage.freebsd.pl> <20080731013229.9d342ee5.nork@FreeBSD.org> <20080806004557.6e538e5c.nork@FreeBSD.org> <200808051328.18308.jhb@freebsd.org> Message-ID: <20080807023558.39ca4ffa.nork@FreeBSD.org> On Tue, 5 Aug 2008 13:28:17 -0400 John Baldwin wrote: > On Tuesday 05 August 2008 11:45:57 am Norikatsu Shigemura wrote: > > On Thu, 31 Jul 2008 01:32:29 +0900 > > Norikatsu Shigemura wrote: > > > > However, this feature is a bit undocumented yet, and it didn't work > correctly > > > > for me. But you can always test it out. > > > I'm using zfsboot on my note PC, and not using UFS. I know many > > > problems about it:-). > > > 1. zpool configuration is too limited, only single and mirror > > > usable. If you want to zfsboot, you can't use RAIDZ, striping > > > and cache(zpool add ... cache ...):-(. > > I missed. zfsboot is disregarded zpool cache rather than supports it. Grrr my broken English! "rather than doesn't support it."... > > > SEE ALSO: > > > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004895.html > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/125878 > > I found some zfsboot issues, please apply following patches: > > 1. zfsboot2 (boot2) doesn't %d (printf), so change %d to %u. Grrr my broken English! "doesn't support %d(printf)"... > > +#define SPA_VERSION_6 6ULL > FYI, style(9) prefers '#define' to '#define'. Keeping with the > existing style would likely shorten the diffs. Yes, thanks for your pointed out. From olli at lurza.secnetix.de Wed Aug 6 18:07:36 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Wed Aug 6 18:07:43 2008 Subject: ZFS hang issue and prefetch_disable - UPDATE In-Reply-To: <20080806112944.6793fc11@twoflower.in.publishing.hu> Message-ID: <200808061807.m76I7BUj004737@lurza.secnetix.de> CZUCZY Gergely wrote: > The thing, that it works on pointyhat, and it works on kris@'s box, is just an > IWFM-level, not the proof of any stability, reliability. You cannot "prove" stability or reliability. If you think you can, please tell me how. > No matter how much someone tunes ZFS, no matter what you disable, it's not > garanteed, [...] You want a guarantee? There is none. Not for ZFS, not for UFS, not for any other file system on any operating system, be it commercial or open-source. I agree that ZFS still has problems, especially related to kernel memory. AFAIK this is being worked on, and Pawel's latest patches seem to improve things a lot. Note that the ZFS code is still considered experimental (you've seen the fat warning, I assume), so it's reasonable to expect that it doesn't provide production-quality yet. It is rather totally unreasonable to expect a port of ZFS to appear in FreeBSD and be bug-free and without problems from day one. Also note that, in earlier days, certain UFS-features such as soft-updates and dirhash were also known to need a lot of memory (well, "a lot" by standards as of those days), and it's still wise to disable them on small embedded boxes with limited RAM. The memory requirements of ZFS aren't that much different, although on a larger scale. Alan Cox has worked on the problem and lifted the existing kmem limit for amd64 in FreeBSD -current. (I'm not sure if he will MFC that to 7-stable.) If you run with that code *and* Pawels latest ZFS patches, you should be a lot less likely to see the dreaded kmem panics. And that's without any tuning. Of course it might still make sense to tune ZFS for your workload in order to get better performance (e.g. disable prefetch etc.). Just the same as UFS. > Many of us has reported this, bot noone looked into it. That's completely untrue. Some people put a lot of time and efforts in FreeBSD's ZFS port. Please don't insult them. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is over-complicated nonsense. And Bjorn Shoestrap's book a danger to public health. I tried reading it once, I was in recovery for months." -- Cliff Sarginson From pjd at FreeBSD.org Wed Aug 6 18:59:16 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Aug 6 18:59:22 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <200807272026.54907.peter.schuller@infidyne.com> References: <200807262005.54235.peter.schuller@infidyne.com> <20080726205118.GB1345@garage.freebsd.pl> <200807272026.54907.peter.schuller@infidyne.com> Message-ID: <20080806185909.GC2580@garage.freebsd.pl> On Sun, Jul 27, 2008 at 08:26:46PM +0200, Peter Schuller wrote: > Hello, > > > The problem is that we don't between async and sync I/O request on GEOM > > level, that's why I decided to commit a ZIL log after each write, which > > wasn't very smart it seems. This is handled differently in version I've > > in perforce. Could you try the below patch and see how it performs now? > > > > http://people.freebsd.org/~pjd/patches/zvol.c.patch > > The above (though the files has moved, for anyone else reading wanting to > apply) does eliminate the synchronicity problem. I am now seeing 5-15 > MB/second write speeds to the zvol, with 100% constituent disk utilization. > > I am not sure why I don't see faster writes; I get more like 40-60 when > writing to a file in a ZFS file system on the same pool. But regardless, the > synchronisity issue is gone. Not sure why's that, I spent no time on optimizing ZVOL yet, sorry. > Does your comment above regarding distinguishing bewteen sync and asynch apply > to the section of code affected by the above patch, or did you mean there is > some other place above the zvol handling where there is lack of distinction? > > That is, is the end-effect of the above change that we *never* do synchronous > writes (because the fact that a write is supposed to be synchronous is > somehow lost before it reaches that point)? > > I understand a zil_commit is only required on BIO_FLUSH requests, which is > what the patch fixes. But I get the impression from your phrasing above that > the reason that a zil_commit was done on every I/O from the get go was in an > effort to honor actual synchronous writes by conservatively *always* doing > synchronous writes, because the synchronicity of synchronous writes would not > be propagated down to the zvol class. I wouldn't want to sacrifice > correctness just to get the speed ;) With the patch above we synchoronize in-memory transactions every 5 seconds or when queue is full or when we receive BIO_FLUSH. Of course the previous behaviour was more conservative, but sending writes down doesn't mean they will reach disk platters. There is still disk cache in the way. If we really want to be sure that data is safe on the disk, we should send BIO_FLUSH. In other words if you use UFS on raw disk, sync writes can still be delayed by disk's cache. When you use UFS on top of ZVOL, writes can be delayed by ZFS cache. I think the way to go is to pass sync/async property of I/O request down to the GEOM stack. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080806/d8205c13/attachment.pgp From morganw at chemikals.org Wed Aug 6 19:15:07 2008 From: morganw at chemikals.org (Wes Morgan) Date: Wed Aug 6 19:15:14 2008 Subject: ZFS Advice In-Reply-To: <200808061928.37001.peter.schuller@infidyne.com> References: <18585.3903.895425.122613@almost.alerce.com> <200808061928.37001.peter.schuller@infidyne.com> Message-ID: On Wed, 6 Aug 2008, Peter Schuller wrote: >> The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host >> controller", which seems to be AKA 88SX6081, which is listed as >> supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS >> experience with it? > > Yes; it has been working quite fine for me with 7 up to a release-candidate. > In 7.0-RELEASE you must disable the hptrr driver because it eats the device. > See: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120615 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120842 > > When I say "working fine", this is with 8 SATA drives (6 of them in a raidz2, > two of them for other stuff) and not having any issues like corruption, > timeouts or whatever else people have had with shaky controllers. > > However, I cannot speak to performance because it's a PCI-X card that I've > plugged into a PCI slow, so throughput is limited by the PCI bus (and the > machine is otherwise not the fastest to begin with). > > Note that this is on 32 bit; haven't been able to try it on 64 bit because I > the PCI-X card wouldn't work on the motherboard (again PCI, so it's > hit-and-miss) where I would otherwise have tried it. > > I'd love to find a buyable PCI-E version that also worked in FreeBSD... I'll > see if the link in your post contains any such hints. Hmmm... That PCI-X card is interesting. Supermicro also lists this: http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for $155. That would be cheaper than buying a new board with a PCI-X slot or two, and would even handle SAS drives. Claims to be based on the "LSISAS 1068E SAS controller". Any idea if that is supported? I don't see it listed in the mfi man page. LSI has a Linux driver for download. That card looks like it would be just what I need. From rmacklem at uoguelph.ca Wed Aug 6 20:36:54 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Aug 6 20:37:05 2008 Subject: doing vfs_hash_get when vnode locked Message-ID: On Tue, Aug 05, 2008 at 04:58:30PM -0400, Rick Macklem wrote: [stuff snipped] >> The way I just coded it is: >> - the function that does the vfs_hash_get() without LK_EXCLUSIVE just >> fails if MNTK_UNMOUNTF is set. >> - my nfs_close just returns when MNTK_UNMOUNTF is set. >> - my nfs_unmount() doesn't set FORCECLOSE on the vflush() but instead >> sleeps and retries a bunch of times if vflush() fails for MNT_FORCE. >> - my nfs_unmount() and other code (mostly based on the vanilla FreeBSD >> client makes requests all fail with EINTR when MNTK_UNMOUNTF is set). > You still has the race where the MNTK_UNMOUNTF is set after you check > returned false, isn't it ? I don't think it will be an issue, but I haven't tested the forced unmount without FORCECLOSE yet. > BTW, is your fs marked as mpsafe ? Yep. Except for the low level code doing the RPCs, the client side is basically a clone of the regular nfsclient. > Yes, ATM it should be safe, since only vflush() does reclamation for the > vnodes with usecount > 0. On the other hand, I believe our VFS never > makes a guarantee that this is the only location of the call. Well, at this point, vflush() only seems to be called inside the file systems for a mount point of that file system type. If someone adds a vflush() with FORCECLOSE outside of my code that acts on a mountpoint for my nfs client, it could break horribly. That's life in this game;-) I'm thinking about how to avoid this, but it ain't gonna be trivial and it could be ugly. (The short version is that, for nfsv4, the nfs part of the vnode must keep the directory fh and component name of the file in it, so that Opens can be done. That's ugly enough, given renames and multiple hard links to a file. If I can't get at that nfs vnode, I'll have to keep a copy of the directory fh and component name in the Open data structure, which wouldn't be too bad until a rename occurs.) I might try and do this in the next month or so... rick From mcdouga9 at egr.msu.edu Wed Aug 6 23:05:05 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Wed Aug 6 23:05:12 2008 Subject: ZFS Advice In-Reply-To: References: <18585.3903.895425.122613@almost.alerce.com> <200808061928.37001.peter.schuller@infidyne.com> Message-ID: <489A2E1F.6000405@egr.msu.edu> Wes Morgan wrote: > On Wed, 6 Aug 2008, Peter Schuller wrote: > >>> The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host >>> controller", which seems to be AKA 88SX6081, which is listed as >>> supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS >>> experience with it? >> >> Yes; it has been working quite fine for me with 7 up to a >> release-candidate. >> In 7.0-RELEASE you must disable the hptrr driver because it eats the >> device. >> See: >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120615 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120842 >> >> When I say "working fine", this is with 8 SATA drives (6 of them in a >> raidz2, >> two of them for other stuff) and not having any issues like corruption, >> timeouts or whatever else people have had with shaky controllers. >> >> However, I cannot speak to performance because it's a PCI-X card that >> I've >> plugged into a PCI slow, so throughput is limited by the PCI bus (and >> the >> machine is otherwise not the fastest to begin with). >> >> Note that this is on 32 bit; haven't been able to try it on 64 bit >> because I >> the PCI-X card wouldn't work on the motherboard (again PCI, so it's >> hit-and-miss) where I would otherwise have tried it. >> >> I'd love to find a buyable PCI-E version that also worked in >> FreeBSD... I'll >> see if the link in your post contains any such hints. > > Hmmm... That PCI-X card is interesting. Supermicro also lists this: > > http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm > > http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html > > > Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for > $155. That would be cheaper than buying a new board with a PCI-X slot > or two, and would even handle SAS drives. Claims to be based on the > "LSISAS 1068E SAS controller". Any idea if that is supported? I don't > see it listed in the mfi man page. LSI has a Linux driver for > download. That card looks like it would be just what I need. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > mpt2@pci0:8:0:0: class=0x010000 card=0x31501000 chip=0x00581000 rev=0x04 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' device = 'SAS 3000 series, 8-port with 1068E -StorPort' class = mass storage subclass = SCSI mpt2: port 0xa800-0xa8ff mem 0xfcbfc000-0xfcbfffff,0xfcbe0000-0xfcbeffff irq 17 at device 0.0 on pci8 da1 at mpt2 bus 0 target 0 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing Enabled da1: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da2 at mpt2 bus 0 target 1 lun 0 da2: Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing Enabled da2: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da3 at mpt2 bus 0 target 2 lun 0 da3: Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing Enabled da3: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da4 at mpt2 bus 0 target 3 lun 0 da4: Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing Enabled da4: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da5 at mpt2 bus 0 target 4 lun 0 da5: Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing Enabled da5: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da6 at mpt2 bus 0 target 5 lun 0 da6: Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing Enabled da6: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da7 at mpt2 bus 0 target 6 lun 0 da7: Fixed Direct Access SCSI-5 device da7: 300.000MB/s transfers da7: Command Queueing Enabled da7: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da8 at mpt2 bus 0 target 7 lun 0 da8: Fixed Direct Access SCSI-5 device da8: 300.000MB/s transfers da8: Command Queueing Enabled da8: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1s1d ONLINE 0 0 0 da2s1d ONLINE 0 0 0 da3s1d ONLINE 0 0 0 da4s1d ONLINE 0 0 0 da5s1d ONLINE 0 0 0 da6s1d ONLINE 0 0 0 da7s1d ONLINE 0 0 0 da8s1d ONLINE 0 0 0 From vyeperman at gmail.com Thu Aug 7 04:35:14 2008 From: vyeperman at gmail.com (Vye Wilson) Date: Thu Aug 7 04:35:26 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive Message-ID: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> Hello, I setup a raidz1 zpool to test ZFS with a device failure and to see how quickly the zpool could be resilvered. The system I'm using has a backplane that all the drives are connected to, so everything is hotswappable. I created the raidz1 zpool and then removed one of the drives. zpool status showed that the pool was degraded but online. Ok great, so lets bring the now functioning drive back online. [root@Touzyoh /home/vye]# zpool online ztemp ad18 Bringing device ad18 online Everything looks good... lets check the zpool status pool: ztemp state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Wed Aug 6 20:59:54 2008 config: NAME STATE READ WRITE CKSUM ztemp DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad10 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad18 UNAVAIL 0 0 0 cannot open errors: No known data errors Doh! still degraded. It shows 'UNAVAIL cannot open' I've tried rebooting but it will not open that drive at all. According to dmesg the drive is functional, and if I destroy the pool and recreate it the drive works fine. I wasn't able to find any similar issues on this mailing list or in google. Does anyone have any ideas? I've attached my dmesg output. Thanks. -- --Vye -------------- next part -------------- Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-STABLE #1: Wed Jul 30 10:29:43 PDT 2008 vye@Touzyoh.wow.com:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 248 (2200.01-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Features=0x78bfbff AMD Features=0xe0500800 usable memory = 3744727040 (3571 MB) avail memory = 3621384192 (3453 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 100000, dff00000 (3) failed acpi0: reservation of dfefd000, 400 (3) failed acpi0: reservation of dfefe000, 400 (3) failed acpi0: reservation of dfeff000, 1000 (3) failed acpi0: reservation of 0, a0000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.0 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfdbfc000-0xfdbfcfff irq 20 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 10 ports with 10 removable, self powered ehci0: mem 0xfdbe0000-0xfdbe00ff irq 21 at device 2.1 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb1: EHCI version 1.0 usb1: companion controller, 4 ports each: usb0 usb1: on ehci0 usb1: USB revision 2.0 uhub1: on usb1 uhub1: 10 ports with 10 removable, self powered pci0: at device 4.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x3000-0x300f at device 6.0 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] atapci1: port 0xcc00-0xcc07,0xc800-0xc803,0xc400-0xc407,0xc000-0xc003,0xbc00-0xbc0f mem 0xfdbff000-0xfdbfffff irq 20 at device 7.0 on pci0 atapci1: [ITHREAD] ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] atapci2: port 0xb800-0xb807,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f mem 0xfdbfe000-0xfdbfefff irq 21 at device 8.0 on pci0 atapci2: [ITHREAD] ata4: on atapci2 ata4: [ITHREAD] ata5: on atapci2 ata5: [ITHREAD] pcib1: at device 9.0 on pci0 pci5: on pcib1 vgapci0: port 0x9800-0x98ff mem 0xfc000000-0xfcffffff,0xfdaff000-0xfdafffff irq 16 at device 4.0 on pci5 pcib2: at device 11.0 on pci0 pci4: on pcib2 pcib3: at device 12.0 on pci0 pci2: on pcib3 bge0: mem 0xfb8f0000-0xfb8fffff irq 17 at device 0.0 on pci2 miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:30:48:57:5a:de bge0: [ITHREAD] pcib4: at device 13.0 on pci0 pci3: on pcib4 bge1: mem 0xfb9f0000-0xfb9fffff irq 18 at device 0.0 on pci3 miibus1: on bge1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:30:48:57:5a:df bge1: [ITHREAD] pcib5: at device 14.0 on pci0 pci1: on pcib5 pcib6: on acpi0 pci128: on pcib6 pci128: at device 0.0 (no driver attached) pci128: at device 1.0 (no driver attached) atapci3: port 0xfc00-0xfc07,0xf800-0xf803,0xf400-0xf407,0xf000-0xf003,0xec00-0xec0f mem 0xfdeff000-0xfdefffff irq 44 at device 7.0 on pci128 atapci3: [ITHREAD] ata6: on atapci3 ata6: [ITHREAD] ata7: on atapci3 ata7: [ITHREAD] atapci4: port 0xe800-0xe807,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd80f mem 0xfdefe000-0xfdefefff irq 45 at device 8.0 on pci128 atapci4: [ITHREAD] ata8: on atapci4 ata8: [ITHREAD] ata9: on atapci4 ata9: [ITHREAD] pcib7: at device 13.0 on pci128 pci132: on pcib7 pcib8: at device 14.0 on pci128 pci129: on pcib8 pcib9: at device 0.0 on pci129 pci131: on pcib9 arcmsr0: mem 0xfddff000-0xfddfffff,0xfe000000-0xfe3fffff irq 40 at device 14.0 on pci131 ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.41 2006-5-24 arcmsr0: [ITHREAD] pcib10: at device 0.2 on pci129 pci130: on pcib10 cpu0: on acpi0 powernow0: on cpu0 device_attach: powernow0 attach returned 6 cpu1: on acpi0 powernow1: on cpu1 device_attach: powernow1 attach returned 6 acpi_button0: on acpi0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] ppc0: port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 ppbus0: [ITHREAD] plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff,0xc9800-0xcafff,0xcb000-0xcbfff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ad6: 152627MB at ata3-master SATA150 ad10: 953869MB at ata5-master SATA300 ad14: 286168MB at ata7-master SATA150 ad18: 286168MB at ata9-master SATA150 Waiting 5 seconds for SCSI devices to settle (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step da0 at arcmsr0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da0: 286102MB (585937408 512 byte sectors: 255H 63S/T 36472C) da1 at arcmsr0 bus 0 target 0 lun 1 da1: Fixed Direct Access SCSI-3 device da1: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da1: 953674MB (1953124352 512 byte sectors: 255H 63S/T 121576C) da2 at arcmsr0 bus 0 target 0 lun 2 da2: Fixed Direct Access SCSI-3 device da2: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da2: 572204MB (1171874816 512 byte sectors: 255H 63S/T 72945C) SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/da0s1a WARNING: ZFS is considered to be an experimental feature in FreeBSD. ZFS filesystem version 6 ZFS storage pool version 6 From koitsu at FreeBSD.org Thu Aug 7 04:47:59 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 7 04:48:09 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> Message-ID: <20080807044759.GA7505@eos.sc1.parodius.com> On Wed, Aug 06, 2008 at 09:09:02PM -0700, Vye Wilson wrote: > Hello, > > I setup a raidz1 zpool to test ZFS with a device failure and to see how > quickly the zpool could be resilvered. The system I'm using has a backplane > that all the drives are connected to, so everything is hotswappable. I > created the raidz1 zpool and then removed one of the drives. zpool status > showed that the pool was degraded but online. Ok great, so lets bring the > now functioning drive back online. > > [root@Touzyoh /home/vye]# zpool online ztemp ad18 > Bringing device ad18 online > > Everything looks good... lets check the zpool status > > pool: ztemp > state: DEGRADED > status: One or more devices could not be opened. Sufficient replicas exist > for > the pool to continue functioning in a degraded state. > action: Attach the missing device and online it using 'zpool online'. > see: http://www.sun.com/msg/ZFS-8000-D3 > scrub: resilver completed with 0 errors on Wed Aug 6 20:59:54 2008 > config: > > NAME STATE READ WRITE CKSUM > ztemp DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > ad10 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > ad18 UNAVAIL 0 0 0 cannot open > > errors: No known data errors > > Doh! still degraded. It shows 'UNAVAIL cannot open' I've tried rebooting but > it will not open that drive at all. According to dmesg the drive is > functional, and if I destroy the pool and recreate it the drive works fine. > I wasn't able to find any similar issues on this mailing list or in google. > Does anyone have any ideas? I've attached my dmesg output. What was in your dmesg when you yanked the disk? What was in your dmesg when you re-inserted the disk? Did you try detaching it administratively using "atacontrol detach" first, then retaching it using "atacontrol attach"? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From vyeperman at gmail.com Thu Aug 7 05:12:02 2008 From: vyeperman at gmail.com (Vye Wilson) Date: Thu Aug 7 05:12:09 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080807044759.GA7505@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> Message-ID: <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> When I physically disconnected the disk it showed: subdisk18: detached ad18: detached There was nothing in dmesg after plugging the disk back in but atacontrol showed it on channel 9: ATA channel 9: Master: ad18 Serial ATA v1.0 Slave: no device present After detaching and reattaching the device with atacontrol this is what dmesg had to say: subdisk18: detached ad18: detached ata9: [ITHREAD] ad18: 286168MB at ata9-master SATA150 zpool is still saying it cannot read from ad18 after detaching and reattaching with atacontrol. However, I remade the zpool and instead of physically removing the drive I just used the atacontrol detatch/attach and it was able to resilver without any issues. That helps but what if I do have a drive failure? I shouldn't need to halt the system to switch out the drive. Thanks. On Wed, Aug 6, 2008 at 9:47 PM, Jeremy Chadwick wrote: > On Wed, Aug 06, 2008 at 09:09:02PM -0700, Vye Wilson wrote: > > Hello, > > > > I setup a raidz1 zpool to test ZFS with a device failure and to see how > > quickly the zpool could be resilvered. The system I'm using has a > backplane > > that all the drives are connected to, so everything is hotswappable. I > > created the raidz1 zpool and then removed one of the drives. zpool status > > showed that the pool was degraded but online. Ok great, so lets bring the > > now functioning drive back online. > > > > [root@Touzyoh /home/vye]# zpool online ztemp ad18 > > Bringing device ad18 online > > > > Everything looks good... lets check the zpool status > > > > pool: ztemp > > state: DEGRADED > > status: One or more devices could not be opened. Sufficient replicas > exist > > for > > the pool to continue functioning in a degraded state. > > action: Attach the missing device and online it using 'zpool online'. > > see: http://www.sun.com/msg/ZFS-8000-D3 > > scrub: resilver completed with 0 errors on Wed Aug 6 20:59:54 2008 > > config: > > > > NAME STATE READ WRITE CKSUM > > ztemp DEGRADED 0 0 0 > > raidz1 DEGRADED 0 0 0 > > ad10 ONLINE 0 0 0 > > ad14 ONLINE 0 0 0 > > ad18 UNAVAIL 0 0 0 cannot open > > > > errors: No known data errors > > > > Doh! still degraded. It shows 'UNAVAIL cannot open' I've tried rebooting > but > > it will not open that drive at all. According to dmesg the drive is > > functional, and if I destroy the pool and recreate it the drive works > fine. > > I wasn't able to find any similar issues on this mailing list or in > google. > > Does anyone have any ideas? I've attached my dmesg output. > > What was in your dmesg when you yanked the disk? What was in your dmesg > when you re-inserted the disk? > > Did you try detaching it administratively using "atacontrol detach" > first, then retaching it using "atacontrol attach"? > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > -- --Vye From vyeperman at gmail.com Thu Aug 7 05:35:55 2008 From: vyeperman at gmail.com (Vye Wilson) Date: Thu Aug 7 05:36:02 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> Message-ID: <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> I tested it again and it seems I was mistaken in my last email. If the drive is removed manually it will _not_ show up in atacontrol. atacontrol attach ata3 atacontrol: ioctl(IOCATAATTACH): File exists [root@Touzyoh /home/vye]# atacontrol list ATA channel 3: Master: no device present Slave: no device present [root@Touzyoh /home/vye]# atacontrol detach ata6 [root@Touzyoh /home/vye]# atacontrol attach ata6 Master: no device present Slave: no device present I'm not sure what happened the first few times but if I reboot the server the device will show up in atacontrol and the zpool will resilver. This doesn't seem to be a ZFS problem like I originally thought. Thanks. On Wed, Aug 6, 2008 at 10:12 PM, Vye Wilson wrote: > When I physically disconnected the disk it showed: > > subdisk18: detached > ad18: detached > > There was nothing in dmesg after plugging the disk back in but atacontrol > showed it on channel 9: > > ATA channel 9: > Master: ad18 Serial ATA v1.0 > Slave: no device present > > After detaching and reattaching the device with atacontrol this is what > dmesg had to say: > subdisk18: detached > ad18: detached > ata9: [ITHREAD] > ad18: 286168MB at ata9-master SATA150 > > zpool is still saying it cannot read from ad18 after detaching and > reattaching with atacontrol. However, I remade the zpool and instead of > physically removing the drive I just used the atacontrol detatch/attach and > it was able to resilver without any issues. That helps but what if I do have > a drive failure? I shouldn't need to halt the system to switch out the > drive. > > Thanks. > > > On Wed, Aug 6, 2008 at 9:47 PM, Jeremy Chadwick wrote: > >> On Wed, Aug 06, 2008 at 09:09:02PM -0700, Vye Wilson wrote: >> > Hello, >> > >> > I setup a raidz1 zpool to test ZFS with a device failure and to see how >> > quickly the zpool could be resilvered. The system I'm using has a >> backplane >> > that all the drives are connected to, so everything is hotswappable. I >> > created the raidz1 zpool and then removed one of the drives. zpool >> status >> > showed that the pool was degraded but online. Ok great, so lets bring >> the >> > now functioning drive back online. >> > >> > [root@Touzyoh /home/vye]# zpool online ztemp ad18 >> > Bringing device ad18 online >> > >> > Everything looks good... lets check the zpool status >> > >> > pool: ztemp >> > state: DEGRADED >> > status: One or more devices could not be opened. Sufficient replicas >> exist >> > for >> > the pool to continue functioning in a degraded state. >> > action: Attach the missing device and online it using 'zpool online'. >> > see: http://www.sun.com/msg/ZFS-8000-D3 >> > scrub: resilver completed with 0 errors on Wed Aug 6 20:59:54 2008 >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > ztemp DEGRADED 0 0 0 >> > raidz1 DEGRADED 0 0 0 >> > ad10 ONLINE 0 0 0 >> > ad14 ONLINE 0 0 0 >> > ad18 UNAVAIL 0 0 0 cannot open >> > >> > errors: No known data errors >> > >> > Doh! still degraded. It shows 'UNAVAIL cannot open' I've tried rebooting >> but >> > it will not open that drive at all. According to dmesg the drive is >> > functional, and if I destroy the pool and recreate it the drive works >> fine. >> > I wasn't able to find any similar issues on this mailing list or in >> google. >> > Does anyone have any ideas? I've attached my dmesg output. >> >> What was in your dmesg when you yanked the disk? What was in your dmesg >> when you re-inserted the disk? >> >> Did you try detaching it administratively using "atacontrol detach" >> first, then retaching it using "atacontrol attach"? >> >> -- >> | Jeremy Chadwick jdc at parodius.com | >> | Parodius Networking http://www.parodius.com/ | >> | UNIX Systems Administrator Mountain View, CA, USA | >> | Making life hard for others since 1977. PGP: 4BD6C0CB | >> >> > > > -- > --Vye > -- --Vye From koitsu at FreeBSD.org Thu Aug 7 05:58:41 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 7 05:58:48 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> Message-ID: <20080807055841.GB9735@eos.sc1.parodius.com> On Wed, Aug 06, 2008 at 10:35:54PM -0700, Vye Wilson wrote: > I tested it again and it seems I was mistaken in my last email. If the drive > is removed manually it will _not_ show up in atacontrol. > > atacontrol attach ata3 > atacontrol: ioctl(IOCATAATTACH): File exists > [root@Touzyoh /home/vye]# atacontrol list > ATA channel 3: > Master: no device present > Slave: no device present > [root@Touzyoh /home/vye]# atacontrol detach ata6 > [root@Touzyoh /home/vye]# atacontrol attach ata6 > Master: no device present > Slave: no device present > > I'm not sure what happened the first few times but if I reboot the server > the device will show up in atacontrol and the zpool will resilver. This > doesn't seem to be a ZFS problem like I originally thought. Correct, it's a FreeBSD ATA subsystem/driver problem. I'm actually amazed your kernel didn't panic when you did the 2nd attach. The first attach returning "File exists" doesn't surprise me either. I've been following these kinds of issues with ATA for many months now, so I want you to know that you're not alone. It is NOT specific to your nVidia controller either; others see the same thing on Intel ICH, and VIA. If you attach those SATA disks to your Areca controller and perform the exact same tests (you'll need to use camcontrol, of course), I can almost guarantee you things will behave 100% correctly. My advice at this point in time, because as of today I have officially lost faith in it: avoid ata(4) at all costs. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From bu7cher at yandex.ru Thu Aug 7 06:44:45 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Thu Aug 7 06:44:52 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080807055841.GB9735@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> Message-ID: <489A9739.20707@yandex.ru> Jeremy Chadwick wrote: > Correct, it's a FreeBSD ATA subsystem/driver problem. I tried 8.0-CURRENT on marvell's, nvida's and intel's controllers. Hot plug and attach/detach works on any of these controllers without any problems.. What i should to do to get similar problems? :) > My advice at this point in time, because as of today I have officially > lost faith in it: avoid ata(4) at all costs. I tried to contact you some time ago, but didn't receive any answers.. Do you still want to resolve your problems with ATA? -- WBR, Andrey V. Elsukov From koitsu at FreeBSD.org Thu Aug 7 07:14:35 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 7 07:14:41 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <489A9739.20707@yandex.ru> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> Message-ID: <20080807071434.GA15465@eos.sc1.parodius.com> On Thu, Aug 07, 2008 at 10:33:29AM +0400, Andrey V. Elsukov wrote: > Jeremy Chadwick wrote: >> Correct, it's a FreeBSD ATA subsystem/driver problem. > > I tried 8.0-CURRENT on marvell's, nvida's and intel's controllers. > Hot plug and attach/detach works on any of these controllers without > any problems.. What i should to do to get similar problems? :) I haven't tried CURRENT; I don't track HEAD. I will work on setting up another testbed environment at home and repeating my tests on HEAD. That will take me some time, however. My test method is very simple, at least in regards to disk removal. Here's the step-by-step I've used to hit the bugs in question: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html >> My advice at this point in time, because as of today I have officially >> lost faith in it: avoid ata(4) at all costs. > > I tried to contact you some time ago, but didn't receive any > answers.. Do you still want to resolve your problems with ATA? Yes, I did receive your mails, but you just wanted to know "if I was still having problems". I should have replied, but I did not. That is my fault, and for that I apologise. The issues aren't problems specific to me -- they are affecting a significant userbase, specifically folks who use servers in production environments. But maybe I've misunderstood what you meant by "your problems" -- my apologies if I have. But have you looked at my Wiki page, documenting most (but not all) of the issues? http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting We still don't have an answer to the famous "DMA timeout issue", which continues to haunt many. I provided a small analysis in my Wiki, but the technical justification is over my head -- it needs review from someone who is familiar with the ATA protocol. I inteprete the NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an invalid LBA. I received one mail from a user (I forget if a mailing list was CC'd or not -- I need to dig up the mail) who said that in some cases NID_NOT_FOUND is normal. The FreeNAS folks reported that increasing the internal ATA command timeout from 5 seconds to 10 or 15 has helped (FreeNAS users), but those on FreeBSD who suffer from said timeouts and have tried the patches said they have made no difference. That said, I have some questions: 1) Are you trying to tell me that individuals running commercial services in production environments should run CURRENT? I don't think many are willing to do this; I know I'm not, and I can probably speak for Randy Bush. ;-) 2) If the issues above were fixed in HEAD, why were none of the PRs listed in my Wiki updated to reflect that? 3) If the above issues were fixed in HEAD, can you point me to the CVS commits for them? Any time I see ATA commits happen in RELENG_7, I immediately use cvsweb to look at the changes and commit message -- that means I look at HEAD, RELENG_7, and any other branchpoint. I haven't seen anything committed for these issues. 4) If the above issues were actually fixed in HEAD, are there scheduled plans to MFC the fixes? I appreciate you taking the time to help track these down and investigate them, but I feel like you, myself, Scott Long, and the users are the only ones who care about these issues. The maintainer is alive and active, but hasn't said a word, and some of those PRs go untouched for 2+ years... -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From randy at psg.com Thu Aug 7 07:28:22 2008 From: randy at psg.com (Randy Bush) Date: Thu Aug 7 07:28:29 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080807071434.GA15465@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> Message-ID: <489AA414.5080709@psg.com> > 1) Are you trying to tell me that individuals running commercial > services in production environments should run CURRENT? I don't > think many are willing to do this; I know I'm not, and I can probably > speak for Randy Bush. ;-) depends on what you say. :) i am personally (not day job) running current with zfs on one production server and contemplating more[0]. i am running current ufs on a number of other servers. i figure someone has to put the stuff under load. and it does not bite me too often. of course, when it does, i kick myself. :) > I feel like you, myself, Scott Long, and the users are the only ones > who care about these issues. i imagine some others may be like i. as i have no time to code, i try not to demand a lot unless something is really likely to bite someone. i just report bugs and don't expect a refund. randy --- [0] need to build a 20tb raidz2 system, kind of an nfs store to serve some data collertor(s) and compute egine(s). trying to sort out disk controller and 10ge card decisions and the hardware / freebsd support space is complex. From tamaru at myn.rcast.u-tokyo.ac.jp Thu Aug 7 08:51:06 2008 From: tamaru at myn.rcast.u-tokyo.ac.jp (Hiroharu Tamaru) Date: Thu Aug 7 08:51:15 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <489A9739.20707@yandex.ru> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> Message-ID: Hi, At Thu, 07 Aug 2008 10:33:29 +0400, Andrey V. Elsukov wrote: > Jeremy Chadwick wrote: > > Correct, it's a FreeBSD ATA subsystem/driver problem. > > I tried 8.0-CURRENT on marvell's, nvida's and intel's controllers. > Hot plug and attach/detach works on any of these controllers without > any problems.. What i should to do to get similar problems? :) Since the topic is here: Can you boot up the machine with that ata HDD offline (unplugged), and then insert it and atthach it? On my 6.2-STABLE machine, I can detach-replace-attach ata disks (most of the time), but only if I have that slot active at boot time. And yes, though I haven't come up with an absolute sequence to reproduce it, I occasionary have the same problem that hot replaced disks are not recognized by 'atacontrol attach', be it the same physical disk or another. I feel there is a 'right sequence' of atacontrol commands (or the lack of), and some 'wrong sequences'. The only way to reattach it then is to reboot the machine. Thanks -- Hiroharu Tamaru From fbsd-fs at mawer.org Thu Aug 7 11:30:10 2008 From: fbsd-fs at mawer.org (Antony Mawer) Date: Thu Aug 7 11:30:16 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080807071434.GA15465@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> Message-ID: <489ADD89.8070809@mawer.org> On 7/08/2008 5:14 PM, Jeremy Chadwick wrote: >>> My advice at this point in time, because as of today I have officially >>> lost faith in it: avoid ata(4) at all costs. >> I tried to contact you some time ago, but didn't receive any >> answers.. Do you still want to resolve your problems with ATA? > > Yes, I did receive your mails, but you just wanted to know "if I was > still having problems". I should have replied, but I did not. That is > my fault, and for that I apologise. > > The issues aren't problems specific to me -- they are affecting a > significant userbase, specifically folks who use servers in production > environments. But maybe I've misunderstood what you meant by "your > problems" -- my apologies if I have. ... > We still don't have an answer to the famous "DMA timeout issue", which > continues to haunt many. I provided a small analysis in my Wiki, but > the technical justification is over my head -- it needs review from > someone who is familiar with the ATA protocol. I inteprete the > NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an > invalid LBA. I received one mail from a user (I forget if a mailing > list was CC'd or not -- I need to dig up the mail) who said that in some > cases NID_NOT_FOUND is normal. Do you know if most people found these are something that go away, at least temporarily, when the server is rebooted? Or do they persist across reboots? I'm going to do some analysis and find out whether I can find any of our systems that may be experiencing ATA errors that don't correlate with what their SMART data is saying. To date I haven't caught any, but that's not to say they may not be happening... just that all of the ones I have caught to date do appear to have been hardware-related issues... It seems a shame that, at one point, bits of FreeBSD ATA code wound up in Linux (the controversy when parts were lifted but Soren's copyright removed)... now FreeBSD's ata subsystem is left languishing with no immediate solution in sight :-( Is there any means or interest for the FreeBSD Foundation to find someone experienced enough with ATA to spend some time reviewing/testing it? Or does that sort of thing just "not happen"? --Antony From koitsu at FreeBSD.org Thu Aug 7 12:12:45 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 7 12:12:52 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <489ADD89.8070809@mawer.org> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> Message-ID: <20080807121245.GA26629@eos.sc1.parodius.com> On Thu, Aug 07, 2008 at 09:33:29PM +1000, Antony Mawer wrote: > On 7/08/2008 5:14 PM, Jeremy Chadwick wrote: >>>> My advice at this point in time, because as of today I have officially >>>> lost faith in it: avoid ata(4) at all costs. >>> I tried to contact you some time ago, but didn't receive any >>> answers.. Do you still want to resolve your problems with ATA? >> >> Yes, I did receive your mails, but you just wanted to know "if I was >> still having problems". I should have replied, but I did not. That is >> my fault, and for that I apologise. >> >> The issues aren't problems specific to me -- they are affecting a >> significant userbase, specifically folks who use servers in production >> environments. But maybe I've misunderstood what you meant by "your >> problems" -- my apologies if I have. > ... >> We still don't have an answer to the famous "DMA timeout issue", which >> continues to haunt many. I provided a small analysis in my Wiki, but >> the technical justification is over my head -- it needs review from >> someone who is familiar with the ATA protocol. I inteprete the >> NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an >> invalid LBA. I received one mail from a user (I forget if a mailing >> list was CC'd or not -- I need to dig up the mail) who said that in some >> cases NID_NOT_FOUND is normal. > > Do you know if most people found these are something that go away, at > least temporarily, when the server is rebooted? Or do they persist > across reboots? For some people, the problems are permanent. Those people have been referred to talk to Scott Long, who offered to help look into the issue, specifically those who can reproduce the errors. For others, the problems eventually disappear, or possibly disappear after "messing around" with their systems. Some people swapped SATA cables, others replaced their entire motherboard; "things work fine" was the result. The problem is that these were considered solutions, and I'm not so sure the problem really was with their hardware, cables, or anything else. For example, the FreeNAS folks found that for some users, increasing the ATA command timeout in the code from an arbitrary 5 seconds to something larger (10-15 seconds) fixed the problem. I don't know why an ATA transaction would take 10-15 seconds, but I suppose it's possible if the disk is doing something internal. I have personal experience with an example of drives doing such. Some older (circa late 90s/early 2000) IBM ATA disks have a feature called ADM, where almost like clockwork, the drive would spin down and start doing some sort of internal maintenance. Upon recieving an ATA command, it would abort ADM, spin up, then complete the request. The result in FreeBSD (4.x) was a slew of ATA timeout/DMA errors. I eventually found this feature mentioned in the disk specification PDF, and contacted IBM about it. IBM confirmed the feature, confirmed it was responsible, and stated there was no way to disable it on ATA disks. (On SCSI, the feature defaulted to off, but could be toggled on via a custom vendor-specific SCSI command). I still have the mail from IBM if anyone wants to see it. A few months later, when IBM released their next generation version of their ATA disks, I found the ADM feature completely gone (from the disk and the disk specification PDF). In almost every case I've looked at so far, the individuals' chipsets, disks, and overall setup are different. SMART statistics on the drives show absolutely no sign of errors, or anything that indicates a hardware failure. Many of the users are using AHCI as well (myself included, and I have seen the DMA error issue myself), which is more reliable than classic IDE. Even in the case of temporary failures ("I saw those DMA errors once, but they haven't returned"), it would be benefitial if users could submit that data to me, so I can put more example cases in the Wiki. The NID_NOT_FOUND error bother me, because the ATA specification implies the OS is asking for an invalid LBA, which would likely be caused by a bug in FreeBSD. That's why that error condition needs to be analysed more. I will point people to the libata FAQ on errors, too -- look at the definition of error type IDNF (this is what FreeBSD calls NID_NOT_FOUND): http://ata.wiki.kernel.org/index.php/Libata_error_messages For SATA, FreeBSD does not appear to support printing SATA SError codes, so for those of us with SATA disks, we're actually missing some verbosity on what the errors could be caused by. It would be benefitial if there was some form of sysctl to increase the verbosity from the ATA subsystem when an error happens. The existing data we get back is terse, and barely useful. I know for a fact there's more debug information that could be output in such scenarios. And please do not reply with "good idea, send patches" unless you're wanting to be chewed out. :-) > I'm going to do some analysis and find out whether I can find any of our > systems that may be experiencing ATA errors that don't correlate with > what their SMART data is saying. To date I haven't caught any, but > that's not to say they may not be happening... just that all of the ones > I have caught to date do appear to have been hardware-related issues... > > It seems a shame that, at one point, bits of FreeBSD ATA code wound up > in Linux (the controversy when parts were lifted but Soren's copyright > removed)... now FreeBSD's ata subsystem is left languishing with no > immediate solution in sight :-( I had forgotten all about that piece of history -- thanks for reminding me. The only solutions/options that are on the horizon: * Recommending people buy SATA controllers that utilise CAM and da(4), thus avoiding ata(4) entirely. Areca makes such controllers, but it's impractical to ask users to buy one, since they're expensive; end-users should be able to use their onboard SATA controllers like the Intel ICH series and nVidia nForce and MCP with reliability. * Implement a form of ATA-to-SCSI translation, similar to what Linux libata does; this would make ATA/SATA disks utilise CAM and da(4) through a translation layer. Scott Long has told me that he is actually in the process of writing such, but I know Scott is also *incredibly* busy, so that project may take a very long time. * If you use ATA (read: PATA) disks, disabling DMA and forcing PIO does apparently work. The performance hit is quite substantial, however, and this isn't practical for servers. Disabling DMA is not possible with SATA. * Contact Jeff Garzik (Linux libata maintainer) and ask for help. This might upset some people, especially considering the history item you brought up earlier. By "help" I don't mean "Hey man, write the code", I mean "can you help expand on what this error means, and have you folks seen it on Linux?" > Is there any means or interest for the FreeBSD Foundation to find > someone experienced enough with ATA to spend some time reviewing/testing > it? Or does that sort of thing just "not happen"? That's a very good question. I don't know how the politics of the FreeBSD Foundation work. But I would be more than happy to donate money (read: a couple thousand US dollars out of my pocket) to the Foundation assuming the proceeds went to getting someone very familiar with ATA to look at the problem, look at the code, address existing PRs, and fix things. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From morganw at chemikals.org Thu Aug 7 15:46:22 2008 From: morganw at chemikals.org (Wes Morgan) Date: Thu Aug 7 15:46:33 2008 Subject: ZFS Advice In-Reply-To: <489A2E1F.6000405@egr.msu.edu> References: <18585.3903.895425.122613@almost.alerce.com> <200808061928.37001.peter.schuller@infidyne.com> <489A2E1F.6000405@egr.msu.edu> Message-ID: On Wed, 6 Aug 2008, Adam McDougall wrote: > Wes Morgan wrote: >> On Wed, 6 Aug 2008, Peter Schuller wrote: >> >>>> The AoC-SAT2-MV8 is based on the "Marvell Hercules-2 Rev. C0 SATA host >>>> controller", which seems to be AKA 88SX6081, which is listed as >>>> supported by the ata driver in 7.0-RELEASE. Has anyone had any ZFS >>>> experience with it? >>> >>> Yes; it has been working quite fine for me with 7 up to a >>> release-candidate. >>> In 7.0-RELEASE you must disable the hptrr driver because it eats the >>> device. >>> See: >>> >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120615 >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/120842 >>> >>> When I say "working fine", this is with 8 SATA drives (6 of them in a >>> raidz2, >>> two of them for other stuff) and not having any issues like corruption, >>> timeouts or whatever else people have had with shaky controllers. >>> >>> However, I cannot speak to performance because it's a PCI-X card that I've >>> plugged into a PCI slow, so throughput is limited by the PCI bus (and the >>> machine is otherwise not the fastest to begin with). >>> >>> Note that this is on 32 bit; haven't been able to try it on 64 bit because >>> I >>> the PCI-X card wouldn't work on the motherboard (again PCI, so it's >>> hit-and-miss) where I would otherwise have tried it. >>> >>> I'd love to find a buyable PCI-E version that also worked in FreeBSD... >>> I'll >>> see if the link in your post contains any such hints. >> >> Hmmm... That PCI-X card is interesting. Supermicro also lists this: >> >> http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm >> >> http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html >> >> Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for $155. >> That would be cheaper than buying a new board with a PCI-X slot or two, and >> would even handle SAS drives. Claims to be based on the "LSISAS 1068E SAS >> controller". Any idea if that is supported? I don't see it listed in the >> mfi man page. LSI has a Linux driver for download. That card looks like it >> would be just what I need. >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > mpt2@pci0:8:0:0: class=0x010000 card=0x31501000 chip=0x00581000 > rev=0x04 hdr=0x00 > vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > device = 'SAS 3000 series, 8-port with 1068E -StorPort' > class = mass storage > subclass = SCSI > mpt2: port 0xa800-0xa8ff mem > 0xfcbfc000-0xfcbfffff,0xfcbe0000-0xfcbeffff irq 17 at device 0.0 on pci8 > > da1 at mpt2 bus 0 target 0 lun 0 > da1: Fixed Direct Access SCSI-5 device > da1: 300.000MB/s transfers > da1: Command Queueing Enabled > da1: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da2 at mpt2 bus 0 target 1 lun 0 > da2: Fixed Direct Access SCSI-5 device > da2: 300.000MB/s transfers > da2: Command Queueing Enabled > da2: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da3 at mpt2 bus 0 target 2 lun 0 > da3: Fixed Direct Access SCSI-5 device > da3: 300.000MB/s transfers > da3: Command Queueing Enabled > da3: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da4 at mpt2 bus 0 target 3 lun 0 > da4: Fixed Direct Access SCSI-5 device > da4: 300.000MB/s transfers > da4: Command Queueing Enabled > da4: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da5 at mpt2 bus 0 target 4 lun 0 > da5: Fixed Direct Access SCSI-5 device > da5: 300.000MB/s transfers > da5: Command Queueing Enabled > da5: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da6 at mpt2 bus 0 target 5 lun 0 > da6: Fixed Direct Access SCSI-5 device > da6: 300.000MB/s transfers > da6: Command Queueing Enabled > da6: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da7 at mpt2 bus 0 target 6 lun 0 > da7: Fixed Direct Access SCSI-5 device > da7: 300.000MB/s transfers > da7: Command Queueing Enabled > da7: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) > da8 at mpt2 bus 0 target 7 lun 0 > da8: Fixed Direct Access SCSI-5 device > da8: 300.000MB/s transfers > da8: Command Queueing Enabled > da8: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) Excellent! How is your experience with the performance and reliability of the card? From boris.kotzev at gmail.com Thu Aug 7 16:50:07 2008 From: boris.kotzev at gmail.com (Boris Kotzev) Date: Thu Aug 7 16:50:13 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges Message-ID: <200808071925.45786.boris.kotzev@gmail.com> Hello, I used the zfs port to Mac OS X (http://zfs.macosforge.org) to create a storage pool under Mac OS X. The pool can be imported successfully under FreeBSD: root:~-114# zpool import macpool root:~-115# zpool list macpool NAME SIZE USED AVAIL CAP HEALTH ALTROOT macpool 6,94G 510K 6,94G 0% ONLINE - root:~-116# zfs list macpool NAME USED AVAIL REFER MOUNTPOINT macpool 474K 6,83G 308K /macpool and is fully accessible to the root user: root:~-118# id uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) root:~-119# ls -ld /macpool drwxr-xr-x 7 root wheel 8 7 ??? 16:59 /macpool root:~-120# ls -l /macpool total 43 drwx------ 3 root wheel 3 7 ??? 16:31 .Spotlight-V100 -rw-r--r-- 1 root wheel 35014 7 ??? 16:31 .VolumeIcon.icns drwx------ 2 root wheel 4 7 ??? 16:32 .fseventsd drwxr-xr-x 2 root wheel 2 7 ??? 16:59 backup drwxr-xr-x 2 root wheel 2 7 ??? 16:59 downloads drwxr-xr-x 2 root wheel 2 7 ??? 16:58 music According to the file permissions on /macpool (drwxr-xr-x), anyone should have read access to it. This is not the case though: root:~-121# su user % id uid=1003(user) gid=1003(user) groups=1003(user),0(wheel),5(operator) % ls -l /macpool ls: /macpool: Permission denied % cd /macpool /macpool: Permission denied. Is this a bug, or is there some way to get access to /macpool as an ordinary user? The pool was created under version zfs-119 of the Mac OS X port; the FreeBSD version is: root:~-122# uname -a FreeBSD xxxx 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Aug 2 14:19:33 EEST 2008 root@xxxx:/usr/obj/usr/src/sys/MACBOOK amd64 with the latest zfs patch, but the problem was also present before applying the patch. Thanks, Boris Kotzev From koitsu at FreeBSD.org Thu Aug 7 16:55:03 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 7 16:55:10 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <200808071925.45786.boris.kotzev@gmail.com> References: <200808071925.45786.boris.kotzev@gmail.com> Message-ID: <20080807165502.GA39420@eos.sc1.parodius.com> On Thu, Aug 07, 2008 at 07:25:45PM +0300, Boris Kotzev wrote: > Hello, > > I used the zfs port to Mac OS X (http://zfs.macosforge.org) to create > a storage pool under Mac OS X. The pool can be imported successfully > under FreeBSD: > > root:~-114# zpool import macpool > root:~-115# zpool list macpool > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > macpool 6,94G 510K 6,94G 0% ONLINE - > root:~-116# zfs list macpool > NAME USED AVAIL REFER MOUNTPOINT > macpool 474K 6,83G 308K /macpool > > and is fully accessible to the root user: > > root:~-118# id > uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) > root:~-119# ls -ld /macpool > drwxr-xr-x 7 root wheel 8 7 ??? 16:59 /macpool > root:~-120# ls -l /macpool > total 43 > drwx------ 3 root wheel 3 7 ??? 16:31 .Spotlight-V100 > -rw-r--r-- 1 root wheel 35014 7 ??? 16:31 .VolumeIcon.icns > drwx------ 2 root wheel 4 7 ??? 16:32 .fseventsd > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 backup > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 downloads > drwxr-xr-x 2 root wheel 2 7 ??? 16:58 music > > According to the file permissions on /macpool (drwxr-xr-x), anyone > should have read access to it. This is not the case though: > > root:~-121# su user > % id > uid=1003(user) gid=1003(user) groups=1003(user),0(wheel),5(operator) > % ls -l /macpool > ls: /macpool: Permission denied > % cd /macpool > /macpool: Permission denied. > > Is this a bug, or is there some way to get access to /macpool as an > ordinary user? > > The pool was created under version zfs-119 of the Mac OS X port; the > FreeBSD version is: > > root:~-122# uname -a > FreeBSD xxxx 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Aug 2 14:19:33 > EEST 2008 root@xxxx:/usr/obj/usr/src/sys/MACBOOK amd64 > > with the latest zfs patch, but the problem was also present before > applying the patch. As root, what does "zfs get all macpool" return on FreeBSD? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From boris.kotzev at gmail.com Thu Aug 7 17:41:22 2008 From: boris.kotzev at gmail.com (Boris Kotzev) Date: Thu Aug 7 17:41:29 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <20080807165502.GA39420@eos.sc1.parodius.com> References: <200808071925.45786.boris.kotzev@gmail.com> <20080807165502.GA39420@eos.sc1.parodius.com> Message-ID: <200808072040.55571.boris.kotzev@gmail.com> ?? Thursday 07 August 2008 19:55:02 Jeremy Chadwick ??????: > On Thu, Aug 07, 2008 at 07:25:45PM +0300, Boris Kotzev wrote: > > Hello, > > > > I used the zfs port to Mac OS X (http://zfs.macosforge.org) to > > create a storage pool under Mac OS X. The pool can be imported > > successfully under FreeBSD: > > > > root:~-114# zpool import macpool > > root:~-115# zpool list macpool > > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > > macpool 6,94G 510K 6,94G 0% ONLINE - > > root:~-116# zfs list macpool > > NAME USED AVAIL REFER MOUNTPOINT > > macpool 474K 6,83G 308K /macpool > > > > and is fully accessible to the root user: > > > > root:~-118# id > > uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) > > root:~-119# ls -ld /macpool > > drwxr-xr-x 7 root wheel 8 7 ??? 16:59 /macpool > > root:~-120# ls -l /macpool > > total 43 > > drwx------ 3 root wheel 3 7 ??? 16:31 .Spotlight-V100 > > -rw-r--r-- 1 root wheel 35014 7 ??? 16:31 .VolumeIcon.icns > > drwx------ 2 root wheel 4 7 ??? 16:32 .fseventsd > > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 backup > > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 downloads > > drwxr-xr-x 2 root wheel 2 7 ??? 16:58 music > > > > According to the file permissions on /macpool (drwxr-xr-x), > > anyone should have read access to it. This is not the case > > though: > > > > root:~-121# su user > > % id > > uid=1003(user) gid=1003(user) > > groups=1003(user),0(wheel),5(operator) % ls -l /macpool > > ls: /macpool: Permission denied > > % cd /macpool > > /macpool: Permission denied. > > > > Is this a bug, or is there some way to get access to /macpool as > > an ordinary user? > > > > The pool was created under version zfs-119 of the Mac OS X port; > > the FreeBSD version is: > > > > root:~-122# uname -a > > FreeBSD xxxx 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Aug 2 > > 14:19:33 EEST 2008 root@xxxx:/usr/obj/usr/src/sys/MACBOOK amd64 > > > > with the latest zfs patch, but the problem was also present > > before applying the patch. > > As root, what does "zfs get all macpool" return on FreeBSD? root@:~-116# zfs get all macpool NAME PROPERTY VALUE SOURCE macpool type filesystem - macpool creation ?? ??? 7 16:31 2008 - macpool used 474K - macpool available 6,83G - macpool referenced 308K - macpool compressratio 1.00x - macpool mounted yes - macpool quota none default macpool reservation none default macpool recordsize 128K default macpool mountpoint /macpool default macpool sharenfs off default macpool checksum on default macpool compression off default macpool atime on default macpool devices on default macpool exec on default macpool setuid on default macpool readonly off default macpool jailed off default macpool snapdir hidden default macpool aclmode groupmask default macpool aclinherit restricted default macpool canmount on default macpool shareiscsi off default macpool xattr off temporary macpool copies 1 default macpool version 1 - macpool utf8only off - macpool normalization none - macpool casesensitivity sensitive - macpool vscan off default macpool nbmand off default macpool sharesmb off default macpool refquota none default macpool refreservation none default Thanks! Boris Kotzev From bp at barryp.org Thu Aug 7 17:59:30 2008 From: bp at barryp.org (Barry Pederson) Date: Thu Aug 7 17:59:37 2008 Subject: ZFS Advice In-Reply-To: References: <18585.3903.895425.122613@almost.alerce.com> <200808061928.37001.peter.schuller@infidyne.com> <489A2E1F.6000405@egr.msu.edu> Message-ID: <489B2FFE.5050406@barryp.org> Wes Morgan wrote: > On Wed, 6 Aug 2008, Adam McDougall wrote: > >> Wes Morgan wrote: >>> On Wed, 6 Aug 2008, Peter Schuller wrote: >>> Hmmm... That PCI-X card is interesting. Supermicro also lists this: >>> >>> http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm >>> >>> http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html >>> >>> Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for >>> $155. I think CDW is mistaken in saying it's a PCI-E card, UIO is a proprietary Supermicro bus that some of their motherboards support. http://www.supermicro.com/products/nfo/UIO.cfm Too bad, that's a good price for an 8-port SAS card. I've got some LSI-branded PCI-X and PCI-E SAS cards that have been working very well with ZFS and SATA drives, but the 8-port card was more like $288. Barry From mcdouga9 at egr.msu.edu Thu Aug 7 18:47:51 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Thu Aug 7 18:48:05 2008 Subject: ZFS Advice In-Reply-To: References: <18585.3903.895425.122613@almost.alerce.com> <200808061928.37001.peter.schuller@infidyne.com> <489A2E1F.6000405@egr.msu.edu> Message-ID: <489B4356.8060702@egr.msu.edu> Wes Morgan wrote: > On Wed, 6 Aug 2008, Adam McDougall wrote: > >> Wes Morgan wrote: >>> >>> http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html >>> >>> Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for >>> $155. That would be cheaper than buying a new board with a PCI-X >>> slot or two, and would even handle SAS drives. Claims to be based on >>> the "LSISAS 1068E SAS controller". Any idea if that is supported? I >>> don't see it listed in the mfi man page. LSI has a Linux driver for >>> download. That card looks like it would be just what I need. >>> _______________________________________________ >>> >> mpt2@pci0:8:0:0: class=0x010000 card=0x31501000 >> chip=0x00581000 rev=0x04 hdr=0x00 >> vendor = 'LSI Logic (Was: Symbios Logic, NCR)' >> device = 'SAS 3000 series, 8-port with 1068E -StorPort' >> class = mass storage >> subclass = SCSI >> mpt2: port 0xa800-0xa8ff mem >> 0xfcbfc000-0xfcbfffff,0xfcbe0000-0xfcbeffff irq 17 at device 0.0 on pci8 >> >> da1 at mpt2 bus 0 target 0 lun 0 >> da1: Fixed Direct Access SCSI-5 device >> da1: 300.000MB/s transfers >> da1: Command Queueing Enabled >> da1: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) >> > > Excellent! How is your experience with the performance and reliability > of the card? > The system is in production right now with a smallish amount of load, so the quick test I did to see 85MB/sec write to a raidz1 and 60M/sec read is probably a bit unfair. Each set of 4 disks is attached to a SAS expander which is attached to the card internally in a sun x4150. I didn't try to tune it for performance before I started using that volume for storage in a manner where size and realiability were more important than speed. But on a whole it seems reliable. I won't have any time soon to get it into a better state for testing, but I may test the new ZFS patches on it before that. I've been pretty happy with all other MPT family chips so I would doubt it has any crippling speed problems. From matt at corp.spry.com Thu Aug 7 21:12:58 2008 From: matt at corp.spry.com (Matt Simerson) Date: Thu Aug 7 21:13:11 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <489AA414.5080709@psg.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489AA414.5080709@psg.com> Message-ID: <135AC59E-45A2-4254-AC2C-43D3BB60EE39@corp.spry.com> On Aug 7, 2008, at 12:28 AM, Randy Bush wrote: >> 1) Are you trying to tell me that individuals running commercial >> services in production environments should run CURRENT? I don't >> think many are willing to do this; I know I'm not, and I can probably >> speak for Randy Bush. ;-) > > depends on what you say. :) > > i am personally (not day job) running current with zfs on one > production > server and contemplating more[0]. i am running current ufs on a > number of > other servers. > > --- > > [0] need to build a 20tb raidz2 system, kind of an nfs store to serve > some data collertor(s) and compute egine(s). trying to sort out disk > controller and 10ge card decisions and the hardware / freebsd support > space is complex. SuperMicro 24 disk chassis (CSE-846TQ-R900B) 24 1TB SATA disks (2) Areca 1231ML controllers with BBWC Two RAID 5 partitions with no hot-spare disks will net you 19TB of formatted storage. If you ZFS stripe across the two RAID arrays, you'll get decent performance. If your data is compressible, enabling file system compression will easily store the 20TB you need. If you have heavy I/O load, do not use raidz2 in 7.0 lest you experience the instability and hangs that many of us are seeing. Use 8.0-HEAD with the latest patches and with a little luck, you'll have lots of storage and stability. Matt From randy at psg.com Thu Aug 7 21:32:02 2008 From: randy at psg.com (Randy Bush) Date: Thu Aug 7 21:33:02 2008 Subject: medium sized zfs nfs server (was: zpool degraded - 'UNAVAIL cannot open' functioning drive) In-Reply-To: <135AC59E-45A2-4254-AC2C-43D3BB60EE39@corp.spry.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489AA414.5080709@psg.com> <135AC59E-45A2-4254-AC2C-43D3BB60EE39@corp.spry.com> Message-ID: <489B69D0.4050303@psg.com> [ i am a internet / research / software guy, and this pee cee hardware stuff scares the outta me. so i am seriously desperate with these questions and *highly* appreciative of the responses ] >> [0] need to build a 20tb raidz2 system, kind of an nfs store to serve >> some data collertor(s) and compute engine(s). trying to sort out disk >> controller and 10ge card decisions and the hardware / freebsd support >> space is complex. > > SuperMicro 24 disk chassis (CSE-846TQ-R900B) > 24 1TB SATA disks > (2) Areca 1231ML controllers with BBWC thank you. and the 10gige card? thinking copper cx as it's just between two or three hosts in the rack. > Two RAID 5 partitions with no hot-spare disks will net you 19TB of > formatted storage. If you ZFS stripe across the two RAID arrays, you'll > get decent performance. If your data is compressible, enabling file > system compression will easily store the 20TB you need. actually, i was thinking what i hope is a bit more conservatively (except for the last line) o exactly that chassis with a mobo with 8gb ram and > 2 cores o the goal is 20tb of spindles to get 10tb of data store [ may even wait some weeks for the 1.5tb drives, oink oink ] o two hot spare drives as this is in a remote rack o no hardware raid, have been burned by 3ware, ... o zfs's raidz2 o (gasp!) -current for the zfs fixes whack me harder with the clue bat, please randy From md at hudora.de Thu Aug 7 22:21:24 2008 From: md at hudora.de (Maximillian Dornseif) Date: Thu Aug 7 22:21:31 2008 Subject: Strange (?) hangs with ZFS/rsync. Message-ID: <18880785.post@talk.nabble.com> I have a ZFS based backup server which rsyncs from a dozen other machines. There is only a single rsync active for any point in time and the system is used for nothing else. It has an amd64 kernel and 4GB RAM. I tried to follow the ZFS tuning guide. For a few weeks the machine works well. Since monday it rsync always hangs when fetching a big logfile from a remote machine. The two rsync processes are in state "zfs:&b" and "zfs:lo" and can't be killed. The other strange thing is that the system is reporting 3367M (!) wired memory. I have uploaded dmesg output and a some additional information at http://static.23.nu/md/Pictures/freebsd+zfs+issue.txt Any suggestions on how to debug this issue? Regards Maximillian Dornseif -- View this message in context: http://www.nabble.com/Strange-%28-%29-hangs-with-ZFS-rsync.-tp18880785p18880785.html Sent from the freebsd-fs mailing list archive at Nabble.com. From md at hudora.de Thu Aug 7 22:33:28 2008 From: md at hudora.de (Maximillian Dornseif) Date: Thu Aug 7 22:33:35 2008 Subject: Strange (?) hangs with ZFS/rsync. In-Reply-To: <18880785.post@talk.nabble.com> References: <18880785.post@talk.nabble.com> Message-ID: <18881178.post@talk.nabble.com> Maximillian Dornseif wrote: > > For a few weeks the machine works well. Since monday it rsync always hangs > when fetching a big logfile from a remote machine. > I forget to mention: other processes still work and the ZFS filesystem is still accessible to casual inspection by ls and cat. The machine does not reboot but hangs during the "syncing" phase of shutdown. It has to be powercycled to restart. --md -- View this message in context: http://www.nabble.com/Strange-%28-%29-hangs-with-ZFS-rsync.-tp18880785p18881178.html Sent from the freebsd-fs mailing list archive at Nabble.com. From des at des.no Thu Aug 7 23:09:13 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Aug 7 23:09:19 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled In-Reply-To: <20080806144820.GO97161@deviant.kiev.zoral.com.ua> (Kostik Belousov's message of "Wed, 6 Aug 2008 17:48:20 +0300") References: <200808061020.m76AK5NI013323@freefall.freebsd.org> <20080806133441.GM97161@deviant.kiev.zoral.com.ua> <20080806144820.GO97161@deviant.kiev.zoral.com.ua> Message-ID: <86zlnoe82w.fsf@ds4.des.no> Kostik Belousov writes: > @@ -169,7 +169,8 @@ ffs_mount(struct mount *mp, struct thread *td) > * persist "snapshot" in the options list. > */ > vfs_deleteopt(mp->mnt_optnew, "snapshot"); > - vfs_deleteopt(mp->mnt_opt, "snapshot"); > + if (mp->mnt_opt != NULL) > + vfs_deleteopt(mp->mnt_opt, "snapshot"); > } > > MNT_ILOCK(mp); I would suggest also adding a KASSERT to vfs_deleteopt(). DES -- Dag-Erling Sm?rgrav - des@des.no From des at des.no Thu Aug 7 23:30:04 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Aug 7 23:30:09 2008 Subject: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Message-ID: <200808072330.m77NU3GP052285@freefall.freebsd.org> The following reply was made to PR kern/126287; it has been noted by GNATS. From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Kostik Belousov Cc: Mateusz Guzik , freebsd-fs@freebsd.org, bug-followup@freebsd.org Subject: Re: kern/126287: [ufs] [panic] Kernel panics while mounting an UFS filesystem with snapshot enabled Date: Fri, 08 Aug 2008 01:09:11 +0200 Kostik Belousov writes: > @@ -169,7 +169,8 @@ ffs_mount(struct mount *mp, struct thread *td) > * persist "snapshot" in the options list. > */ > vfs_deleteopt(mp->mnt_optnew, "snapshot"); > - vfs_deleteopt(mp->mnt_opt, "snapshot"); > + if (mp->mnt_opt !=3D NULL) > + vfs_deleteopt(mp->mnt_opt, "snapshot"); > } >=20=20 > MNT_ILOCK(mp); I would suggest also adding a KASSERT to vfs_deleteopt(). DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From rmacklem at uoguelph.ca Thu Aug 7 23:53:13 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Thu Aug 7 23:53:19 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> References: <86myk06e18.fsf@ds4.des.no> <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> Message-ID: On Mon, 4 Aug 2008, Doug Rabson wrote: > > Try using current - I updated heimdal to 1.1 in current. > > The GSS-API implementation in 7.x and current is a plugin system which > heimdal's krb5 code plugs into as a GSS-API mechanism provider. With heimdal > 1.1, it also supports spnego and ntlm as plugins. > Well, vanilla Heimdal-1.1 seems to work fine. However, when I try to link to the libraries in FreeBSD-CURRENT, I get a bunch of multiply defined globals, because it gets both external.o and gss_names.o, out of libgssapi.a and libgssapi_krb5.a respectively. Btw, I was able to use gss_inquire_sec_context_by_oid() to get the session key out of the security context. rick From koitsu at FreeBSD.org Fri Aug 8 03:39:02 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Aug 8 03:39:08 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <200808072040.55571.boris.kotzev@gmail.com> References: <200808071925.45786.boris.kotzev@gmail.com> <20080807165502.GA39420@eos.sc1.parodius.com> <200808072040.55571.boris.kotzev@gmail.com> Message-ID: <20080808033902.GA72860@eos.sc1.parodius.com> On Thu, Aug 07, 2008 at 08:40:55PM +0300, Boris Kotzev wrote: > ?? Thursday 07 August 2008 19:55:02 Jeremy Chadwick ??????: > > On Thu, Aug 07, 2008 at 07:25:45PM +0300, Boris Kotzev wrote: > > > Hello, > > > > > > I used the zfs port to Mac OS X (http://zfs.macosforge.org) to > > > create a storage pool under Mac OS X. The pool can be imported > > > successfully under FreeBSD: > > > > > > root:~-114# zpool import macpool > > > root:~-115# zpool list macpool > > > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > > > macpool 6,94G 510K 6,94G 0% ONLINE - > > > root:~-116# zfs list macpool > > > NAME USED AVAIL REFER MOUNTPOINT > > > macpool 474K 6,83G 308K /macpool > > > > > > and is fully accessible to the root user: > > > > > > root:~-118# id > > > uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) > > > root:~-119# ls -ld /macpool > > > drwxr-xr-x 7 root wheel 8 7 ??? 16:59 /macpool > > > root:~-120# ls -l /macpool > > > total 43 > > > drwx------ 3 root wheel 3 7 ??? 16:31 .Spotlight-V100 > > > -rw-r--r-- 1 root wheel 35014 7 ??? 16:31 .VolumeIcon.icns > > > drwx------ 2 root wheel 4 7 ??? 16:32 .fseventsd > > > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 backup > > > drwxr-xr-x 2 root wheel 2 7 ??? 16:59 downloads > > > drwxr-xr-x 2 root wheel 2 7 ??? 16:58 music > > > > > > According to the file permissions on /macpool (drwxr-xr-x), > > > anyone should have read access to it. This is not the case > > > though: > > > > > > root:~-121# su user > > > % id > > > uid=1003(user) gid=1003(user) > > > groups=1003(user),0(wheel),5(operator) % ls -l /macpool > > > ls: /macpool: Permission denied > > > % cd /macpool > > > /macpool: Permission denied. > > > > > > Is this a bug, or is there some way to get access to /macpool as > > > an ordinary user? > > > > > > The pool was created under version zfs-119 of the Mac OS X port; > > > the FreeBSD version is: > > > > > > root:~-122# uname -a > > > FreeBSD xxxx 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Aug 2 > > > 14:19:33 EEST 2008 root@xxxx:/usr/obj/usr/src/sys/MACBOOK amd64 > > > > > > with the latest zfs patch, but the problem was also present > > > before applying the patch. > > > > As root, what does "zfs get all macpool" return on FreeBSD? > > root@:~-116# zfs get all macpool > NAME PROPERTY VALUE SOURCE > macpool type filesystem - > macpool creation ?? ??? 7 16:31 2008 - > macpool used 474K - > macpool available 6,83G - > macpool referenced 308K - > macpool compressratio 1.00x - > macpool mounted yes - > macpool quota none default > macpool reservation none default > macpool recordsize 128K default > macpool mountpoint /macpool default > macpool sharenfs off default > macpool checksum on default > macpool compression off default > macpool atime on default > macpool devices on default > macpool exec on default > macpool setuid on default > macpool readonly off default > macpool jailed off default > macpool snapdir hidden default > macpool aclmode groupmask default > macpool aclinherit restricted default > macpool canmount on default > macpool shareiscsi off default > macpool xattr off temporary > macpool copies 1 default > macpool version 1 - > macpool utf8only off - > macpool normalization none - > macpool casesensitivity sensitive - > macpool vscan off default > macpool nbmand off default > macpool sharesmb off default > macpool refquota none default > macpool refreservation none default It's interesting to note that your filesystem has a significantly larger number of properties returned than mine. I wonder if the ZFS code has support for those properties on FreeBSD, but they simply aren't listed. Or maybe the patch you're using adds all of them? I don't know. Anyway, the property that may be relevant is aclinherit. The zfs(1) manpage on FreeBSD makes no mention of what "restricted" means for property "aclinherit". I believe it may be the source of the problem. A ZFS filesystem made on FreeBSD has a different value for that property. I explicitly enabled compression on the below fs, BTW, which is why that value is not the default value: NAME PROPERTY VALUE SOURCE storage type filesystem - storage creation Sun May 25 19:33 2008 - storage used 183G - storage available 730G - storage referenced 183G - storage compressratio 1.02x - storage mounted yes - storage quota none default storage reservation none default storage recordsize 128K default storage mountpoint /storage default storage sharenfs off default storage checksum on default storage compression on local storage atime off local storage devices on default storage exec on default storage setuid on default storage readonly off default storage jailed off default storage snapdir hidden default storage aclmode groupmask default storage aclinherit secure default storage canmount on default storage shareiscsi off default storage xattr off temporary storage copies 1 default -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From bu7cher at yandex.ru Fri Aug 8 04:24:08 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Fri Aug 8 04:24:15 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080807121245.GA26629@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> <20080807121245.GA26629@eos.sc1.parodius.com> Message-ID: <489BCA4D.3050704@yandex.ru> Jeremy Chadwick wrote: > In almost every case I've looked at so far, the individuals' chipsets, > disks, and overall setup are different. SMART statistics on the drives > show absolutely no sign of errors, or anything that indicates a hardware > failure. Many of the users are using AHCI as well (myself included, and > I have seen the DMA error issue myself), which is more reliable than > classic IDE. I have done some work on AHCI part of ATA driver and I am looking for testers... http://perforce.freebsd.org/changeList.cgi?CMD=changes&FSPC=//depot/user/butcher/src/... > It would be benefitial if there was some form of sysctl to increase the > verbosity from the ATA subsystem when an error happens. The existing > data we get back is terse, and barely useful. I know for a fact there's > more debug information that could be output in such scenarios. And > please do not reply with "good idea, send patches" unless you're wanting > to be chewed out. :-) Ok, I'll try to add some verbose 'printfs' in my branch in perforce :) >> I'm going to do some analysis and find out whether I can find any of our >> systems that may be experiencing ATA errors that don't correlate with >> what their SMART data is saying. To date I haven't caught any, but >> that's not to say they may not be happening... just that all of the ones >> I have caught to date do appear to have been hardware-related issues... IMHO. Today we have many hardware versions and revisions and some of them are buggy. But another OSes (windows, linux) work with buggy hardware without big problems. Yes, some developers have docs and can make workarounds.. I think our ata driver needs new error handling subsystem, which can correctly handle errors. -- WBR, Andrey V. Elsukov From md at hudora.de Fri Aug 8 06:37:02 2008 From: md at hudora.de (Maximillian Dornseif) Date: Fri Aug 8 06:37:09 2008 Subject: Strange (?) hangs with ZFS/rsync. In-Reply-To: <18881178.post@talk.nabble.com> References: <18880785.post@talk.nabble.com> <18881178.post@talk.nabble.com> Message-ID: <18886364.post@talk.nabble.com> Maximillian Dornseif wrote: > > The machine does not reboot but hangs during the "syncing" phase of > shutdown. It has to be powercycled to restart. > after a shutdown -r now the box complains that it couldn't terminate vnlru, bufdaemon and syncer. -- View this message in context: http://www.nabble.com/Strange-%28-%29-hangs-with-ZFS-rsync.-tp18880785p18886364.html Sent from the freebsd-fs mailing list archive at Nabble.com. From dfr at rabson.org Fri Aug 8 08:08:01 2008 From: dfr at rabson.org (Doug Rabson) Date: Fri Aug 8 08:08:08 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: References: <86myk06e18.fsf@ds4.des.no> <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> Message-ID: On 8 Aug 2008, at 01:04, Rick Macklem wrote: > > > On Mon, 4 Aug 2008, Doug Rabson wrote: >> >> Try using current - I updated heimdal to 1.1 in current. >> >> The GSS-API implementation in 7.x and current is a plugin system >> which heimdal's krb5 code plugs into as a GSS-API mechanism >> provider. With heimdal 1.1, it also supports spnego and ntlm as >> plugins. >> > Well, vanilla Heimdal-1.1 seems to work fine. However, when I try to > link > to the libraries in FreeBSD-CURRENT, I get a bunch of multiply defined > globals, because it gets both external.o and gss_names.o, out of > libgssapi.a and libgssapi_krb5.a respectively. Don't use static linking? > > Btw, I was able to use gss_inquire_sec_context_by_oid() to get the > session key out of the security context. Excellent. From gary.jennejohn at freenet.de Fri Aug 8 11:01:30 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Fri Aug 8 11:01:36 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <20080808033902.GA72860@eos.sc1.parodius.com> References: <200808071925.45786.boris.kotzev@gmail.com> <20080807165502.GA39420@eos.sc1.parodius.com> <200808072040.55571.boris.kotzev@gmail.com> <20080808033902.GA72860@eos.sc1.parodius.com> Message-ID: <20080808130127.4cc71ac9@peedub.jennejohn.org> On Thu, 7 Aug 2008 20:39:02 -0700 Jeremy Chadwick wrote: > On Thu, Aug 07, 2008 at 08:40:55PM +0300, Boris Kotzev wrote: [snip] > > macpool aclinherit restricted default > > macpool canmount on default > > macpool shareiscsi off default > > macpool xattr off temporary > > macpool copies 1 default > > macpool version 1 - > > macpool utf8only off - > > macpool normalization none - > > macpool casesensitivity sensitive - > > macpool vscan off default > > macpool nbmand off default > > macpool sharesmb off default > > macpool refquota none default > > macpool refreservation none default > > It's interesting to note that your filesystem has a significantly larger > number of properties returned than mine. I wonder if the ZFS code has > support for those properties on FreeBSD, but they simply aren't listed. > Or maybe the patch you're using adds all of them? I don't know. > > Anyway, the property that may be relevant is aclinherit. The zfs(1) > manpage on FreeBSD makes no mention of what "restricted" means for > property "aclinherit". I believe it may be the source of the problem. > > A ZFS filesystem made on FreeBSD has a different value for that > property. I explicitly enabled compression on the below fs, BTW, which > is why that value is not the default value: > No, it doesn't necessarily. Here the output from a ZFS FS made with FreeBSD but using the old version 6 ZFS: root:peedub:~:bash:1> zfs get all mirpool NAME PROPERTY VALUE SOURCE mirpool type filesystem - mirpool creation Sat Nov 24 17:53 2007 - mirpool used 141G - mirpool available 316G - mirpool referenced 18K - mirpool compressratio 1.00x - mirpool mounted yes - mirpool quota none default mirpool reservation none default mirpool recordsize 128K default mirpool mountpoint /mirpool default mirpool sharenfs off local mirpool checksum on default mirpool compression off default mirpool atime on default mirpool devices on default mirpool exec on default mirpool setuid on default mirpool readonly off default mirpool jailed off default mirpool snapdir hidden default mirpool aclmode groupmask default mirpool aclinherit restricted default <== mirpool canmount on default mirpool shareiscsi off default mirpool xattr off temporary mirpool copies 1 default mirpool version 1 - mirpool utf8only off - mirpool normalization none - mirpool casesensitivity sensitive - mirpool vscan off default mirpool nbmand off default mirpool sharesmb off default mirpool refquota none default mirpool refreservation none default root:peedub:~:bash:2> zfs set aclinherit=secure mirpool property 'aclinherit' not supported on FreeBSD: permission denied Apparently it's not really used. > NAME PROPERTY VALUE SOURCE > storage type filesystem - > storage creation Sun May 25 19:33 2008 - > storage used 183G - > storage available 730G - > storage referenced 183G - > storage compressratio 1.02x - > storage mounted yes - > storage quota none default > storage reservation none default > storage recordsize 128K default > storage mountpoint /storage default > storage sharenfs off default > storage checksum on default > storage compression on local > storage atime off local > storage devices on default > storage exec on default > storage setuid on default > storage readonly off default > storage jailed off default > storage snapdir hidden default > storage aclmode groupmask default > storage aclinherit secure default <== > storage canmount on default > storage shareiscsi off default > storage xattr off temporary > storage copies 1 default > --- Gary Jennejohn From koitsu at FreeBSD.org Fri Aug 8 11:26:13 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Aug 8 11:26:24 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <20080808130127.4cc71ac9@peedub.jennejohn.org> References: <200808071925.45786.boris.kotzev@gmail.com> <20080807165502.GA39420@eos.sc1.parodius.com> <200808072040.55571.boris.kotzev@gmail.com> <20080808033902.GA72860@eos.sc1.parodius.com> <20080808130127.4cc71ac9@peedub.jennejohn.org> Message-ID: <20080808112613.GA91032@eos.sc1.parodius.com> On Fri, Aug 08, 2008 at 01:01:27PM +0200, Gary Jennejohn wrote: > On Thu, 7 Aug 2008 20:39:02 -0700 > Jeremy Chadwick wrote: > > > On Thu, Aug 07, 2008 at 08:40:55PM +0300, Boris Kotzev wrote: > [snip] > > > macpool aclinherit restricted default > > > macpool canmount on default > > > macpool shareiscsi off default > > > macpool xattr off temporary > > > macpool copies 1 default > > > macpool version 1 - > > > macpool utf8only off - > > > macpool normalization none - > > > macpool casesensitivity sensitive - > > > macpool vscan off default > > > macpool nbmand off default > > > macpool sharesmb off default > > > macpool refquota none default > > > macpool refreservation none default > > > > It's interesting to note that your filesystem has a significantly larger > > number of properties returned than mine. I wonder if the ZFS code has > > support for those properties on FreeBSD, but they simply aren't listed. > > Or maybe the patch you're using adds all of them? I don't know. > > > > Anyway, the property that may be relevant is aclinherit. The zfs(1) > > manpage on FreeBSD makes no mention of what "restricted" means for > > property "aclinherit". I believe it may be the source of the problem. > > > > A ZFS filesystem made on FreeBSD has a different value for that > > property. I explicitly enabled compression on the below fs, BTW, which > > is why that value is not the default value: > > No, it doesn't necessarily. Here the output from a ZFS FS made with > FreeBSD but using the old version 6 ZFS: > > root:peedub:~:bash:1> zfs get all mirpool > NAME PROPERTY VALUE SOURCE > mirpool type filesystem - > mirpool creation Sat Nov 24 17:53 2007 - > mirpool used 141G - > mirpool available 316G - > mirpool referenced 18K - > mirpool compressratio 1.00x - > mirpool mounted yes - > mirpool quota none default > mirpool reservation none default > mirpool recordsize 128K default > mirpool mountpoint /mirpool default > mirpool sharenfs off local > mirpool checksum on default > mirpool compression off default > mirpool atime on default > mirpool devices on default > mirpool exec on default > mirpool setuid on default > mirpool readonly off default > mirpool jailed off default > mirpool snapdir hidden default > mirpool aclmode groupmask default > mirpool aclinherit restricted default <== > mirpool canmount on default > mirpool shareiscsi off default > mirpool xattr off temporary > mirpool copies 1 default > mirpool version 1 - > mirpool utf8only off - > mirpool normalization none - > mirpool casesensitivity sensitive - > mirpool vscan off default > mirpool nbmand off default > mirpool sharesmb off default > mirpool refquota none default > mirpool refreservation none default > > root:peedub:~:bash:2> zfs set aclinherit=secure mirpool > property 'aclinherit' not supported on FreeBSD: permission denied > > Apparently it's not really used. You need to remember the individual is using the patch on CURRENT provided by pjd, which bring ZFS up to the latest OpenSolaris version. It's possible on that version it *is* implemented; I do not know. Based on the manpage description for aclinherit, that option could definitely cause what he's seeing. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From gary.jennejohn at freenet.de Fri Aug 8 11:51:15 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Fri Aug 8 11:51:22 2008 Subject: Fw: zfs - no access to a Mac OS X zfs pool without root privileges Message-ID: <20080808135112.7bf59d83@peedub.jennejohn.org> Oops, forgot to include the ML. Begin forwarded message: Date: Fri, 8 Aug 2008 13:49:38 +0200 From: Gary Jennejohn To: Jeremy Chadwick Subject: Re: zfs - no access to a Mac OS X zfs pool without root privileges On Fri, 8 Aug 2008 04:26:13 -0700 Jeremy Chadwick wrote: > On Fri, Aug 08, 2008 at 01:01:27PM +0200, Gary Jennejohn wrote: [BIG snip] > > mirpool aclinherit restricted default <== > > mirpool canmount on default > > mirpool shareiscsi off default > > mirpool xattr off temporary > > mirpool copies 1 default > > mirpool version 1 - > > mirpool utf8only off - > > mirpool normalization none - > > mirpool casesensitivity sensitive - > > mirpool vscan off default > > mirpool nbmand off default > > mirpool sharesmb off default > > mirpool refquota none default > > mirpool refreservation none default > > > > root:peedub:~:bash:2> zfs set aclinherit=secure mirpool > > property 'aclinherit' not supported on FreeBSD: permission denied > > > > Apparently it's not really used. > > You need to remember the individual is using the patch on CURRENT > provided by pjd, which bring ZFS up to the latest OpenSolaris version. > It's possible on that version it *is* implemented; I do not know. > > Based on the manpage description for aclinherit, that option could > definitely cause what he's seeing. > I _am_ using the patched version from pjd@. garyj:peedub:freebsd:-bash:11> grep ZFS /var/run/dmesg.boot WARNING: ZFS is considered to be an experimental feature in FreeBSD. ZFS filesystem version 11 ZFS storage pool version 11 --- Gary Jennejohn From boris.kotzev at gmail.com Fri Aug 8 13:00:55 2008 From: boris.kotzev at gmail.com (Boris Kotzev) Date: Fri Aug 8 13:01:01 2008 Subject: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <20080808033902.GA72860@eos.sc1.parodius.com> References: <200808071925.45786.boris.kotzev@gmail.com> <200808072040.55571.boris.kotzev@gmail.com> <20080808033902.GA72860@eos.sc1.parodius.com> Message-ID: <200808081600.47603.boris.kotzev@gmail.com> ?? Friday 08 August 2008 06:39:02 ?????????: > On Thu, Aug 07, 2008 at 08:40:55PM +0300, Boris Kotzev wrote: > > ?? Thursday 07 August 2008 19:55:02 Jeremy Chadwick ??????: > > > On Thu, Aug 07, 2008 at 07:25:45PM +0300, Boris Kotzev wrote: > > > > Hello, > > > > > > > > I used the zfs port to Mac OS X (http://zfs.macosforge.org) > > > > to create a storage pool under Mac OS X. The pool can be > > > > imported successfully under FreeBSD: > > > > > > > > root:~-114# zpool import macpool > > > > root:~-115# zpool list macpool > > > > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > > > > macpool 6,94G 510K 6,94G 0% ONLINE - > > > > root:~-116# zfs list macpool > > > > NAME USED AVAIL REFER MOUNTPOINT > > > > macpool 474K 6,83G 308K /macpool > > > > > > > > and is fully accessible to the root user: > > > > > > > > root:~-118# id > > > > uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) > > > > root:~-119# ls -ld /macpool > > > > drwxr-xr-x 7 root wheel 8 7 ??? 16:59 /macpool > > > > root:~-120# ls -l /macpool > > > > total 43 > > > > drwx------ 3 root wheel 3 7 ??? 16:31 .Spotlight-V100 > > > > -rw-r--r-- 1 root wheel 35014 7 ??? 16:31 > > > > .VolumeIcon.icns drwx------ 2 root wheel 4 7 ??? > > > > 16:32 .fseventsd drwxr-xr-x 2 root wheel 2 7 ??? > > > > 16:59 backup drwxr-xr-x 2 root wheel 2 7 ??? 16:59 > > > > downloads drwxr-xr-x 2 root wheel 2 7 ??? 16:58 music > > > > > > > > According to the file permissions on /macpool (drwxr-xr-x), > > > > anyone should have read access to it. This is not the case > > > > though: > > > > > > > > root:~-121# su user > > > > % id > > > > uid=1003(user) gid=1003(user) > > > > groups=1003(user),0(wheel),5(operator) % ls -l /macpool > > > > ls: /macpool: Permission denied > > > > % cd /macpool > > > > /macpool: Permission denied. > > > > > > > > Is this a bug, or is there some way to get access to /macpool > > > > as an ordinary user? > > > > > > > > The pool was created under version zfs-119 of the Mac OS X > > > > port; the FreeBSD version is: > > > > > > > > root:~-122# uname -a > > > > FreeBSD xxxx 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Aug 2 > > > > 14:19:33 EEST 2008 root@xxxx:/usr/obj/usr/src/sys/MACBOOK > > > > amd64 > > > > > > > > with the latest zfs patch, but the problem was also present > > > > before applying the patch. > > > > > > As root, what does "zfs get all macpool" return on FreeBSD? > > > > root@:~-116# zfs get all macpool > > NAME PROPERTY VALUE SOURCE > > macpool type filesystem - > > macpool creation ?? ??? 7 16:31 2008 - > > macpool used 474K - > > macpool available 6,83G - > > macpool referenced 308K - > > macpool compressratio 1.00x - > > macpool mounted yes - > > macpool quota none default > > macpool reservation none default > > macpool recordsize 128K default > > macpool mountpoint /macpool default > > macpool sharenfs off default > > macpool checksum on default > > macpool compression off default > > macpool atime on default > > macpool devices on default > > macpool exec on default > > macpool setuid on default > > macpool readonly off default > > macpool jailed off default > > macpool snapdir hidden default > > macpool aclmode groupmask default > > macpool aclinherit restricted default > > macpool canmount on default > > macpool shareiscsi off default > > macpool xattr off temporary > > macpool copies 1 default > > macpool version 1 - > > macpool utf8only off - > > macpool normalization none - > > macpool casesensitivity sensitive - > > macpool vscan off default > > macpool nbmand off default > > macpool sharesmb off default > > macpool refquota none default > > macpool refreservation none default > > It's interesting to note that your filesystem has a significantly > larger number of properties returned than mine. I wonder if the > ZFS code has support for those properties on FreeBSD, but they > simply aren't listed. Or maybe the patch you're using adds all of > them? I don't know. > The extra properties appeared after applying the ZFS patches. The newer versions of zfs and zpool exhibit more poperties than zpool version 6 and zfs version 1: % zpool upgrade -v This system is currently running ZFS pool version 11. The following versions are supported: VER DESCRIPTION --- -------------------------------------------------------- 1 Initial ZFS version 2 Ditto blocks (replicated metadata) 3 Hot spares and double parity RAID-Z 4 zpool history 5 Compression using the gzip algorithm 6 bootfs pool property 7 Separate intent log devices 8 Delegated administration 9 refquota and refreservation properties 10 Cache devices 11 Improved scrub performance For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/N Where 'N' is the version number. % zfs upgrade -v The following filesystem versions are supported: VER DESCRIPTION --- -------------------------------------------------------- 1 Initial ZFS filesystem version 2 Enhanced directory entries 3 Case insensitive and File system unique identifer (FUID) For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/zpl/N Where 'N' is the version number. > Anyway, the property that may be relevant is aclinherit. The > zfs(1) manpage on FreeBSD makes no mention of what "restricted" > means for property "aclinherit". I believe it may be the source of > the problem. This property has different values under FreeBSD and Mac OS X. It is shown as "secure" in Mac OS X: sh-3.2# zfs get aclinherit macpool NAME PROPERTY VALUE SOURCE macpool aclinherit secure default It is not possible to change the value inder FreeBSD: root@:/-112# zfs set aclinherit=discard macpool property 'aclinherit' not supported on FreeBSD: permission denied I set the value under Mac OS X to "discard" but the change did not seem to make any difference. > > A ZFS filesystem made on FreeBSD has a different value for that > property. I explicitly enabled compression on the below fs, BTW, > which is why that value is not the default value: > > NAME PROPERTY VALUE SOURCE > storage type filesystem - > storage creation Sun May 25 19:33 2008 - > storage used 183G - > storage available 730G - > storage referenced 183G - > storage compressratio 1.02x - > storage mounted yes - > storage quota none default > storage reservation none default > storage recordsize 128K default > storage mountpoint /storage default > storage sharenfs off default > storage checksum on default > storage compression on local > storage atime off local > storage devices on default > storage exec on default > storage setuid on default > storage readonly off default > storage jailed off default > storage snapdir hidden default > storage aclmode groupmask default > storage aclinherit secure default > storage canmount on default > storage shareiscsi off default > storage xattr off temporary > storage copies 1 default It is also possible to import a pool created under FreeBSD to Mac OS X but whenever I write to the pool in Mac OS X and then try to read the entries in FreeBSD, I encounter the same problem: the entries created under Mac OS X are accessible by the root user only. I also noticed that all entries in a FreeBSD pool acquired ACL's in Mac OS X. For example the etc directory of FreeBSD has the following ACL in MAC OS X: sh-3.2# ls -lde etc drwxr-xr-x+ 19 root wheel 122 7 ??? 18:39 etc 0: group:nogroup deny This ACL looks suspicious to me though when I compare it to the ACL's on the Mac OS X hfs+ volume: sh-3.2# ls -lde /Applications drwxrwxr-x+ 49 root admin 1666 6 ??? 21:27 /Applications 0: group:everyone deny delete Can the problem be related to the fact that I run the AMD 64 version of FreeBSD? Thanks, Boris Kotzev From koitsu at FreeBSD.org Fri Aug 8 13:12:41 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Aug 8 13:12:47 2008 Subject: Fw: zfs - no access to a Mac OS X zfs pool without root privileges In-Reply-To: <20080808135112.7bf59d83@peedub.jennejohn.org> References: <20080808135112.7bf59d83@peedub.jennejohn.org> Message-ID: <20080808131241.GA94716@eos.sc1.parodius.com> On Fri, Aug 08, 2008 at 01:51:12PM +0200, Gary Jennejohn wrote: > Oops, forgot to include the ML. > > Begin forwarded message: > > Date: Fri, 8 Aug 2008 13:49:38 +0200 > From: Gary Jennejohn > To: Jeremy Chadwick > Subject: Re: zfs - no access to a Mac OS X zfs pool without root privileges > > > On Fri, 8 Aug 2008 04:26:13 -0700 > Jeremy Chadwick wrote: > > > On Fri, Aug 08, 2008 at 01:01:27PM +0200, Gary Jennejohn wrote: > [BIG snip] > > > mirpool aclinherit restricted default <== > > > mirpool canmount on default > > > mirpool shareiscsi off default > > > mirpool xattr off temporary > > > mirpool copies 1 default > > > mirpool version 1 - > > > mirpool utf8only off - > > > mirpool normalization none - > > > mirpool casesensitivity sensitive - > > > mirpool vscan off default > > > mirpool nbmand off default > > > mirpool sharesmb off default > > > mirpool refquota none default > > > mirpool refreservation none default > > > > > > root:peedub:~:bash:2> zfs set aclinherit=secure mirpool > > > property 'aclinherit' not supported on FreeBSD: permission denied > > > > > > Apparently it's not really used. > > > > You need to remember the individual is using the patch on CURRENT > > provided by pjd, which bring ZFS up to the latest OpenSolaris version. > > It's possible on that version it *is* implemented; I do not know. > > > > Based on the manpage description for aclinherit, that option could > > definitely cause what he's seeing. > > > > I _am_ using the patched version from pjd@. > > garyj:peedub:freebsd:-bash:11> grep ZFS /var/run/dmesg.boot > WARNING: ZFS is considered to be an experimental feature in FreeBSD. > ZFS filesystem version 11 > ZFS storage pool version 11 Then I don't have an explanation. The only thing different I see about the filesystem is that aclinherit option is "restricted" (and the value of that option, I believe, could cause what you're seeing), while on ZFS fs/pool version 6, the default appears to be "secure". Possibly they're the same, just renamed? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From rmacklem at uoguelph.ca Fri Aug 8 14:18:08 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Aug 8 14:18:14 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: References: <86myk06e18.fsf@ds4.des.no> <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> Message-ID: On Thu, 7 Aug 2008, Rick Macklem wrote: > > > On Mon, 4 Aug 2008, Doug Rabson wrote: >> >> Try using current - I updated heimdal to 1.1 in current. >> >> The GSS-API implementation in 7.x and current is a plugin system which >> heimdal's krb5 code plugs into as a GSS-API mechanism provider. With >> heimdal 1.1, it also supports spnego and ntlm as plugins. >> > Well, vanilla Heimdal-1.1 seems to work fine. However, when I try to link > to the libraries in FreeBSD-CURRENT, I get a bunch of multiply defined > globals, because it gets both external.o and gss_names.o, out of > libgssapi.a and libgssapi_krb5.a respectively. > Oops, spoke too soon. It worked for a mount last night, but couldn't re-acquire fresh credentials this morning. (There are slightly different problems with Heimdal-0.8 and Heimdal-1.1, but they both seem related to getting a TGT via the keytab entry.) I'm going to try contacting the Heimdal folks. (In the meantime, I'm back to Heimdal-0.7 which works fine.) If you're doing RPCSEC_GSS for the NLM, you are probably going to want this to work too. (Solaris uses a keytab entry with root/.@ in it for root accesse.) rick From rmacklem at uoguelph.ca Fri Aug 8 14:24:31 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Fri Aug 8 14:24:37 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: References: <86myk06e18.fsf@ds4.des.no> <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> Message-ID: On Fri, 8 Aug 2008, Doug Rabson wrote: > > Don't use static linking? > When I linked without "-static", it would crash in the gss_acquire_cred() call. (For some reason, I couldn't find a core dump left anywhere.) rick From ivoras at freebsd.org Fri Aug 8 20:40:02 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Aug 8 20:40:08 2008 Subject: Strange (?) hangs with ZFS/rsync. In-Reply-To: <18880785.post@talk.nabble.com> References: <18880785.post@talk.nabble.com> Message-ID: Maximillian Dornseif wrote: > I have a ZFS based backup server which rsyncs from a dozen other machines. > There is only a single rsync active for any point in time and the system is > used for nothing else. It has an amd64 kernel and 4GB RAM. I tried to follow > the ZFS tuning guide. > > For a few weeks the machine works well. Since monday it rsync always hangs > when fetching a big logfile from a remote machine. > > The two rsync processes are in state "zfs:&b" and "zfs:lo" and can't be > killed. > The other strange thing is that the system is reporting 3367M (!) wired > memory. These look like well known problems with ZFS (see http://wiki.freebsd.org/ZFSKnownProblems). A new version of the ZFS port is currently under development, if you want to try it, you need a 8-CURRENT machine and this patch: http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004887.html People have reported that the new patch resolves hangs such as yours. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080808/8a4b98bd/signature.pgp From peter.schuller at infidyne.com Fri Aug 8 22:06:00 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri Aug 8 22:06:12 2008 Subject: Asynchronous writing to zvols (ZFS) In-Reply-To: <20080806185909.GC2580@garage.freebsd.pl> References: <200807262005.54235.peter.schuller@infidyne.com> <200807272026.54907.peter.schuller@infidyne.com> <20080806185909.GC2580@garage.freebsd.pl> Message-ID: <200808090007.25865.peter.schuller@infidyne.com> [zvol write performance less than file-on-zfs] > Not sure why's that, I spent no time on optimizing ZVOL yet, sorry. Absolutely, I was not complaining! I just felt it was worth mentioning. It was not meant to be negative criticism. > With the patch above we synchoronize in-memory transactions every 5 > seconds or when queue is full or when we receive BIO_FLUSH. That was my understanding. Sorry, I probably wasn't being clear. I thought your original comment ("The problem is that we don't between async and sync I/O request on GEOM level") implied that there was some reason one could not trust bio_cmd to be preserved correctly (somehow being downgraded from BIO_FLUSH before it reaches the zvol class). Because if this was the case it would make perfect sense under the circumstances to treat all writes as flushes, in order to achieve correct semantics for *actual* flushes. So my question was meant to confirm that this was not the case, because if it were, the patch would mean that *actual* flushes would not get treated as such. Thanks, -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080808/3f0d4148/attachment.pgp From peter.schuller at infidyne.com Fri Aug 8 22:18:38 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri Aug 8 22:18:45 2008 Subject: ZFS Advice In-Reply-To: <489B2FFE.5050406@barryp.org> References: <489B2FFE.5050406@barryp.org> Message-ID: <200808090020.04315.peter.schuller@infidyne.com> > >>> http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm > > I think CDW is mistaken in saying it's a PCI-E card, UIO is a > proprietary Supermicro bus that some of their motherboards support. > > http://www.supermicro.com/products/nfo/UIO.cfm As far as I can tell the USAS-L8i on the supermicro page is claimed by them to be PCI-E. Or are you saying the one on CDW is actually not the same card? -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080808/838d59fe/attachment.pgp From peter.schuller at infidyne.com Fri Aug 8 22:27:13 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri Aug 8 22:27:20 2008 Subject: ZFS Advice In-Reply-To: References: <200808061928.37001.peter.schuller@infidyne.com> Message-ID: <200808090028.39729.peter.schuller@infidyne.com> > http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm > > http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/ >lsisas1068e/index.html > > Not much onboard ram, but it's PCI-E and even SAS. CDW lists it for $155. > That would be cheaper than buying a new board with a PCI-X slot or two, > and would even handle SAS drives. Claims to be based on the "LSISAS 1068E > SAS controller". Any idea if that is supported? I don't see it listed in > the mfi man page. LSI has a Linux driver for download. That card looks > like it would be just what I need. Well: ./dev/mpt/mpilib/mpi_cnfg.h:#define MPI_MANUFACTPAGE_DEVID_SAS1068E (0x0058) ./dev/mpt/mpilib/mpi_ioc.h: * 03-11-05 01.05.08 Added family code for 1068E family. ./dev/mpt/mpilib/mpi_ioc.h:#define MPI_FW_HEADER_PID_FAMILY_106xE_SAS (0x0004) /* 1068E, 1066E, and 1064E */ And mpt(4) has "LSI Logic AS1064, LSI Logic AS1068 (SAS/SATA)" listed as supported. So as far as I can tell it should be supported, assuming it truly works in practice. Given past history in trying to find suitable controllers I have to be paranoid and wonder what I am missing... surely I cannot have found the perfect controller. Does anyone have hands-on experience? I especially like the SAS support; particularly since I have recently noticed the availability of low-cost 7k2rpm SAS drives. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080808/ec58f74b/attachment.pgp From hartzell at alerce.com Sat Aug 9 02:03:03 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat Aug 9 02:03:09 2008 Subject: ZFS Advice In-Reply-To: <200808090020.04315.peter.schuller@infidyne.com> References: <489B2FFE.5050406@barryp.org> <200808090020.04315.peter.schuller@infidyne.com> Message-ID: <18588.64214.354495.804458@almost.alerce.com> Peter Schuller writes: > > >>> http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm > > > > I think CDW is mistaken in saying it's a PCI-E card, UIO is a > > proprietary Supermicro bus that some of their motherboards support. > > > > http://www.supermicro.com/products/nfo/UIO.cfm > > As far as I can tell the USAS-L8i on the supermicro page is claimed by them to > be PCI-E. Or are you saying the one on CDW is actually not the same card? > I've been googling around trying to figure out if UIO is the same thing as PCI-E. Supermicro has a bunch of motherboards for which the description explicitly points out PCI-E and UIO slots, which makes it sounds like they're different beasts. Or, it could be that a UIO slot is specifically a PCI-Ex8 slot? You can buy risers that convert "1U PCI-E (x16) to 1 UIO and 1 PCI-E " Supermicro's description of the card does explicitly say that it "uses a PCI Express host interface", but maybe they mean that they're using it in some nonstandard fashion? It's confusing... g. From hartzell at alerce.com Sat Aug 9 02:08:59 2008 From: hartzell at alerce.com (George Hartzell) Date: Sat Aug 9 02:09:06 2008 Subject: Strange (?) hangs with ZFS/rsync. In-Reply-To: References: <18880785.post@talk.nabble.com> Message-ID: <18588.64570.786730.812483@almost.alerce.com> Ivan Voras writes: > Maximillian Dornseif wrote: > > I have a ZFS based backup server which rsyncs from a dozen other machines. > > There is only a single rsync active for any point in time and the system is > > used for nothing else. It has an amd64 kernel and 4GB RAM. I tried to follow > > the ZFS tuning guide. > > > > For a few weeks the machine works well. Since monday it rsync always hangs > > when fetching a big logfile from a remote machine. > > > > The two rsync processes are in state "zfs:&b" and "zfs:lo" and can't be > > killed. > > The other strange thing is that the system is reporting 3367M (!) wired > > memory. > > These look like well known problems with ZFS (see > http://wiki.freebsd.org/ZFSKnownProblems). A new version of the ZFS port > is currently under development, if you want to try it, you need a > 8-CURRENT machine and this patch: > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004887.html > > People have reported that the new patch resolves hangs such as yours. Not to be a party pooper, but on an arguably underconfigured server (2GB RAM, dual core AMD, amd64), moving to -CURRENT and applying the new patch actually required that I tune it with an even lower arc_max than I had been getting away with on the -STABLE zfs with the same hardware. My testing (described on this list a week or two back) is admittedly unscientific and it may be that the -STABLE configuration was already skating on thin ice but it's certainly not the case that the subsystem now autotunes itself on limited hardware. But, with kernel mem and arc configured as many folks have described, both versions run nicely in spite of anything I've thrown at them and the zfs feature set is great. Thanks again to Pawel and all involved! g. From peter.schuller at infidyne.com Sat Aug 9 07:16:07 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sat Aug 9 07:16:15 2008 Subject: ZFS Advice In-Reply-To: <18588.64214.354495.804458@almost.alerce.com> References: <200808090020.04315.peter.schuller@infidyne.com> <18588.64214.354495.804458@almost.alerce.com> Message-ID: <200808090917.34149.peter.schuller@infidyne.com> > Or, it could be that a UIO slot is specifically a PCI-Ex8 slot? You > can buy risers that convert "1U PCI-E (x16) to 1 UIO and 1 PCI-E " http://www.supermicro.com/products/nfo/UIO_cards.cfm This one actually says in the clear: "8-Lane PCI-Express interface (Supermicro UIO slot)" My guess is one of the following: * It means nothing but "PCI-E", and the UIO stuff is just marketing BS. * It means "extra PCI-E", and the UIO stuff is just marketing BS. * (Based on the grapical animation on the UIO page) They actually do have a special slot on their motherboard for use with their special UIO card which then provides a few extra PCI-E slots. The mentioning of UIO on pages describing standard PCI-E cards is just marketing BS resulting from the technically correct fact that they can be used with their UIO card. If I don't find specific information to the contrary I'll probably chance it and see. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080809/4d81ea04/attachment.pgp From dimitar.vassilev at gmail.com Sat Aug 9 09:03:38 2008 From: dimitar.vassilev at gmail.com (Dimitar Vasilev) Date: Sat Aug 9 09:03:45 2008 Subject: zfs snapshot panic problem Message-ID: <59adc1a0808090138t7ab9913bmc08d42fb56801e0d@mail.gmail.com> Hi all, I'm having a problem with a 7-stable SMP amd64 machine running zfs snapshots for backups. It starts to complain about bad file descriptors after running 8 days without a problem. Then we try to unmount the problem fs and we got a system crash. Kernel config ident FOO include GENERIC nooptions SCHED_4BSD options SCHED_ULE options GEOM_JOURNAL options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_PRIQ options ALTQ_NOPCC options DEVICE_POLLING options ZERO_COPY_SOCKETS options HZ=2000 # device pf device pflog #logging support interface for PF device pfsync #synchronization interface for PF device carp #Common Address Redundancy Protocol Here is output of backtrace: kgdb -c /var/crash/vmcore.7 /boot/kernel/kernel GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc0 fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffffff804c2515 stack pointer = 0x10:0xffffffffd77e4980 frame pointer = 0x10:0xffffffffd77e4a20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 19469 (zfs) trap number = 12 panic: page fault cpuid = 0 Uptime: 8d12h7m45s Physical memory: 2034 MB Dumping 1417 MB: 1402 1386 1370 1354 1338 1322 1306 1290 1274 1258 1242 1226 1210 1194 1178 1162 1146 1130 1114 1098 1082 1066 1050 1034 1018 1002 986 970 954 938 922 906 890 874 858 842 826 810 794 778 762 746 730 714 698 682 666 650 634 618 602 586 570 554 538 522 506 490 474 458 442 426 410 394 378 362 346 330 314 298 282 266 250 234 218 202 186 170 154 138 122 106 90 74 58 42 26 10 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/geom_journal.ko...Reading symbols from /boot/kernel/geom_journal.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_journal.ko Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/fdescfs.ko Reading symbols from /boot/kernel/pflog.ko...Reading symbols from /boot/kernel/pflog.ko.symbols...done. done. Loaded symbols for /boot/kernel/pflog.ko Reading symbols from /boot/kernel/pf.ko...Reading symbols from /boot/kernel/pf.ko.symbols...done. done. Loaded symbols for /boot/kernel/pf.ko Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /boot/kernel/accf_http.ko.symbols...done. done. Loaded symbols for /boot/kernel/accf_http.ko #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff804ba839 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff804bac3d in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:572 #4 0xffffffff807871c4 in trap_fatal (frame=0xffffff00015c7360, eva=18446742974224558304) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff80787595 in trap_pfault (frame=0xffffffffd77e48d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff80787ed8 in trap (frame=0xffffffffd77e48d0) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff8076d8ae in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffffff804c2515 in _sx_xlock (sx=0xa0, opts=0, file=0xffffffff80cb88e0 "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c", line=1069) at atomic.h:142 #9 0xffffffff80c9fb3a in zfsctl_umount_snapshots (vfsp=Variable "vfsp" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1069 #10 0xffffffff80ca6988 in zfs_umount (vfsp=0xffffff0001560a68, fflag=0, td=0xffffff00015c7360) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:692 #11 0xffffffff80533dbe in dounmount (mp=0xffffff0001560a68, flags=0, td=0xffffff00015c7360) at /usr/src/sys/kern/vfs_mount.c:1286 #12 0xffffffff8053458e in unmount (td=0xffffff00015c7360, uap=0xffffffffd77e4be0) at /usr/src/sys/kern/vfs_mount.c:1182 #13 0xffffffff80787817 in syscall (frame=0xffffffffd77e4c70) at /usr/src/sys/amd64/amd64/trap.c:852 #14 0xffffffff8076dabb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #15 0x0000000800f1514c in ?? () It's the same machine mentioned in: http://lists.freebsd.org/pipermail/freebsd-fs/2008-February/004377.html http://lists.freebsd.org/pipermail/freebsd-fs/2008-February/004418.html Any ideas how to fix? Can compile kernel with debug if needed. Best regards, Dimitar Vassilev From morganw at chemikals.org Sat Aug 9 12:56:09 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sat Aug 9 12:56:16 2008 Subject: ZFS Advice In-Reply-To: <200808090917.34149.peter.schuller@infidyne.com> References: <200808090020.04315.peter.schuller@infidyne.com> <18588.64214.354495.804458@almost.alerce.com> <200808090917.34149.peter.schuller@infidyne.com> Message-ID: On Sat, 9 Aug 2008, Peter Schuller wrote: >> Or, it could be that a UIO slot is specifically a PCI-Ex8 slot? You >> can buy risers that convert "1U PCI-E (x16) to 1 UIO and 1 PCI-E " > > http://www.supermicro.com/products/nfo/UIO_cards.cfm > > This one actually says in the clear: > > "8-Lane PCI-Express interface (Supermicro UIO slot)" > > My guess is one of the following: > > * It means nothing but "PCI-E", and the UIO stuff is just marketing BS. > > * It means "extra PCI-E", and the UIO stuff is just marketing BS. > > * (Based on the grapical animation on the UIO page) They actually do have a > special slot on their motherboard for use with their special UIO card which > then provides a few extra PCI-E slots. The mentioning of UIO on pages > describing standard PCI-E cards is just marketing BS resulting from the > technically correct fact that they can be used with their UIO card. > > If I don't find specific information to the contrary I'll probably chance it > and see. As you say, it's hard to tell from the SuperMicro page. LSI has this card: http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html http://www.newegg.com/Product/Product.aspx?Item=N82E16816118092 Which is their official card with the same chipset. The slots on both the SuperMicro and the LSI cards look mighty similar. The only obvious difference is the little right-angle thing next to the PCI-E interface. I'm very interested to see if it works out. If I wasn't going to be away for a few weeks I'd try to abuse CDW's return policy to give it a test. From pjd at FreeBSD.org Sat Aug 9 20:58:33 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sat Aug 9 20:58:40 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080806101621.H24586@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> Message-ID: <20080809205835.GE1363@garage.freebsd.pl> On Wed, Aug 06, 2008 at 11:00:57AM -0400, Weldon S Godfrey 3 wrote: > > Hello, > > Please forgive me, I didn't really see this discussed in the archives but > I am wondering if anyone has seen this issue. I can replicate this issue > under FreeBSD amd64 7.0-RELEASE and the latest -STABLE (RELENG_7). I do > not replicate any problems running 9 instances of postmark on the machine > directly, so the issue appears to be isolated with NFS. The backtrace you posted doesn't suggest it is ZFS problem, although it can of course be. Can you try to reproduce it with NFS-exported UFS file system? -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080809/6f90aab5/attachment.pgp From koitsu at FreeBSD.org Sun Aug 10 07:11:17 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Sun Aug 10 07:11:24 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <489BCA4D.3050704@yandex.ru> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> <20080807121245.GA26629@eos.sc1.parodius.com> <489BCA4D.3050704@yandex.ru> Message-ID: <20080810071117.GA3857@eos.sc1.parodius.com> On Fri, Aug 08, 2008 at 08:23:41AM +0400, Andrey V. Elsukov wrote: > Jeremy Chadwick wrote: >> In almost every case I've looked at so far, the individuals' chipsets, >> disks, and overall setup are different. SMART statistics on the drives >> show absolutely no sign of errors, or anything that indicates a hardware >> failure. Many of the users are using AHCI as well (myself included, and >> I have seen the DMA error issue myself), which is more reliable than >> classic IDE. > > I have done some work on AHCI part of ATA driver and I am looking > for testers... > http://perforce.freebsd.org/changeList.cgi?CMD=changes&FSPC=//depot/user/butcher/src/... These look quite good. Regarding change 146184, do you know if this addresses the problems documented in PR 102211, PR 108924, or what I described in http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html ? >> It would be benefitial if there was some form of sysctl to increase the >> verbosity from the ATA subsystem when an error happens. The existing >> data we get back is terse, and barely useful. I know for a fact there's >> more debug information that could be output in such scenarios. And >> please do not reply with "good idea, send patches" unless you're wanting >> to be chewed out. :-) > > Ok, I'll try to add some verbose 'printfs' in my branch in perforce :) That'd be great. It appears to me, WRT FreeBSD, that error conditions do not bother to handle SATA-related errors; everything is assumed to be ATA, so the extra granularity SATA implements is not available on FreeBSD. This also starts to enter the realm of why FreeBSD does not implement support for NCQ -- is this because the ATA driver was built solely around ATA, rather than AHCI? Linux appears to have two different drivers depending upon if you're using AHCI or not. FreeBSD's ata(4) code seems to have everything intermixed/jumbled around, so it looks a lot like spaghetti... Is this the problem? >>> I'm going to do some analysis and find out whether I can find any of >>> our systems that may be experiencing ATA errors that don't correlate >>> with what their SMART data is saying. To date I haven't caught any, >>> but that's not to say they may not be happening... just that all of >>> the ones I have caught to date do appear to have been >>> hardware-related issues... > > IMHO. Today we have many hardware versions and revisions and some of > them are buggy. But another OSes (windows, linux) work with buggy > hardware without big problems. Yes, some developers have docs and can > make workarounds.. I think our ata driver needs new error handling > subsystem, which can correctly handle errors. Yep, I understand there is in fact bugs in consumer and commercial-grade hardware/firmwares. However, FreeBSD users will want to know if they're suffering from said bugs, or some other issue. I'm more than willing to document both scenarios (known buggy hardware and other bugs which are NOT the result of hardware flaws), but I (obviously) need data and example output for this. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From fbsd-fs at mawer.org Sun Aug 10 07:50:26 2008 From: fbsd-fs at mawer.org (Antony Mawer) Date: Sun Aug 10 07:50:33 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080810071117.GA3857@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> <20080807121245.GA26629@eos.sc1.parodius.com> <489BCA4D.3050704@yandex.ru> <20080810071117.GA3857@eos.sc1.parodius.com> Message-ID: <489E9DC1.4030802@mawer.org> On 10/08/2008 5:11 PM, Jeremy Chadwick wrote: > On Fri, Aug 08, 2008 at 08:23:41AM +0400, Andrey V. Elsukov wrote: >> Jeremy Chadwick wrote: >>> It would be benefitial if there was some form of sysctl to increase the >>> verbosity from the ATA subsystem when an error happens. The existing >>> data we get back is terse, and barely useful. I know for a fact there's >>> more debug information that could be output in such scenarios. And >>> please do not reply with "good idea, send patches" unless you're wanting >>> to be chewed out. :-) >> Ok, I'll try to add some verbose 'printfs' in my branch in perforce :) > > This also starts to enter the realm of why FreeBSD does not implement > support for NCQ -- is this because the ATA driver was built solely > around ATA, rather than AHCI? Linux appears to have two different > drivers depending upon if you're using AHCI or not. FreeBSD's ata(4) > code seems to have everything intermixed/jumbled around, so it looks a > lot like spaghetti... Is this the problem? My understanding of it is that the "legacy" style SATA support is modeled off ATA, while AHCI implements more SCSI-like features (like NCQ). With AHCI mode on Linux, I believe it uses the SCSI subsystem where the infrastructure for things like tagged queuing are available. I thought I heard Scott Long was looking at implementing a SATA subsystem based on CAM at one point, but I gather it succumbed to ENOTIME... --Antony From rmacklem at uoguelph.ca Sun Aug 10 20:58:55 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sun Aug 10 20:59:03 2008 Subject: NFSv4 client and server for FreeBSD-current needs testing Message-ID: I just put my current nfs client and server port to FreeBSD-CURRENT (actually the June snapshot, since July wouldn't install on my hardware) up on the ftp site. I also updated the FreeBSD7 port to be at the same code level and it includes fixes for a few issues reported by Brooks Davis. If anyone is interested in trying it out, just go to: ftp://ftp.cis.uoguelph.ca/pub/nfsv4/FreeBSD-CURRENT I'll be trying the August snapshot very soon and then looking at a current kernel, to try and bring it right up-to-date. Have a good week, rick From bu7cher at yandex.ru Mon Aug 11 04:37:42 2008 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Mon Aug 11 04:37:49 2008 Subject: zpool degraded - 'UNAVAIL cannot open' functioning drive In-Reply-To: <20080810071117.GA3857@eos.sc1.parodius.com> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> <20080807121245.GA26629@eos.sc1.parodius.com> <489BCA4D.3050704@yandex.ru> <20080810071117.GA3857@eos.sc1.parodius.com> Message-ID: <489FC20C.8040401@yandex.ru> Jeremy Chadwick wrote: > These look quite good. Regarding change 146184, do you know if this > addresses the problems documented in PR 102211, PR 108924, or what I > described in > http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html ? Currently I didn't work with ataraid on ICHx, but i tested hot plug on ICH9 with and without my changes. Without changes hot plug worked, but it did several "reiniting channel .." before a drive becomes online. With patched sources a drive is going online without several reinits. -- WBR, Andrey V. Elsukov From pjd at FreeBSD.org Mon Aug 11 07:22:15 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Aug 11 07:22:21 2008 Subject: ZFS on whole disk vs. slice vs. partition? In-Reply-To: <48902042.3030609@quip.cz> References: <48902042.3030609@quip.cz> Message-ID: <20080811072216.GB2766@garage.freebsd.pl> On Wed, Jul 30, 2008 at 10:03:14AM +0200, Miroslav Lachman wrote: > Hi all, > > I am preparing myself to next try with ZFS and I would like to know if > there are any recomendations / performance differences between using > whole disk device (ad0) or slice (ad0s2) or partition (ad0s1e). > > For example, if I have machine with 2 disks and I want to setup small > part of the disk gmirrored with UFS2 (/ + /usr) and the rest of space > for data on ZFS mirror - is it better to use ad0s1 + ad1s1 for gmirror > and ad0s2 + ad1s2 for ZFS mirror? Or is it better to use ad0s1e + ad1s1e > for ZFS mirror? > > Next example could be machine with 4 disks (1TB disks in RAIDZ / RAIDZ2 > as array for backups). It would be nice to user ad0 + ad1 + ad2 + ad3, > but then I cannot boot of it, so again - I can use small piece of each > disk as bootable UFS2 root with gmirror of 4 drives (first slice of each > disk - ad0s1, ad1s1, ad2s1, ad3s1) and the rest for ZFS. Or is there > significant reason not to split disks, use whole device for ZFS pool and > setup UFS2 root on some other media like CF card with CF 2 IDE convertor? > > Thanks for any useful informations, tips, trick, links etc. There should be no difference whatsoever performance-wise between using disks, slices or partitions on FreeBSD. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080811/b1008729/attachment.pgp From bugmaster at FreeBSD.org Mon Aug 11 11:06:57 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 11 11:07:36 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200808111106.m7BB6uPa047176@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file 9 problems total. From nejc at skoberne.net Mon Aug 11 22:08:46 2008 From: nejc at skoberne.net (=?ISO-8859-2?Q?Nejc_=A9koberne?=) Date: Mon Aug 11 22:08:53 2008 Subject: fchroot on unionfs Message-ID: <48A0B6E5.3000000@skoberne.net> Hi, I have a strange problem with Apache not seeing the lower layer of unionfs. Using ktrace on Apache I have written this C code: #include #include #include #include int main() { int fd; char buf[512]; /* This is what apache does */ fd=open(".",O_RDONLY,0); fchdir(fd); close(fd); /* This is how Apache calls open */ fd=open("/etc/hosts",O_RDONLY,0x1b6); if(fd < 0) { printf("error %d,%d\n",fd,errno); perror(NULL); exit(-1); } read(fd, buf, 511); buf[511]=0; printf("%s",buf); close(fd); return(0); } So without fchdir() call this program just displays (first 511 bytes) of /etc/hosts. If I uncomment fchdir() call with precedent open(".",...) call, I get this: root@web:~# ./a No such file or directory error -1,2 and also if I list the /etc directory with a php script, I see only those files in /etc which are on the upper layer of unionfs. I stumbled upon this problem while trying to figure out why apache can't resolve hostnames I have defined in /etc/hosts. I have this: FreeBSD web.jail 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun Aug 10 09:54:42 CEST 2008 root@server.domain.com:/usr/src/sys/amd64/compile/SERVER amd64 So I am running a jail on a unionfs. Everything works now (after MFCing the unix sockets patch from HEAD to the 7-STABLE (MySQL didn't work)), so I currently have only this problem. I also tried to grep the Apache source code for fchdir, but the call seems to be made implicitely somehow (grep returned no matches). Thanks, Nejc From gw.freebsd at tnode.com Tue Aug 12 01:11:42 2008 From: gw.freebsd at tnode.com (GW) Date: Tue Aug 12 01:12:12 2008 Subject: Unionfs move directory problems Message-ID: <48A0DCBF.6050900@tnode.com> Hi. It seems that there are even more bugs in unionfs at least in 7.0-STABLE as the "unionfs breakout to upper layer" problem submitted by Nejc... Moving directories around in unionfs mounts isn't handled correctly as can be seen from the long example below. It behaves the same if there is no "-o below" parameter. Original hierarchy: ~# mkdir rw ro ro/orig ~# touch ro/orig/file ro/orig/file2 ~# mount -t unionfs -o below ro rw ~/rw# ls -aFloWi total 6 49874 drwx------ 3 root wheel - 512 Aug 12 01:32 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 01:20 ../ 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 orig/ ~/rw# ls -aFloWi orig/ total 4 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 ./ 49874 drwx------ 3 root wheel - 512 Aug 12 01:32 ../ 49876 -rw------- 1 root wheel - 0 Aug 12 01:20 file 49877 -rw------- 1 root wheel - 0 Aug 12 01:20 file2 Problem appears (lost directory contents): ~/rw# mv orig moved ~/rw# ls -aFloWi total 6 49874 drwx------ 3 root wheel - 512 Aug 12 01:33 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 01:20 ../ 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 moved/ 0 w--------- 0 root wheel - 0 Jan 1 1970 orig% ~/rw# ls -aFloWi moved total 4 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 ./ 49874 drwx------ 3 root wheel - 512 Aug 12 01:33 ../ << WTF!??? PROBLEM 1 And now some magic (content reappears): ~/rw# mv moved orig ~/rw# ls -aFloWi total 6 49874 drwx------ 3 root wheel - 512 Aug 12 01:33 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 01:20 ../ 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 orig/ ~/rw# ls -aFloWi orig total 4 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 ./ 49874 drwx------ 3 root wheel - 512 Aug 12 01:33 ../ 49876 -rw------- 1 root wheel - 0 Aug 12 01:20 file 49877 -rw------- 1 root wheel - 0 Aug 12 01:20 file2 Hm, lets test something else (it creates in upper layer): ~/rw# mkdir new ~/rw# touch new/newfile ~/rw# ls -aFloWi total 8 49874 drwx------ 4 root wheel - 512 Aug 12 02:04 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 02:03 ../ 49879 drwx------ 2 root wheel - 512 Aug 12 02:04 new/ 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 orig/ ~/rw# ls -aFloWi new total 4 49879 drwx------ 2 root wheel - 512 Aug 12 02:04 ./ 49874 drwx------ 4 root wheel - 512 Aug 12 02:04 ../ 49880 -rw------- 1 root wheel - 0 Aug 12 02:04 newfile ~/rw# ls -aFloWi orig total 4 49878 drwx------ 2 root wheel - 512 Aug 12 01:20 ./ 49874 drwx------ 4 root wheel - 512 Aug 12 02:04 ../ 49876 -rw------- 1 root wheel - 0 Aug 12 01:20 file 49877 -rw------- 1 root wheel - 0 Aug 12 01:20 file2 And lets see if there is another problem: ~/rw# rm -rf orig ~/rw# ls -aFloWi total 6 49874 drwx------ 3 root wheel - 512 Aug 12 02:06 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 02:03 ../ 49879 drwx------ 2 root wheel - 512 Aug 12 02:06 new/ 0 w--------- 0 root wheel - 0 Jan 1 1970 orig% ~/rw# mv new orig ~/rw# ls -aFloWi total 6 49874 drwx------ 3 root wheel - 512 Aug 12 02:07 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 02:03 ../ 49879 drwx------ 2 root wheel - 512 Aug 12 02:06 orig/ Looks fine till now, but (files appear from nowhere): ~/rw# ls -aFloWi orig total 4 49879 drwx------ 2 root wheel - 512 Aug 12 02:06 ./ 49874 drwx------ 3 root wheel - 512 Aug 12 02:07 ../ 49876 -rw------- 1 root wheel - 0 Aug 12 01:20 file 49877 -rw------- 1 root wheel - 0 Aug 12 01:20 file2 49880 -rw------- 1 root wheel - 0 Aug 12 02:05 newfile << WTF!??? PROBLEM 2 I see two possible solutions to the first problem: - on directory rename/move the whole file hierarchy should be exactly duplicated under a new directory tree - if both layers exist on the same partition, inodes of files can be reused as it is done with hard links, but the directory structure needs to be duplicated so that it has correct parent inodes For the second problem: - in all cases (option whiteout=always or whenneeded) when new directories get created unionfs should check whether such a directory exists in the lower layer and automaticly whiteout all of its entries Does anyone have any better ideas? I am using: ~# uname -a FreeBSD server.domain.com 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun Aug 10 09:54:42 CEST 2008 root@server.domain.com:/usr/src/sys/amd64/compile/SERVER amd64 Is it already fixed in HEAD? Unrelated: Will the newest patches for unionfs from HEAD ever appear in 7-STABLE or do I need to backport them? Is there or will there be a way to separately access any layer in a unionfs mount (without using nullfs)? I made a script to detect duplicates, but in current design (it can be done differently) you need to unmount the layers before you can use it. Are whiteouts and the interface to manage them (FTS or sth) specific to FreeBSD or does it look like other OS (that have something like unionfs, eg Linux) will adapt this sooner or later? Cheers, gw From gw.freebsd at tnode.com Tue Aug 12 01:47:20 2008 From: gw.freebsd at tnode.com (GW) Date: Tue Aug 12 01:47:27 2008 Subject: Unionfs move directory problems In-Reply-To: <48A0DCBF.6050900@tnode.com> References: <48A0DCBF.6050900@tnode.com> Message-ID: <48A0EAEF.6060802@tnode.com> GW wrote: > For the second problem: > - in all cases (option whiteout=always or whenneeded) when new > directories get created unionfs should check whether such a directory > exists in the lower layer and automaticly whiteout all of its entries - or better (can not work for more than 2 layers) copying all files to upper layer in the moved directory and set the opaque flag, so that underlying files under the new directory name can't be seen through This is the way it is done on the end of the following scenario... First a little different PROBLEM 1 again: ~# mkdir ro ro/foo ro/foo/moveme rw ~# touch ro/foo/moveme/file1 ~# touch ro/foo/moveme/file2 ~# ls -aFloWi orig ~# mount -t unionfs -o below ro rw ~# cd rw ~/rw# ls -aFloWi total 6 49876 drwx------ 3 root wheel - 512 Aug 12 03:10 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 03:09 ../ 49879 drwx------ 2 root wheel - 512 Aug 12 03:09 foo/ ~/rw# ls -aFloWi foo/ total 6 49879 drwx------ 2 root wheel - 512 Aug 12 03:09 ./ 49876 drwx------ 3 root wheel - 512 Aug 12 03:10 ../ 49880 drwx------ 2 root wheel - 512 Aug 12 03:10 moveme/ ~/rw# ls -aFloWi foo/moveme/ total 4 49880 drwx------ 2 root wheel - 512 Aug 12 03:10 ./ 49879 drwx------ 3 root wheel - 512 Aug 12 03:10 ../ 49877 -rw------- 1 root wheel - 0 Aug 12 03:10 file1 49878 -rw------- 1 root wheel - 0 Aug 12 03:10 file2 ~/rw# mv foo/moveme/ . ~/rw# ls -aFloWi total 8 49876 drwx------ 4 root wheel - 512 Aug 12 03:11 ./ 49872 drwx------ 4 root wheel - 512 Aug 12 03:09 ../ 49879 drwx------ 2 root wheel - 512 Aug 12 03:10 foo/ 49880 drwx------ 2 root wheel - 512 Aug 12 03:10 moveme/ ~/rw# ls -aFloWi foo total 4 49879 drwx------ 2 root wheel - 512 Aug 12 03:10 ./ 49876 drwx------ 4 root wheel - 512 Aug 12 03:11 ../ 0 w--------- 0 root wheel - 0 Jan 1 1970 moveme% ~/rw# ls -aFloWi moveme/ total 4 49880 drwx------ 2 root wheel - 512 Aug 12 03:10 ./ 49876 drwx------ 4 root wheel - 512 Aug 12 03:11 ../ << again PROBLEM 1, nothing new And now behavour that is a good solution for PROBLEM 2: ~/rw# mkdir foo/moveme ~/rw# ls -aFloWi foo/ total 6 49879 drwx------ 3 root wheel - 512 Aug 12 03:14 ./ 49876 drwx------ 4 root wheel - 512 Aug 12 03:11 ../ 49881 drwx------ 2 root wheel opaque 512 Aug 12 03:14 moveme/ ~/rw# ls -aFloWi foo/moveme/ total 4 49881 drwx------ 2 root wheel opaque 512 Aug 12 03:14 ./ 49879 drwx------ 3 root wheel - 512 Aug 12 03:14 ../ To proove that the opaque flag did the trick: ~/rw# chflags noopaque foo/moveme/ ~/rw# ls -aFloWi foo/moveme/ total 4 49881 drwx------ 2 root wheel - 512 Aug 12 03:14 ./ 49879 drwx------ 3 root wheel - 512 Aug 12 03:14 ../ 49877 -rw------- 1 root wheel - 0 Aug 12 03:10 file1 49878 -rw------- 1 root wheel - 0 Aug 12 03:10 file2 gw From dchagin at freebsd.org Tue Aug 12 18:44:03 2008 From: dchagin at freebsd.org (Chagin Dmitry) Date: Tue Aug 12 18:44:12 2008 Subject: new file system (my experiments) Message-ID: <20080812182028.GA7047@dchagin.dialup.corbina.ru> Hi I experiment resalization of new file system based on tmpfs - shmfs for Linux emulation layer. for the beginning has simply copied current source codes to compat/lintmps, has compiled a module and tried to mount: mount -t lintmpfs lintmpfs /compat/linux/dev/shm and has received a panic: #11 0xffffffff803b6ade in calltrap () at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/exception.S:217 ---Type to continue, or q to quit--- #12 0xffffffff802a794b in vfs_filteropt (opts=0x0, legal=0xffffffff808497e0) at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 #13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, td=0xffffff004dcb56c0) at /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf s/lintmpfs_vfsops.c:206 #14 0xffffffff802a947f in vfs_donmount (td=0xffffff004dcb56c0, fsflags=0, fsoptions=0xffffff000190e800) at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1010 #15 0xffffffff802aaa46 in nmount (td=0xffffff004dcb56c0, uap=0xfffffffe7e7fcbf0) at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:417 #16 0xffffffff803d4e47 in syscall (frame=0xfffffffe7e7fcc80) at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/trap.c:902 #17 0xffffffff803b6ceb in Xfast_syscall () (kgdb) f 12 #12 0xffffffff802a794b in vfs_filteropt (opts=0x0, legal=0xffffffff808497e0) at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 1812 { (kgdb) list *0xffffffff802a794b 0xffffffff802a794b is in vfs_filteropt (/usr/local/root/pub/linux_shmfs/sys/kern /vfs_mount.c:1818). 1813 struct vfsopt *opt; 1814 char errmsg[255]; 1815 const char **t, *p, *q; 1816 int ret = 0; 1817 1818 TAILQ_FOREACH(opt, opts, link) { 1819 p = opt->name; 1820 q = NULL; 1821 if (p[0] == 'n' && p[1] == 'o') 1822 q = p + 2; (kgdb) (kgdb) up #13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, td=0xffffff004dcb56c0) at /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf s/lintmpfs_vfsops.c:206 206 if (vfs_filteropt(mp->mnt_optnew, lintmpfs_opts)) (kgdb) Problem in that mp->mnt_optnew is 0, but tmpfs works correctly. I shall not understand that I have missied... -- Have fun! chd From kris at FreeBSD.org Tue Aug 12 18:50:16 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Aug 12 18:50:23 2008 Subject: new file system (my experiments) In-Reply-To: <20080812182028.GA7047@dchagin.dialup.corbina.ru> References: <20080812182028.GA7047@dchagin.dialup.corbina.ru> Message-ID: <48A1DB63.3050804@FreeBSD.org> Chagin Dmitry wrote: > Hi > > I experiment resalization of new file system based on tmpfs - shmfs for > Linux emulation layer. for the beginning has simply copied current source > codes to compat/lintmps, has compiled a module and tried to mount: > > mount -t lintmpfs lintmpfs /compat/linux/dev/shm > and has received a panic: > > #11 0xffffffff803b6ade in calltrap () > at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/exception.S:217 > ---Type to continue, or q to quit--- > #12 0xffffffff802a794b in vfs_filteropt (opts=0x0, legal=0xffffffff808497e0) > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 > #13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, > td=0xffffff004dcb56c0) > at /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf > s/lintmpfs_vfsops.c:206 > #14 0xffffffff802a947f in vfs_donmount (td=0xffffff004dcb56c0, fsflags=0, > fsoptions=0xffffff000190e800) > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1010 > #15 0xffffffff802aaa46 in nmount (td=0xffffff004dcb56c0, > uap=0xfffffffe7e7fcbf0) > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:417 > #16 0xffffffff803d4e47 in syscall (frame=0xfffffffe7e7fcc80) > at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/trap.c:902 > #17 0xffffffff803b6ceb in Xfast_syscall () > > (kgdb) f 12 > #12 0xffffffff802a794b in vfs_filteropt (opts=0x0, legal=0xffffffff808497e0) > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 > 1812 { > (kgdb) list *0xffffffff802a794b > 0xffffffff802a794b is in vfs_filteropt (/usr/local/root/pub/linux_shmfs/sys/kern > /vfs_mount.c:1818). > 1813 struct vfsopt *opt; > 1814 char errmsg[255]; > 1815 const char **t, *p, *q; > 1816 int ret = 0; > 1817 > 1818 TAILQ_FOREACH(opt, opts, link) { > 1819 p = opt->name; > 1820 q = NULL; > 1821 if (p[0] == 'n' && p[1] == 'o') > 1822 q = p + 2; > (kgdb) > (kgdb) up > #13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, > td=0xffffff004dcb56c0) > at /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf > s/lintmpfs_vfsops.c:206 > 206 if (vfs_filteropt(mp->mnt_optnew, lintmpfs_opts)) > (kgdb) > > Problem in that mp->mnt_optnew is 0, but tmpfs works correctly. > I shall not understand that I have missied... > If you have DEBUG_LOCKS and/or DEBUG_VFS_LOCKS then one of them changes the kernel ABI (adds entries to structs somewhere). You need to either add them to the module CFLAGS or use make buildkernel. Kris From md at hudora.de Tue Aug 12 19:02:05 2008 From: md at hudora.de (Maximillian Dornseif) Date: Tue Aug 12 19:02:13 2008 Subject: Strange (?) hangs with ZFS/rsync. In-Reply-To: References: <18880785.post@talk.nabble.com> Message-ID: <18950555.post@talk.nabble.com> Ivan Voras-7 wrote: > > These look like well known problems with ZFS (see > http://wiki.freebsd.org/ZFSKnownProblems). > To be frank, I do not see my problem described there. Rsync from remote mashines does not count as "Heavy IO activity in multithreaded applications", does it? In the meantime the two ZFS machines both crash whenever I try zfs delete tank/snapshots@2008 Do I understand it correctly that there is little interest in supporting the 7.x port and most energy is put in hte 8.x port? --md -- View this message in context: http://www.nabble.com/Strange-%28-%29-hangs-with-ZFS-rsync.-tp18880785p18950555.html Sent from the freebsd-fs mailing list archive at Nabble.com. From dchagin at freebsd.org Tue Aug 12 20:50:51 2008 From: dchagin at freebsd.org (Chagin Dmitry) Date: Tue Aug 12 20:50:58 2008 Subject: new file system (my experiments) In-Reply-To: <48A1DB63.3050804@FreeBSD.org> References: <20080812182028.GA7047@dchagin.dialup.corbina.ru> <48A1DB63.3050804@FreeBSD.org> Message-ID: <20080812205043.GA9233@dchagin.dialup.corbina.ru> On Tue, Aug 12, 2008 at 08:50:11PM +0200, Kris Kennaway wrote: > Chagin Dmitry wrote: > >Hi > > > >I experiment resalization of new file system based on tmpfs - shmfs for > >Linux emulation layer. for the beginning has simply copied current source > >codes to compat/lintmps, has compiled a module and tried to mount: > > > >mount -t lintmpfs lintmpfs /compat/linux/dev/shm > >and has received a panic: > > > >#11 0xffffffff803b6ade in calltrap () > > at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/exception.S:217 > >---Type to continue, or q to quit--- > >#12 0xffffffff802a794b in vfs_filteropt (opts=0x0, > >legal=0xffffffff808497e0) > > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 > >#13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, > > td=0xffffff004dcb56c0) > > at > > /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf > > s/lintmpfs_vfsops.c:206 > >#14 0xffffffff802a947f in vfs_donmount (td=0xffffff004dcb56c0, fsflags=0, > > fsoptions=0xffffff000190e800) > > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1010 > >#15 0xffffffff802aaa46 in nmount (td=0xffffff004dcb56c0, > > uap=0xfffffffe7e7fcbf0) > > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:417 > >#16 0xffffffff803d4e47 in syscall (frame=0xfffffffe7e7fcc80) > > at /usr/local/root/pub/linux_shmfs/sys/amd64/amd64/trap.c:902 > >#17 0xffffffff803b6ceb in Xfast_syscall () > > > >(kgdb) f 12 > >#12 0xffffffff802a794b in vfs_filteropt (opts=0x0, > >legal=0xffffffff808497e0) > > at /usr/local/root/pub/linux_shmfs/sys/kern/vfs_mount.c:1812 > >1812 { > >(kgdb) list *0xffffffff802a794b > >0xffffffff802a794b is in vfs_filteropt > >(/usr/local/root/pub/linux_shmfs/sys/kern > >/vfs_mount.c:1818). > >1813 struct vfsopt *opt; > >1814 char errmsg[255]; > >1815 const char **t, *p, *q; > >1816 int ret = 0; > >1817 > >1818 TAILQ_FOREACH(opt, opts, link) { > >1819 p = opt->name; > >1820 q = NULL; > >1821 if (p[0] == 'n' && p[1] == 'o') > >1822 q = p + 2; > >(kgdb) > >(kgdb) up > >#13 0xffffffff80847015 in tmpfs_mount (mp=0xffffff0001a3a000, > > td=0xffffff004dcb56c0) > > at > > /usr/local/root/pub/linux_shmfs/sys/modules/lintmpfs/../../compat/lintmpf > >s/lintmpfs_vfsops.c:206 > >206 if (vfs_filteropt(mp->mnt_optnew, lintmpfs_opts)) > >(kgdb) > > > >Problem in that mp->mnt_optnew is 0, but tmpfs works correctly. > >I shall not understand that I have missied... > > > > If you have DEBUG_LOCKS and/or DEBUG_VFS_LOCKS then one of them changes > the kernel ABI (adds entries to structs somewhere). You need to either > add them to the module CFLAGS or use make buildkernel. > yes, it has helped. thnx! -- Have fun! chd From matt at corp.spry.com Tue Aug 12 23:58:24 2008 From: matt at corp.spry.com (Matt Simerson) Date: Tue Aug 12 23:58:41 2008 Subject: ZFS patch against todays -HEAD Message-ID: <863C8170-8DCB-4BBD-9E18-CD03D59BC129@corp.spry.com> I applied the ZFS patch from http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 to a copy of -HEAD checked out today and it fails as shown below. Any pointers on what needs to be tweaked to get past that? Matt cvsup /usr/local/etc/cvsup-head cd /usr/src patch -p0 < ~matt/zfs/zfs_20080727.patch rm /usr/src/sys/cddl/compat/opensolaris/sys/acl.h rm /usr/src/sys/cddl/compat/opensolaris/sys/callb.h cd /usr/src && make buildkernel KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common - finline-limit=8000 --param inline-unit-growth=100 --param large- function-growth=1000 -fno-omit-frame-pointer -mcmodel=kernel -mno-red- zone -mfpmath=387 -mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow - msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack- protector -Werror /usr/src/sys/kern/kern_ntptime.c cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -std=c99 -g - Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing- prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer- sign -fformat-extensions -nostdinc -I. -I/usr/src/sys -I/usr/src/sys/ contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit- growth=100 --param large-function-growth=1000 -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno- sse3 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror /usr/src/sys/kern/kern_osd.c /usr/src/sys/kern/kern_osd.c:345: error: redefinition of 'M_OSD' /usr/src/sys/kern/kern_osd.c:44: error: previous definition of 'M_OSD' was here /usr/src/sys/kern/kern_osd.c:345: error: redefinition of 'M_OSD_init_sys_init' /usr/src/sys/kern/kern_osd.c:44: error: previous definition of 'M_OSD_init_sys_init' was here /usr/src/sys/kern/kern_osd.c:345: error: redefinition of '__set_sysinit_set_sym_M_OSD_init_sys_init' /usr/src/sys/kern/kern_osd.c:44: error: previous definition of '__set_sysinit_set_sym_M_OSD_init_sys_init' was here /usr/src/sys/kern/kern_osd.c:345: error: redefinition of 'M_OSD_uninit_sys_uninit' /usr/src/sys/kern/kern_osd.c:44: error: previous definition of 'M_OSD_uninit_sys_uninit' was here /usr/src/sys/kern/kern_osd.c:345: error: redefinition of '__set_sysuninit_set_sym_M_OSD_uninit_sys_uninit' /usr/src/sys/kern/kern_osd.c:44: error: previous definition of '__set_sysuninit_set_sym_M_OSD_uninit_sys_uninit' was here /usr/src/sys/kern/kern_osd.c:347: error: redefinition of 'osd_debug' /usr/src/sys/kern/kern_osd.c:46: error: previous definition of 'osd_debug' was here /usr/src/sys/kern/kern_osd.c:349: error: redefinition of 'sysctl___debug_osd' /usr/src/sys/kern/kern_osd.c:48: error: previous definition of 'sysctl___debug_osd' was here /usr/src/sys/kern/kern_osd.c:349: error: redefinition of '__set_sysctl_set_sym_sysctl___debug_osd' /usr/src/sys/kern/kern_osd.c:48: error: previous definition of '__set_sysctl_set_sym_sysctl___debug_osd' was here /usr/src/sys/kern/kern_osd.c:362: error: conflicting types for 'osd_list' /usr/src/sys/kern/kern_osd.c:61: error: previous declaration of 'osd_list' was here cc1: warnings being treated as errors /usr/src/sys/kern/kern_osd.c:363: warning: redundant redeclaration of 'osd_destructors' /usr/src/sys/kern/kern_osd.c:62: warning: previous declaration of 'osd_destructors' was here /usr/src/sys/kern/kern_osd.c:364: warning: redundant redeclaration of 'osd_nslots' /usr/src/sys/kern/kern_osd.c:63: warning: previous declaration of 'osd_nslots' was here /usr/src/sys/kern/kern_osd.c:365: warning: redundant redeclaration of 'osd_lock' /usr/src/sys/kern/kern_osd.c:64: warning: previous declaration of 'osd_lock' was here /usr/src/sys/kern/kern_osd.c:369: error: redefinition of 'osd_default_destructor' /usr/src/sys/kern/kern_osd.c:68: error: previous definition of 'osd_default_destructor' was here /usr/src/sys/kern/kern_osd.c:375: error: redefinition of 'osd_register' /usr/src/sys/kern/kern_osd.c:74: error: previous definition of 'osd_register' was here /usr/src/sys/kern/kern_osd.c:422: error: redefinition of 'osd_deregister' /usr/src/sys/kern/kern_osd.c:121: error: previous definition of 'osd_deregister' was here /usr/src/sys/kern/kern_osd.c:461: error: redefinition of 'osd_set' /usr/src/sys/kern/kern_osd.c:160: error: previous definition of 'osd_set' was here /usr/src/sys/kern/kern_osd.c:503: error: redefinition of 'osd_get' /usr/src/sys/kern/kern_osd.c:202: error: previous definition of 'osd_get' was here /usr/src/sys/kern/kern_osd.c:521: error: redefinition of 'osd_del' /usr/src/sys/kern/kern_osd.c:220: error: previous definition of 'osd_del' was here /usr/src/sys/kern/kern_osd.c:572: error: redefinition of 'osd_exit' /usr/src/sys/kern/kern_osd.c:271: error: previous definition of 'osd_exit' was here /usr/src/sys/kern/kern_osd.c:592: error: redefinition of 'osd_init' /usr/src/sys/kern/kern_osd.c:291: error: previous definition of 'osd_init' was here /usr/src/sys/kern/kern_osd.c:602: error: redefinition of 'osd_sys_init' /usr/src/sys/kern/kern_osd.c:301: error: previous definition of 'osd_sys_init' was here /usr/src/sys/kern/kern_osd.c:602: error: redefinition of '__set_sysinit_set_sym_osd_sys_init' /usr/src/sys/kern/kern_osd.c:301: error: previous definition of '__set_sysinit_set_sym_osd_sys_init' was here /usr/src/sys/kern/kern_osd.c:646: error: redefinition of 'M_OSD' /usr/src/sys/kern/kern_osd.c:345: error: previous definition of 'M_OSD' was here /usr/src/sys/kern/kern_osd.c:646: error: redefinition of 'M_OSD_init_sys_init' /usr/src/sys/kern/kern_osd.c:345: error: previous definition of 'M_OSD_init_sys_init' was here /usr/src/sys/kern/kern_osd.c:646: error: redefinition of '__set_sysinit_set_sym_M_OSD_init_sys_init' /usr/src/sys/kern/kern_osd.c:345: error: previous definition of '__set_sysinit_set_sym_M_OSD_init_sys_init' was here /usr/src/sys/kern/kern_osd.c:646: error: redefinition of 'M_OSD_uninit_sys_uninit' /usr/src/sys/kern/kern_osd.c:345: error: previous definition of 'M_OSD_uninit_sys_uninit' was here /usr/src/sys/kern/kern_osd.c:646: error: redefinition of '__set_sysuninit_set_sym_M_OSD_uninit_sys_uninit' /usr/src/sys/kern/kern_osd.c:345: error: previous definition of '__set_sysuninit_set_sym_M_OSD_uninit_sys_uninit' was here /usr/src/sys/kern/kern_osd.c:648: error: redefinition of 'osd_debug' /usr/src/sys/kern/kern_osd.c:347: error: previous definition of 'osd_debug' was here /usr/src/sys/kern/kern_osd.c:650: error: redefinition of 'sysctl___debug_osd' /usr/src/sys/kern/kern_osd.c:349: error: previous definition of 'sysctl___debug_osd' was here /usr/src/sys/kern/kern_osd.c:650: error: redefinition of '__set_sysctl_set_sym_sysctl___debug_osd' /usr/src/sys/kern/kern_osd.c:349: error: previous definition of '__set_sysctl_set_sym_sysctl___debug_osd' was here /usr/src/sys/kern/kern_osd.c:663: error: conflicting types for 'osd_list' /usr/src/sys/kern/kern_osd.c:362: error: previous declaration of 'osd_list' was here /usr/src/sys/kern/kern_osd.c:664: warning: redundant redeclaration of 'osd_destructors' /usr/src/sys/kern/kern_osd.c:363: warning: previous declaration of 'osd_destructors' was here /usr/src/sys/kern/kern_osd.c:665: warning: redundant redeclaration of 'osd_nslots' /usr/src/sys/kern/kern_osd.c:364: warning: previous declaration of 'osd_nslots' was here /usr/src/sys/kern/kern_osd.c:666: warning: redundant redeclaration of 'osd_lock' /usr/src/sys/kern/kern_osd.c:365: warning: previous declaration of 'osd_lock' was here /usr/src/sys/kern/kern_osd.c:670: error: redefinition of 'osd_default_destructor' /usr/src/sys/kern/kern_osd.c:369: error: previous definition of 'osd_default_destructor' was here /usr/src/sys/kern/kern_osd.c:676: error: redefinition of 'osd_register' /usr/src/sys/kern/kern_osd.c:74: error: previous definition of 'osd_register' was here /usr/src/sys/kern/kern_osd.c:723: error: redefinition of 'osd_deregister' /usr/src/sys/kern/kern_osd.c:121: error: previous definition of 'osd_deregister' was here /usr/src/sys/kern/kern_osd.c:762: error: redefinition of 'osd_set' /usr/src/sys/kern/kern_osd.c:160: error: previous definition of 'osd_set' was here /usr/src/sys/kern/kern_osd.c:804: error: redefinition of 'osd_get' /usr/src/sys/kern/kern_osd.c:202: error: previous definition of 'osd_get' was here /usr/src/sys/kern/kern_osd.c:822: error: redefinition of 'osd_del' /usr/src/sys/kern/kern_osd.c:220: error: previous definition of 'osd_del' was here /usr/src/sys/kern/kern_osd.c:873: error: redefinition of 'osd_exit' /usr/src/sys/kern/kern_osd.c:271: error: previous definition of 'osd_exit' was here /usr/src/sys/kern/kern_osd.c:893: error: redefinition of 'osd_init' /usr/src/sys/kern/kern_osd.c:592: error: previous definition of 'osd_init' was here /usr/src/sys/kern/kern_osd.c:903: error: redefinition of 'osd_sys_init' /usr/src/sys/kern/kern_osd.c:602: error: previous definition of 'osd_sys_init' was here /usr/src/sys/kern/kern_osd.c:903: error: redefinition of '__set_sysinit_set_sym_osd_sys_init' /usr/src/sys/kern/kern_osd.c:602: error: previous definition of '__set_sysinit_set_sym_osd_sys_init' was here *** Error code 1 Stop in /usr/obj/usr/src/sys/GENERIC. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. From gary.jennejohn at freenet.de Wed Aug 13 09:15:25 2008 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Wed Aug 13 09:15:31 2008 Subject: ZFS patch against todays -HEAD In-Reply-To: <863C8170-8DCB-4BBD-9E18-CD03D59BC129@corp.spry.com> References: <863C8170-8DCB-4BBD-9E18-CD03D59BC129@corp.spry.com> Message-ID: <20080813111520.7c508734@peedub.jennejohn.org> On Tue, 12 Aug 2008 16:58:04 -0700 Matt Simerson wrote: > > I applied the ZFS patch from http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > to a copy of -HEAD checked out today and it fails as shown below. > > Any pointers on what needs to be tweaked to get past that? > > Matt > > cvsup /usr/local/etc/cvsup-head > cd /usr/src > patch -p0 < ~matt/zfs/zfs_20080727.patch > rm /usr/src/sys/cddl/compat/opensolaris/sys/acl.h > rm /usr/src/sys/cddl/compat/opensolaris/sys/callb.h > cd /usr/src && make buildkernel > > > [error messages snip] I just applied it using ``patch -p0 -E'' and had no problem building a kernel. --- Gary Jennejohn From lists at jnielsen.net Wed Aug 13 17:48:30 2008 From: lists at jnielsen.net (John Nielsen) Date: Wed Aug 13 17:48:48 2008 Subject: ZFS patches. In-Reply-To: <200807281139.45771.lists@jnielsen.net> References: <20080727125413.GG1345@garage.freebsd.pl> <200807281139.45771.lists@jnielsen.net> Message-ID: <200808131348.57683.lists@jnielsen.net> On Monday 28 July 2008, John Nielsen wrote: > On Monday 28 July 2008, David Grochowski wrote: > > Hey, > > > > On Sun, Jul 27, 2008 at 11:24 PM, Adam McDougall > > wrote: > > > On Sun, Jul 27, 2008 at 02:54:13PM +0200, Pawel Jakub Dawidek wrote: > > > > Hi. > > > > > > > > > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > > > Stop in /usr/src. > > > > I had the same issue. Try deleting > > "/usr/src/sys/cddl/compat/opensolaris/sys/acl.h" and > > "/usr/src/sys/cddl/compat/opensolaris/sys/callb.h" (make sure that > > these files have a length of zero first!). When patching, these files > > are supposed to be deleted, but were instead left as empty files. > > Since these files are included before the actual ones in > > "/usr/src/sys/cddl/contrib/opensolaris/uts/common/sys", this will > > cause a problem. > > > > Also, I would like to note that the patch has been working for me > > without any problems. > > Thanks for pointing this out David, I had been scratching my head too. > (Also thanks to those who posted reminders to use patch -p0). > > I'm now up and running with the patch and an upgraded zpool. No issues > thus far. I even tried to reproduce the UDP NFS write lockup issue I > reported recently and was unable to. Thanks PJD! I experienced a couple panics yesterday while working with some video files The panics didn't happen until after an hour or two of sustained activity (heavy reading and writing to/from multiple files about 2GB in size). The panic message (most recent) probably looks familiar: panic:kmem_malloc(65536): kmem_map too small: 757669888 total allocated This is on an i386 8-CURRENT box w/ the recent ZFS mega-patch applied: %uname -a FreeBSD stealth.jnielsen.net 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Mon Jul 28 09:17:49 EDT 2008 john@stealth.jnielsen.net:/usr/obj/usr/src8/src/sys/STEALTH i386 %zfs upgrade This system is currently running ZFS filesystem version 3. All filesystems are formatted with the current version. %zpool upgrade This system is currently running ZFS pool version 11. All pools are formatted using this version. The box has 1.25 GB RAM. The kernel is compiled with KVA_PAGES=384 and vm.kmem_size and kmem_size_max are set to 768M. Since the last panic I have set vfs.zfs.arc_max to 160M and I haven't gotten another one, but I haven't had the same sustained activity since then either. I'll keep an eye on it. Just thought I'd send a report since I'm not sure if this is still expected behavior with the new patch. I am of course also open to tuning suggestions, though I have read the wiki and kept up on the mailing lists and am willing to experiment to see what ends up working best for this system. Thanks, JN From matt at corp.spry.com Wed Aug 13 21:46:25 2008 From: matt at corp.spry.com (Matt Simerson) Date: Wed Aug 13 21:46:31 2008 Subject: ZFS patch against todays -HEAD - resolved In-Reply-To: <20080813111520.7c508734@peedub.jennejohn.org> References: <863C8170-8DCB-4BBD-9E18-CD03D59BC129@corp.spry.com> <20080813111520.7c508734@peedub.jennejohn.org> Message-ID: When I do a cvsup of -HEAD, cvs checks out many of the patched files anew but not all of them. In this case, it was several files, including sys/kern/kern_osd.c which had the contents of itself duplicated a number of times, explaining the duplicate definition errors. The quantify of duplicated contents corresponded with the number of times I had run the patch -p0 command. So, to get a kernel built with the patch applied, I needed to: cd /usr/src rm -rf kern cddl sys/cddl sys/kern cvsup /usr/local/etc/cvsup-head patch -p0 -E < ~matt/zfs/zfs_20080727.patch make buildkernel Matt On Aug 13, 2008, at 2:15 AM, Gary Jennejohn wrote: > On Tue, 12 Aug 2008 16:58:04 -0700 > Matt Simerson wrote: > >> I applied the ZFS patch from http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >> to a copy of -HEAD checked out today and it fails as shown below. >> >> Any pointers on what needs to be tweaked to get past that? >> >> Matt >> >> cvsup /usr/local/etc/cvsup-head >> cd /usr/src >> patch -p0 < ~matt/zfs/zfs_20080727.patch >> rm /usr/src/sys/cddl/compat/opensolaris/sys/acl.h >> rm /usr/src/sys/cddl/compat/opensolaris/sys/callb.h >> cd /usr/src && make buildkernel >> >> >> > [error messages snip] > > I just applied it using ``patch -p0 -E'' and had no problem building a > kernel. > > --- > Gary Jennejohn From timbob at bigpond.com Thu Aug 14 01:11:03 2008 From: timbob at bigpond.com (Timothy Bourke) Date: Thu Aug 14 01:11:10 2008 Subject: msdosfs for an iriver x20 Message-ID: <20080814005210.GB1057@triptrop.cse.unsw.EDU.AU> The iriver x20 portable media player in MSC mode is detected by the umass driver but the internal flash memory cannot be mounted (7.6GB FAT32 filesystem on an unsliced disk) under 6.3-RELEASE. The msdos file system routines detect: pmp->pm_SecPerTrack=64 (0x40) The patch below fixes the problem. It looks like HEAD contains more general improvements that should also work but RELENG_6 does not. Would it be worth MFCing the new changes or committing the attached patch before the 6.4 release? Tim. --- sys/fs/msdosfs/msdosfs_vfsops.c.orig 2008-08-14 09:43:06.000000000 +1000 +++ sys/fs/msdosfs/msdosfs_vfsops.c 2008-08-14 09:43:19.000000000 +1000 @@ -504,7 +504,7 @@ #ifdef PC98 || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 255) { #else - || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 63) { + || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 64) { #endif error = EINVAL; goto error_exit; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080814/7ceaba8c/attachment.pgp From kostikbel at gmail.com Thu Aug 14 11:00:13 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Aug 14 11:00:36 2008 Subject: msdosfs for an iriver x20 In-Reply-To: <20080814005210.GB1057@triptrop.cse.unsw.EDU.AU> References: <20080814005210.GB1057@triptrop.cse.unsw.EDU.AU> Message-ID: <20080814104709.GL1803@deviant.kiev.zoral.com.ua> On Thu, Aug 14, 2008 at 10:52:10AM +1000, Timothy Bourke wrote: > The iriver x20 portable media player in MSC mode is detected by the > umass driver but the internal flash memory cannot be mounted (7.6GB > FAT32 filesystem on an unsliced disk) under 6.3-RELEASE. > > The msdos file system routines detect: > pmp->pm_SecPerTrack=64 (0x40) > > The patch below fixes the problem. It looks like HEAD contains more > general improvements that should also work but RELENG_6 does not. > Would it be worth MFCing the new changes or committing the attached > patch before the 6.4 release? > > Tim. > > --- sys/fs/msdosfs/msdosfs_vfsops.c.orig 2008-08-14 09:43:06.000000000 +1000 > +++ sys/fs/msdosfs/msdosfs_vfsops.c 2008-08-14 09:43:19.000000000 +1000 > @@ -504,7 +504,7 @@ > #ifdef PC98 > || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 255) { > #else > - || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 63) { > + || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 64) { > #endif > error = EINVAL; > goto error_exit; > So, could you, please, confirm that the change below works correctly for you on RELENG_6 ? After your confirmation I will commit it into RELENG_6. I merged it to RELENG_7 exactly to be able to use iriver clix2. commit 89d237ece000e6ccf208553e95c72efdf217e792 Author: marcel Date: Thu Feb 21 03:19:46 2008 +0000 Don't check the bpbSecPerTrack and bpbHeads fields of the BPB. They are typically 0 on new ia64 systems. Since we don't use either field, there's no harm in not checking. git-svn-id: file:///usr/local/arch/freebsd/svn/base/head@176431 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f diff --git a/sys/fs/msdosfs/msdosfs_vfsops.c b/sys/fs/msdosfs/msdosfs_vfsops.c index 6834381..9bba037 100644 --- a/sys/fs/msdosfs/msdosfs_vfsops.c +++ b/sys/fs/msdosfs/msdosfs_vfsops.c @@ -508,14 +508,13 @@ mountmsdosfs(struct vnode *devvp, struct mount *mp, struct thread *td) /* calculate the ratio of sector size to DEV_BSIZE */ pmp->pm_BlkPerSec = pmp->pm_BytesPerSec / DEV_BSIZE; - /* XXX - We should probably check more values here */ - if (!pmp->pm_BytesPerSec || !SecPerClust - || !pmp->pm_Heads -#ifdef PC98 - || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 255) { -#else - || !pmp->pm_SecPerTrack || pmp->pm_SecPerTrack > 63) { -#endif + /* + * We don't check pm_Heads nor pm_SecPerTrack, because + * these may not be set for EFI file systems. We don't + * use these anyway, so we're unaffected if they are + * invalid. + */ + if (!pmp->pm_BytesPerSec || !SecPerClust) { error = EINVAL; goto error_exit; } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080814/68840c43/attachment.pgp From timbob at bigpond.com Thu Aug 14 12:00:02 2008 From: timbob at bigpond.com (Timothy Bourke) Date: Thu Aug 14 12:00:21 2008 Subject: msdosfs for an iriver x20 In-Reply-To: <20080814104709.GL1803@deviant.kiev.zoral.com.ua> References: <20080814005210.GB1057@triptrop.cse.unsw.EDU.AU> <20080814104709.GL1803@deviant.kiev.zoral.com.ua> Message-ID: <20080814113256.GA1029@triptrop.cse.unsw.EDU.AU> Thanks Kostik, On Aug 14 at 13:47 +0300, Kostik Belousov wrote: > So, could you, please, confirm that the change below works correctly for > you on RELENG_6 ? After your confirmation I will commit it into RELENG_6. > I merged it to RELENG_7 exactly to be able to use iriver clix2. I can't easily test RELENG_6, but with your patch I can mount the x20 file system under 6.3-RELEASE. Tim. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080814/052bc0c1/attachment.pgp From kostikbel at gmail.com Thu Aug 14 12:34:52 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Aug 14 12:34:59 2008 Subject: msdosfs for an iriver x20 In-Reply-To: <20080814113256.GA1029@triptrop.cse.unsw.EDU.AU> References: <20080814005210.GB1057@triptrop.cse.unsw.EDU.AU> <20080814104709.GL1803@deviant.kiev.zoral.com.ua> <20080814113256.GA1029@triptrop.cse.unsw.EDU.AU> Message-ID: <20080814123445.GN1803@deviant.kiev.zoral.com.ua> On Thu, Aug 14, 2008 at 09:32:56PM +1000, Timothy Bourke wrote: > Thanks Kostik, > > On Aug 14 at 13:47 +0300, Kostik Belousov wrote: > > So, could you, please, confirm that the change below works correctly for > > you on RELENG_6 ? After your confirmation I will commit it into RELENG_6. > > I merged it to RELENG_7 exactly to be able to use iriver clix2. > > I can't easily test RELENG_6, but with your patch I can mount the x20 > file system under 6.3-RELEASE. > > Tim. > Thanks, committed as r181729. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080814/9789c08f/attachment.pgp From test at airbites.pl Thu Aug 14 18:22:05 2008 From: test at airbites.pl (D5YBeO@pIU271iL2.com) Date: Thu Aug 14 18:22:36 2008 Subject: Saludamos! Message-ID: <616F7D94.DC11F0D2@airbites.pl> ¡Buenos días! Nosotros somos vendedores de televisores plasma de las marcas famosas: Samsung, LG, Sony, Philips. Ahora estamos seleccionando consultantes a distancia para nuestros clientes. Le garantizamos de 200 a 500 € por semana, con su dedicación de 1 a 5 horas por día. Usted tan solo necesita tener un ordenador y un teléfono. Para recibir la descripción detallada del trabajo, puede Usted escribir a nuestro email: Alessio.Webdreamers.31@gmail.com Por favor, indique en su carta su nombre, edad y ciudad de residencia. From gw.freebsd at tnode.com Fri Aug 15 13:50:52 2008 From: gw.freebsd at tnode.com (GW) Date: Fri Aug 15 13:51:00 2008 Subject: fchroot on unionfs In-Reply-To: <48A0B6E5.3000000@skoberne.net> References: <48A0B6E5.3000000@skoberne.net> Message-ID: <48A58905.5080100@tnode.com> Nejc ?koberne wrote: > I have a strange problem with Apache not seeing the lower layer of > unionfs. Using > ktrace on Apache I have written this C code: > So without fchdir() call this program just displays (first 511 bytes) of > /etc/hosts. > If I uncomment fchdir() call with precedent open(".",...) call, I get this: Hm, I tried to recreate this, but all attempts failed (mount above/below, run program directly/in chroot/in jail and under root or www user). So it seems that it is either fixed or something else on the system was set differently (sysctl?). lp, gregor From ler at lerctr.org Sun Aug 17 19:38:58 2008 From: ler at lerctr.org (Larry Rosenman) Date: Sun Aug 17 19:39:12 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080817142354.X2181@borg> On Sun, 27 Jul 2008, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > > I upgraded from 7-STABLE to 8-CURRENT yesterday with no issues, so I figured I'd try these. No issues, and now happily running zpool version 11 zfs version 3 on all my fs's (except root :) ) Thank You pjd! (amd64, 4g real, 6 SATA 400g disks). -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From ler at lerctr.org Sun Aug 17 22:14:03 2008 From: ler at lerctr.org (Larry Rosenman) Date: Sun Aug 17 22:14:16 2008 Subject: ZFS patches. In-Reply-To: <20080817142354.X2181@borg> References: <20080727125413.GG1345@garage.freebsd.pl> <20080817142354.X2181@borg> Message-ID: <20080817171314.M3100@borg> On Sun, 17 Aug 2008, Larry Rosenman wrote: > On Sun, 27 Jul 2008, Pawel Jakub Dawidek wrote: > >> Hi. >> >> http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >> >> The patch above contains the most recent ZFS version that could be found >> in OpenSolaris as of today. Apart for large amount of new functionality, >> I belive there are many stability (and also performance) improvements >> compared to the version from the base system. >> >> Check out OpenSolaris website to find out the differences between base >> system version and patch version. >> >> Please test, test, test. If I get enough positive feedback, I may be >> able to squeeze it into 7.1-RELEASE, but this might be hard. >> >> If you have any questions, please use mailing lists >> (freebsd-fs@FreeBSD.org would be the best). >> >> Thank you in advance! >> >> > I upgraded from 7-STABLE to 8-CURRENT yesterday with no issues, so I figured > I'd try these. > > No issues, and now happily running zpool version 11 zfs version 3 on all my > fs's (except root :) ) > One comment, when I issue zfs commands against filesystems I get the following: WARNING pid 2412 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2412 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2479 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2479 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2479 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2479 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2494 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2494 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2494 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 WARNING pid 2494 (zfs): ioctl sign-extension ioctl ffffffffcc285a12 > Thank You pjd! > > (amd64, 4g real, 6 SATA 400g disks). > > > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From bugmaster at FreeBSD.org Mon Aug 18 11:06:49 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 18 11:07:36 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200808181106.m7IB6mQc079790@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file 9 problems total. From weldon at excelsus.com Mon Aug 18 19:29:15 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Mon Aug 18 19:29:22 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080806101621.H24586@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> Message-ID: <20080814091337.Y94482@emmett.excelsus.com> Update on what else I have tried (all yeild same results, same backtraces, no indication in logs/console of why it is panicing other than page fault: (FYI--I have tried to load 8-CURRENT, but it panics during install on the Dell 2950-3 I am using, I see a patch for a newer port of zfs, that looks like for 8, is there a patch for 7.0-RELEASE?) I have tried breaking it into two smaller < 2TB filesystems and performed same test on one, still I tried disabling swap all together (although I wasn't swapping) I upped number of nfs daemons from 12 to 100 I turned on zfs debugging and WITNESS to see if anything would show, like locking issues (nothing shows) I ran loops every 3s to monitor max vnodes, kmem, and arc during testes and up until the panic nothing was climbing I turned off ZIL and disabled prefetch, the problem still occurs I didn't get a panic in these situations: I created a zfs mirror filesystem of only two drives (one on each chasis) and performed the test I took one drive, created a UFS filesystem and performed the test. If memory serves me right, sometime around Aug 6, Weldon S Godfrey 3 told me: > > Hello, > > Please forgive me, I didn't really see this discussed in the archives but I am > wondering if anyone has seen this issue. I can replicate this issue under > FreeBSD amd64 7.0-RELEASE and the latest -STABLE (RELENG_7). I do not > replicate any problems running 9 instances of postmark on the machine > directly, so the issue appears to be isolated with NFS. > > There are backtraces and more information in ticket kern/124280 > > I am experiencing random kernel panics while running postmark benchmark from 9 > NFS clients (clients on RedHat) to a 3TB ZFS filesystem exported with NFS. > The panics happen as soon as 5 mins from starting the benchmark or may take > hours before it panics and reboots. It doesn't correspond to a time a cron > job is going on. I am using the following settings in postmark: > > set number 20000 > set transactions 10000000 > set subdirectories 1000 > set size 10000 15000 > set report verbose > set location /var/mail/store1/X (where X is a number 1-9 so each is operating > in its own tree) > > The problem happens if I run 1 postmark on 9 NFS clients at the same time > (each client is its own server) or if I run 9 postmarks on one NFS client. > > commands used to create filesystem: > zpool create tank mirror da0 da12 mirror da1 da13 mirror da2 da14 mirror da3 > da15\ > mirror da4 da16 mirror da5 da17 mirror da6 da18 mirror da7 da19 mirror da8 > da20 \ > mirror da9 da21 mirror da10 da22 spare da11 da23 > zfs set atime=off tank > zfs create tank/mail > zfs set mountpoint=/var/mail tank/mail > zfs set sharenfs="-maproot=root -network 192.168.2.0 -mask 255.255.255.0" > tank/mail > > I am using a 3ware 9690 SAS controller. I have 2 IBM EXP3000 enclosures, each > drive is shown as single disk by the controller. > > > this is my loader.conf: > vm.kmem_size_max="1073741824" > vm.kmem_size="1073741824" > kern.maxvnodes="800000" > vfs.zfs.prefetch_disable="1" > vfs.zfs.cache_flush_disable="1" > > (I should note that kern.maxnodes in loader.conf does not appear to do > anything, after boot, it is shown to be at 100000 with sysctl. It does change > to 800000 if I manually set it with sysctl. However it appears my vnode usage > sits at around 25-26K and is near that within 5s of the panic. > > The server has 16GB of RAM, and 2 quad core XEON processors. > > This server is only a NFS fileserver. The only non-default daemon running is > sshd. It is running the GENERIC kernel, right now, unmodified. > > I am using two NICs. NFS is exported only on the secondary NIC. Each NIC is > in it's own subnet. > > > nothing in /var/log/messages near time of panic except: > Aug 6 08:45:30 store1 savecore: reboot after panic: page fault > Aug 6 08:45:30 store1 savecore: writing core to vmcore.2 > > I can provide cores if needed. > > Thank you for your time! > > Weldon > > > > kgdb with backtrace: > > store1# kgdb kernel.debug /var/crash/vmcore.2 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 5; apic id = 05 > fault virtual address = 0xdc > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff8063b3d8 > stack pointer = 0x10:0xffffffffdfbc5720 > frame pointer = 0x10:0xffffff00543ed000 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 839 (nfsd) > trap number = 12 > panic: page fault > cpuid = 5 > Uptime: 18m53s > Physical memory: 16366 MB > Dumping 1991 MB: 1976 1960 1944 1928 1912 1896 1880 1864 1848 1832 1816 1800 > 1784 1768 1752 1736 1720 1704 1688 1672 1656 1640 1624 1608 1592 1576 1560 > 1544 1528 1512 1496 1480 1464 1448 1432 1416 1400 1384 1368 1352 1336 1320 > 1304 1288 1272 1256 1240 1224 1208 1192 1176 1160 1144 1128 1112 1096 1080 > 1064 1048 1032 1016 1000 984 968 952 936 920 904 888 872 856 840 824 808 792 > 776 760 744 728 712 696 680 664 648 632 616 600 584 568 552 536 520 504 488 > 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 > 168 152 136 120 104 88 72 56 40 24 8 > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /boot/kernel/zfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > #0 doadump () at pcpu.h:194 > 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); > (kgdb) backtrace > #0 doadump () at pcpu.h:194 > #1 0x0000000000000004 in ?? () > #2 0xffffffff804a7049 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:418 > #3 0xffffffff804a744d in panic (fmt=0x104
) at > /usr/src/sys/kern/kern_shutdown.c:572 > #4 0xffffffff807780e4 in trap_fatal (frame=0xffffff000bce26c0, > eva=18446742974395967712) > at /usr/src/sys/amd64/amd64/trap.c:724 > #5 0xffffffff807784b5 in trap_pfault (frame=0xffffffffdfbc5670, usermode=0) > at /usr/src/sys/amd64/amd64/trap.c:641 > #6 0xffffffff80778de8 in trap (frame=0xffffffffdfbc5670) at > /usr/src/sys/amd64/amd64/trap.c:410 > #7 0xffffffff8075e7ce in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:169 > #8 0xffffffff8063b3d8 in nfsrv_access (vp=0xffffff00207d7dc8, flags=128, > cred=0xffffff00403d4800, rdonly=0, > td=0xffffff000bce26c0, override=0) at > /usr/src/sys/nfsserver/nfs_serv.c:4284 > #9 0xffffffff8063c4f1 in nfsrv3_access (nfsd=0xffffff00543ed000, > slp=0xffffff0006396d00, td=0xffffff000bce26c0, > mrq=0xffffffffdfbc5af0) at /usr/src/sys/nfsserver/nfs_serv.c:234 > #10 0xffffffff8064cd1d in nfssvc (td=Variable "td" is not available. > ) at /usr/src/sys/nfsserver/nfs_syscalls.c:456 > #11 0xffffffff80778737 in syscall (frame=0xffffffffdfbc5c70) at > /usr/src/sys/amd64/amd64/trap.c:852 > #12 0xffffffff8075e9db in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:290 > #13 0x0000000800687acc in ?? () > Previous frame inner to this frame (corrupt stack?) > From rmacklem at uoguelph.ca Mon Aug 18 20:24:05 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Mon Aug 18 20:24:12 2008 Subject: Which GSSAPI library does FreeBSD use? In-Reply-To: References: <86myk06e18.fsf@ds4.des.no> <326AF658-D96D-4410-9E32-0001FF8264AA@rabson.org> Message-ID: On Fri, 8 Aug 2008, Doug Rabson wrote: > > Don't use static linking? > Just to let everyone know, with help from Doug I have gotten a gssd.c to work with the libraries in FreeBSD-CURRENT and it has been uploaded to the Perforce server. I don't know exactly why it would crash in the gss_acquire_cred() call when dynaically linked, but it no longer does. (I changed to specifying the Kerberos mechanism explicitly instead of letting the library function work through the mechanism list, which might explain it. Anyhow, I'm a happy camper now, rick From caelian at gmail.com Wed Aug 20 16:19:30 2008 From: caelian at gmail.com (Pascal Hofstee) Date: Wed Aug 20 16:19:43 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: On Sun, Jul 27, 2008 at 2:54 PM, Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 Just thought i'd give a slight HEADS UP for those of us running with this ZFS patch. With the recent VIMAGE code having hit the CURRENT source tree, the above patch fails to apply in a single (very minor way) in sys/kern/kern_jail.c contents of sys/kern/kern_jail.c.rej *************** *** 34,39 **** #include #include #include #include #include --- 34,40 ---- #include #include #include + #include #include #include The above rejection is caused by the inclusion of the header immediately after the inclusion in a fresh CURRENT source tree. So when you next update your CURRENT tree ... keep in mind that you will need to manually apply this part of the zfs-patchset (until somebody is kind enough to provide an updated patchset). -- Pascal Hofstee From weldon at excelsus.com Wed Aug 20 18:43:39 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Wed Aug 20 18:43:46 2008 Subject: ZFS patches. In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <20080820143851.D76650@emmett.excelsus.com> I installed the latest from 8-HEAD from cvs today, and applied the ZFS patches in http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2. It compiled fine and came up fine. However, every zfs command yeilds an out of memory error: store1# zpool list internal error: out of memory store1# zpool destroy store1-1 internal error: out of memory store1# zpool status internal error: out of memory nothing mounts, I can't create since it says the devices are in use. loader.conf: vm.kmem_size_max="16106127360" vm.kmem_size="1073741824" kern.maxvnodes="800000" vfs.zfs.debug="1" #vfs.zfs.zil_disable="1" vfs.zfs.prefetch_disable="1" I verified settings took with sysctl -a. I 1st tried with zfs zil disabled. From ler at lerctr.org Wed Aug 20 19:10:43 2008 From: ler at lerctr.org (Larry Rosenman) Date: Wed Aug 20 19:10:55 2008 Subject: ZFS patches. In-Reply-To: <20080820143851.D76650@emmett.excelsus.com> References: <20080727125413.GG1345@garage.freebsd.pl> <20080820143851.D76650@emmett.excelsus.com> Message-ID: <019b01c902f8$6ffaeb70$4ff0c250$@org> did you install the new userland as well? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On Behalf Of Weldon S Godfrey 3 Sent: Wednesday, August 20, 2008 1:44 PM To: Pawel Jakub Dawidek Cc: freebsd-fs@freebsd.org; freebsd-current@freebsd.org Subject: Re: ZFS patches. I installed the latest from 8-HEAD from cvs today, and applied the ZFS patches in http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2. It compiled fine and came up fine. However, every zfs command yeilds an out of memory error: store1# zpool list internal error: out of memory store1# zpool destroy store1-1 internal error: out of memory store1# zpool status internal error: out of memory nothing mounts, I can't create since it says the devices are in use. loader.conf: vm.kmem_size_max="16106127360" vm.kmem_size="1073741824" kern.maxvnodes="800000" vfs.zfs.debug="1" #vfs.zfs.zil_disable="1" vfs.zfs.prefetch_disable="1" I verified settings took with sysctl -a. I 1st tried with zfs zil disabled. _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From weldon at excelsus.com Wed Aug 20 19:23:13 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Wed Aug 20 19:23:25 2008 Subject: ZFS patches. In-Reply-To: <019b01c902f8$6ffaeb70$4ff0c250$@org> References: <20080727125413.GG1345@garage.freebsd.pl> <20080820143851.D76650@emmett.excelsus.com> <019b01c902f8$6ffaeb70$4ff0c250$@org> Message-ID: <20080820152239.R76650@emmett.excelsus.com> No, but that is a good point (sorry, my thick head thought it was only a kernel mod)...I'll do that. Thanks If memory serves me right, sometime around 2:10pm, Larry Rosenman told me: > did you install the new userland as well? > > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 512-248-2683 E-Mail: ler@lerctr.org > US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 > > > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On > Behalf Of Weldon S Godfrey 3 > Sent: Wednesday, August 20, 2008 1:44 PM > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org; freebsd-current@freebsd.org > Subject: Re: ZFS patches. > > > I installed the latest from 8-HEAD from cvs today, and applied the ZFS > patches in http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2. > > It compiled fine and came up fine. However, every zfs command yeilds an > out of memory error: > > store1# zpool list > internal error: out of memory > store1# zpool destroy store1-1 > internal error: out of memory > store1# zpool status > internal error: out of memory > > nothing mounts, I can't create since it says the devices are in use. > > loader.conf: > vm.kmem_size_max="16106127360" > vm.kmem_size="1073741824" > kern.maxvnodes="800000" > vfs.zfs.debug="1" > #vfs.zfs.zil_disable="1" > vfs.zfs.prefetch_disable="1" > > I verified settings took with sysctl -a. I 1st tried with zfs zil > disabled. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From lopez.on.the.lists at yellowspace.net Thu Aug 21 12:54:06 2008 From: lopez.on.the.lists at yellowspace.net (Lorenzo Perone) Date: Thu Aug 21 12:54:14 2008 Subject: ZFS patches In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: Hi, Just let me intro this mail with a "Sorry for asking..." as I know the efforts already ongoing ar huge and I do respect this! But, here it is: any chances to see these patches on 7-STABLE anytime... soon? I think there would be many more testers available (me included) than for HEAD. In my case, for example, all I could afford now is to set up a complete-test-only box with the HEAD code, which in turn wouldn't be a real test case as it would be "just" a test box for zfs. Whereas I could afford to test it in much more "real life" situation with 7-STABLE. My guess is that this would be the case for many others. The problem about HEAD is that there would be too many spots with potential problems (which ports work, which don't, scripts that might make 7-bound assumptions, etc..) so that I can't afford that for anything below "test only" boxes.. Just experienced a deadlock again on 7-STABLE with zfs, that's why I'm refreshing this... Kudos && Regards, Lorenzo From matt at corp.spry.com Thu Aug 21 18:15:31 2008 From: matt at corp.spry.com (Matt Simerson) Date: Thu Aug 21 18:15:39 2008 Subject: ZFS patches In-Reply-To: References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <9EA26FF4-3B5D-4C41-8A9D-50F752159566@corp.spry.com> It's still a bit too early for me to make any announcement about ZFS and stability on HEAD but I was having deadlocks on 7.0 every other day under my workload. I took the plunge and upgraded both my servers (which are now in production, BTW) to HEAD. I have one running HEAD without the latest patches and one with HEAD + patch and have not experienced a deadlock since the upgrade. FreeBSD back01.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Aug 15 16:42:36 PDT 2008 root@back01.int.spry.com:/usr/obj/usr/src/ sys/BACK01 amd64 FreeBSD back02.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #1: Wed Aug 13 13:57:19 PDT 2008 root@back02.int.spry.com:/usr/obj/usr/src/ sys/BACK02-HEAD amd64 It turns out that I disliked the known instability of ZFS and 7-STABLE than the unknown risks associated with HEAD. As always, YMMMV but since ZFS is still experimental, odds are good you'll have a better experience if you are willing to upgrade to -HEAD. Matt $ cat /boot/loader.conf vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.arc_min="16M" vfs.zfs.arc_max="64M" vfs.zfs.prefetch_disable=1 On Aug 21, 2008, at 5:44 AM, Lorenzo Perone wrote: > Hi, > > Just let me intro this mail with a "Sorry for asking..." > as I know the efforts already ongoing ar huge and I do > respect this! > > But, here it is: any chances to see these patches on > 7-STABLE anytime... soon? > > I think there would be many more testers available (me included) > than for HEAD. In my case, for example, all I could afford now > is to set up a complete-test-only box with the HEAD code, which in > turn wouldn't be a real test case as it would be "just" a test box > for zfs. > > Whereas I could afford to test it in much more "real life" > situation with 7-STABLE. > My guess is that this would be the case for many others. > > The problem about HEAD is that there would be too many > spots with potential problems (which ports work, which don't, > scripts that might make 7-bound assumptions, etc..) > so that I can't afford that for anything below "test only" boxes.. > > Just experienced a deadlock again on 7-STABLE with zfs, that's > why I'm refreshing this... > > Kudos && Regards, > > Lorenzo > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From weldon at excelsus.com Thu Aug 21 19:35:07 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Thu Aug 21 19:35:14 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080814091337.Y94482@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> Message-ID: <20080821153107.W76650@emmett.excelsus.com> Looks like the bug with NFS and ZFS still exists. Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran the benchmarks again this morning and after about an hour, it paniced with the same message about page fault with nfsd. It dropped to debugger on shutdown, it didn't do a savecore, dumpdev is set to AUTO. I will be more than happy to provide anything to assist in debugging it. Thanks! Weldon If memory serves me right, sometime around Monday, Weldon S Godfrey 3 told me: > > Update on what else I have tried (all yeild same results, same backtraces, no > indication in logs/console of why it is panicing other than page fault: > (FYI--I have tried to load 8-CURRENT, but it panics during install on the Dell > 2950-3 I am using, I see a patch for a newer port of zfs, that looks like for > 8, is there a patch for 7.0-RELEASE?) > > > I have tried breaking it into two smaller < 2TB filesystems and performed same > test on one, still > > I tried disabling swap all together (although I wasn't swapping) > > I upped number of nfs daemons from 12 to 100 > > I turned on zfs debugging and WITNESS to see if anything would show, like > locking issues (nothing shows) > > I ran loops every 3s to monitor max vnodes, kmem, and arc during testes and up > until the panic nothing was climbing > > I turned off ZIL and disabled prefetch, the problem still occurs > > > > > I didn't get a panic in these situations: > > I created a zfs mirror filesystem of only two drives (one on each chasis) and > performed the test > > I took one drive, created a UFS filesystem and performed the test. > > > > > If memory serves me right, sometime around Aug 6, Weldon S Godfrey 3 told me: > >> >> Hello, >> >> Please forgive me, I didn't really see this discussed in the archives but I >> am wondering if anyone has seen this issue. I can replicate this issue >> under FreeBSD amd64 7.0-RELEASE and the latest -STABLE (RELENG_7). I do not >> replicate any problems running 9 instances of postmark on the machine >> directly, so the issue appears to be isolated with NFS. >> >> There are backtraces and more information in ticket kern/124280 >> >> I am experiencing random kernel panics while running postmark benchmark from >> 9 NFS clients (clients on RedHat) to a 3TB ZFS filesystem exported with NFS. >> The panics happen as soon as 5 mins from starting the benchmark or may take >> hours before it panics and reboots. It doesn't correspond to a time a cron >> job is going on. I am using the following settings in postmark: >> >> set number 20000 >> set transactions 10000000 >> set subdirectories 1000 >> set size 10000 15000 >> set report verbose >> set location /var/mail/store1/X (where X is a number 1-9 so each is >> operating in its own tree) >> >> The problem happens if I run 1 postmark on 9 NFS clients at the same time >> (each client is its own server) or if I run 9 postmarks on one NFS client. >> >> commands used to create filesystem: >> zpool create tank mirror da0 da12 mirror da1 da13 mirror da2 da14 mirror da3 >> da15\ >> mirror da4 da16 mirror da5 da17 mirror da6 da18 mirror da7 da19 mirror da8 >> da20 \ >> mirror da9 da21 mirror da10 da22 spare da11 da23 >> zfs set atime=off tank >> zfs create tank/mail >> zfs set mountpoint=/var/mail tank/mail >> zfs set sharenfs="-maproot=root -network 192.168.2.0 -mask 255.255.255.0" >> tank/mail >> >> I am using a 3ware 9690 SAS controller. I have 2 IBM EXP3000 enclosures, >> each drive is shown as single disk by the controller. >> >> >> this is my loader.conf: >> vm.kmem_size_max="1073741824" >> vm.kmem_size="1073741824" >> kern.maxvnodes="800000" >> vfs.zfs.prefetch_disable="1" >> vfs.zfs.cache_flush_disable="1" >> >> (I should note that kern.maxnodes in loader.conf does not appear to do >> anything, after boot, it is shown to be at 100000 with sysctl. It does >> change to 800000 if I manually set it with sysctl. However it appears my >> vnode usage sits at around 25-26K and is near that within 5s of the panic. >> >> The server has 16GB of RAM, and 2 quad core XEON processors. >> >> This server is only a NFS fileserver. The only non-default daemon running >> is sshd. It is running the GENERIC kernel, right now, unmodified. >> >> I am using two NICs. NFS is exported only on the secondary NIC. Each NIC >> is in it's own subnet. >> >> >> nothing in /var/log/messages near time of panic except: >> Aug 6 08:45:30 store1 savecore: reboot after panic: page fault >> Aug 6 08:45:30 store1 savecore: writing core to vmcore.2 >> >> I can provide cores if needed. >> >> Thank you for your time! >> >> Weldon >> >> >> >> kgdb with backtrace: >> >> store1# kgdb kernel.debug /var/crash/vmcore.2 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 5; apic id = 05 >> fault virtual address = 0xdc >> fault code = supervisor read data, page not present >> instruction pointer = 0x8:0xffffffff8063b3d8 >> stack pointer = 0x10:0xffffffffdfbc5720 >> frame pointer = 0x10:0xffffff00543ed000 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 839 (nfsd) >> trap number = 12 >> panic: page fault >> cpuid = 5 >> Uptime: 18m53s >> Physical memory: 16366 MB >> Dumping 1991 MB: 1976 1960 1944 1928 1912 1896 1880 1864 1848 1832 1816 1800 >> 1784 1768 1752 1736 1720 1704 1688 1672 1656 1640 1624 1608 1592 1576 1560 >> 1544 1528 1512 1496 1480 1464 1448 1432 1416 1400 1384 1368 1352 1336 1320 >> 1304 1288 1272 1256 1240 1224 1208 1192 1176 1160 1144 1128 1112 1096 1080 >> 1064 1048 1032 1016 1000 984 968 952 936 920 904 888 872 856 840 824 808 792 >> 776 760 744 728 712 696 680 664 648 632 616 600 584 568 552 536 520 504 488 >> 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 >> 168 152 136 120 104 88 72 56 40 24 8 >> >> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from >> /boot/kernel/zfs.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/zfs.ko >> #0 doadump () at pcpu.h:194 >> 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); >> (kgdb) backtrace >> #0 doadump () at pcpu.h:194 >> #1 0x0000000000000004 in ?? () >> #2 0xffffffff804a7049 in boot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:418 >> #3 0xffffffff804a744d in panic (fmt=0x104
) at >> /usr/src/sys/kern/kern_shutdown.c:572 >> #4 0xffffffff807780e4 in trap_fatal (frame=0xffffff000bce26c0, >> eva=18446742974395967712) >> at /usr/src/sys/amd64/amd64/trap.c:724 >> #5 0xffffffff807784b5 in trap_pfault (frame=0xffffffffdfbc5670, usermode=0) >> at /usr/src/sys/amd64/amd64/trap.c:641 >> #6 0xffffffff80778de8 in trap (frame=0xffffffffdfbc5670) at >> /usr/src/sys/amd64/amd64/trap.c:410 >> #7 0xffffffff8075e7ce in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:169 >> #8 0xffffffff8063b3d8 in nfsrv_access (vp=0xffffff00207d7dc8, flags=128, >> cred=0xffffff00403d4800, rdonly=0, >> td=0xffffff000bce26c0, override=0) at >> /usr/src/sys/nfsserver/nfs_serv.c:4284 >> #9 0xffffffff8063c4f1 in nfsrv3_access (nfsd=0xffffff00543ed000, >> slp=0xffffff0006396d00, td=0xffffff000bce26c0, >> mrq=0xffffffffdfbc5af0) at /usr/src/sys/nfsserver/nfs_serv.c:234 >> #10 0xffffffff8064cd1d in nfssvc (td=Variable "td" is not available. >> ) at /usr/src/sys/nfsserver/nfs_syscalls.c:456 >> #11 0xffffffff80778737 in syscall (frame=0xffffffffdfbc5c70) at >> /usr/src/sys/amd64/amd64/trap.c:852 >> #12 0xffffffff8075e9db in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:290 >> #13 0x0000000800687acc in ?? () >> Previous frame inner to this frame (corrupt stack?) >> > From koitsu at FreeBSD.org Thu Aug 21 19:47:43 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 21 19:47:49 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080821153107.W76650@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> Message-ID: <20080821194742.GA19362@eos.sc1.parodius.com> On Thu, Aug 21, 2008 at 03:35:04PM -0400, Weldon S Godfrey 3 wrote: > Looks like the bug with NFS and ZFS still exists. > > Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran > the benchmarks again this morning and after about an hour, it paniced > with the same message about page fault with nfsd. It dropped to debugger > on shutdown, it didn't do a savecore, dumpdev is set to AUTO. Specifically regarding the debugger/didn't run savecore/dumpdev statement: What exactly did you type once at the debugger prompt? It matters. There's also this, which I reported nearly a year ago: http://www.freebsd.org/cgi/query-pr.cgi?pr=conf/118255 I haven't been able to reproduce my above PR on RELENG_7, but I'm unaware of anything that might have changed in RELENG_7 that fixes this problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From pjd at FreeBSD.org Thu Aug 21 19:50:42 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Thu Aug 21 19:50:49 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080821153107.W76650@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> Message-ID: <20080821195043.GA1585@garage.freebsd.pl> On Thu, Aug 21, 2008 at 03:35:04PM -0400, Weldon S Godfrey 3 wrote: > > Looks like the bug with NFS and ZFS still exists. > > Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran > the benchmarks again this morning and after about an hour, it paniced with > the same message about page fault with nfsd. It dropped to debugger on > shutdown, it didn't do a savecore, dumpdev is set to AUTO. > > I will be more than happy to provide anything to assist in debugging it. [...] > (kgdb) backtrace > #0 doadump () at pcpu.h:194 > #1 0x0000000000000004 in ?? () > #2 0xffffffff804a7049 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 > #3 0xffffffff804a744d in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:572 > #4 0xffffffff807780e4 in trap_fatal (frame=0xffffff000bce26c0, eva=18446742974395967712) at /usr/src/sys/amd64/amd64/trap.c:724 > #5 0xffffffff807784b5 in trap_pfault (frame=0xffffffffdfbc5670, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 > #6 0xffffffff80778de8 in trap (frame=0xffffffffdfbc5670) at /usr/src/sys/amd64/amd64/trap.c:410 > #7 0xffffffff8075e7ce in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 > #8 0xffffffff8063b3d8 in nfsrv_access (vp=0xffffff00207d7dc8, flags=128, cred=0xffffff00403d4800, rdonly=0, td=0xffffff000bce26c0, override=0) at /usr/src/sys/nfsserver/nfs_serv.c:4284 > #9 0xffffffff8063c4f1 in nfsrv3_access (nfsd=0xffffff00543ed000, slp=0xffffff0006396d00, td=0xffffff000bce26c0, mrq=0xffffffffdfbc5af0) at /usr/src/sys/nfsserver/nfs_serv.c:234 > #10 0xffffffff8064cd1d in nfssvc (td=Variable "td" is not available.) at /usr/src/sys/nfsserver/nfs_syscalls.c:456 > #11 0xffffffff80778737 in syscall (frame=0xffffffffdfbc5c70) at /usr/src/sys/amd64/amd64/trap.c:852 > #12 0xffffffff8075e9db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 Can you tell me how exactly line 4284 of sys/nfsserver/nfs_serv.c looks in your source? -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080821/bcf511b8/attachment.pgp From olli at lurza.secnetix.de Thu Aug 21 20:16:58 2008 From: olli at lurza.secnetix.de (Oliver Fromme) Date: Thu Aug 21 20:17:10 2008 Subject: ZFS patches In-Reply-To: <9EA26FF4-3B5D-4C41-8A9D-50F752159566@corp.spry.com> Message-ID: <200808212016.m7LKGpkC019592@lurza.secnetix.de> Matt Simerson wrote: > [...] > FreeBSD back01.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri > Aug 15 16:42:36 PDT 2008 root@back01.int.spry.com:/usr/obj/usr/src/ > sys/BACK01 amd64 > > FreeBSD back02.int.spry.com 8.0-CURRENT FreeBSD 8.0-CURRENT #1: Wed > Aug 13 13:57:19 PDT 2008 root@back02.int.spry.com:/usr/obj/usr/src/ > sys/BACK02-HEAD amd64 > [...] > $ cat /boot/loader.conf > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" I think those two lines can be removed, thanks to Alan Cox' recent improvents of the kmem addressing on amd64. > vfs.zfs.arc_min="16M" > vfs.zfs.arc_max="64M" > vfs.zfs.prefetch_disable=1 Those are probably OK. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing." -- Dick Brandon From koitsu at FreeBSD.org Thu Aug 21 20:36:02 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Thu Aug 21 20:36:09 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080821155118.C76650@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080821155118.C76650@emmett.excelsus.com> Message-ID: <20080821203602.GA22354@eos.sc1.parodius.com> On Thu, Aug 21, 2008 at 03:55:09PM -0400, Weldon S Godfrey 3 wrote: > To be hostest, I told it to reboot. Sorry, I am not familiar with the > debugger and I didn't see (but I often overlook) anything to would > initiate a savecore when I typed help. (although I could have tried go, > next, or whatever command would force it to step ahead, I didn't try > that) If you know the command, I can repeat the test tomorrow and type > the right thing. I do have cores from the crashes with 7.0. First, please do not remove the mailing list from the CC list; I've re-added it. People need to know what you've said. :-) If my memory serves me correctly, the problem is that you typed "reboot" and not "panic". I think this causes the machine to simply reboot without dumping memory contents to swap, thus savecore won't find any panic image in swap when the machine restarts. Others should be able to help you through using the kernel debugger. > If memory serves me right, sometime around 12:47pm, Jeremy Chadwick told me: > >> On Thu, Aug 21, 2008 at 03:35:04PM -0400, Weldon S Godfrey 3 wrote: >>> Looks like the bug with NFS and ZFS still exists. >>> >>> Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran >>> the benchmarks again this morning and after about an hour, it paniced >>> with the same message about page fault with nfsd. It dropped to debugger >>> on shutdown, it didn't do a savecore, dumpdev is set to AUTO. >> >> Specifically regarding the debugger/didn't run savecore/dumpdev >> statement: >> >> What exactly did you type once at the debugger prompt? It matters. >> >> There's also this, which I reported nearly a year ago: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=conf/118255 >> >> I haven't been able to reproduce my above PR on RELENG_7, but I'm >> unaware of anything that might have changed in RELENG_7 that fixes this >> problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From weldon at excelsus.com Thu Aug 21 20:39:38 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Thu Aug 21 20:39:46 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080821203602.GA22354@eos.sc1.parodius.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080821155118.C76650@emmett.excelsus.com> <20080821203602.GA22354@eos.sc1.parodius.com> Message-ID: <20080821163736.X76650@emmett.excelsus.com> Thank would make sense, since panic dumps (I stupidly thought it already panic). I'll repeat tomorrow and do that. If anyone wants me to do anything else during this process, let me know. thanks! Weldon If memory serves me right, sometime around 1:36pm, Jeremy Chadwick told me: > On Thu, Aug 21, 2008 at 03:55:09PM -0400, Weldon S Godfrey 3 wrote: >> To be hostest, I told it to reboot. Sorry, I am not familiar with the >> debugger and I didn't see (but I often overlook) anything to would >> initiate a savecore when I typed help. (although I could have tried go, >> next, or whatever command would force it to step ahead, I didn't try >> that) If you know the command, I can repeat the test tomorrow and type >> the right thing. I do have cores from the crashes with 7.0. > > First, please do not remove the mailing list from the CC list; I've > re-added it. People need to know what you've said. :-) > > If my memory serves me correctly, the problem is that you typed "reboot" > and not "panic". I think this causes the machine to simply reboot > without dumping memory contents to swap, thus savecore won't find any > panic image in swap when the machine restarts. > > Others should be able to help you through using the kernel debugger. > From weldon at excelsus.com Fri Aug 22 16:02:49 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Fri Aug 22 16:03:00 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080821194742.GA19362@eos.sc1.parodius.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> Message-ID: <20080822115932.M76650@emmett.excelsus.com> Ok, I tried panic, it gave a page of the typical panic page that this crash generates under 7.0. I rebooted and no core, so I am missing a step. Sorry for being clueless here. Since the panic didn't reboot, I did a bt, it said process it was at process 1001 access.nfsrv and access.nfs3srv (sorry, I know that isn't quite right, I meant to write it down, it was definately something with access and nfsrv) Thanks, Weldon If memory serves me right, sometime around Yesterday, Jeremy Chadwick told me: > On Thu, Aug 21, 2008 at 03:35:04PM -0400, Weldon S Godfrey 3 wrote: >> Looks like the bug with NFS and ZFS still exists. >> >> Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran >> the benchmarks again this morning and after about an hour, it paniced >> with the same message about page fault with nfsd. It dropped to debugger >> on shutdown, it didn't do a savecore, dumpdev is set to AUTO. > > Specifically regarding the debugger/didn't run savecore/dumpdev > statement: > > What exactly did you type once at the debugger prompt? It matters. > > There's also this, which I reported nearly a year ago: > http://www.freebsd.org/cgi/query-pr.cgi?pr=conf/118255 > > I haven't been able to reproduce my above PR on RELENG_7, but I'm > unaware of anything that might have changed in RELENG_7 that fixes this > problem. > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > From koitsu at FreeBSD.org Fri Aug 22 17:44:11 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Aug 22 17:44:59 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080822115932.M76650@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080822115932.M76650@emmett.excelsus.com> Message-ID: <20080822174411.GA89610@eos.sc1.parodius.com> On Fri, Aug 22, 2008 at 12:02:47PM -0400, Weldon S Godfrey 3 wrote: > Ok, I tried panic, it gave a page of the typical panic page that this > crash generates under 7.0. I rebooted and no core, so I am missing a > step. Sorry for being clueless here. Then you're probably being bit by what's listed in the below PR. Supposedly you can do "panic", it should dump memory contents to swap, then upon rebooting go into single-user mode, "mount -a", then run savecore. A real PITA, I know, but supposedly it works. I can't help with the cause of the actual panic, however; it's outside of my skillset. > Since the panic didn't reboot, I did a bt, it said process it was at > process 1001 access.nfsrv and access.nfs3srv (sorry, I know that isn't > quite right, I meant to write it down, it was definately something with > access and nfsrv) > > Thanks, > > Weldon > > > If memory serves me right, sometime around Yesterday, Jeremy Chadwick told me: > >> On Thu, Aug 21, 2008 at 03:35:04PM -0400, Weldon S Godfrey 3 wrote: >>> Looks like the bug with NFS and ZFS still exists. >>> >>> Well, I got the lastest 8-HEAD on with the most recent ZFS patch and ran >>> the benchmarks again this morning and after about an hour, it paniced >>> with the same message about page fault with nfsd. It dropped to debugger >>> on shutdown, it didn't do a savecore, dumpdev is set to AUTO. >> >> Specifically regarding the debugger/didn't run savecore/dumpdev >> statement: >> >> What exactly did you type once at the debugger prompt? It matters. >> >> There's also this, which I reported nearly a year ago: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=conf/118255 >> >> I haven't been able to reproduce my above PR on RELENG_7, but I'm >> unaware of anything that might have changed in RELENG_7 that fixes this >> problem. >> >> -- >> | Jeremy Chadwick jdc at parodius.com | >> | Parodius Networking http://www.parodius.com/ | >> | UNIX Systems Administrator Mountain View, CA, USA | >> | Making life hard for others since 1977. PGP: 4BD6C0CB | -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From weldon at excelsus.com Fri Aug 22 18:29:02 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Fri Aug 22 18:29:23 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080822174411.GA89610@eos.sc1.parodius.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080822115932.M76650@emmett.excelsus.com> <20080822174411.GA89610@eos.sc1.parodius.com> Message-ID: <20080822142834.J76650@emmett.excelsus.com> Thanks, I'll give that a try. If memory serves me right, sometime around 10:44am, Jeremy Chadwick told me: > On Fri, Aug 22, 2008 at 12:02:47PM -0400, Weldon S Godfrey 3 wrote: >> Ok, I tried panic, it gave a page of the typical panic page that this >> crash generates under 7.0. I rebooted and no core, so I am missing a >> step. Sorry for being clueless here. > > Then you're probably being bit by what's listed in the below PR. > Supposedly you can do "panic", it should dump memory contents to swap, > then upon rebooting go into single-user mode, "mount -a", then run > savecore. A real PITA, I know, but supposedly it works. > > I can't help with the cause of the actual panic, however; it's outside > of my skillset. > From randy at psg.com Fri Aug 22 23:28:33 2008 From: randy at psg.com (Randy Bush) Date: Fri Aug 22 23:28:40 2008 Subject: zfs bringing a new drive online Message-ID: <48AF4BA0.5040208@psg.com> one of the drives in my pool got funky. i put it offline 2008-07-26.16:23:44 zpool offline -t tank ad6s1 2008-07-26.16:24:28 zpool offline tank ad6s1 and then replaced it. i rebooted so the hw and driver would be happy about the drive, and then # zpool online tank ad6s1 Bringing device ad6s1 online # zpool status -x pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Fri Aug 22 23:14:44 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad4s2 ONLINE 0 0 0 ad8s2 ONLINE 0 0 0 ad6s1 UNAVAIL 0 0 0 cannot open ad10s1 ONLINE 0 0 0 errors: No known data errors smartctl seems to like the spindle # smartctl -a /dev/ad6 smartctl version 5.38 [amd64-portbld-freebsd8.0] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3320620AS Serial Number: 6QF3RPZC Firmware Version: 3.AAE User Capacity: 320,072,933,376 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Aug 22 23:25:37 2008 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 115) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 100 006 Pre-fail Always - 145017563 3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 279592 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 4 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 3 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 054 054 045 Old_age Always - 46 (Lifetime Min/Max 37/46) 194 Temperature_Celsius 0x0022 046 046 000 Old_age Always - 46 (0 37 0 0) 195 Hardware_ECC_Recovered 0x001a 063 060 000 Old_age Always - 159460935 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 253 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 4 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. clearly, i am not understanding something randy From des at des.no Sat Aug 23 11:41:36 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sat Aug 23 11:41:42 2008 Subject: ZFS patches In-Reply-To: (Lorenzo Perone's message of "Thu, 21 Aug 2008 14:44:00 +0200") References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <86wsi8dkls.fsf@ds4.des.no> Lorenzo Perone writes: > But, here it is: any chances to see these patches on 7-STABLE > anytime... soon? They're not even in HEAD yet, and the 7.1 release cycle starts in a few days, so no. DES -- Dag-Erling Sm?rgrav - des@des.no From des at des.no Sat Aug 23 11:45:44 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sat Aug 23 11:45:51 2008 Subject: zfs bringing a new drive online In-Reply-To: <48AF4BA0.5040208@psg.com> (Randy Bush's message of "Fri, 22 Aug 2008 16:28:32 -0700") References: <48AF4BA0.5040208@psg.com> Message-ID: <86skswdkew.fsf@ds4.des.no> Randy Bush writes: > one of the drives in my pool got funky. i put it offline > > 2008-07-26.16:23:44 zpool offline -t tank ad6s1 > 2008-07-26.16:24:28 zpool offline tank ad6s1 > > and then replaced it. i rebooted so the hw and driver would be happy > about the drive, and then > > # zpool online tank ad6s1 > Bringing device ad6s1 online The correct command is 'zpool replace tank ad6s1', as explained in the fine manual. BTW, it is generally a bad idea to feed ZFS slices instead of whole disks. DES -- Dag-Erling Sm?rgrav - des@des.no From randy at psg.com Sat Aug 23 15:36:17 2008 From: randy at psg.com (Randy Bush) Date: Sat Aug 23 15:36:23 2008 Subject: zfs bringing a new drive online In-Reply-To: <86skswdkew.fsf@ds4.des.no> References: <48AF4BA0.5040208@psg.com> <86skswdkew.fsf@ds4.des.no> Message-ID: <48B02E6F.6070008@psg.com> Dag-Erling Sm?rgrav wrote: > Randy Bush writes: >> one of the drives in my pool got funky. i put it offline >> >> 2008-07-26.16:23:44 zpool offline -t tank ad6s1 >> 2008-07-26.16:24:28 zpool offline tank ad6s1 >> >> and then replaced it. i rebooted so the hw and driver would be happy >> about the drive, and then >> >> # zpool online tank ad6s1 >> Bringing device ad6s1 online > > The correct command is 'zpool replace tank ad6s1', as explained in the > fine manual. thanks. read and reread man pages, not wikis. > BTW, it is generally a bad idea to feed ZFS slices instead of whole > disks. i have four drives, o two with gmirrored boot/root slices and the second slice for zfs o two zpooled disks so slicing was the mode of the day. maybe i need to rethink this? randy From cadfred at electronicbox.net Mon Aug 25 03:11:25 2008 From: cadfred at electronicbox.net (Fred) Date: Mon Aug 25 03:11:31 2008 Subject: Government funds available Message-ID: <20080825025333.4858A3D84A5@smtp2.electronicbox.net> Press Release 5:46:05 PM The American Grants and Loans Book is now available. This publication contains valuable information with more than 1800 financial programs, subsidies, scholarships, grants and loans offered by the United States federal government. It also includes over 700 financing programs put forth by various Foundations and Associations across the United States. Businesses, students, individuals, municipalities, government departments, institutions, foundations and associations will find a wealth of information that will help them with their new ventures or existing projects. What you get: -Description of Grant available -Url to government website -Full mailing address -Phone and fax number The Canadian Subsidy Directory is also available for Canada. CD version: $69.95 Printed version: $149.95 To order please call: 819-322-7533 If you do not wish to receive communication from us in the future please write "agl" in the subject line to: unsub2@hotpop.com Canada Books 833 Boise de la Riviere Prevost, Qc Canada J0R 1T0 From bugmaster at FreeBSD.org Mon Aug 25 11:06:50 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 25 11:07:42 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200808251106.m7PB6nnx027740@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 7 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file 9 problems total. From stefan.lambrev at moneybookers.com Mon Aug 25 11:24:26 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Mon Aug 25 11:24:37 2008 Subject: ZFS patches. In-Reply-To: <20080727125413.GG1345@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <48B2965E.20700@moneybookers.com> Hi, I can add work for me too. Will this patch be ready for 7.1R ? I guess/hope if not ready for 7.1R we will see it in 7.1-STABLE a week after 7.1R ? :) Pawel Jakub Dawidek wrote: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. > > Check out OpenSolaris website to find out the differences between base > system version and patch version. > > Please test, test, test. If I get enough positive feedback, I may be > able to squeeze it into 7.1-RELEASE, but this might be hard. > > If you have any questions, please use mailing lists > (freebsd-fs@FreeBSD.org would be the best). > > Thank you in advance! > > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From netslists at gmail.com Tue Aug 26 06:48:07 2008 From: netslists at gmail.com (Sten Daniel Soersdal) Date: Tue Aug 26 06:48:13 2008 Subject: Sector size of 4096 bytes (not 512) Message-ID: <48B3A11C.6030504@gmail.com> Does anyone know if i might run into any surprises if my hdd has 4096 byte sized sectors and not the regular 512, it is low-level formatted that way. I want to run a regular FreeBSD (v7) installation on it for a machine that goes into production relatively soon after that. I tried Googling but i have trouble finding definitive answers. I have noticed on the many lists that people writing device drivers and filesystem utils often assume it's 512B in size. My (lack of) confidence in my own understanding of the filesystem code makes me uncertain. (I looked in src/sys/boot and src/sys/ufs). -- Sten Daniel Soersdal From pluknet at gmail.com Tue Aug 26 07:37:34 2008 From: pluknet at gmail.com (pluknet) Date: Tue Aug 26 07:37:40 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <48B3A11C.6030504@gmail.com> References: <48B3A11C.6030504@gmail.com> Message-ID: 2008/8/26 Sten Daniel Soersdal : > > Does anyone know if i might run into any surprises if my hdd has 4096 byte > sized sectors and not the regular 512, it is low-level formatted that way. > I want to run a regular FreeBSD (v7) installation on it for a machine that > goes into production relatively soon after that. > I tried Googling but i have trouble finding definitive answers. > I have noticed on the many lists that people writing device drivers and > filesystem utils often assume it's 512B in size. > My (lack of) confidence in my own understanding of the filesystem code makes > me uncertain. (I looked in src/sys/boot and src/sys/ufs). > You should not change the sector size, because it is value of the physical parameter on disc (typically it's 512 for magnetic discs and 4096 for optical discs), not whatever logical value. You should better check tuning(7)/tunefs(8) man pages if you want to tune up your system. wbr, pluknet From phk at phk.freebsd.dk Tue Aug 26 07:43:36 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Aug 26 07:43:44 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: Your message of "Tue, 26 Aug 2008 02:22:20 -0400." <48B3A11C.6030504@gmail.com> Message-ID: <35438.1219736614@critter.freebsd.dk> In message <48B3A11C.6030504@gmail.com>, Sten Daniel Soersdal writes: > >Does anyone know if i might run into any surprises if my hdd has 4096 >byte sized sectors and not the regular 512, it is low-level formatted >that way. That's no problem, as long as the sectorsize is a power of two, larger than or equal to 512, you should have no trouble. >I want to run a regular FreeBSD (v7) installation on it for a machine >that goes into production relatively soon after that. In the disk menu where you create filesystems, you will need to use the set the newfs arguments (is it 'N' in the menu ?) to -b 32768 -f 4096 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Tue Aug 26 07:44:40 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Aug 26 07:44:45 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: Your message of "Tue, 26 Aug 2008 11:05:47 +0400." Message-ID: <35461.1219736678@critter.freebsd.dk> In message , plukn et writes: >2008/8/26 Sten Daniel Soersdal : >> >> Does anyone know if i might run into any surprises if my hdd has 4096 byte >> sized sectors and not the regular 512, it is low-level formatted that way. [...] >You should not change the sector size, because it is value of the >physical parameter on disc >(typically it's 512 for magnetic discs and 4096 for optical discs), >not whatever logical value. >You should better check tuning(7)/tunefs(8) man pages if you want to >tune up your system. This answer has nothing to do with the question asked, and is wrong in just about every way it can be. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From des at des.no Tue Aug 26 11:06:34 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Aug 26 11:06:40 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <35438.1219736614@critter.freebsd.dk> (Poul-Henning Kamp's message of "Tue, 26 Aug 2008 07:43:34 +0000") References: <35438.1219736614@critter.freebsd.dk> Message-ID: <86zln011dz.fsf@ds4.des.no> "Poul-Henning Kamp" writes: > [to install on a disk with 4096-byte sectors] > In the disk menu where you create filesystems, you will need to use the > set the newfs arguments (is it 'N' in the menu ?) to > -b 32768 -f 4096 Doesn't newfs figure this out on its own, based on the parameters reported by ata / cam? DES -- Dag-Erling Sm?rgrav - des@des.no From des at des.no Tue Aug 26 11:20:24 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Aug 26 11:20:33 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <86zln011dz.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Tue, 26 Aug 2008 13:06:32 +0200") References: <35438.1219736614@critter.freebsd.dk> <86zln011dz.fsf@ds4.des.no> Message-ID: <86vdxo10qw.fsf@ds4.des.no> Dag-Erling Sm?rgrav writes: > Doesn't newfs figure this out on its own, based on the parameters > reported by ata / cam? It does, actually: if (sectorsize == 0) if (ioctl(disk.d_fd, DIOCGSECTORSIZE, §orsize) == -1) sectorsize = 0; /* back out on error for safety */ /* ... */ if (fsize <= 0) fsize = MAX(DFL_FRAGSIZE, sectorsize); if (bsize <= 0) bsize = MIN(DFL_BLKSIZE, 8 * fsize); If I read the sysinstall code correctly, it won't specify a block and fragment size unless you actually press 'N' to set custom options - at which point it will initialize them to "-b 16384 -f 2048" (which are the newfs defaults for disks with sector sizes <= 2048) before displaying the dialog box where you can edit them. DES -- Dag-Erling Sm?rgrav - des@des.no From phk at phk.freebsd.dk Tue Aug 26 12:45:56 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Aug 26 12:46:06 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: Your message of "Tue, 26 Aug 2008 13:06:32 +0200." <86zln011dz.fsf@ds4.des.no> Message-ID: <95623.1219754754@critter.freebsd.dk> In message <86zln011dz.fsf@ds4.des.no>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= wr ites: >"Poul-Henning Kamp" writes: >> [to install on a disk with 4096-byte sectors] >> In the disk menu where you create filesystems, you will need to use the >> set the newfs arguments (is it 'N' in the menu ?) to >> -b 32768 -f 4096 > >Doesn't newfs figure this out on its own, based on the parameters >reported by ata / cam? Ohh, actually it might... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From pluknet at gmail.com Tue Aug 26 13:05:50 2008 From: pluknet at gmail.com (pluknet) Date: Tue Aug 26 13:06:06 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <35461.1219736678@critter.freebsd.dk> References: <35461.1219736678@critter.freebsd.dk> Message-ID: 2008/8/26 Poul-Henning Kamp : > In message , plukn > et writes: >>2008/8/26 Sten Daniel Soersdal : >>> >>> Does anyone know if i might run into any surprises if my hdd has 4096 byte >>> sized sectors and not the regular 512, it is low-level formatted that way. > > [...] > >>You should not change the sector size, because it is value of the >>physical parameter on disc >>(typically it's 512 for magnetic discs and 4096 for optical discs), >>not whatever logical value. >>You should better check tuning(7)/tunefs(8) man pages if you want to >>tune up your system. > > This answer has nothing to do with the question asked, and is wrong in > just about every way it can be. > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > I'm talking about on-disk sector-size, not about filesystem's block-size. wbr, pluknet From phk at phk.freebsd.dk Tue Aug 26 13:09:58 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Aug 26 13:10:10 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: Your message of "Tue, 26 Aug 2008 17:05:48 +0400." Message-ID: <95749.1219756196@critter.freebsd.dk> In message , plukne t writes: >2008/8/26 Poul-Henning Kamp : >> This answer has nothing to do with the question asked, and is wrong in >> just about every way it can be. >> >I'm talking about on-disk sector-size, not about filesystem's block-size. I wrote the code that implementes both of those :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From des at des.no Tue Aug 26 14:05:20 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Aug 26 14:05:33 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <86zln011dz.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Tue, 26 Aug 2008 13:06:32 +0200") References: <35438.1219736614@critter.freebsd.dk> <86zln011dz.fsf@ds4.des.no> Message-ID: <86vdxo10qw.fsf@ds4.des.no> Dag-Erling Sm?rgrav writes: > Doesn't newfs figure this out on its own, based on the parameters > reported by ata / cam? It does, actually: if (sectorsize == 0) if (ioctl(disk.d_fd, DIOCGSECTORSIZE, §orsize) == -1) sectorsize = 0; /* back out on error for safety */ /* ... */ if (fsize <= 0) fsize = MAX(DFL_FRAGSIZE, sectorsize); if (bsize <= 0) bsize = MIN(DFL_BLKSIZE, 8 * fsize); If I read the sysinstall code correctly, it won't specify a block and fragment size unless you actually press 'N' to set custom options - at which point it will initialize them to "-b 16384 -f 2048" (which are the newfs defaults for disks with sector sizes <= 2048) before displaying the dialog box where you can edit them. DES -- Dag-Erling Sm?rgrav - des@des.no From des at des.no Tue Aug 26 14:05:20 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Aug 26 14:05:33 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <86zln011dz.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Tue, 26 Aug 2008 13:06:32 +0200") References: <35438.1219736614@critter.freebsd.dk> <86zln011dz.fsf@ds4.des.no> Message-ID: <86vdxo10qw.fsf@ds4.des.no> Dag-Erling Sm?rgrav writes: > Doesn't newfs figure this out on its own, based on the parameters > reported by ata / cam? It does, actually: if (sectorsize == 0) if (ioctl(disk.d_fd, DIOCGSECTORSIZE, §orsize) == -1) sectorsize = 0; /* back out on error for safety */ /* ... */ if (fsize <= 0) fsize = MAX(DFL_FRAGSIZE, sectorsize); if (bsize <= 0) bsize = MIN(DFL_BLKSIZE, 8 * fsize); If I read the sysinstall code correctly, it won't specify a block and fragment size unless you actually press 'N' to set custom options - at which point it will initialize them to "-b 16384 -f 2048" (which are the newfs defaults for disks with sector sizes <= 2048) before displaying the dialog box where you can edit them. DES -- Dag-Erling Sm?rgrav - des@des.no From andrew at modulus.org Tue Aug 26 22:20:36 2008 From: andrew at modulus.org (Andrew Snow) Date: Tue Aug 26 22:20:43 2008 Subject: Poor ZFS prefetch performance Message-ID: <48B47D42.9060307@modulus.org> On latest 8-current with latest patches, I am seeing poor data prefetch hit rate with my workload of rsync on millions of files. ZFS has been quite good apart from this - UFS was much faster at this particular workload. ZFS just seems to be reading many megabytes over and over from the filesystem with little to show for it. Arcstats reports the following: kstat.zfs.misc.arcstats.prefetch_data_hits: 475476 kstat.zfs.misc.arcstats.prefetch_data_misses: 5325057 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 126411515 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 3998476 I don't really want to turn of prefetching completely because it does actually help in many situations. From this I deduce it would be really nice to provide an extra tunable that allows you to turn off prefetching for data but leave it on for metadata. This would be easy to do because data prefetches come in through dmu_zfetch, but metadata comes through dmu_prefetch() in dmu.c - Andrew From alvaro.mtr at googlemail.com Wed Aug 27 06:45:04 2008 From: alvaro.mtr at googlemail.com (Alvaro) Date: Wed Aug 27 06:45:14 2008 Subject: hfs+ in freebsd7.0-stable Message-ID: has anyone compiled succesfully the hfs+ programs in http://people.freebsd.org/~yar/hfs/ in 7.0-stable? From michael at fuckner.net Wed Aug 27 08:23:10 2008 From: michael at fuckner.net (Michael Fuckner) Date: Wed Aug 27 08:23:16 2008 Subject: hfs+ in freebsd7.0-stable In-Reply-To: References: Message-ID: <48B50B5C.1050002@fuckner.net> Alvaro wrote: > has anyone compiled succesfully the hfs+ programs in > http://people.freebsd.org/~yar/hfs/ in 7.0-stable? did you try ports/emulators/hfs or hfsutils? Regards, Michael! From rebehn at ant.uni-bremen.de Wed Aug 27 09:13:40 2008 From: rebehn at ant.uni-bremen.de (Heinrich Rebehn) Date: Wed Aug 27 09:13:47 2008 Subject: Problem with default ACLs and mask In-Reply-To: <20051014092250.D66245@fledge.watson.org> References: <434F4FF8.9050903@ant.uni-bremen.de> <20051014064145.GA40856@admin.sibptus.tomsk.ru> <20051014092250.D66245@fledge.watson.org> Message-ID: <48B5159B.3070506@ant.uni-bremen.de> Robert Watson wrote: > > On Fri, 14 Oct 2005, Victor Sudakov wrote: > >> Heinrich Rebehn wrote: >>> >> >> [dd] >>> Am i doing something wrong here? Why is the mask not propagated? >> >> I am afraid the current umask prevents it. You must set it to >> something like "umask 002" before you create your files or directories >> (the group write bit matters here). > > The problem, so to speak, is that we actually implement what is > described in the POSIX.1e spec. When we did our initial implementation, > the various OS's varied a bit in the semantics they implemented: > > - Solaris implemented umask override if the mask was specified in the > default ACL. > - IRIX implemented the spec. > > Since that time, Linux has turned up and implemented the Solaris model, > and IRIX has switched to the Solaris model also as a result of peer > pressure. I've previouly looked at switching us, but it tears up our > kernel APIs some and will require significant testing. I had hoped to > do this for FreeBSD 6.x but was derailed working on other problems that > needed to be fixed. My hope is to change the default in FreeBSD 7.x. Hi Robert, has there been any progress so far? This might encourage me to move to 7.x (Still on 6.1) Regards, Heinrich -- Heinrich Rebehn University of Bremen Physics / Electrical and Electronics Engineering - Department of Telecommunications - Phone : +49/421/218-4664 Fax : -3341 From pjd at FreeBSD.org Wed Aug 27 11:13:55 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Aug 27 11:14:01 2008 Subject: Sector size of 4096 bytes (not 512) In-Reply-To: <35438.1219736614@critter.freebsd.dk> References: <48B3A11C.6030504@gmail.com> <35438.1219736614@critter.freebsd.dk> Message-ID: <20080827111401.GA1857@garage.freebsd.pl> On Tue, Aug 26, 2008 at 07:43:34AM +0000, Poul-Henning Kamp wrote: > In message <48B3A11C.6030504@gmail.com>, Sten Daniel Soersdal writes: > > > >Does anyone know if i might run into any surprises if my hdd has 4096 > >byte sized sectors and not the regular 512, it is low-level formatted > >that way. > > That's no problem, as long as the sectorsize is a power of two, larger > than or equal to 512, you should have no trouble. Also, sector shouldn't be larger than page size, because once you place UFS on such thing, mmap(2) will stop working, AFAIR. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080827/af49a9cf/attachment.pgp From weldon at excelsus.com Wed Aug 27 20:17:23 2008 From: weldon at excelsus.com (Weldon S Godfrey 3) Date: Wed Aug 27 20:17:30 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080822174411.GA89610@eos.sc1.parodius.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080822115932.M76650@emmett.excelsus.com> <20080822174411.GA89610@eos.sc1.parodius.com> Message-ID: <20080827161150.G76650@emmett.excelsus.com> Well, I am not sure if it was exactly that bug, since the last time I paniced, it never dumped to memory. Although this time i paniced, it started to dump then locked up solid. before I paniced I did get an error message at the end of the panic: Stopped at nfsrv_access at 0x190 testb 0x1, 0xa4 (%rax) I did hard reboot after lockup and tried savecore manually, it said it found no cores. (I did wait a good while and verified the drive lights was not running when I hard rebooted). Although I am guessing at this point I could go back to 7.0-R for testing this bug since it appears to be a problem with NFS and ZFS on either v6 or v11 of the filesystem. I will be happy to help debug the issue with anyone, I just need someone that is more knowledgeable of NFS and/or ZFS than I (which is most) Thanks, Weldon If memory serves me right, sometime around Friday, Jeremy Chadwick told me: > On Fri, Aug 22, 2008 at 12:02:47PM -0400, Weldon S Godfrey 3 wrote: >> Ok, I tried panic, it gave a page of the typical panic page that this >> crash generates under 7.0. I rebooted and no core, so I am missing a >> step. Sorry for being clueless here. > > Then you're probably being bit by what's listed in the below PR. > Supposedly you can do "panic", it should dump memory contents to swap, > then upon rebooting go into single-user mode, "mount -a", then run > savecore. A real PITA, I know, but supposedly it works. > > I can't help with the cause of the actual panic, however; it's outside > of my skillset. > From rmacklem at uoguelph.ca Wed Aug 27 20:45:50 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Wed Aug 27 20:45:57 2008 Subject: ZFS-NFS kernel panic under load In-Reply-To: <20080827161150.G76650@emmett.excelsus.com> References: <20080806101621.H24586@emmett.excelsus.com> <20080814091337.Y94482@emmett.excelsus.com> <20080821153107.W76650@emmett.excelsus.com> <20080821194742.GA19362@eos.sc1.parodius.com> <20080822115932.M76650@emmett.excelsus.com> <20080822174411.GA89610@eos.sc1.parodius.com> <20080827161150.G76650@emmett.excelsus.com> Message-ID: On Wed, 27 Aug 2008, Weldon S Godfrey 3 wrote: > > Well, I am not sure if it was exactly that bug, since the last time I > paniced, it never dumped to memory. Although this time i paniced, it started > to dump then locked up solid. > > before I paniced I did get an error message at the end of the panic: > > > Stopped at nfsrv_access at 0x190 testb 0x1, 0xa4 (%rax) > Well, I'd guess that is the following source line: if (rdonly || (vp->v_mount->mnt_flag & MNT_RDONLY)) { since "mnt_flag" is way down in the structure and MNT_RDONLY == 0x1. This suggests that v_mount is no longer valid, but I have no idea why that might happen. (This was looking at the -current nfsserver sources, but I don't think they've changed much. Maybe someone who knows ZFS and how it handles the mount structure might have some idea? rick From swell.k at gmail.com Fri Aug 29 00:02:55 2008 From: swell.k at gmail.com (swell.k@gmail.com) Date: Fri Aug 29 00:03:02 2008 Subject: ZFS patches. References: <20080727125413.GG1345@garage.freebsd.pl> Message-ID: <86tzd490qx.fsf@gmail.com> (CC'ing Attilio, who made the commits) Pawel Jakub Dawidek writes: > Hi. > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > The patch above contains the most recent ZFS version that could be found > in OpenSolaris as of today. Apart for large amount of new functionality, > I belive there are many stability (and also performance) improvements > compared to the version from the base system. [...] After r182371 and r182383 there are another three rejections. Namely cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h.rej sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.rej sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_replay.c.rej I'm attaching them in case someone has a quick fix or idea how to solve them, especially regarding `+' lines. In the meantime I'm reverting them locally hoping it will not do any harm to me. If this fails then I will stay with r182370 since I already upgraded my pools to 11th version and can't go back easily. -------------- next part -------------- *************** *** 331,374 **** char *v_path; } vnode_t; typedef struct vattr { uint_t va_mask; /* bit-mask of attributes */ u_offset_t va_size; /* file size in bytes */ } vattr_t; - #define AT_TYPE 0x0001 - #define AT_MODE 0x0002 - #define AT_UID 0x0004 - #define AT_GID 0x0008 - #define AT_FSID 0x0010 - #define AT_NODEID 0x0020 - #define AT_NLINK 0x0040 - #define AT_SIZE 0x0080 - #define AT_ATIME 0x0100 - #define AT_MTIME 0x0200 - #define AT_CTIME 0x0400 - #define AT_RDEV 0x0800 - #define AT_BLKSIZE 0x1000 - #define AT_NBLOCKS 0x2000 - #define AT_SEQ 0x8000 #define CRCREAT 0 - #define VOP_CLOSE(vp, f, c, o, cr) 0 - #define VOP_PUTPAGE(vp, of, sz, fl, cr) 0 - #define VOP_GETATTR(vp, vap, fl, cr) ((vap)->va_size = (vp)->v_size, 0) - #define VOP_FSYNC(vp, f, cr) fsync((vp)->v_fd) - #define VN_RELE(vp) vn_close(vp) extern int vn_open(char *path, int x1, int oflags, int mode, vnode_t **vpp, int x2, int x3); extern int vn_openat(char *path, int x1, int oflags, int mode, vnode_t **vpp, - int x2, int x3, vnode_t *vp); extern int vn_rdwr(int uio, vnode_t *vp, void *addr, ssize_t len, offset_t offset, int x1, int x2, rlim64_t x3, void *x4, ssize_t *residp); - extern void vn_close(vnode_t *vp); #define vn_remove(path, x1, x2) remove(path) #define vn_rename(from, to, seg) rename((from), (to)) --- 347,439 ---- char *v_path; } vnode_t; + + typedef struct xoptattr { + timestruc_t xoa_createtime; /* Create time of file */ + uint8_t xoa_archive; + uint8_t xoa_system; + uint8_t xoa_readonly; + uint8_t xoa_hidden; + uint8_t xoa_nounlink; + uint8_t xoa_immutable; + uint8_t xoa_appendonly; + uint8_t xoa_nodump; + uint8_t xoa_settable; + uint8_t xoa_opaque; + uint8_t xoa_av_quarantined; + uint8_t xoa_av_modified; + } xoptattr_t; + typedef struct vattr { uint_t va_mask; /* bit-mask of attributes */ u_offset_t va_size; /* file size in bytes */ } vattr_t; + + typedef struct xvattr { + vattr_t xva_vattr; /* Embedded vattr structure */ + uint32_t xva_magic; /* Magic Number */ + uint32_t xva_mapsize; /* Size of attr bitmap (32-bit words) */ + uint32_t *xva_rtnattrmapp; /* Ptr to xva_rtnattrmap[] */ + uint32_t xva_reqattrmap[XVA_MAPSIZE]; /* Requested attrs */ + uint32_t xva_rtnattrmap[XVA_MAPSIZE]; /* Returned attrs */ + xoptattr_t xva_xoptattrs; /* Optional attributes */ + } xvattr_t; + + typedef struct vsecattr { + uint_t vsa_mask; /* See below */ + int vsa_aclcnt; /* ACL entry count */ + void *vsa_aclentp; /* pointer to ACL entries */ + int vsa_dfaclcnt; /* default ACL entry count */ + void *vsa_dfaclentp; /* pointer to default ACL entries */ + size_t vsa_aclentsz; /* ACE size in bytes of vsa_aclentp */ + } vsecattr_t; + + #define AT_TYPE 0x00001 + #define AT_MODE 0x00002 + #define AT_UID 0x00004 + #define AT_GID 0x00008 + #define AT_FSID 0x00010 + #define AT_NODEID 0x00020 + #define AT_NLINK 0x00040 + #define AT_SIZE 0x00080 + #define AT_ATIME 0x00100 + #define AT_MTIME 0x00200 + #define AT_CTIME 0x00400 + #define AT_RDEV 0x00800 + #define AT_BLKSIZE 0x01000 + #define AT_NBLOCKS 0x02000 + #define AT_SEQ 0x08000 + #define AT_XVATTR 0x10000 #define CRCREAT 0 + #define VOP_CLOSE(vp, f, c, o, cr, ct) 0 + #define VOP_PUTPAGE(vp, of, sz, fl, cr, ct) 0 + #define VOP_GETATTR(vp, vap, cr, td) ((vap)->va_size = (vp)->v_size, 0) + + #define VOP_FSYNC(vp, f, cr, ct) fsync((vp)->v_fd) + #define VN_RELE(vp) vn_close(vp, 0, NULL, NULL) + #define vn_lock(vp, type) + #define VOP_UNLOCK(vp, type) + #ifdef VFS_LOCK_GIANT + #undef VFS_LOCK_GIANT + #endif + #define VFS_LOCK_GIANT(mp) 0 + #ifdef VFS_UNLOCK_GIANT + #undef VFS_UNLOCK_GIANT + #endif + #define VFS_UNLOCK_GIANT(vfslocked) extern int vn_open(char *path, int x1, int oflags, int mode, vnode_t **vpp, int x2, int x3); extern int vn_openat(char *path, int x1, int oflags, int mode, vnode_t **vpp, + int x2, int x3, vnode_t *vp, int fd); extern int vn_rdwr(int uio, vnode_t *vp, void *addr, ssize_t len, offset_t offset, int x1, int x2, rlim64_t x3, void *x4, ssize_t *residp); + extern void vn_close(vnode_t *vp, int openflag, cred_t *cr, kthread_t *td); #define vn_remove(path, x1, x2) remove(path) #define vn_rename(from, to, seg) rename((from), (to)) -------------- next part -------------- *************** *** 81,91 **** } #endif /* * Determine the physical size of the file. */ vattr.va_mask = AT_SIZE; - error = VOP_GETATTR(vp, &vattr, 0, kcred); if (error) { vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; return (error); --- 81,110 ---- } #endif + return (0); + } + + static int + vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift) + { + vdev_file_t *vf; + vattr_t vattr; + vnode_t *vp; + int error; + + if ((error = vdev_file_open_common(vd)) != 0) + return (error); + + vf = vd->vdev_tsd; + vp = vf->vf_vnode; + /* * Determine the physical size of the file. */ vattr.va_mask = AT_SIZE; + vn_lock(vp, LK_SHARED | LK_RETRY); + error = VOP_GETATTR(vp, &vattr, kcred, curthread); + VOP_UNLOCK(vp, 0); if (error) { vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; return (error); -------------- next part -------------- *************** *** 352,386 **** return (error); } - zfs_init_vattr(&va, lr->lr_mask, lr->lr_mode, lr->lr_uid, lr->lr_gid, 0, lr->lr_foid); - va.va_size = lr->lr_size; - ZFS_TIME_DECODE(&va.va_atime, lr->lr_atime); - ZFS_TIME_DECODE(&va.va_mtime, lr->lr_mtime); vp = ZTOV(zp); vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); - error = VOP_SETATTR(vp, &va, kcred, curthread); VOP_UNLOCK(vp, 0); VN_RELE(vp); return (error); } static int - zfs_replay_acl(zfsvfs_t *zfsvfs, lr_acl_t *lr, boolean_t byteswap) { ace_t *ace = (ace_t *)(lr + 1); /* ace array follows lr_acl_t */ #ifdef TODO vsecattr_t vsa; - #endif znode_t *zp; int error; if (byteswap) { byteswap_uint64_array(lr, sizeof (*lr)); - zfs_ace_byteswap(ace, lr->lr_aclcnt); } if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) { --- 766,877 ---- return (error); } + zfs_init_vattr(vap, lr->lr_mask, lr->lr_mode, lr->lr_uid, lr->lr_gid, 0, lr->lr_foid); + vap->va_size = lr->lr_size; + ZFS_TIME_DECODE(&vap->va_atime, lr->lr_atime); + ZFS_TIME_DECODE(&vap->va_mtime, lr->lr_mtime); + + /* + * Fill in xvattr_t portions if necessary. + */ + + start = (lr_setattr_t *)(lr + 1); + if (vap->va_mask & AT_XVATTR) { + zfs_replay_xvattr((lr_attr_t *)start, &xva); + start = (caddr_t)start + + ZIL_XVAT_SIZE(((lr_attr_t *)start)->lr_attr_masksize); + } else + xva.xva_vattr.va_mask &= ~AT_XVATTR; + + zfsvfs->z_fuid_replay = zfs_replay_fuid_domain(start, &start, + lr->lr_uid, lr->lr_gid); vp = ZTOV(zp); vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); + error = VOP_SETATTR(vp, vap, kcred, curthread); VOP_UNLOCK(vp, 0); + + zfs_fuid_info_free(zfsvfs->z_fuid_replay); + zfsvfs->z_fuid_replay = NULL; VN_RELE(vp); return (error); } static int + zfs_replay_acl_v0(zfsvfs_t *zfsvfs, lr_acl_v0_t *lr, boolean_t byteswap) { ace_t *ace = (ace_t *)(lr + 1); /* ace array follows lr_acl_t */ + vsecattr_t vsa; + znode_t *zp; + int error; + + if (byteswap) { + byteswap_uint64_array(lr, sizeof (*lr)); + zfs_oldace_byteswap(ace, lr->lr_aclcnt); + } + + if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) { + /* + * As we can log acls out of order, it's possible the + * file has been removed. In this case just drop the acl + * and return success. + */ + if (error == ENOENT) + error = 0; + return (error); + } + + bzero(&vsa, sizeof (vsa)); + vsa.vsa_mask = VSA_ACE | VSA_ACECNT; + vsa.vsa_aclcnt = lr->lr_aclcnt; + vsa.vsa_aclentsz = sizeof (ace_t) * vsa.vsa_aclcnt; + vsa.vsa_aclflags = 0; + vsa.vsa_aclentp = ace; + #ifdef TODO + error = VOP_SETSECATTR(ZTOV(zp), &vsa, 0, kcred, NULL); + #else + panic("%s:%u: unsupported condition", __func__, __LINE__); + #endif + + VN_RELE(ZTOV(zp)); + + return (error); + } + + /* + * Replaying ACLs is complicated by FUID support. + * The log record may contain some optional data + * to be used for replaying FUID's. These pieces + * are the actual FUIDs that were created initially. + * The FUID table index may no longer be valid and + * during zfs_create() a new index may be assigned. + * Because of this the log will contain the original + * doman+rid in order to create a new FUID. + * + * The individual ACEs may contain an ephemeral uid/gid which is no + * longer valid and will need to be replaced with an actual FUID. + * + */ + static int + zfs_replay_acl(zfsvfs_t *zfsvfs, lr_acl_t *lr, boolean_t byteswap) + { + ace_t *ace = (ace_t *)(lr + 1); vsecattr_t vsa; znode_t *zp; int error; if (byteswap) { byteswap_uint64_array(lr, sizeof (*lr)); + zfs_ace_byteswap(ace, lr->lr_acl_bytes, B_FALSE); + if (lr->lr_fuidcnt) { + byteswap_uint64_array((caddr_t)ace + + ZIL_ACE_LENGTH(lr->lr_acl_bytes), + lr->lr_fuidcnt * sizeof (uint64_t)); + } } if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) { From DarioP at WebNX.com Fri Aug 29 03:36:04 2008 From: DarioP at WebNX.com (Dario Perovich) Date: Fri Aug 29 03:36:10 2008 Subject: move data pool to pool in same tank, free space not recovered. Message-ID: <48B7693D.2060001@WebNX.com> Hello everyone, I used mc-light to try to move about 3.6tb of data from /storage to /storage2 in the same tank. I had about 1.2tb of space free, it copied 1.2tb of data and then ran out of room. /storage shows 2.4tb of data and /storage2 shows 1.2tb but now there?s only 460 megs of free space. I deleted a random file and space is now 1.1gb. The pool/tank was created with 7.0-RELEASE and imported into 8.0-CURRENT-200806 (since 07 wont install and 08 has no amd64). Also yes, I have confirmed the data is no longer, at least ls?d, in /storage For those curious why I?m doing this, I?ve noticed issues with /storage that scrub hasn?t caught so I thought I?d see if creating a fresh pool would be a crude work around, just to run across this. Dario ps. sorry if theres a dupe on the list, I sent from the wrong addr before wasn't sure if it hit the list. From dariop at corp.webnx.com Fri Aug 29 04:04:51 2008 From: dariop at corp.webnx.com (Dario Perovich) Date: Fri Aug 29 04:04:57 2008 Subject: move data pool to pool in same tank, free space not recovered. Message-ID: <3C412955B44C7B43BBB442F61610484435D1DAB1@SBS2K8P01.WEBNX.local> Hello everyone, I used mc-light to try to move about 3.6tb of data from /storage to /storage2 in the same tank. I had about 1.2tb of space free, it copied 1.2tb of data and then ran out of room. /storage shows 2.4tb of data and /storage2 shows 1.2tb but now there's only 460 megs of free space. I deleted a random file and space is now 1.1gb. The pool/tank was created with 7.0-RELEASE and imported into 8.0-CURRENT-200806 (since 07 wont install and 08 has no amd64). Also yes, I have confirmed the data is no longer, at least ls'd, in /storage For those curious why I'm doing this, I've noticed issues with /storage that scrub hasn't caught so I thought I'd see if creating a fresh pool would be a crude work around, just to find this :) Dario From pjd at FreeBSD.org Fri Aug 29 07:47:30 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Fri Aug 29 07:47:36 2008 Subject: ZFS patches. In-Reply-To: <86tzd490qx.fsf@gmail.com> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> Message-ID: <20080829074738.GB3026@garage.freebsd.pl> On Fri, Aug 29, 2008 at 03:29:58AM +0400, swell.k@gmail.com wrote: > (CC'ing Attilio, who made the commits) > > Pawel Jakub Dawidek writes: > > > Hi. > > > > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > > > The patch above contains the most recent ZFS version that could be found > > in OpenSolaris as of today. Apart for large amount of new functionality, > > I belive there are many stability (and also performance) improvements > > compared to the version from the base system. > [...] > > After r182371 and r182383 there are another three rejections. Namely > cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h.rej > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.rej > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_replay.c.rej > I'm attaching them in case someone has a quick fix or idea how > to solve them, especially regarding `+' lines. > > In the meantime I'm reverting them locally hoping it will not do any > harm to me. If this fails then I will stay with r182370 since I already > upgraded my pools to 11th version and can't go back easily. There are some rejections, I know, and I'm tracking everything in perforce. In the meantime there were two ZFS version bumps in OpenSOlaris (so I've 13 in perforce at the moment). I probably won't create new patch, but just commit what I've to HEAD. In the meantime also I fixes quite a few bugs, mostly reported by kris@. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080829/4c5737dd/attachment.pgp From rwatson at FreeBSD.org Fri Aug 29 07:55:32 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Fri Aug 29 07:55:39 2008 Subject: Problem with default ACLs and mask In-Reply-To: <48B5159B.3070506@ant.uni-bremen.de> References: <434F4FF8.9050903@ant.uni-bremen.de> <20051014064145.GA40856@admin.sibptus.tomsk.ru> <20051014092250.D66245@fledge.watson.org> <48B5159B.3070506@ant.uni-bremen.de> Message-ID: On Wed, 27 Aug 2008, Heinrich Rebehn wrote: >> Since that time, Linux has turned up and implemented the Solaris model, and >> IRIX has switched to the Solaris model also as a result of peer pressure. >> I've previouly looked at switching us, but it tears up our kernel APIs some >> and will require significant testing. I had hoped to do this for FreeBSD >> 6.x but was derailed working on other problems that needed to be fixed. >> My hope is to change the default in FreeBSD 7.x. > > has there been any progress so far? This might encourage me to move to 7.x > (Still on 6.1) Dear Heinrich: Unfortunately, we've not yet made this change, as it's somewhat disruptive on the in-kernel interfaces for VFS. I'd like to see it made for 8.x; on a similar note, we had a Google Summer of Code project to implement support for NT/NFSv4-style ACLs that has made some excellent headway, so we may aso be able to ship support for that in 8.0 as well. Robert N M Watson Computer Laboratory University of Cambridge From peter.schuller at infidyne.com Fri Aug 29 14:57:39 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri Aug 29 14:57:52 2008 Subject: move data pool to pool in same tank, free space not recovered. In-Reply-To: <3C412955B44C7B43BBB442F61610484435D1DAB1@SBS2K8P01.WEBNX.local> References: <3C412955B44C7B43BBB442F61610484435D1DAB1@SBS2K8P01.WEBNX.local> Message-ID: <20080829145736.GA51262@hyperion.scode.org> > I used mc-light to try to move about 3.6tb of data from /storage to /storage2 in the same tank. I had about 1.2tb of space free, it copied 1.2tb of data and then ran out of room. /storage shows 2.4tb of data and /storage2 shows 1.2tb but now there's only 460 megs of free space. I deleted a random file and space is now 1.1gb. The pool/tank was created with 7.0-RELEASE and imported into 8.0-CURRENT-200806 (since 07 wont install and 08 has no amd64). Also yes, I have confirmed the data is no longer, at least ls'd, in /storage I assume you are *moving* the files and not copying? It may be obvious but: Do you have snapshots of the file system on which your deletes are not recoverying space? -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080829/a25760bb/attachment.pgp From dariop at webnx.com Fri Aug 29 15:13:04 2008 From: dariop at webnx.com (dariop@webnx.com) Date: Fri Aug 29 15:13:10 2008 Subject: move data pool to pool in same tank, free space not recovered. In-Reply-To: <20080829145736.GA51262@hyperion.scode.org> References: <20080829145736.GA51262@hyperion.scode.org> Message-ID: <8e31ea7a985c341d601fa4c5dcb66d23@localhost> On Fri, 29 Aug 2008 16:57:36 +0200, Peter Schuller wrote: >> I used mc-light to try to move about 3.6tb of data from /storage to > /storage2 in the same tank. I had about 1.2tb of space free, it copied > 1.2tb of data and then ran out of room. /storage shows 2.4tb of data and > /storage2 shows 1.2tb but now there's only 460 megs of free space. I > deleted a random file and space is now 1.1gb. The pool/tank was created > with 7.0-RELEASE and imported into 8.0-CURRENT-200806 (since 07 wont > install and 08 has no amd64). Also yes, I have confirmed the data is no > longer, at least ls'd, in /storage > > I assume you are *moving* the files and not copying? > > It may be obvious but: Do you have snapshots of the file system on > which your deletes are not recoverying space? > I actually don't use snapshots on it. All the data that shows up in /storage2 is gone from /storage as far as I can see. This was a simple highlight many directories and then move function in midnight commander. Nothing exotic except for how much data was being done. This is a very simple pool, nothing fancy all default, just a file dump for the most part. I can't figure out how to recover the data, a scrub locked the pool randomly, no error, just can't get in or ctrl-c/z the ls. It only happens on /storage which is why I wanted to first test moving things out to /storage2 to see if that made a difference. Dario From doug at polands.org Fri Aug 29 16:39:59 2008 From: doug at polands.org (Doug Poland) Date: Fri Aug 29 16:40:05 2008 Subject: ZFS newbie: pools/mounts not surviving reboots? Message-ID: <932083ec073ad86a324f24b8725a8477.squirrel@email.polands.org> Hello, I'm a newbie when it comes to ZFS and am doing some experimentation to get up-to-speed. I've been using the man pages, wiki and freebsd-fs for reference. Right now I'm running in a VM on 7.x i386. I've created a pool with the intention of housing /var : # zpool create zfs01 da7 # zfs create zfs01/var # dump -0aLC8 -f- /var | ( cd /zfs01/var && restore -rf- ) # zfs set mountpoint=/var zfs01/var # df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 128M 782M 14% / devfs 1.0K 1.0K 0B 100% /dev /dev/da3s1d 989M 468K 910M 0% /var /dev/da4s1d 989M 12K 910M 0% /tmp /dev/da5s1d 989M 390M 520M 43% /usr /dev/da6s1d 989M 180K 910M 0% /home zfs01 983M 128K 983M 0% /zfs01 zfs01/var 984M 640K 983M 0% /zfs01/var fbsd7xvm# zfs set mountpoint=/var zfs01/var fbsd7xvm# mount /dev/mirror/gm0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/da3s1d on /var (ufs, local, soft-updates) /dev/da4s1d on /tmp (ufs, local, soft-updates) /dev/da5s1d on /usr (ufs, local, soft-updates) /dev/da6s1d on /home (ufs, local, soft-updates) zfs01 on /zfs01 (zfs, local) zfs01/var on /var (zfs, local) Ok, so far so good. My ZFS pool is mounted under /var. However, after I reboot, the pool is no longer mounted # df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 129M 781M 14% / devfs 1.0K 1.0K 0B 100% /dev /dev/da4s1d 989M 12K 910M 0% /tmp /dev/da5s1d 989M 390M 520M 43% /usr /dev/da6s1d 989M 184K 910M 0% /home # zfs list NAME USED AVAIL REFER MOUNTPOINT zfs01 648K 983M 18K /zfs01 zfs01/var 528K 983M 528K /var So what am I doing wrong? -- Regards, Doug From freebsd-listen at fabiankeil.de Fri Aug 29 16:51:37 2008 From: freebsd-listen at fabiankeil.de (Fabian Keil) Date: Fri Aug 29 16:51:43 2008 Subject: ZFS newbie: pools/mounts not surviving reboots? In-Reply-To: <932083ec073ad86a324f24b8725a8477.squirrel@email.polands.org> References: <932083ec073ad86a324f24b8725a8477.squirrel@email.polands.org> Message-ID: <20080829185127.7ebe0b0e@fabiankeil.de> "Doug Poland" wrote: > I'm a newbie when it comes to ZFS and am doing some experimentation to > get up-to-speed. I've been using the man pages, wiki and freebsd-fs > for reference. > > Right now I'm running in a VM on 7.x i386. I've created a pool with > the intention of housing /var : > > # zpool create zfs01 da7 > # zfs create zfs01/var > # dump -0aLC8 -f- /var | ( cd /zfs01/var && restore -rf- ) > # zfs set mountpoint=/var zfs01/var > # df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/mirror/gm0s1a 989M 128M 782M 14% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da3s1d 989M 468K 910M 0% /var > /dev/da4s1d 989M 12K 910M 0% /tmp > /dev/da5s1d 989M 390M 520M 43% /usr > /dev/da6s1d 989M 180K 910M 0% /home > zfs01 983M 128K 983M 0% /zfs01 > zfs01/var 984M 640K 983M 0% /zfs01/var > fbsd7xvm# zfs set mountpoint=/var zfs01/var > fbsd7xvm# mount > /dev/mirror/gm0s1a on / (ufs, local) > devfs on /dev (devfs, local) > /dev/da3s1d on /var (ufs, local, soft-updates) > /dev/da4s1d on /tmp (ufs, local, soft-updates) > /dev/da5s1d on /usr (ufs, local, soft-updates) > /dev/da6s1d on /home (ufs, local, soft-updates) > zfs01 on /zfs01 (zfs, local) > zfs01/var on /var (zfs, local) > > > Ok, so far so good. My ZFS pool is mounted under /var. However, > after I reboot, the pool is no longer mounted > > # df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/mirror/gm0s1a 989M 129M 781M 14% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da4s1d 989M 12K 910M 0% /tmp > /dev/da5s1d 989M 390M 520M 43% /usr > /dev/da6s1d 989M 184K 910M 0% /home > > # zfs list > NAME USED AVAIL REFER MOUNTPOINT > zfs01 648K 983M 18K /zfs01 > zfs01/var 528K 983M 528K /var > > > So what am I doing wrong? Did you add 'zfs_enable="YES"' to /etc/rc.conf? Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080829/93221c79/signature.pgp From morganw at chemikals.org Sat Aug 30 12:20:00 2008 From: morganw at chemikals.org (Wes Morgan) Date: Sat Aug 30 12:20:07 2008 Subject: ZFS Advice In-Reply-To: References: <200808090020.04315.peter.schuller@infidyne.com> <18588.64214.354495.804458@almost.alerce.com> <200808090917.34149.peter.schuller@infidyne.com> Message-ID: On Sat, 9 Aug 2008, Wes Morgan wrote: > On Sat, 9 Aug 2008, Peter Schuller wrote: > >>> Or, it could be that a UIO slot is specifically a PCI-Ex8 slot? You >>> can buy risers that convert "1U PCI-E (x16) to 1 UIO and 1 PCI-E " >> >> http://www.supermicro.com/products/nfo/UIO_cards.cfm >> >> This one actually says in the clear: >> >> "8-Lane PCI-Express interface (Supermicro UIO slot)" >> >> My guess is one of the following: >> >> * It means nothing but "PCI-E", and the UIO stuff is just marketing BS. >> >> * It means "extra PCI-E", and the UIO stuff is just marketing BS. >> >> * (Based on the grapical animation on the UIO page) They actually do have a >> special slot on their motherboard for use with their special UIO card which >> then provides a few extra PCI-E slots. The mentioning of UIO on pages >> describing standard PCI-E cards is just marketing BS resulting from the >> technically correct fact that they can be used with their UIO card. >> >> If I don't find specific information to the contrary I'll probably chance >> it >> and see. > > As you say, it's hard to tell from the SuperMicro page. LSI has this card: > > http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html > http://www.newegg.com/Product/Product.aspx?Item=N82E16816118092 > > Which is their official card with the same chipset. The slots on both the > SuperMicro and the LSI cards look mighty similar. The only obvious difference > is the little right-angle thing next to the PCI-E interface. I'm very > interested to see if it works out. If I wasn't going to be away for a few > weeks I'd try to abuse CDW's return policy to give it a test. Has anyone, by chance, tried one of those SuperMicro UIO cards in a PCIe slot yet? If not, I'm going to give it a try unless someone is sure it won't work. From peter.schuller at infidyne.com Sat Aug 30 14:32:07 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sat Aug 30 14:32:14 2008 Subject: ZFS Advice In-Reply-To: References: <200808090020.04315.peter.schuller@infidyne.com> <18588.64214.354495.804458@almost.alerce.com> <200808090917.34149.peter.schuller@infidyne.com> Message-ID: <20080830143205.GA6244@hyperion.scode.org> > Has anyone, by chance, tried one of those SuperMicro UIO cards in a PCIe > slot yet? If not, I'm going to give it a try unless someone is sure it > won't work. I talked to someone off-list who said they are somehow reversed and won't fit into normal PCI-E slots. In addition he pointed out that most non-server boards don't have PCI-e 8x slots even if that weren't the case. On the plus side, he said he booted a FreeBSD 7 CD on a machine with a SA8204ELP card, and it did detect the drives (but not RAID volumes), indicating it should work as a plain controller. This is probably what I'll be going for. Previously I had not been able to get any information on whether it would work. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080830/3cfbfb85/attachment.pgp From koitsu at FreeBSD.org Sat Aug 30 16:33:05 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Sat Aug 30 16:33:16 2008 Subject: ZFS Advice In-Reply-To: References: <200808090020.04315.peter.schuller@infidyne.com> <18588.64214.354495.804458@almost.alerce.com> <200808090917.34149.peter.schuller@infidyne.com> Message-ID: <20080830161703.GA6133@icarus.home.lan> On Sat, Aug 30, 2008 at 07:19:53AM -0500, Wes Morgan wrote: > On Sat, 9 Aug 2008, Wes Morgan wrote: > >> On Sat, 9 Aug 2008, Peter Schuller wrote: >> >>>> Or, it could be that a UIO slot is specifically a PCI-Ex8 slot? You >>>> can buy risers that convert "1U PCI-E (x16) to 1 UIO and 1 PCI-E " >>> >>> http://www.supermicro.com/products/nfo/UIO_cards.cfm >>> >>> This one actually says in the clear: >>> >>> "8-Lane PCI-Express interface (Supermicro UIO slot)" >>> >>> My guess is one of the following: >>> >>> * It means nothing but "PCI-E", and the UIO stuff is just marketing BS. >>> >>> * It means "extra PCI-E", and the UIO stuff is just marketing BS. >>> >>> * (Based on the grapical animation on the UIO page) They actually do have a >>> special slot on their motherboard for use with their special UIO card which >>> then provides a few extra PCI-E slots. The mentioning of UIO on pages >>> describing standard PCI-E cards is just marketing BS resulting from the >>> technically correct fact that they can be used with their UIO card. >>> >>> If I don't find specific information to the contrary I'll probably >>> chance it >>> and see. >> >> As you say, it's hard to tell from the SuperMicro page. LSI has this card: >> >> http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html >> http://www.newegg.com/Product/Product.aspx?Item=N82E16816118092 >> >> Which is their official card with the same chipset. The slots on both >> the SuperMicro and the LSI cards look mighty similar. The only obvious >> difference is the little right-angle thing next to the PCI-E interface. >> I'm very interested to see if it works out. If I wasn't going to be >> away for a few weeks I'd try to abuse CDW's return policy to give it a >> test. > > Has anyone, by chance, tried one of those SuperMicro UIO cards in a PCIe > slot yet? If not, I'm going to give it a try unless someone is sure it > won't work. I don't think it'll work. If you check Supermicro's site under the Accessory section, you can find some of their riser cards which convert the UIO slots into either PCI-X or PCIe depending upon what you want/need. Searching Google for higher-quality images of the risers indicates that the wiring/pinout is indeed different/custom, which is why the riser is needed. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From ler at lerctr.org Sat Aug 30 20:35:20 2008 From: ler at lerctr.org (Larry Rosenman) Date: Sat Aug 30 20:35:26 2008 Subject: ZFS patches. In-Reply-To: <20080829074738.GB3026@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> <20080829074738.GB3026@garage.freebsd.pl> Message-ID: <20080830153311.T32295@borg> On Fri, 29 Aug 2008, Pawel Jakub Dawidek wrote: > On Fri, Aug 29, 2008 at 03:29:58AM +0400, swell.k@gmail.com wrote: >> (CC'ing Attilio, who made the commits) >> >> Pawel Jakub Dawidek writes: >> >>> Hi. >>> >>> http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 > > There are some rejections, I know, and I'm tracking everything in > perforce. In the meantime there were two ZFS version bumps in > OpenSOlaris (so I've 13 in perforce at the moment). I probably won't > create new patch, but just commit what I've to HEAD. In the meantime > also I fixes quite a few bugs, mostly reported by kris@. Do you have a time frame for the commit to HEAD? (the current patchset and manual fix for the kern_jail.c include of sys/osd.h is working well, but I'm concerned about getting out of sync easily). Thanks! > > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From rmacklem at uoguelph.ca Sat Aug 30 23:10:52 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sat Aug 30 23:10:58 2008 Subject: Is curthread always valid in a VOP call? Message-ID: Since VOP_GETATTR() and VOP_SETATTR() lost the thread argument in -current, I did the obvious and used "curthread" instead. Is this safe to do? rick From kostikbel at gmail.com Sun Aug 31 06:42:31 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sun Aug 31 06:42:37 2008 Subject: Is curthread always valid in a VOP call? In-Reply-To: References: Message-ID: <20080831061218.GJ2038@deviant.kiev.zoral.com.ua> On Sat, Aug 30, 2008 at 07:23:07PM -0400, Rick Macklem wrote: > Since VOP_GETATTR() and VOP_SETATTR() lost the thread argument in > -current, I did the obvious and used "curthread" instead. Is this > safe to do? Yes. Does the change forced you to use curthread often ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080831/43e3755d/attachment.pgp From rmacklem at uoguelph.ca Sun Aug 31 18:46:16 2008 From: rmacklem at uoguelph.ca (Rick Macklem) Date: Sun Aug 31 18:46:23 2008 Subject: Is curthread always valid in a VOP call? In-Reply-To: <20080831061218.GJ2038@deviant.kiev.zoral.com.ua> References: <20080831061218.GJ2038@deviant.kiev.zoral.com.ua> Message-ID: On Sun, 31 Aug 2008, Kostik Belousov wrote: > On Sat, Aug 30, 2008 at 07:23:07PM -0400, Rick Macklem wrote: >> Since VOP_GETATTR() and VOP_SETATTR() lost the thread argument in >> -current, I did the obvious and used "curthread" instead. Is this >> safe to do? > > Yes. Does the change forced you to use curthread often ? > Ok, thanks. How often? 7 (or 2 if I "struct thread *td = curthread; at the beginning of nfs_getattr() and nfs_setattr() like the vanilla nfsclient has done). Basically, any NFS VOP is going to end up doing an RPC sooner or later (if it's lucky, it hits a cache, but...) and the RPC currently likes to have a thread/proc pointer so it can check for termination signals for interruptible mounts. Personally, I'm not fond of interruptible mounts (they're hard to get right and almost impossible to do correctly for NFS4) and prefer hard mounts + forced dismounts when a server is dead. But, I'm not sure others would be ready to get rid of them. So, until interruptible mounts go away, I can't see avoiding a thread pointer for the RPC and that means either pass it down the calling chain or use curthread at some point. (Doesn't matter to me which it is, so long as it works.) rick