From peterjeremy at optushome.com.au Sat Nov 1 02:39:07 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Sat Nov 1 02:39:15 2008 Subject: ZFS patches. In-Reply-To: <20080829074738.GB3026@garage.freebsd.pl> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> <20080829074738.GB3026@garage.freebsd.pl> Message-ID: <20081031201814.GA54286@server.vk2pj.dyndns.org> Hi Pawel, On 2008-Aug-28 20:47:30 +0000, Pawel Jakub Dawidek wrote: >On Fri, Aug 29, 2008 at 03:29:58AM +0400, swell.k@gmail.com wrote: >> (CC'ing Attilio, who made the commits) >> >> Pawel Jakub Dawidek writes: >> >> > Hi. >> > >> > http://people.freebsd.org/~pjd/patches/zfs_20080727.patch.bz2 >> > >> > The patch above contains the most recent ZFS version that could be found >> > in OpenSolaris as of today. Apart for large amount of new functionality, >> > I belive there are many stability (and also performance) improvements >> > compared to the version from the base system. >> [...] >> >> After r182371 and r182383 there are another three rejections. Namely >> cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h.rej >> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.rej >> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_replay.c.rej >> I'm attaching them in case someone has a quick fix or idea how >> to solve them, especially regarding `+' lines. >> >> In the meantime I'm reverting them locally hoping it will not do any >> harm to me. If this fails then I will stay with r182370 since I already >> upgraded my pools to 11th version and can't go back easily. > >There are some rejections, I know, and I'm tracking everything in >perforce. In the meantime there were two ZFS version bumps in >OpenSOlaris (so I've 13 in perforce at the moment). I probably won't >create new patch, but just commit what I've to HEAD. In the meantime >also I fixes quite a few bugs, mostly reported by kris@. It's now somewhat over two months later and the latest ZFS patchset still appears to be zfs_20080727.patch.bz2. Unfortunately, these patches no longer apply to -current (I tried working through the rejects but must have missed something because I wound up with an unkillable running process). Can you please give us an indication as to when we might expect to see either an updated set of ZFS patches (or, better, the patches committed to -current). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081101/e3a0dd4f/attachment.pgp From glavoie at gmail.com Sat Nov 1 16:16:13 2008 From: glavoie at gmail.com (Gabriel Lavoie) Date: Sat Nov 1 16:16:19 2008 Subject: FreeBSD 7.0: gjournal on root filesystem problems Message-ID: Hello, I'm currently making some test to know how to setup a new home fileserver using two 500GB hard drives. I want to create a gmirror/gjournal setup on the complete filesystem. I've been able to setup everything and it works well. Now the problem I have is with the failure test. I create a file with random data on the / filesystem using "dd" and while whe file is being created, I hit the reset button of the computer. Now, it won't boot anymore... I get the following message: GEOM_MIRROR: Device mirror/gm launched (2/2) GEOM_JOURNAL: Journal 3672855181: mirror/gma contains data. GEOM_JOURNAL: Journal 3672855181: mirror/gma contains journal. GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains data. GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains journal. GEOM_JOURNAL: Journal mirror/gmd consistent. Trying to mount root from ufs:/dev/mirror/gma.journal Manual root filesystem specification: : Mount using filesystem eg. ufs:da0s1a ? List valid disk boot devices Abort manual input mountroot> ? List of GEOM managed disk devices: mirror/gmd.journal mirror/gmd mirror/gmc mirror/gma mirror/gm ad10s1c ad10s1b ad8s1c ad8s1b ad10s2 ad10s1 ad8s1 ad10 ad8 acd0 If I boot on the FixIt CD and I try to mount the device, it tells me that it isn't clean and that I must run fsck on it. I do it then I try to mount it again, everything is now OK and now the system will boot again. If I do the same thing on my /usr filesystem (writing random data then reset), there is no problem. The difference is that the GEOM_JOURNAL messages on boot tells me that the journal is consistent instead of clean. When the journalized filesystem isn't clean, I also noticed that "gjournal list" gives me the following line on the provider: Mode: r0w0e0 instead of this line: Mode: r1w1e1 Is it possible that the kernel becomes unable to mount the filesystem read-only to access fsck then remount it read-write to boot the rest of the system when this happens? Here is my configuration. Take note that this setup isn't optimal and I'm making it for testing purpose before making the real thing: /etc/fstab: # Device Mountpoint FStype Options Dump Pass# /dev/ad10s1b none swap sw 0 0 /dev/ad8s1b none swap sw 0 0 /dev/mirror/gma.journal / ufs rw,async 1 1 /dev/mirror/gmd.journal /usr ufs rw,async 2 2 /dev/acd0 /cdrom cd9660 ro,noauto 0 0 /boot/loader.conf geom_mirror_load="YES" geom_journal_load="YES" headless# gmirror status Name Status Components mirror/gm COMPLETE ad8s2 ad10s2 headless# gmirror list Geom name: gm State: COMPLETE Components: 2 Balance: split Slice: 4096 Flags: NOFAILSYNC GenID: 0 SyncID: 1 ID: 1602936318 Providers: 1. Name: mirror/gm Mediasize: 4293595648 (4.0G) Sectorsize: 512 Mode: r2w2e3 Consumers: 1. Name: ad8s2 Mediasize: 4293596160 (4.0G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: HARDCODED GenID: 0 SyncID: 1 ID: 2559388791 2. Name: ad10s2 Mediasize: 4293596160 (4.0G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: HARDCODED GenID: 0 SyncID: 1 ID: 1481264275 headless# gjournal status Name Status Components mirror/gma.journal N/A mirror/gma mirror/gmd.journal N/A mirror/gmd headless# gjournal list Geom name: gjournal 3672855181 ID: 3672855181 Providers: 1. Name: mirror/gma.journal Mediasize: 1073741312 (1.0G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/gma Mediasize: 2147483648 (2.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 2147483136 Jstart: 1073741312 Role: Data,Journal Geom name: gjournal 3868799910 ID: 3868799910 Providers: 1. Name: mirror/gmd.journal Mediasize: 1072361472 (1.0G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/gmd Mediasize: 2146103808 (2.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 2146103296 Jstart: 1072361472 Role: Data,Journal headless# mount /dev/mirror/gma.journal on / (ufs, asynchronous, local, gjournal) devfs on /dev (devfs, local) /dev/mirror/gmd.journal on /usr (ufs, asynchronous, local, gjournal) Thanks Gabriel Lavoie -- Gabriel Lavoie glavoie@gmail.com From freebsd at edvax.de Sat Nov 1 21:23:10 2008 From: freebsd at edvax.de (Polytropon) Date: Sat Nov 1 21:23:18 2008 Subject: Repairing a defective UFS 2 partition with fsck_ffs (or other means) Message-ID: <20081102050601.9fccb80f.freebsd@edvax.de> Dear list, I need your help in order to solve one of the strangest and most complicated problems existing in this universe. First of all I'd like to mention that I'm using FreeBSD nearly exclusively (along with Solaris and other UNIXes) for many years and I never had any problem similar to this. In fact, I never had *any* problem that required external help. But now, I'm lost. I don't know what to try, so I would be glad about any suggestion you could give me. I'm familiar with FreeBSD, shell scripting and C. My skills cover the usual "admin things". The accident that happended to me is some very stange thing, strange in regards of why the usual means of solving sich a problem don't seem to fit. In fact, I'm the second (!) person on earth who encountered this problem, as far as my investigations revealed. So I'm not sure if it's solvable at all. In order to explain what it's about, I'd like to follow this path: 1. What initially happened? (impact) 2. How does the problem occur? (examination) 3. What seems to be the reason? (diagnosis) 4. What did I try to solve the problem? (treatment) 5. What kind of solution should be possible? (prognosis) This should help to explain my problem properly. If there's more to know, please ask me. I'll try to answer as precisely as I can. And don't mind my bad English, it's not my native language. It's a long story, sorry. So here I'll go... 1. What initially happened? --------------------------- First of all, we're talking about this device: ad0: 114473MB at ata0-master UDMA100 The installation has been a FreeBSD 5.4-p something on a 2 GHz P4 machine with 768 MB SDR-SDRAM, working perfectly for many years now. The disk contained some partitions (ad0s1a as /, ad0s1d as /var, ad0s1e as /usr and ad0s1f as /home), formatted as UFS 2 with Soft Updates (except for /). While doing some web development (running: xterms with Midnight Commander and its editor, and Opera), the system suddenly stopped working, it froze. Some seconds later, it rebootet. The last message on VT 0 was something like this, if I remember correctly: cannot free some inode: already free automatic reboot When the system came up again, I relied on fsck_ffs solving all possible problems, as I knew it from the past. The result: Many defects in the file system contents, most of them didn't matter (can reinstall), but it wouldn't make the /home partition completely accessible again. I could copy the content from the archive and all the other users' home directories (luckily), but under no circumstances I could access my own (!) home directory again. HEART ATTACK!!! Of course, I didn't have a good backup (the last one was many years old). This is because I never encountered any problems, so I got lazy. Okay, that seems to be the revenge now. When you don't do your backups, something will happen. If you do your backups, nothing will happen, and you won't need them at all. That's their purpose. I'm sure you're familiar with this wisdom. :-) We're talking about documentation, mail archives, sources of programming and various projects here, data collections created in many years of hard work. So it's understandable why I want to get the stuff back as complete as possible, that would be great. 2. How does the problem occur? ------------------------------ The problem occured at system startup when running fsck_ffs. ** /dev/ad1s1f ** Last Mounted on /home ** Phase 1 - Check Blocks and Sizes 1035979 BAD I=259127 UNEXPECTED SOFT UPDATE INCONSISTENCY 1101472 DUP I=260035 UNEXPECTED SOFT UPDATE INCONSISTENCY [...] 1117681 DUP I=260039 UNEXPECTED SOFT UPDATE INCONSISTENCY 1117682 DUP I=260039 UNEXPECTED SOFT UPDATE INCONSISTENCY EXCESSIVE DUP BLKS I=260039 CONTINUE? yes [...] 3774433638169537379 BAD I=260051 UNEXPECTED SOFT UPDATE INCONSISTENCY 7021223365635213949 BAD I=260051 UNEXPECTED SOFT UPDATE INCONSISTENCY 8030898235988077411 BAD I=260051 UNEXPECTED SOFT UPDATE INCONSISTENCY 7310315658325879925 BAD I=260051 UNEXPECTED SOFT UPDATE INCONSISTENCY EXCESSIVE BAD BLKS I=260051 CONTINUE? yes [...] 1485568 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485569 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485570 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485571 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485572 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485573 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485574 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 1485575 DUP I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 5707022222514874728 BAD I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 8091332836184380774 BAD I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY 8598589197767749681 BAD I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY [...] 3631363939722683732 BAD I=290557 UNEXPECTED SOFT UPDATE INCONSISTENCY EXCESSIVE BAD BLKS I=290557 CONTINUE? yes INCORRECT BLOCK COUNT I=290557 (3104 should be 736) CORRECT? yes fsck_ffs: bad inode number 306176 to nextinode As it's obvious, fsck_ffs fails in phase 1. No recovery is done. In my opinion, this indicates a major defect of the file system. Maybe many defects, one worse than the other. If fsck_ffs can't repair it, it must be really bad. Okay, I took the opportunity to take a new hard disk where I already had installed FreeBSD 7. Why? Because other partitions had damages, too. On /dev/ad0s1a, /, nothing significant happened, but for example on /dev/ad0s1e, /usr, the whole X11R6/ subtree disappeared, and lost+found/ filled up with many directory fragments. So I could not use the system anymore. I put in the new disk as ad0 and the former ad0 disk as ad1 and retried the fsck_ffs check where fsck_ffs from version 5 failed with fsck_ffs from version 7. NB that no matter by which other name I called fsck_ffs, be it fsck_ufs or fsck_4.2bsd, the problem would stay the same. In order to do some tests, I made an 1:1 copy of the defective partition. This is a wise step, because I can't accidently damage important data, and when I messed up a copy, I can pull a new one. FreeBSD's dd program did the job well. It ran approx. 4 hours without any error message. The defect(s) of the disk partition are replicated 1:1 in the image. % cd ~/rescue % dd if=/dev/ad1s1f of=ad1s1f.dd bs=1m 86566+1 records in 86566+1 records out 90772014080 bytes transferred in 15156.804004 secs (5988862 bytes/sec) File size of ad1s1f.dd seemed to be good, the partition contained in this file was correctly recognized: % file ad1s1f.dd ad1s1f.dd: Unix Fast File system [v2] (little-endian) last mounted on /mnt, last written at Wed Jul 2 18:51:06 2008, clean flag 0, readonly flag 0, number of blocks 44322272, number of data blocks 42925108, number of cylinder groups 472, block size 16384, fragment size 2048, average file size 16384, average number of files in dir 64, pending blocks to free 0, pending inodes to free 0, system-wide uuid 0, minimum percentage of free blocks 8, TIME optimization Of course, I tried to mount and access the partition's copy using the vnode mechanism for memory disks: % sudo mdconfig -a -t vnode -u 10 -f ad1s1f.dd % mount -o ro /dev/md10 mnt/ Fine, mount worked, so I could see what's on the disk. +<-/export/home/poly/rescue/mnt------v>+ | Name | Size | MTime | |/.. |UP--DIR| | |/.snap | 512|Dec 21 2004| |/archiv | 512|Feb 27 2006| |/backup | 512|Sep 23 2005| |/gast | 1024|Aug 25 2005| |/lost+found | 2048|Jul 1 10:15| |/markus | 512|Nov 20 2003| |/root | 1024|Apr 18 16:17| |/surf | 1024|Feb 17 2005| | .fsck_snapshot | 86567M|Jun 30 20:47| |?poly | 0|Jan 1 1970| <=== +--------------------------------------+ |/.. | +--------------------------------------+ poly@r55:~/rescue/mnt% [^] 1Help 2Menu 3View 4Edit 5Copy 6RenMov 7Mkdir 8Delete 9PullDn 10Quit Within the Midnight Commander, the name of the home directory has been marked with red color and a leading question mark. Do you recognize the timestamp? Strange. Furthermore, I could not change into this directory. % cd mnt/poly mnt/poly: Not a directory. % file mnt/poly mnt/poly: cannot open `mnt/poly' (Bad file descriptor) But I didn't give up hope yet. The data from within the home directory seemed to be present. The corresponding inodes don't seem to be marked as unused. I think this is what "orphan inodes" are called? Where do I take this idea from? There's an interesting match of the disk occupation percentage I found out when trying some df and dh examinations: % df -h Filesystem Size Used Avail Capacity Mounted on /dev/md10 82G 75G 716M 99% /export/home/poly/rescue/mnt At this point, a strange situation already occurs: The disk is 82 GB, 75 GB are used, but less than 1 GB is free. So there's something missing? I remember that at the point the disk got mad there were only approx. 700 MB free on /home. This matches the numbers above, But where's the rest? % sudo du -sch mnt du: mnt/poly: Bad file descriptor du: mnt/archiv/cr/clips.w32/s01.wmv: Bad file descriptor du: mnt/archiv/cr/clips.w32/s02.wmv: Bad file descriptor 52G mnt 52G total The disk is 82 GB, 75 GB are used, and the data structures that are still present make up 52 GB. So there must be approx. 20 GB somewhere. This could be the content of my home directory, the important data, my life, the universe, and everything. :-) Furthermore, you'll see two further "Bad file descriptor" warnings inside the archive directory. They don't matter, but they surely indicate that more than just the inode of my home directory died. So more problems can occur while proceeding. Of course, checking the partition's copy with dd, directly or via the md device, gives the same error message as already mentioned. There was a file /.fsck_snapshot of the partition's respective size. This file could be mounted, too, and within it there was a very old copy of my home directory. The snapshot has been taken at the time when I initially installed and configured this system, so it was very old, too old. 3. What seems to be the reason? ------------------------------- The reason seems to be that the inode describing my home directory doesn't exist anymore. This explains why its name is is still there (stored in the inode describing the root directory), but no further information about the file type (here: directory) and its respective content is available. But after all, this does not explain why fsck_ffs can't repair the partition any more, nor can any other program. Here my troubles understanding what happened start. 4. What did I try to solve the problem? --------------------------------------- As I already mentioned, FreeBSD's fsck_ffs is unable to repair the partition. fsck_ffs: bad inode number 306176 to nextinode Using FreeBSD's clri, I tried to clear the inodes that I thought would cause the problem of fsck_ffs: % sudo mdconfig -a -t vnode -u 10 -f ad1s1f.dd % clri 306176 /dev/md10 % sync This didn't work at all. I've tried other versions of fsck_ffs, too, running on my main machine or another one, from FreeBSD 5, 6 and 7. The only difference was a FreeBSD 5 system where fsck_ffs crashed within phase 1 with this message: fsck_ffs: cannot alloc 1073796864 bytes for inoinfo It seems that this particular machine didn't have enough RAM installed. And no matter if I checked the original partition or the copy I made with dd, the problem would always be the same. So then I tried an alternative to FreeBSD's dd, hoping that some "magical translation" would happen. My first choice was ddrescue from the ports: % ddrescue -d -r 3 -n /dev/ad1s1f ad1s1f.ddr logfile Press Ctrl-C to interrupt Initial status (read from logfile) rescued: 0 B, errsize: 0 B, errors: 0 Current status rescued: 90772 MB, errsize: 0 B, current rate: 6815 kB/s ipos: 90772 MB, errors: 0, average rate: 6723 kB/s opos: 90772 MB Finished The file ad1s1f.ddr was exactly the same as ad1s1f.dd, so no gain of hope here. Another idea was to copy data from the original disk using FreeBSD's fetch program - fetch -rR. Nope. Even FreeBSD's recoverdisk, done from the partition or its copy, just brought up another 1:1 copy including the problem. % recoverdisk ad1s1f.dd ad1s1f.rd start size block-len state done remaining % done 90771030016 984064 984064 0 90772014080 0 100.00000 Completed After this, I tried some "hardcore stuff": The Sleuth Kit from the ports, and first its dls program: % dls -v -f ufs -i raw ad1s1f.dd > ad1s1f.dls File system is corrupt (ffs_group_load: Group 12 descriptor offsets too large at 1129104) Allthough it didn't help me either, the error message is to be considered interesting: "Group 12 descriptor offsets too large at 1129104", but sadly, I don't know how to interpret this. Is 1129104 an inode? If yes: it's not allocated. What group is meant? Cylinder group? Maybe you could tell me. Another program from The Sleuth Kit, fls, allowed me to see some content of the partition. In fact, it even showed data that wasn't accessible, so it's within the range of the files that need to be restored. % fls -i raw -r ad1s1f.dd [...] d/- * 259072(realloc): poly + d/d * 3438592(realloc): 2003-05-17 [...] +++ d/d 5840896: brazil ++++ r/r 5840897: kate_bush_-_brazil.mp3 ++++ r/r 5840898: shangrila_towers.mp3 ++++ r/r 5840899: singing_telegram.mp3 ++++ r/r 5840900: the_first_noel.mp3 Segmentation fault (core dumped) So I checked: % fsdb -r ad1s1f.dd ad1s1f.dd is not a disk device CONTINUE? [yn] y ** ad1s1f.dd Editing file system `ad1s1f.dd' Last Mounted on /export/home/poly/rescue/mnt fsdb (inum: 2)> inode 3438592 current inode: directory I=3438592 MODE=40700 SIZE=512 BTIME=Nov 30 14:31:57 2007 [0 nsec] MTIME=Jun 26 05:06:14 2008 [0 nsec] CTIME=Jun 26 05:06:14 2008 [0 nsec] ATIME=Jul 1 21:13:05 2008 [0 nsec] OWNER=poly GRP=staff LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=4803f917 fsdb (inum: 3438592)> ls slot 0 ino 3438592 reclen 12: directory, `.' slot 1 ino 447497 reclen 12: directory, `..' slot 2 ino 3438593 reclen 24: regular, `.sylpheed_mark' slot 3 ino 283193 reclen 12: regular, `1' slot 4 ino 289966 reclen 12: regular, `2' slot 5 ino 289970 reclen 12: regular, `3' slot 6 ino 3438620 reclen 24: regular, `.sylpheed_cache' slot 7 ino 290363 reclen 12: regular, `4' slot 8 ino 290366 reclen 12: regular, `5' slot 9 ino 290385 reclen 12: regular, `6' slot 10 ino 290444 reclen 368: regular, `7' fsdb (inum: 3438592)> inode 259072 current inode 259072: unallocated inode fsdb (inum: 259072)> quit ***** FILE SYSTEM STILL DIRTY ***** *** FILE SYSTEM MARKED DIRTY *** BE SURE TO RUN FSCK TO CLEAN UP ANY DAMAGE *** IF IT WAS MOUNTED, RE-MOUNT WITH -u -o reload Allthough the directory's name "2003-05-17" indicates that it should hold pictures from the cam/ subtree, it's content seems to be a Sylpheed MH mail directory. According to fls's output, inodes 3438592 and 259072 have been reallocated. And remember 259072? This has been my home directory, I think. Another program from the ports, scan_ffs, would only confirm what I already knew: % scan_ffs -lv /dev/md10 block 128 id 3f67c4e6,354efde8 size 44322272 block 160 id 3f67c4e6,354efde8 size 44322272 X: 177289088 0 4.2BSD 2048 16384 0 # /export/home/poly/rescue/mnt block 12032 id 616e732e,c0690070 size 44322272 block 12416 id 3f67c4e6,354efde8 size 44322272 block 13248 id 6e73746a,c3577600 size 44322272 block 376512 id 3f67c4e6,354efde8 size 44322272 block 752864 id 3f67c4e6,354efde8 size 44322272 block 1129216 id 3f67c4e6,354efde8 size 44322272 block 1505568 id 3f67c4e6,354efde8 size 44322272 [...] The 4.2BSD partition is still there and intact, okay. The program testdisk, as well available from the ports, seems to have the same purpose. But a lost partition is not the real problem, I think. Another approach I found would to be to avoid looking at the file system at all, instead trying to parse the disk "byte-wise" and look for magic bytes. A tool to do so is magicrescue from the ports. % magicrescue -r /usr/local/share/magicrescue/recipes -d mr_output /dev/md10 Read error on /dev/md10 at 102400 bytes: Invalid argument It didn't work on the memory disk, but fortunately on the dd copy: % magicrescue -r /usr/local/share/magicrescue/recipes -d mr_output ad1s1f.dd The files recovered by this program contained many different types, such as JPG images or MP3 files. Furthermore, files from within the inaccessible home directory had been restored. This is another hint that the data should still be there. But sadly, the file structures could not be retrieved, so I got lots of stuff into one directory. >From the manual of the program ffs2recov from the ports I found out that it's possible to create an inode where you can explicitely specify name and number. So I tried: % cd ~/rescue % ffs2recov -c 259072 -n poly ad1s1f.dd This caused a file called "poly" in the ~/rescue directory. Okay, not what I wanted to get. So I tried something really stupid: % cd ~/rescue % sudo mdconfig -a -t vnode -u 10 -f ad1s1f.dd % mount -o rw /dev/md10 mnt/ % cd mnt % ffs2recov -c 259072 -n poly ad1s1f.dd % sync panic: ffs_write: type 0xc5d37e04 0 (0,16384) Dumping 136 MB: 121 105 89 73 57 41 (CTRL-C to abort) 25 9 Dump complete Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... [...] ad0: 305245MB at ata0-master UDMA100 ad1: 305245MB at ata0-slave UDMA100 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=64 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=64 ad1: FAILURE - READ_DMA status=51 error=84 LBA=64 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=0 ad1: FAILURE - READ_DMA status=51 error=84 LBA=0 savecore: reboot after panic: ffs_write: type 0xc5d37e04 0 (0,16384) You can imagine my heartbeat going up to 200 at this moment! :-) Fortunately, no data was lost. I've got no idea what happened, but I'm sure my approach was wrong. The system would not react in this way without a proper reason. And NB that the ad0 and ad1 you see are completely different things, the original 120 GB Seagate disk is on the shelf. This is the new FreeBSD 7 system is put on ad0, and ad1 is reserved for backup purposes. Why does it complain that much? Okay, don't mind, it's not important now. 5. What kind of solution should be possible? -------------------------------------------- In general, there would be two options: a) Modify fsck_ffs so it will work. b) Modify the file system so fsck_ffs will work. Of course, I've got no good clue how to do this in particular. Let me first describe what I did to fsck_ffs. I first took a look at fsck_ffs's source code. Well... it's not that I did understand very much of it, sadly, but I could find the position where the error fsck_ffs: bad inode number 306176 to nextinode came from: it was /usr/src/sbin/fsck_ffs/inode.c line 319: if (inumber != nextino++ || inumber > lastvalidinum) errx(EEXIT, "bad inode number %d to nextinode", inumber); Oh how I love disjunctions in exit conditions! :-) So I made a change to this part, just to see what would happen. (And: Yes, I know, "trial & error" is not a programming concept.) I used a copy of the subtrees sbin/fsck_ffs/ + sbin/mount/ and sys/ufs/ffs/ + sys/ufs/ufs/ from /usr/src/, then issued the command "make" from within ~/rescue/sbin/fsck_ffs/, which would give me an executable fsck_ffs in this directory. I copied it to ~/rescue and tested it with the 1:1 dd copy. if(inumber != nextino++) { printf("--- condition: inumber != nextino++\n"); printf("--- inumber=%d nextino(++)=%d lastinum=%d\n", inumber, nextino, lastinum); errx(EEXIT, "bad inode number %d to nextinode", inumber); } if(inumber > lastvalidinum) { printf("--- condition: inumber > lastvalidinum\n"); printf("--- inumber=%d lastvalidinum=%d, lastinum=%d\n", inumber, lastvalidinum, lastinum); errx(EEXIT, "bad inode number %d to nextinode", inumber); } This was the result: % ./fsck_ffs -yf ad1s1f.dd [...] --- condition: inumber > lastvalidinum --- inumber=306176 lastvalidinum=306175, lastinum=306176 fsck_ffs: bad inode number 306176 to nextinode So what's up with inode 306176? When invoking fsdb on this inode, I could see the content of a directory, and ils from The Sleuth Kit revealed that it seems to be a directory within the inaccessible home directory. slot 150 ino 306176 reclen 20: directory, `hellraiser' slot 1566 ino 306176 reclen 12: directory, `.' Strange, isn't it? Finally, I decided to comment out the whole part. I found fsck_ffs complaining in fsutil.c line 139: if (inum > maxino) errx(EEXIT, "inoinfo: inumber %d out of range", inum); So I put in another "checkpoint" there: printf("---> %d\n", inum); if (inum > maxino) { printf("--- condition: inum > maxino\n"); printf("--- inum=%d maxino=%d\n", inum, maxino); errx(EEXIT, "inoinfo: inumber %d out of range", inum); } The result was this: % ./fsck_ffs -yf ad1s1f.dd [...] THE FOLLOWING DISK SECTORS COULD NOT BE READ: 177638368, 177638369, 177638370, 177638371, 177638372, 177638373, 177638374, 177638375, 177638376, 177638377, 177638378, 177638379, 177638380, 177638381, 177638382, 177638383, 177638384, 177638385, 177638386, 177638387, 177638388, 177638389, 177638390, 177638391, 177638392, 177638393, 177638394, 177638395, 177638396, 177638397, 177638398, 177638399, 177638400, 177638401, 177638402, 177638403, 177638404, 177638405, 177638406, 177638407, 177638408, 177638409, 177638410, 177638411, 177638412, 177638413, 177638414, 177638415, 177638416, 177638417, 177638418, 177638419, 177638420, 177638421, 177638422, 177638423, 177638424, 177638425, 177638426, 177638427, 177638428, 177638429, 177638430, 177638431, 177638432, 177638433, 177638434, 177638435, 177638436, 177638437, 177638438, 177638439, 177638440, 177638441, 177638442, 177638443, 177638444, 177638445, 177638446, 177638447, 177638448, 177638449, 177638450, 177638451, 177638452, 177638453, 177638454, 177638455, 177638456, 177638457, 177638458, 177638459, 177638460, 177638461, 177638462, 177638463, 177638464, 177638465, 177638466, 177638467, 177638468, 177638469, 177638470, 177638471, 177638472, 177638473, 177638474, 177638475, 177638476, 177638477, 177638478, 177638479, 177638480, 177638481, 177638482, 177638483, 177638484, 177638485, 177638486, 177638487, 177638488, 177638489, 177638490, 177638491, 177638492, 177638493, 177638494, 177638495, --- condition: inum > maxino --- inum=11116545 maxino=11116544 fsck_ffs: inoinfo: inumber 11116545 out of range Seemed to be an important condition. :-) So what's this again? The answer was in setup.c line 209: maxino = sblock.fs_ncg * sblock.fs_ipg; Is there some information retrieved incorrectly from the file system's superblock causing all the trouble? Well, I did try checks with fsck_ffs with refering to alternate superblocks, but no luck. Or does it mean that there are 11116544 inodes on the partition? This would imply that (not mentioning directories) 11 millions of files can be created - or are stored on this disk totally? At this point, I decided to give up this way of "fixing"; most of the conditions seem to be well intended, the defect on the disk must be that bad that fsck_ffs can't handle it anymore. And now for the file system. As it is already clear, the inode of the home directory is gone. So an idea would be to create a new inode, with the same name and number as it should be. Good idea? No, obviously not. I tried it in two different ways, with no luck. So that seems to be insufficient. I do understand it: The inode number created would only be a kind of "link entry" inside the root directory which points to further information. But where should the new home directory entry know about its content? >From the friendly FreeBSD questions mailing list I even learned that there's no way to predict the inode numbers. If I assume a directory D with its inode number i(D), within D a file F with its inode number i(F), I can't claim i(D) < i(F), so I can't expect any special inode number. I think there's more to establish an intact directory structure, not just a simple "make inode with name". The directory needs to be populated correctly, but therefore, I would need to know which files are inside it. So it would be neccessary to pick all possible inode numbers and look what's behind them. This means I would need to "walk back" the .. paths to see which one finally leads to the home directory, and then put the 1st instance directory name (or inode number instead of the name, because the name is lost) into one of the directory slots; do I call them correctly? As far as I've already learned, when "walking back" the path from a file deep within a directory structure, every inode contains a field "where it comes from", let's say, where CWD and .. are (as an inode number). Let's assume we're at the inode 259301 refering to a file bla.txt. Then something like this structure should exist: bla.txt dingens/ foo/ poly/ / 259301 -----> 259285 -----> 259140 -----> 259072 -----> 2 This would be /home/poly/foo/dingens/bla.txt on ad0s1f (where / is then mounted as /home). When I can assume that every inode still knows "where it came from", what would be a useful tool to build poly/ (12345) again? I think I'll need to construct its content again, because just by creating poly/ as 12345, where does the filesystem know from what's the content of poly/? Is the term "directory slots" I came across related to that topic? Which sources could give good hints? For any considerations, I'll assume that only the inode of my home directory is gone. I can't tell for sure that it will be this way, it's possible that other inodes have died, too. I can't assert it won't be the case. In general, I think what's needed is a way to reconnect the "orphan" inodes to "normal" inodes again so they can be accessed. Because the home directory's inode is gone, any information about the files and directories on its 1st level is gone, too. So these would not be restored with their original names, but with the inode number as names, just like fsck_ffs would do it with its lost+found/ mechanism. All data within the directories from the 1st level would of course still have their names because these inodes are present. I'm thinking about something like this: Formerly: / poly/ foo/ bar.c baz/ boing/ boo.h boom.h bla.c .xchat/ xchat.conf .fetchmailrc After restore: / poly/ #123456/ bar.c #123789/ boing/ boo.h boom.h #124785 #127854/ xchat.conf #128745 ^^^^^^^ There are tools that can help to "restore" the 1st level, for example FreeBSD's file command. There aren't many files where a problem should occur: File names can usually be recognized from the data they contain (source, note, configuration file etc.), and directories can be recognized by the names of the files they contain. Of course, that's the thing that would happen if fsck_ffs would work as initially intended. When I see it, I will remember what the correct names were. So these were my first thoughts about this problem. I hope you can help me with some ideas, concepts or suggestions, or documents or source files worth studying. I don't expect you to solve my problem, I'm not greedy. :-) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... From freebsd at edvax.de Sun Nov 2 00:20:21 2008 From: freebsd at edvax.de (Polytropon) Date: Sun Nov 2 00:21:09 2008 Subject: Repairing a defective UFS 2 partition with fsck_ffs (or other means) In-Reply-To: <1225600923.97152.3.camel@jill.exit.com> References: <20081102050601.9fccb80f.freebsd@edvax.de> <1225600923.97152.3.camel@jill.exit.com> Message-ID: <20081102082015.b8d40d77.freebsd@edvax.de> Hi, many thanks for your quick reply. On Sat, 01 Nov 2008 21:42:03 -0700, Frank Mayhar wrote: > Check out /usr/ports/sysutils/ffs2recov. It won't repair the filesystem > but it may be able to recover some or all of your data. I've already tried this program: % mdconfig -a -t vnode -f ad1s1f.dd md0 % ffs2recov -dav /dev/md0 will create many files according to inode numbers. The file command will identify them as "data", and they are all of the same size, group-wise. I don't know what I could do with these files. Furthermore, these errors are repeatedly displayed: getinode: inode size (14199049117881519608 > 175821242368) too big, skipping. and main.c:529 memory allocation failed for inode 294970, -1693177856 with different values. I'm not sure if ffs2recov is the right tool here, or am I using it the wrong way? I've read /usr/local/share/doc/sleuthkit/ref_fs.txt carefully, but it doesn't seem to help me either. I don't want to buy expensive "Windows" software that I have to run in wine (along with the usual problems upcoming) for something that I think could be solved using FreeBSD's on-board means, or even send the disk to a recovery company for much money. I simply can't afford this. If you have any experiences regarding this special case of data recovery, please give me an advice where I should play attention. -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... From peter.schuller at infidyne.com Sun Nov 2 07:09:26 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sun Nov 2 07:09:33 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <490A8D23.6030309@modulus.org> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8D23.6030309@modulus.org> Message-ID: <20081102150924.GB59552@hyperion.scode.org> > Its probably worth playing with vfs.zfs.cache_flush_disable when using > the hardware RAID. > > By default, ZFS will flush the entire hardware cache just to make sure > the ZFS Intent Log (ZIL) has been written. > > This isn't so bad on a group of hard disks with small caches, but bad if > you have 256mb of controller write cache. Flushing the cache to constituent drives also has a direct impact on latency, even without any dirty data (save what you just written) in the cache. If you're doing anything that does frequent fsync():s, you're likely to not want to wait for actual persistence to disk (with battery backed cache). In any case, why would the actual RAID controller cache be flushed, unless someone expliclitly configured it such? I would expect a regular BIO_FLUSH (that's all ZFS is going right?) to be satisfied by the data being contained in the controller cache, under the assumption that it is battery backed, and that the storage volume/controller has not been explicitly configured otherwise to not rely on the battery for persistence. Please correct me if I'm wrong, but if synchronous writing to your RAID device results in actually waiting for underlying disks to commit the data to platters, that sounds like a driver/controller problem/policy issue rather than anything that should be fixed by tweaking ZFS. Or is it the case that ZFS does both a "regular" request to commit data (which I thought was the purpose of BIO_FLUSH, even though the "FLUSH" sounds more specific) and separately does a "flush any actual caches no matter what" type of request that ends up bypassing controller policy (because it is needed on stupid SATA drives or such)? -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081102/376b48ac/attachment.pgp From attilio at freebsd.org Sun Nov 2 08:17:21 2008 From: attilio at freebsd.org (Attilio Rao) Date: Sun Nov 2 08:17:27 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> Message-ID: <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> 2008/11/2, Attilio Rao : > 2008/11/2, Yuri Pankov : > > > Hi, > > > > Trying to mount nonexistent smb share with mount_smbfs leads to > > following panic: > > > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > > > Unread portion of the kernel message buffer: > > smb_co_lock: recursive lock for object 1 > > panic: Lock (lockmgr) smb_vc not locked @ > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > > cpuid = 0 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > panic() at panic+0x182 > > witness_assert() at witness_assert+0x21a > > __lockmgr_args() at __lockmgr_args+0x17a > > smb_co_put() at smb_co_put+0x76 > > smb_sm_lookup() at smb_sm_lookup+0xfe > > smb_usr_lookup() at smb_usr_lookup+0xcd > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > giant_ioctl() at giant_ioctl+0x75 > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > kern_ioctl() at kern_ioctl+0x92 > > ioctl() at ioctl+0xfd > > syscall() at syscall+0x1bf > > Xfast_syscall() at Xfast_syscall+0xab > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > > Uptime: 6m46s > > Physical memory: 2032 MB > > > So, what is happening here is that smb_co_lock() is AFU. > Infact looking at the code: > int > smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) > { > ... > if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && > (flags & LK_CANRECURSE) == 0) { > SMBERROR("recursive lock for object %d\n", cp->co_level); > return 0; > } > ... Yuri, could you please test this fix: http://www.freebsd.org/~attilio/netsmb.diff and report if it works? You could get a KASSERT running but this is expected as I want to identify on the callers who passes a malformed request and fix it. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From attilio at freebsd.org Sun Nov 2 08:10:35 2008 From: attilio at freebsd.org (Attilio Rao) Date: Sun Nov 2 08:42:47 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <20081102123100.GA1434@darklight.homeunix.org> References: <20081102123100.GA1434@darklight.homeunix.org> Message-ID: <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> 2008/11/2, Yuri Pankov : > Hi, > > Trying to mount nonexistent smb share with mount_smbfs leads to > following panic: > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > Unread portion of the kernel message buffer: > smb_co_lock: recursive lock for object 1 > panic: Lock (lockmgr) smb_vc not locked @ > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > panic() at panic+0x182 > witness_assert() at witness_assert+0x21a > __lockmgr_args() at __lockmgr_args+0x17a > smb_co_put() at smb_co_put+0x76 > smb_sm_lookup() at smb_sm_lookup+0xfe > smb_usr_lookup() at smb_usr_lookup+0xcd > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > giant_ioctl() at giant_ioctl+0x75 > devfs_ioctl_f() at devfs_ioctl_f+0x76 > kern_ioctl() at kern_ioctl+0x92 > ioctl() at ioctl+0xfd > syscall() at syscall+0x1bf > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > Uptime: 6m46s > Physical memory: 2032 MB So, what is happening here is that smb_co_lock() is AFU. Infact looking at the code: int smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) { ... if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && (flags & LK_CANRECURSE) == 0) { SMBERROR("recursive lock for object %d\n", cp->co_level); return 0; } ... from that it is obvious that smb_co_lock() won't recurse the lock really, but will let believe the consumer it acquired the lock successfully without panicking at all (the printf is like a little joke there). I think that we don't panic here mainly because these are "user driver" events and we want avoid to get a DoS for the kernel but it is an unacceptable code also. This can be fixed by allowing recuring lockmgr by default but the problem is more vaste. For example, it would be very nice to drop the LK_DRAIN support from netsmb in order to completely remove it from the 8.0 kernel serie and kill a bogus / dangerous option for lockmgr. It would be a cornerstone for lockmgr wealth really. What really is missing here is a valid netsmb maintainer, someone that knows well the layers involved, is motivated to work on it and can take advantage from the other kernel developer experience on the most hardcore parts. It would be also nice, for example, to bring in some Apple's changes (like the crypto support). Someone willing to step in would be very appreciated. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From yuri.pankov at gmail.com Sun Nov 2 09:01:31 2008 From: yuri.pankov at gmail.com (Yuri Pankov) Date: Sun Nov 2 09:01:38 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> Message-ID: <20081102163307.GB1434@darklight.homeunix.org> On Sun, Nov 02, 2008 at 05:17:18PM +0100, Attilio Rao wrote: > 2008/11/2, Attilio Rao : > > 2008/11/2, Yuri Pankov : > > > > > Hi, > > > > > > Trying to mount nonexistent smb share with mount_smbfs leads to > > > following panic: > > > > > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > > > > > Unread portion of the kernel message buffer: > > > smb_co_lock: recursive lock for object 1 > > > panic: Lock (lockmgr) smb_vc not locked @ > > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > > > cpuid = 0 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > panic() at panic+0x182 > > > witness_assert() at witness_assert+0x21a > > > __lockmgr_args() at __lockmgr_args+0x17a > > > smb_co_put() at smb_co_put+0x76 > > > smb_sm_lookup() at smb_sm_lookup+0xfe > > > smb_usr_lookup() at smb_usr_lookup+0xcd > > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > > giant_ioctl() at giant_ioctl+0x75 > > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > > kern_ioctl() at kern_ioctl+0x92 > > > ioctl() at ioctl+0xfd > > > syscall() at syscall+0x1bf > > > Xfast_syscall() at Xfast_syscall+0xab > > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > > > Uptime: 6m46s > > > Physical memory: 2032 MB > > > > > > So, what is happening here is that smb_co_lock() is AFU. > > Infact looking at the code: > > int > > smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) > > { > > ... > > if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && > > (flags & LK_CANRECURSE) == 0) { > > SMBERROR("recursive lock for object %d\n", cp->co_level); > > return 0; > > } > > ... > > Yuri, > could you please test this fix: > http://www.freebsd.org/~attilio/netsmb.diff > > and report if it works? > You could get a KASSERT running but this is expected as I want to > identify on the callers who passes a malformed request and fix it. > > Thanks, > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein Thanks, Attilio. With this patch system doesn't panic anymore with nonexistent share names (though I had to comment out smb_co_lockstatus prototype and function to get rid of -Werror complaints). Still getting a LOR: netsmb_dev: loaded lock order reversal: 1st 0xffffff0021644008 smb_vc (smb_vc) @ /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:332 2nd 0xffffffff81037368 smbsm (smbsm) @ /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:348 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e __lockmgr_args() at __lockmgr_args+0xc2a smb_co_lock() at smb_co_lock+0x38 smb_co_gone() at smb_co_gone+0x38 smb_sm_lookup() at smb_sm_lookup+0xfe smb_usr_lookup() at smb_usr_lookup+0xcd nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 giant_ioctl() at giant_ioctl+0x75 devfs_ioctl_f() at devfs_ioctl_f+0x76 kern_ioctl() at kern_ioctl+0x92 ioctl() at ioctl+0xfd syscall() at syscall+0x1bf Xfast_syscall() at Xfast_syscall+0xab --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = 0x7fffffffe048, rbp = 0x7fffffffe460 --- Thanks, Yuri From freebsd at edvax.de Sun Nov 2 09:39:18 2008 From: freebsd at edvax.de (Polytropon) Date: Sun Nov 2 09:39:25 2008 Subject: Repairing a defective UFS 2 partition with fsck_ffs (or other means) In-Reply-To: <1225642408.97152.15.camel@jill.exit.com> References: <20081102050601.9fccb80f.freebsd@edvax.de> <1225600923.97152.3.camel@jill.exit.com> <20081102082015.b8d40d77.freebsd@edvax.de> <1225642408.97152.15.camel@jill.exit.com> Message-ID: <20081102183913.d4ec58f6.freebsd@edvax.de> On Sun, 02 Nov 2008 08:13:28 -0800, Frank Mayhar wrote: > ffs2recov is a very low-level tool to recover files and data from a > corrupted file system. Among other things, it can search for > directories, print contents of inodes, recover files by name and recover > directory hierarchies given an inode and a name. This is something I must have misunderstood from the manual. As I mentioned before, English is not my native language. So it's possible that this part of the examples section If you would like to recover inode 2385 named dir and all it's decendants the command: % ffs2recov -c 2385 -n dir ffs.image will create dir, populate it, setting modes, permissions, and times to the originals. will not "recreate the inode", but will take this as a starting point to recover the rest... I'll investigate this further. > It's not for the faint > of heart but it can recover data that nothing else can. Up to this point, I agree that ffs2recov seems to be the most promising tool. But you're right, I have to learn more about this topic. It isn't that easy as creating a FAT with a hex editor. :-) > It sounds like your home directory inode has been trashed, in which case > your only choice is to try to find the directory blocks themselves, dump > them and use them to find the names of the files in that directory. > Otherwise you'll be stuck with them named by their inode. Thank you, this is a very good approach. I found ffs2recov very handy for the different stages, such as dumping blocks, retrieving inode informations and such things. > Those errors mean that there is very serious corruption on disk. I do believe it's so. Allthough I'm not sure what particularly caused the damage, according to fsck_ffs there seem to be many corruptions within the soft update data and the inodes, which form some kind of linked list, as far as I understood it. > The > program is ignoring those errors and forging on to try to find and > recover everything it can. It may be that you won't be able to recover > much. Depends on the "magnitude of damage"... which I can't tell exactly of. > You should, however, be able to recover at least some of your > data, especially the stuff that's further away (physically and > logically) from the trashed area of the disk. I need to try some more, and I think the ref_fs.txt from The Sleuth Kit mentioned in my first posting will give some good information about how files and directories, inodes, blocks and the other termini technici build up the basic principles of an UFS file system. It's... I never had much interest in how it worked, as long as it worked, but I found out that many knowledge is obtained when trying to solve some strange problem. After some time, you can explain how TCP/IP, routing or compilation processes work. :-) > I really can't hold your hand, here, since I'm overloaded as it is, > sorry. Best of luck. Thank you. You did really help me giving some advices. And after all... all the data loss and damages... I find the topic quite interesting. Allthough there's no high demand for this data, I'll take the time to learn more, and maybe find a way to solve the problem using ffs2recov. There's probably no need to reinvent the wheel until you learn how to drive. :-) /* PS. I'm not on the -fs mailing list. */ -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... From attilio at freebsd.org Sun Nov 2 09:53:29 2008 From: attilio at freebsd.org (Attilio Rao) Date: Sun Nov 2 09:53:41 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <20081102163307.GB1434@darklight.homeunix.org> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <20081102163307.GB1434@darklight.homeunix.org> Message-ID: <3bbf2fe10811020953l29f1a7eesa4f4eeb49f0a2eba@mail.gmail.com> 2008/11/2, Yuri Pankov : > On Sun, Nov 02, 2008 at 05:17:18PM +0100, Attilio Rao wrote: > > 2008/11/2, Attilio Rao : > > > 2008/11/2, Yuri Pankov : > > > > > > > Hi, > > > > > > > > Trying to mount nonexistent smb share with mount_smbfs leads to > > > > following panic: > > > > > > > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > > > > > > > Unread portion of the kernel message buffer: > > > > smb_co_lock: recursive lock for object 1 > > > > panic: Lock (lockmgr) smb_vc not locked @ > > > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > > > > cpuid = 0 > > > > KDB: stack backtrace: > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > > panic() at panic+0x182 > > > > witness_assert() at witness_assert+0x21a > > > > __lockmgr_args() at __lockmgr_args+0x17a > > > > smb_co_put() at smb_co_put+0x76 > > > > smb_sm_lookup() at smb_sm_lookup+0xfe > > > > smb_usr_lookup() at smb_usr_lookup+0xcd > > > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > > > giant_ioctl() at giant_ioctl+0x75 > > > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > > > kern_ioctl() at kern_ioctl+0x92 > > > > ioctl() at ioctl+0xfd > > > > syscall() at syscall+0x1bf > > > > Xfast_syscall() at Xfast_syscall+0xab > > > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > > > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > > > > Uptime: 6m46s > > > > Physical memory: 2032 MB > > > > > > > > > So, what is happening here is that smb_co_lock() is AFU. > > > Infact looking at the code: > > > int > > > smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) > > > { > > > ... > > > if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && > > > (flags & LK_CANRECURSE) == 0) { > > > SMBERROR("recursive lock for object %d\n", cp->co_level); > > > return 0; > > > } > > > ... > > > > Yuri, > > could you please test this fix: > > http://www.freebsd.org/~attilio/netsmb.diff > > > > and report if it works? > > You could get a KASSERT running but this is expected as I want to > > identify on the callers who passes a malformed request and fix it. > > > > Thanks, > > Attilio > > > > > > -- > > Peace can only be achieved by understanding - A. Einstein > > > Thanks, Attilio. > > With this patch system doesn't panic anymore with nonexistent share > names (though I had to comment out smb_co_lockstatus prototype and > function to get rid of -Werror complaints). Still getting a LOR: > > netsmb_dev: loaded > lock order reversal: > 1st 0xffffff0021644008 smb_vc (smb_vc) @ > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:332 > 2nd 0xffffffff81037368 smbsm (smbsm) @ > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:348 > > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > _witness_debugger() at _witness_debugger+0x2e > witness_checkorder() at witness_checkorder+0x81e > __lockmgr_args() at __lockmgr_args+0xc2a > smb_co_lock() at smb_co_lock+0x38 > smb_co_gone() at smb_co_gone+0x38 > > smb_sm_lookup() at smb_sm_lookup+0xfe > smb_usr_lookup() at smb_usr_lookup+0xcd > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > giant_ioctl() at giant_ioctl+0x75 > devfs_ioctl_f() at devfs_ioctl_f+0x76 > kern_ioctl() at kern_ioctl+0x92 > ioctl() at ioctl+0xfd > syscall() at syscall+0x1bf > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > 0x7fffffffe048, rbp = 0x7fffffffe460 --- I've updated the patch in order to fix smb_co_lockstatus problem. Could you please stress test it while I investigate the LOR problem? Are you running with INVARIANTS? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From yuri.pankov at gmail.com Sun Nov 2 12:34:10 2008 From: yuri.pankov at gmail.com (Yuri Pankov) Date: Sun Nov 2 12:34:17 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811020953l29f1a7eesa4f4eeb49f0a2eba@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <20081102163307.GB1434@darklight.homeunix.org> <3bbf2fe10811020953l29f1a7eesa4f4eeb49f0a2eba@mail.gmail.com> Message-ID: <20081102202211.GA1549@darklight.homeunix.org> On Sun, Nov 02, 2008 at 06:53:25PM +0100, Attilio Rao wrote: > 2008/11/2, Yuri Pankov : > > On Sun, Nov 02, 2008 at 05:17:18PM +0100, Attilio Rao wrote: > > > 2008/11/2, Attilio Rao : > > > > 2008/11/2, Yuri Pankov : > > > > > > > > > Hi, > > > > > > > > > > Trying to mount nonexistent smb share with mount_smbfs leads to > > > > > following panic: > > > > > > > > > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > > > > > > > > > Unread portion of the kernel message buffer: > > > > > smb_co_lock: recursive lock for object 1 > > > > > panic: Lock (lockmgr) smb_vc not locked @ > > > > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > > > > > cpuid = 0 > > > > > KDB: stack backtrace: > > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > > > panic() at panic+0x182 > > > > > witness_assert() at witness_assert+0x21a > > > > > __lockmgr_args() at __lockmgr_args+0x17a > > > > > smb_co_put() at smb_co_put+0x76 > > > > > smb_sm_lookup() at smb_sm_lookup+0xfe > > > > > smb_usr_lookup() at smb_usr_lookup+0xcd > > > > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > > > > giant_ioctl() at giant_ioctl+0x75 > > > > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > > > > kern_ioctl() at kern_ioctl+0x92 > > > > > ioctl() at ioctl+0xfd > > > > > syscall() at syscall+0x1bf > > > > > Xfast_syscall() at Xfast_syscall+0xab > > > > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > > > > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > > > > > Uptime: 6m46s > > > > > Physical memory: 2032 MB > > > > > > > > > > > > So, what is happening here is that smb_co_lock() is AFU. > > > > Infact looking at the code: > > > > int > > > > smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) > > > > { > > > > ... > > > > if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && > > > > (flags & LK_CANRECURSE) == 0) { > > > > SMBERROR("recursive lock for object %d\n", cp->co_level); > > > > return 0; > > > > } > > > > ... > > > > > > Yuri, > > > could you please test this fix: > > > http://www.freebsd.org/~attilio/netsmb.diff > > > > > > and report if it works? > > > You could get a KASSERT running but this is expected as I want to > > > identify on the callers who passes a malformed request and fix it. > > > > > > Thanks, > > > Attilio > > > > > > > > > -- > > > Peace can only be achieved by understanding - A. Einstein > > > > > > Thanks, Attilio. > > > > With this patch system doesn't panic anymore with nonexistent share > > names (though I had to comment out smb_co_lockstatus prototype and > > function to get rid of -Werror complaints). Still getting a LOR: > > > > netsmb_dev: loaded > > lock order reversal: > > 1st 0xffffff0021644008 smb_vc (smb_vc) @ > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:332 > > 2nd 0xffffffff81037368 smbsm (smbsm) @ > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:348 > > > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > > _witness_debugger() at _witness_debugger+0x2e > > witness_checkorder() at witness_checkorder+0x81e > > __lockmgr_args() at __lockmgr_args+0xc2a > > smb_co_lock() at smb_co_lock+0x38 > > smb_co_gone() at smb_co_gone+0x38 > > > > smb_sm_lookup() at smb_sm_lookup+0xfe > > smb_usr_lookup() at smb_usr_lookup+0xcd > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > giant_ioctl() at giant_ioctl+0x75 > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > kern_ioctl() at kern_ioctl+0x92 > > ioctl() at ioctl+0xfd > > syscall() at syscall+0x1bf > > Xfast_syscall() at Xfast_syscall+0xab > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > > > 0x7fffffffe048, rbp = 0x7fffffffe460 --- > > I've updated the patch in order to fix smb_co_lockstatus problem. > Could you please stress test it while I investigate the LOR problem? Not sure what do you mean by "stress test". I've tried mounting several different shares and copied ~100Gb from them, hope this should suffice. > Are you running with INVARIANTS? Yes. > > Thanks, > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein Thanks, Yuri From andrew at modulus.org Sun Nov 2 16:58:36 2008 From: andrew at modulus.org (Andrew Snow) Date: Sun Nov 2 16:58:43 2008 Subject: zfs snapshots sometimes give errors like "File name too long" Message-ID: <490E4CB1.5070704@modulus.org> I have about 50 filesystems on July 8-current with pjd's patchset. While snapshotting works on most of them, occasionally after I create a snapshot and try to use it, I get errors like this: ls -la .zfs/snapshot ls: 20081031_20653: File name too long It does occur only on the longest ones, but they are nowhere near MAXNAMELEN (256) long, they are around 60 characters for the mount point plus another 14 for the snapshot name. From linimon at FreeBSD.org Mon Nov 3 02:36:49 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Nov 3 02:37:01 2008 Subject: kern/128514: [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Adapter Message-ID: <200811031036.mA3AanMH089155@freefall.freebsd.org> Old Synopsis: ZFS and LSILogic SAS/SATA Adapter New Synopsis: [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Adapter Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Nov 3 10:35:42 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=128514 From bugmaster at FreeBSD.org Mon Nov 3 03:06:52 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Nov 3 03:07:45 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200811031106.mA3B6pLe010888@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha a kern/119868 fs [zfs] [patch] 7.0 kernel panic during boot with ZFS an o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 23 problems total. From ivoras at freebsd.org Mon Nov 3 04:00:52 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Nov 3 04:00:58 2008 Subject: FreeBSD 7.0: gjournal on root filesystem problems In-Reply-To: References: Message-ID: Gabriel Lavoie wrote: > Hello, > I'm currently making some test to know how to setup a new home > fileserver using two 500GB hard drives. I want to create a gmirror/gjournal > setup on the complete filesystem. I've been able to setup everything and it > works well. Now the problem I have is with the failure test. I create a file > with random data on the / filesystem using "dd" and while whe file is being > created, I hit the reset button of the computer. Now, it won't boot > anymore... I get the following message: > > GEOM_MIRROR: Device mirror/gm launched (2/2) > GEOM_JOURNAL: Journal 3672855181: mirror/gma contains data. > GEOM_JOURNAL: Journal 3672855181: mirror/gma contains journal. > GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains data. > GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains journal. > GEOM_JOURNAL: Journal mirror/gmd consistent. > Trying to mount root from ufs:/dev/mirror/gma.journal > > Manual root filesystem specification: > : Mount using filesystem > eg. ufs:da0s1a > ? List valid disk boot devices > Abort manual input > > > mountroot> ? > > List of GEOM managed disk devices: > mirror/gmd.journal mirror/gmd mirror/gmc mirror/gma mirror/gm ad10s1c > ad10s1b ad8s1c ad8s1b ad10s2 ad10s1 ad8s1 ad10 ad8 acd0 It looks like journal recovery takes too long and the journal devices isn't ready at the time the root file system needs to be mounted. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081103/0794da01/signature.pgp From numisemis at yahoo.com Mon Nov 3 00:32:06 2008 From: numisemis at yahoo.com (Simun Mikecin) Date: Mon Nov 3 04:24:04 2008 Subject: Areca vs. ZFS performance testing. Message-ID: <723911.11212.qm@web36603.mail.mud.yahoo.com> Peter Schuller wrote: > In any case, why would the actual RAID controller cache be flushed, > unless someone expliclitly configured it such? I would expect a > regular BIO_FLUSH (that's all ZFS is going right?) to be satisfied by > the data being contained in the controller cache, under the assumption > that it is battery backed, and that the storage volume/controller has > not been explicitly configured otherwise to not rely on the battery > for persistence. I'm using amr(4) driver with Dell PERC 4e/DC controller (which is a rebranded LSI 320-2E) that has battery-backed cache and write-caching configured to be write-back. This controller is connected to a LED light that shows when there is something in the cache not yet commited to the disks. Ever since I changed from UFS2 to ZFS that light comes off very quickly. It does not stay on for longer periods of time (it did for upto 10 seconds when I used UFS2 - it is a controller BIOS setting). So doing BIO_FLUSH in this case *does* flush battery-backed cache. I can restore old functionality by setting vfs.zfs.cache_flush_disable=1 but I shouldn't use it in my case since the same system also has SATA disks with ZFS on them and turning off BIO_FLUSH for SATA disks would be dangerous. > Please correct me if I'm wrong, but if synchronous writing to your > RAID device results in actually waiting for underlying disks to commit > the data to platters, that sounds like a driver/controller > problem/policy issue rather than anything that should be fixed by > tweaking ZFS. AFAIK as I know BIO_FLUSH (which is for now implemeted only for ata(4) and amr(4) - correct me if I'm mistaken) does just that: actually flushes and waits for the cache content to be written on disk. > Or is it the case that ZFS does both a "regular" request to commit > data (which I thought was the purpose of BIO_FLUSH, even though the > "FLUSH" sounds more specific) and separately does a "flush any actual > caches no matter what" type of request that ends up bypassing > controller policy (because it is needed on stupid SATA drives or > such)? AFAIK BIO_FLUSH commits *everything* that is in the cache. It is needed for stupid SATA drives. But I'm not so happy about it been turned on for amr(4) flushing the entire 128MB battery backed cache. From jhb at freebsd.org Mon Nov 3 13:03:55 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Nov 3 13:04:02 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> Message-ID: <200811031458.42549.jhb@freebsd.org> On Sunday 02 November 2008 11:17:18 am Attilio Rao wrote: > 2008/11/2, Attilio Rao : > > 2008/11/2, Yuri Pankov : > > > > > Hi, > > > > > > Trying to mount nonexistent smb share with mount_smbfs leads to > > > following panic: > > > > > > # mount_smbfs //yuri@lifebane/blahblah /mnt > > > > > > Unread portion of the kernel message buffer: > > > smb_co_lock: recursive lock for object 1 > > > panic: Lock (lockmgr) smb_vc not locked @ > > > /usr/src/sys/modules/smbfs/../../netsmb/smb_conn.c:329. > > > cpuid = 0 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > panic() at panic+0x182 > > > witness_assert() at witness_assert+0x21a > > > __lockmgr_args() at __lockmgr_args+0x17a > > > smb_co_put() at smb_co_put+0x76 > > > smb_sm_lookup() at smb_sm_lookup+0xfe > > > smb_usr_lookup() at smb_usr_lookup+0xcd > > > nsmb_dev_ioctl() at nsmb_dev_ioctl+0x1f6 > > > giant_ioctl() at giant_ioctl+0x75 > > > devfs_ioctl_f() at devfs_ioctl_f+0x76 > > > kern_ioctl() at kern_ioctl+0x92 > > > ioctl() at ioctl+0xfd > > > syscall() at syscall+0x1bf > > > Xfast_syscall() at Xfast_syscall+0xab > > > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800939aec, rsp = > > > 0x7fffffffe038, rbp = 0x7fffffffe450 --- > > > Uptime: 6m46s > > > Physical memory: 2032 MB > > > > > > So, what is happening here is that smb_co_lock() is AFU. > > Infact looking at the code: > > int > > smb_co_lock(struct smb_connobj *cp, int flags, struct thread *td) > > { > > ... > > if (smb_co_lockstatus(cp, td) == LK_EXCLUSIVE && > > (flags & LK_CANRECURSE) == 0) { > > SMBERROR("recursive lock for object %d\n", cp->co_level); > > return 0; > > } > > ... > > Yuri, > could you please test this fix: > http://www.freebsd.org/~attilio/netsmb.diff > > and report if it works? > You could get a KASSERT running but this is expected as I want to > identify on the callers who passes a malformed request and fix it. This allows all smb locks to recurse unlike the original code I think. It may be better if smb_vclist was initialized with LK_RECURSE, but not all the other smb locks. Also, in smb_co_addchild() I think you should just replace the existing asserts with appropriate lockmgr_assert() (you could add a smb_co_assert() to preserve the layering) rather than removing assertions altogether. -- John Baldwin From rwatson at FreeBSD.org Mon Nov 3 13:07:28 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Nov 3 13:07:34 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <200811031458.42549.jhb@freebsd.org> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <200811031458.42549.jhb@freebsd.org> Message-ID: On Mon, 3 Nov 2008, John Baldwin wrote: >> Yuri, could you please test this fix: >> http://www.freebsd.org/~attilio/netsmb.diff >> >> and report if it works? You could get a KASSERT running but this is >> expected as I want to identify on the callers who passes a malformed >> request and fix it. > > This allows all smb locks to recurse unlike the original code I think. It > may be better if smb_vclist was initialized with LK_RECURSE, but not all the > other smb locks. Also, in smb_co_addchild() I think you should just replace > the existing asserts with appropriate lockmgr_assert() (you could add a > smb_co_assert() to preserve the layering) rather than removing assertions > altogether. My general feeling is that the locking in netsmb needs a bit of cleanup, updating, etc. I'm reluctant to change the underlying primitives (as this patch does) without first clarifying what's going on in the code a layer or two above. Robert N M Watson Computer Laboratory University of Cambridge From attilio at freebsd.org Mon Nov 3 13:20:06 2008 From: attilio at freebsd.org (Attilio Rao) Date: Mon Nov 3 13:20:15 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <200811031458.42549.jhb@freebsd.org> Message-ID: <3bbf2fe10811031320o5d977babpe37bcf22836b8d34@mail.gmail.com> 2008/11/3, Robert Watson : > On Mon, 3 Nov 2008, John Baldwin wrote: > > > > > > > Yuri, could you please test this fix: > http://www.freebsd.org/~attilio/netsmb.diff > > > > > > and report if it works? You could get a KASSERT running but this is > expected as I want to identify on the callers who passes a malformed request > and fix it. > > > > > > > This allows all smb locks to recurse unlike the original code I think. It > may be better if smb_vclist was initialized with LK_RECURSE, but not all the > other smb locks. Also, in smb_co_addchild() I think you should just replace > the existing asserts with appropriate lockmgr_assert() (you could add a > smb_co_assert() to preserve the layering) rather than removing assertions > altogether. > > > > My general feeling is that the locking in netsmb needs a bit of cleanup, > updating, etc. I'm reluctant to change the underlying primitives (as this > patch does) without first clarifying what's going on in the code a layer or > two above. I agree with Robert. We need to make an upper layers analysis and decide what is the best solution for locks. This was a quick hack just to let it not panic when mounting. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From fbsd-fs at mawer.org Mon Nov 3 14:48:18 2008 From: fbsd-fs at mawer.org (Antony Mawer) Date: Mon Nov 3 14:48:31 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811031320o5d977babpe37bcf22836b8d34@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <200811031458.42549.jhb@freebsd.org> <3bbf2fe10811031320o5d977babpe37bcf22836b8d34@mail.gmail.com> Message-ID: <490F77ED.9050501@mawer.org> Attilio Rao wrote: > 2008/11/3, Robert Watson : >> On Mon, 3 Nov 2008, John Baldwin wrote: >>>> Yuri, could you please test this fix: >> http://www.freebsd.org/~attilio/netsmb.diff >>>> and report if it works? You could get a KASSERT running but this is >> expected as I want to identify on the callers who passes a malformed request >> and fix it. >>> This allows all smb locks to recurse unlike the original code I think. It >> may be better if smb_vclist was initialized with LK_RECURSE, but not all the >> other smb locks. Also, in smb_co_addchild() I think you should just replace >> the existing asserts with appropriate lockmgr_assert() (you could add a >> smb_co_assert() to preserve the layering) rather than removing assertions >> altogether. >> My general feeling is that the locking in netsmb needs a bit of cleanup, >> updating, etc. I'm reluctant to change the underlying primitives (as this >> patch does) without first clarifying what's going on in the code a layer or >> two above. > > I agree with Robert. > We need to make an upper layers analysis and decide what is the best > solution for locks. > This was a quick hack just to let it not panic when mounting. This probably also applies to NWFS and netncp as well -- I haven't had a chance to test NWFS in 7.x as of yet, but will hope to do so in the coming months... --Antony From rwatson at FreeBSD.org Mon Nov 3 15:11:39 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Nov 3 15:11:46 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <490F77ED.9050501@mawer.org> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <200811031458.42549.jhb@freebsd.org> <3bbf2fe10811031320o5d977babpe37bcf22836b8d34@mail.gmail.com> <490F77ED.9050501@mawer.org> Message-ID: On Tue, 4 Nov 2008, Antony Mawer wrote: > This probably also applies to NWFS and netncp as well -- I haven't had a > chance to test NWFS in 7.x as of yet, but will hope to do so in the coming > months... Ah, someone who actually uses netncp and nwfs! I've been trying to keep the ipx/spx code alive and working through the MPSAFE network stack work, but I'm really not set up to test that, let alone the Netware file system parts. Let us know how it goes. The netsmb and netncp code is well-structured, but designed with pre-SMPng locking primitives and network stack in mind, so will need quite a bit of work to pull forwards. As Attilio has already been running into, netsmb has rather intimate knowledge and reliance on some of the more obscure behaviors of lockmgr :-). This is something that will need to be fairly high on the list of things to change. Apple has started doing some of this in their fork of our netsmb code, but there's a lot more to do I think. Robert N M Watson Computer Laboratory University of Cambridge From ivoras at gmail.com Mon Nov 3 16:15:42 2008 From: ivoras at gmail.com (Ivan Voras) Date: Mon Nov 3 16:15:49 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> <200811031458.42549.jhb@freebsd.org> <3bbf2fe10811031320o5d977babpe37bcf22836b8d34@mail.gmail.com> <490F77ED.9050501@mawer.org> Message-ID: <9bbcef730811031554m61bc6762g7f503ad56f0981bd@mail.gmail.com> 2008/11/4 Robert Watson : >Apple has started doing > some of this in their fork of our netsmb code, but there's a lot more to do > I think. Like replacing SMB with CIFS :) I think that right now the only operating systems that use plain old SMB instead of CIFS are Windows 98, NT 4 and FreeBSD :) From bp at freebsd.org Mon Nov 3 19:32:28 2008 From: bp at freebsd.org (Boris Popov) Date: Mon Nov 3 19:32:34 2008 Subject: reproducible panic with mount_smbfs In-Reply-To: <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> References: <20081102123100.GA1434@darklight.homeunix.org> <3bbf2fe10811020737g211dfb3fs54b48e4071db2393@mail.gmail.com> <3bbf2fe10811020817g1409a38ep26c1ee8edf075201@mail.gmail.com> Message-ID: <35ab6dd50811031907t2ecd2ddeq82187464a1bbf3c0@mail.gmail.com> On Sun, Nov 2, 2008 at 10:17 PM, Attilio Rao wrote: > Yuri, > could you please test this fix: > http://www.freebsd.org/~attilio/netsmb.diff This patch looks wrong to me. AFAIR, the test (LK_EXCLUSIVE && (flags & LK_CANRECURSE)) were intended to prevent situations when SMB connection can not handle multiple requests (eg, during connection setup). But I do agree, that error processing of the lock status is bogus. -- Boris Popov From avg at icyb.net.ua Wed Nov 5 08:17:40 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 5 08:17:53 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? Message-ID: <4911C3E9.405@icyb.net.ua> Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I partitioned my disk for ZFS root file system more or less as described here: https://ish.com.au/solutions/articles/freebsdzfs Big difference is that I created a separate slice to contain a partition for ZFS pool, so that ZFS pool is ad4s2d (and UFS2 boot is ad4s1a). Everything was fine, ZFS root was mounted as expected. Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it couldn't mount ZFS root and I simply rebooted my machine when I stuck at mountroot prompt (I couldn't enter UFS2 root because of unrelated keyboard problem). The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART messages (errors, warnings). I'll try to debug this further by booting into UFS root and running gpart, but I'd like to ask for an advice upfront. Can something like this be rationally expected? Is there a way to make ZFS see its pool again (when booted into UFS root)? Thank you! -- Andriy Gapon From marius at nuenneri.ch Wed Nov 5 09:21:04 2008 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Wed Nov 5 09:21:10 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <4911C3E9.405@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> Message-ID: On Wed, Nov 5, 2008 at 5:03 PM, Andriy Gapon wrote: > > Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I > partitioned my disk for ZFS root file system more or less as described here: > https://ish.com.au/solutions/articles/freebsdzfs > > Big difference is that I created a separate slice to contain a partition > for ZFS pool, so that ZFS pool is ad4s2d (and UFS2 boot is ad4s1a). > > Everything was fine, ZFS root was mounted as expected. > > Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and > options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it > couldn't mount ZFS root and I simply rebooted my machine when I stuck at > mountroot prompt (I couldn't enter UFS2 root because of unrelated > keyboard problem). > The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART > messages (errors, warnings). > > I'll try to debug this further by booting into UFS root and running > gpart, but I'd like to ask for an advice upfront. > > Can something like this be rationally expected? > Is there a way to make ZFS see its pool again (when booted into UFS root)? Afaict geom_part is a little bit more concerned about the correctness of the partition tables. I think it's possible that you have a bug in your tables which doesn't matter for geom_mbr or geom_bsd but for geom_part. From avg at icyb.net.ua Wed Nov 5 15:58:31 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 5 15:58:37 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: References: <4911C3E9.405@icyb.net.ua> Message-ID: <49121772.1040501@icyb.net.ua> on 05/11/2008 18:55 Marius N?nnerich said the following: > On Wed, Nov 5, 2008 at 5:03 PM, Andriy Gapon wrote: >> Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I >> partitioned my disk for ZFS root file system more or less as described here: >> https://ish.com.au/solutions/articles/freebsdzfs >> >> Big difference is that I created a separate slice to contain a partition >> for ZFS pool, so that ZFS pool is ad4s2d (and UFS2 boot is ad4s1a). >> >> Everything was fine, ZFS root was mounted as expected. >> >> Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and >> options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it >> couldn't mount ZFS root and I simply rebooted my machine when I stuck at >> mountroot prompt (I couldn't enter UFS2 root because of unrelated >> keyboard problem). >> The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART >> messages (errors, warnings). >> >> I'll try to debug this further by booting into UFS root and running >> gpart, but I'd like to ask for an advice upfront. >> >> Can something like this be rationally expected? >> Is there a way to make ZFS see its pool again (when booted into UFS root)? > > Afaict geom_part is a little bit more concerned about the correctness > of the partition tables. I think it's possible that you have a bug in > your tables which doesn't matter for geom_mbr or geom_bsd but for > geom_part. I believe that disklabels are correct even for gpart, but this is not proven. Anyway I'll do more tests tomorrow. P.S. I kind of hoped for an advice on ZFS side of things. -- Andriy Gapon From linimon at FreeBSD.org Thu Nov 6 03:25:20 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Thu Nov 6 03:25:27 2008 Subject: kern/128633: [zfs] [lor] lock order reversal in zfs Message-ID: <200811061125.mA6BPKXF072266@freefall.freebsd.org> Old Synopsis: lock order reversal in zfs New Synopsis: [zfs] [lor] lock order reversal in zfs Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Nov 6 11:24:51 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=128633 From gavin at FreeBSD.org Thu Nov 6 03:44:21 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Thu Nov 6 03:44:32 2008 Subject: kern/119868: [geom_gpt] [patch] 7.0 kernel panic with corrupt GPT label Message-ID: <200811061144.mA6BiLow087904@freefall.freebsd.org> Old Synopsis: [zfs] [patch] 7.0 kernel panic during boot with ZFS and WD1600JS New Synopsis: [geom_gpt] [patch] 7.0 kernel panic with corrupt GPT label Responsible-Changed-From-To: freebsd-fs->freebsd-geom Responsible-Changed-By: gavin Responsible-Changed-When: Thu Nov 6 11:39:24 UTC 2008 Responsible-Changed-Why: Jaakko Heinonen points out that this is actually a bug with geom_gpt and not ZFS. The PR contains a patch, confirmed to fix the issue. http://www.freebsd.org/cgi/query-pr.cgi?pr=119868 From unxfbsdi at gmail.com Fri Nov 7 16:16:58 2008 From: unxfbsdi at gmail.com (Manish Jain) Date: Fri Nov 7 16:17:05 2008 Subject: A problem with fork() and subsequent flock() Message-ID: <4914D501.4090400@gmail.com> Hi, I am starting out as a C/C++ programmer on FreeBSD and I am stuck with a small but irritating problem. I was trying out the flock() call and wrote flocksample.cpp, which starts out with a fork() and subsequent calls to flock() from both processes (parent as well as child; the child does an initial sleep(1) before anything else). It compiles okay and the parent's flock() call succeeds. But the child's flock() call too succeeds on the same file descriptor even before the first flock() unlocks. Can anyone please point out where the problem is ? I am not even sure whether the problem is FreeBSD specific. Attached is flocksample.cpp Thanks for any help Manish Jain unxfbsdi@gmail.com -------------- next part -------------- #include #include #include #include #include #include #include using namespace std; const char * szFileName = "/tmp/flock.log"; int addfunc(int fd); int readfunc(int fd); int main() { int fd = open(szFileName, O_RDWR | O_CREAT); assert(fd > 2); if (fork()) { return addfunc(fd); } else { cout << "Child sleeping for 1s before attempting to read log" << endl; sleep(1); //ensure that child's readfunc starts after parent's addfunc return readfunc(fd); } } int addfunc(int fd) { char buffer[4]; int i = 0; int result = flock(fd, LOCK_EX); assert(result == 0); while(i < 10) { sprintf(buffer, "%d\n\0", ++i); write(fd, buffer, strlen(buffer)); } while(*buffer != 'U') { cout << "Blocking on addfunc. Type U to unlock log" << endl; cin >> buffer; } cout << "Unblocking on addfunc ..." << endl; close(fd); //automatically calls LOCK_UN return 0; } int readfunc(int fd) { struct stat fstat; int result = flock(fd, LOCK_SH); assert(result == 0); cout << "readfunc got hold of log ..." << endl; lstat(szFileName, &fstat); char * buffer = new char[fstat.st_size + 1]; lseek(fd, 0, SEEK_SET); while (result < fstat.st_size) { result += read(fd, buffer + result, (fstat.st_size - result)); } close(fd); //automatically calls LOCK_UN buffer[result] = 0; cout << "Following are the contents of the file flock.log :" << endl; cout << buffer << endl; delete[] buffer; return 0; } From kostikbel at gmail.com Sat Nov 8 06:38:37 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Nov 8 06:38:44 2008 Subject: A problem with fork() and subsequent flock() In-Reply-To: <4914D501.4090400@gmail.com> References: <4914D501.4090400@gmail.com> Message-ID: <20081108135857.GH18100@deviant.kiev.zoral.com.ua> On Sat, Nov 08, 2008 at 05:23:37AM +0530, Manish Jain wrote: > > Hi, > > I am starting out as a C/C++ programmer on FreeBSD and I am stuck with a > small but irritating problem. I was trying out the flock() call and > wrote flocksample.cpp, which starts out with a fork() and subsequent > calls to flock() from both processes (parent as well as child; the child > does an initial sleep(1) before anything else). It compiles okay and the > parent's flock() call succeeds. But the child's flock() call too > succeeds on the same file descriptor even before the first flock() > unlocks. Can anyone please point out where the problem is ? I am not > even sure whether the problem is FreeBSD specific. This is right behaviour for flock. Note that manpage specifies that lock is attached to file. When you open a filedescriptor, you get the structure like that fd -> file -> vnode ,-> means references. Fork makes a copy of all fds opened in the process, and structure becomes fd [1] -> file -> vnode fd [2] / where fd[1] lives in parent, and fd[2] in child. Lock is still attached to file, so both parent and child share the lock. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081108/1b9197dc/attachment.pgp From znek at mulle-kybernetik.com Sat Nov 8 09:17:08 2008 From: znek at mulle-kybernetik.com (=?ISO-8859-1?Q?Marcus_M=FCller?=) Date: Sat Nov 8 09:19:17 2008 Subject: ZFS patches. In-Reply-To: <20081031201814.GA54286@server.vk2pj.dyndns.org> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> <20080829074738.GB3026@garage.freebsd.pl> <20081031201814.GA54286@server.vk2pj.dyndns.org> Message-ID: <73175263-4A61-4C87-9BF3-69ECC2CC0D17@mulle-kybernetik.com> On 01.11.2008, at 02:39, Peter Jeremy wrote: > Can you please give us an indication as to when we might expect to see > either an updated set of ZFS patches (or, better, the patches > committed > to -current). Me too. ;-) I just want to second that request. While I'm also still desperately waiting for an update, basically all I'd like to see at this point is a rough schedule when I/we could expect all the recent changes in Perforce to be in CVS HEAD for testing. Cheers, Marcus -- Marcus Mueller . . . crack-admin/coder ;-) Mulle kybernetiK . http://www.mulle-kybernetik.com Current projects: http://www.mulle-kybernetik.com/znek/ From ler at lerctr.org Sat Nov 8 13:10:06 2008 From: ler at lerctr.org (Larry Rosenman) Date: Sat Nov 8 13:10:13 2008 Subject: ZFS patches. In-Reply-To: <73175263-4A61-4C87-9BF3-69ECC2CC0D17@mulle-kybernetik.com> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> <20080829074738.GB3026@garage.freebsd.pl> <20081031201814.GA54286@server.vk2pj.dyndns.org> <73175263-4A61-4C87-9BF3-69ECC2CC0D17@mulle-kybernetik.com> Message-ID: <005401c941e6$5e660e80$1b322b80$@org> >On 01.11.2008, at 02:39, Peter Jeremy wrote: > >> Can you please give us an indication as to when we might expect to see >> either an updated set of ZFS patches (or, better, the patches >> committed >> to -current). > >Me too. ;-) I just want to second that request. While I'm also still >desperately waiting for an update, basically all I'd like to see at >this point is a rough schedule when I/we could expect all the recent >changes in Perforce to be in CVS HEAD for testing. I'd like to 3rd this request. I'm running a -CURRENT system from August with the patches and upgraded ZPOOL/ZFS's. I'd like to get more recent on the system sources, but I know there are issues with the ZFS patch, and would love to see the schedule or a new patch against recent HEAD. Thanks! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From dimitar.vassilev at gmail.com Sat Nov 8 20:37:19 2008 From: dimitar.vassilev at gmail.com (Dimitar Vasilev) Date: Sat Nov 8 20:37:26 2008 Subject: ZFS patches. In-Reply-To: <005401c941e6$5e660e80$1b322b80$@org> References: <20080727125413.GG1345@garage.freebsd.pl> <86tzd490qx.fsf@gmail.com> <20080829074738.GB3026@garage.freebsd.pl> <20081031201814.GA54286@server.vk2pj.dyndns.org> <73175263-4A61-4C87-9BF3-69ECC2CC0D17@mulle-kybernetik.com> <005401c941e6$5e660e80$1b322b80$@org> Message-ID: <59adc1a0811082037u29594053wd9436bd78c963eb3@mail.gmail.com> 2008/11/8 Larry Rosenman > > >On 01.11.2008, at 02:39, Peter Jeremy wrote: > > > >> Can you please give us an indication as to when we might expect to see > >> either an updated set of ZFS patches (or, better, the patches > >> committed > >> to -current). > > > >Me too. ;-) I just want to second that request. While I'm also still > >desperately waiting for an update, basically all I'd like to see at > >this point is a rough schedule when I/we could expect all the recent > >changes in Perforce to be in CVS HEAD for testing. > > I'd like to 3rd this request. I'm running a -CURRENT system from August > with the patches and upgraded > ZPOOL/ZFS's. I'd like to get more recent on the system sources, but I > know > there are issues with the ZFS > patch, and would love to see the schedule or a new patch against recent > HEAD. > > Thanks! > > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 512-248-2683 E-Mail: ler@lerctr.org > US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Hello, as I'm simple, I'd stick with commiting ZFS patches for start. Once in the tree, it's more "easy" from there. It's up to pjd and core to decide when, but it doesn't hurt ranting. Happy Sunday to all. From dan-freebsd-fs at ourbrains.org Sun Nov 9 10:09:44 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Sun Nov 9 10:09:50 2008 Subject: Will XFS be adopted Message-ID: <20081109174303.GA5146@ourbrains.org> With XFS being adopted by Linux now for a number of years, I wonder why it hasn't been by FreeBSD. It's a great FS that can be resized on the fly which makes it a perfect journaling FS with volume managers. Anybody know? From frank at exit.com Sun Nov 9 10:37:40 2008 From: frank at exit.com (Frank Mayhar) Date: Sun Nov 9 10:37:46 2008 Subject: Will XFS be adopted In-Reply-To: <20081109174303.GA5146@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> Message-ID: <1226254613.76915.1.camel@jill.exit.com> On Sun, 2008-11-09 at 12:43 -0500, Dan wrote: > With XFS being adopted by Linux now for a number of years, I wonder why > it hasn't been by FreeBSD. It's a great FS that can be resized on the > fly which makes it a perfect journaling FS with volume managers. Anybody > know? Considering that XFS is under the GPL (totally incompatible with the BSD license) and that ZFS _is_ being adopted by FreeBSD, at least experimentally, there's no good reason to adopt XFS and at least one good reason not to do so. -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ http://www.exit.com/blog/frank/ http://www.zazzle.com/fmayhar* From peterjeremy at optushome.com.au Sun Nov 9 10:43:56 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Sun Nov 9 10:44:03 2008 Subject: Will XFS be adopted In-Reply-To: <20081109174303.GA5146@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> Message-ID: <20081109184349.GG51239@server.vk2pj.dyndns.org> On 2008-Nov-09 12:43:03 -0500, Dan wrote: >With XFS being adopted by Linux now for a number of years, I wonder why >it hasn't been by FreeBSD. I guess no-one has been sufficiently motivated. > It's a great FS that can be resized on the >fly which makes it a perfect journaling FS with volume managers. FreeBSD has ZFS - which is a re-sizable FS with an integrated volume manager. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081109/1b156b1c/attachment.pgp From freebsd at rgbaz.eu Sun Nov 9 10:46:20 2008 From: freebsd at rgbaz.eu (FBSD UG) Date: Sun Nov 9 10:46:25 2008 Subject: Will XFS be adopted In-Reply-To: <20081109174303.GA5146@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> Message-ID: <27EF9A8F-A7DA-4711-8462-D28B0A1813BC@rgbaz.eu> On 9 nov 2008, at 18:43, Dan wrote: > With XFS being adopted by Linux now for a number of years, I wonder > why > it hasn't been by FreeBSD. It's a great FS that can be resized on the > fly which makes it a perfect journaling FS with volume managers. > Anybody > know? > _______________________________________________ because of the license? gr Arno From michael at fuckner.net Sun Nov 9 10:52:28 2008 From: michael at fuckner.net (Michael Fuckner) Date: Sun Nov 9 10:52:34 2008 Subject: Will XFS be adopted In-Reply-To: <20081109174303.GA5146@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> Message-ID: <49172D82.1010103@fuckner.net> Dan wrote: Hi! > With XFS being adopted by Linux now for a number of years, I wonder why > it hasn't been by FreeBSD. this is not correct. Please read man 5 xfs. The xfs file system support first appeared in FreeBSD 7.0. > It's a great FS that can be resized on the > fly which makes it a perfect journaling FS with volume managers. Anybody > know? Regards, Michael Fuckner! From ivoras at freebsd.org Sun Nov 9 11:15:12 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Nov 9 11:15:18 2008 Subject: Will XFS be adopted In-Reply-To: <49172D82.1010103@fuckner.net> References: <20081109174303.GA5146@ourbrains.org> <49172D82.1010103@fuckner.net> Message-ID: Michael Fuckner wrote: > Dan wrote: > > Hi! >> With XFS being adopted by Linux now for a number of years, I wonder why >> it hasn't been by FreeBSD. > this is not correct. > > Please read man 5 xfs. > The xfs file system support first appeared in FreeBSD 7.0. Well yes, but you can't really call read-only support for a file system its proper adoption. AFAIK there have even been talks of removing reiserfs and xfs as the read-only support is almost useless and there have been problems maintaining them actively (nobody's interested enough). FWIW, I'd like read-write XFS, too, regardless of ZFS. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081109/55a20294/signature.pgp From daichi at ongs.co.jp Sun Nov 9 23:25:44 2008 From: daichi at ongs.co.jp (Daichi GOTO) Date: Sun Nov 9 23:25:50 2008 Subject: [Call for Test] a patch for kern/121385 - Unionfs cross mount issue Message-ID: <4917E0C9.5020105@ongs.co.jp> Hi Unionfs users About kern/121385 - Unionfs cross mount issue, by discussion at EuroBSDCon2008, unionfs does not allow user to do cross mount operation. If you have some interest this issue, please get this patch and try with current. I'll commit this patch after 1 week later. PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/121385 Patch: http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-cross-mount.diff This issue was discussed at EuroBSDCon2008 FreeBSD developer summit. Thanks for hrs and gnn :) -- Daichi GOTO, http://people.freebsd.org/~daichi From daichi at ongs.co.jp Sun Nov 9 23:30:44 2008 From: daichi at ongs.co.jp (Daichi GOTO) Date: Sun Nov 9 23:30:50 2008 Subject: [Call for Test] a patch for kern/118346 - Unionfs socket issue Message-ID: <4917DECC.4010302@ongs.co.jp> Hi Unionfs users At final, I have a long awaited patch to fix Unionfs socket issue (kern/118346). If you have interested in Unionfs socket issue, please get this patch and try it with current. I'll commit this patch to current after 2-week later. PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/118346 Patch: http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-vsock.diff This issue was discussed at EuroBSDCon2008 FreeBSD developer summit. Thanks for rwatson, hrs and gnn :) -- Daichi GOTO, http://people.freebsd.org/~daichi From bugmaster at FreeBSD.org Mon Nov 10 03:06:50 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Nov 10 03:07:54 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200811101106.mAAB6nIx049703@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 23 problems total. From koitsu at FreeBSD.org Mon Nov 10 07:17:34 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Mon Nov 10 07:17:44 2008 Subject: boot0: Unable to boot DOS slices/partitions Message-ID: <20081110151732.GA72926@icarus.home.lan> I've been working for the past 2-3 days on a Wiki entry describing how to get two versions or architecture types of FreeBSD, and MS-DOS, all on a USB stick. The intended goal is to allow an administrator to install FreeBSD i386 or FreeBSD amd64 from a USB stick, while also providing the ability to boot DOS for BIOS upgrades and so forth. Please note the below is a complete mess, due to the fact that I have spent the past 9 hours editing things off and on. http://wiki.freebsd.org/JeremyChadwick/Installing_from_USB_flash_drive So far, I've had great success with the FreeBSD part of it -- no issues or oddities. The problem I'm running into is the DOS part. I've been completely unable to get boot0 to boot MS-DOS 6.22 (on a FAT16 slice), MS-DOS 7.10 (on a FAT16 or FAT32 slice), or FreeDOS 1.0 (on a FAT16 or FAT32 slice). With MS-DOS, COMMAND.COM, IO.SYS, and MSDOS.SYS are all copied on to the slice. With FreeDOS, COMMAND.COM and KERNEL.SYS are all copied on to the slice. boot0 shows "F1 DOS" as a choice, but depending upon where I formatted the slice (on Windows XP or via newfs_msdos), I get one of the following two error messages from the bootstraps installed on /dev/da0s1: Non-system disk Disk error I've even gone so far to try using FAT32LBA.BIN from the FreeDOS project as the bootstrap, e.g. newfs_msdos -B FAT32LBA.BIN /dev/da0s1 which just causes a hard-lock. Enabling packet mode in boot0cfg makes no difference. What *does* work, however, is using FAT32LBA.BIN and SYSLINUX together as the actual boot loader itself (e.g. at sector 0). I can successfully boot FreeDOS using this method. I'd advocate using SYSLINUX entirely, but it doesn't appear possible to get SYSLINUX to boot a slice/partition, only refer to actual files on the FAT16/FAT32 partition. I'd try GRUB, except that all of my BSD boxes at home are amd64 (sans the one I'm trying to boot the USB stick on, but that doesn't have FreeBSD installed on it), and sysutils/grub only builds on i386. Slice layout: DISK Geometry: 977 cyls/255 heads/63 sectors = 15695505 sectors (7663MB) Offset Size(KB) End Name PType Desc Subtype Flags 0 31 62 - 12 unused 0 63 5759271 11518604 da0s1 4 fat (32-bit,LBA) 12 A 11518605 1044225 13607054 da0s2 8 freebsd 165 13607055 1044225 15695504 da0s3 8 freebsd 165 15695505 183 15695870 - 12 unused 0 I don't have boot0cfg output immediately on hand, but can get it if need be. Ideas? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From bra at fsn.hu Mon Nov 10 07:53:32 2008 From: bra at fsn.hu (Attila Nagy) Date: Mon Nov 10 07:53:39 2008 Subject: Different inodes Message-ID: <49185518.2020600@fsn.hu> Hello, I don't quite get this: ls -i /20081021/usr/local/lib/python2.5/config/ 3817938 .svn 3817976 Setup.local 3817978 libpython2.5.a 3817979 Makefile 3817975 config.c 3817980 libpython2.5.so 3817973 Setup 3817977 config.c.in 3817982 makesetup 3817974 Setup.config 3817981 install-sh 3817983 python.o ls -i /20081021/usr/local/lib/python2.5/config/libpython2.5.* 3817978 /20081021/usr/local/lib/python2.5/config/libpython2.5.a 73738 /20081021/usr/local/lib/python2.5/config/libpython2.5.so (libpython2.5.so's inode differs in the two output) The filesystem is an asynchronously mounted UFS2. I have a slightly modified kernel, because I need to do redundant NFS serving, so consistent inodes are a must. The modification consists of changed arc4randoms in sys/ufs/ffs/ffs_alloc.c and ffs_vfsops.c to a constant value, but I don't think it can be the cause. Could it be? Thanks, From peterjeremy at optushome.com.au Mon Nov 10 11:01:59 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Mon Nov 10 11:02:06 2008 Subject: Different inodes In-Reply-To: <49185518.2020600@fsn.hu> References: <49185518.2020600@fsn.hu> Message-ID: <20081110190151.GI51239@server.vk2pj.dyndns.org> On 2008-Nov-10 16:36:56 +0100, Attila Nagy wrote: >I don't quite get this: > >ls -i /20081021/usr/local/lib/python2.5/config/ >3817938 .svn 3817976 Setup.local 3817978 libpython2.5.a >3817979 Makefile 3817975 config.c 3817980 libpython2.5.so >3817973 Setup 3817977 config.c.in 3817982 makesetup >3817974 Setup.config 3817981 install-sh 3817983 python.o > >ls -i /20081021/usr/local/lib/python2.5/config/libpython2.5.* >3817978 /20081021/usr/local/lib/python2.5/config/libpython2.5.a > 73738 /20081021/usr/local/lib/python2.5/config/libpython2.5.so I can't reproduce it here on 7-stable. Note that libpython2.5.so is a symlink so it seems likely that one of your ls commands is de-referencing the symlink and the other isn't - though I don't know how this is being done. Can you confirm that you are using /bin/ls (not an alias or some alternate variant). You might also verify that you get the same result using /bin/sh as a shell. >The modification consists of changed arc4randoms in >sys/ufs/ffs/ffs_alloc.c and ffs_vfsops.c to a constant value, but I >don't think it can be the cause. Could it be? This shouldn't affect the above. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081110/e7a15cd9/attachment.pgp From bra at fsn.hu Tue Nov 11 01:04:42 2008 From: bra at fsn.hu (Attila Nagy) Date: Tue Nov 11 01:04:49 2008 Subject: Different inodes In-Reply-To: <20081110190151.GI51239@server.vk2pj.dyndns.org> References: <49185518.2020600@fsn.hu> <20081110190151.GI51239@server.vk2pj.dyndns.org> Message-ID: <49194AA3.2090208@fsn.hu> Peter Jeremy wrote: > On 2008-Nov-10 16:36:56 +0100, Attila Nagy wrote: > >> I don't quite get this: >> >> ls -i /20081021/usr/local/lib/python2.5/config/ >> 3817938 .svn 3817976 Setup.local 3817978 libpython2.5.a >> 3817979 Makefile 3817975 config.c 3817980 libpython2.5.so >> 3817973 Setup 3817977 config.c.in 3817982 makesetup >> 3817974 Setup.config 3817981 install-sh 3817983 python.o >> >> ls -i /20081021/usr/local/lib/python2.5/config/libpython2.5.* >> 3817978 /20081021/usr/local/lib/python2.5/config/libpython2.5.a >> 73738 /20081021/usr/local/lib/python2.5/config/libpython2.5.so >> > > I can't reproduce it here on 7-stable. Note that libpython2.5.so > is a symlink so it seems likely that one of your ls commands is > de-referencing the symlink and the other isn't - though I don't > know how this is being done. > > Can you confirm that you are using /bin/ls (not an alias or some > alternate variant). You might also verify that you get the same > result using /bin/sh as a shell. > You are right, it's a symlink. What I run is a 7-STABLE on amd64 built at Wed May 28 17:02:56 CEST 2008. With /bin/sh as my shell: # /bin/ls -i /20081021/usr/local/lib/python2.5/config/libpython2.5.so 73738 /20081021/usr/local/lib/python2.5/config/libpython2.5.so # /bin/ls -li /20081021/usr/local/lib/python2.5/config/libpython2.5.so 3817980 lrwxr-xr-x 1 root wheel 30 Oct 29 09:40 /20081021/usr/local/lib/python2.5/config/libpython2.5.so -> /usr/local/lib/libpython2.5.so # /bin/ls -li /usr/local/lib/libpython2.5.so 73739 lrwxr-xr-x 1 root wheel 17 Oct 18 09:23 /usr/local/lib/libpython2.5.so -> libpython2.5.so.1 # /bin/ls -li /usr/local/lib/libpython2.5.so.1 73738 -r-xr-xr-x 1 root wheel 1364800 Oct 18 09:23 /usr/local/lib/libpython2.5.so.1 # /bin/ls -i /20081021/usr/local/lib/python2.5/config 3817938 .svn 3817973 Setup 3817976 Setup.local 3817977 config.c.in 3817978 libpython2.5.a 3817982 makesetup 3817979 Makefile 3817974 Setup.config 3817975 config.c 3817981 install-sh 3817980 libpython2.5.so 3817983 python.o > >> The modification consists of changed arc4randoms in >> sys/ufs/ffs/ffs_alloc.c and ffs_vfsops.c to a constant value, but I >> don't think it can be the cause. Could it be? >> > > This shouldn't affect the above. > Well, the full story is the following: - I have two NFS servers per site, which act as redundant file servers (the above is on one of those machines) - I run a modified kernel on these (see above) to be able to achieve inode consistency between the machines (so in case of failover, the clients won't crash with stale filehandles) - the shares are read only (to the NFS clients) - I update the content from subversion by running an "svn up" on both pairs simultaneously and with the above patch, the files' inodes remained consistent - after the svn up I did an ls -Ri on the working copy on both machines, and made a diff from them. If the diff was empty, the inodes were right, if it wasn't, then there was a problem. Until now, everything was right. - what has changed now is two fold: 1. the content has grown so large, that an svn up from the working copy's root ran too long, so I have switched to the method of comparing the revision of the working copy, and the svn repo's head, made a diff summary and only updated (deep in the tree) what's needed 2. for the same cause, I have switched from ls -Ri to printing each modified file's inode number It seems that symlinks are inconsistent then listing them with "ls -i" directly, but with "ls -Ri" shows no errors. Also, I didn't faced any problems so far when switching NFS servers. So what I see now is that everything is OK (for me, the inodes are in sync), the problem is that I must check for the symlink's inode number and must not let ls to resolve that back to the outer filesystem, which is of course differs on the machines. Problem solved for me, the question is whether the above inner working of ls is a misbehaviour, or is it just normal. From avg at icyb.net.ua Tue Nov 11 05:35:26 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Nov 11 05:35:33 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <4911C3E9.405@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> Message-ID: <49198A1A.3080600@icyb.net.ua> on 05/11/2008 18:03 Andriy Gapon said the following: > Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I > partitioned my disk for ZFS root file system more or less as described here: > https://ish.com.au/solutions/articles/freebsdzfs > > Big difference is that I created a separate slice to contain a partition > for ZFS pool, so that ZFS pool is ad4s2d (and UFS2 boot is ad4s1a). > > Everything was fine, ZFS root was mounted as expected. > > Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and > options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it > couldn't mount ZFS root and I simply rebooted my machine when I stuck at > mountroot prompt (I couldn't enter UFS2 root because of unrelated > keyboard problem). > The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART > messages (errors, warnings). > > I'll try to debug this further by booting into UFS root and running > gpart, but I'd like to ask for an advice upfront. So I did this. Here are some data: $ gpart show => 63 976773105 ad6 MBR (500.1GB) 63 12578832 1 freebsd [active] (6.4GB) 12578895 964189170 2 freebsd (493.7GB) 976768065 5103 - free - (2.6MB) => 0 12578832 ad6s1 BSD (6.4GB) 0 16 - free - (8.2KB) 16 2097152 1 freebsd-ufs (1073.7MB) 2097168 2097152 - free - (1073.7MB) 4194320 8384512 2 freebsd-swap (4.3GB) => 0 964189170 ad6s2 BSD (493.7GB) 0 16 - free - (8.2KB) 16 964189154 4 freebsd-swap (493.7GB) $ zpool status pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas ad6s2d UNAVAIL 0 0 0 cannot open So gpart sees ad6s2d perfectly well, it has the same parameters as disklabel previously reported and /dev/ad6s2d exists. But zfs "cannot open" it. What I did next was: 1. reboot into "disklabel" kernel single-user 2. zpool export tank 3. reboot into gpart kernel single-user 4. zpool import - it saw tank correctly 5. zpool import tank 6. profit! :-) As I see it, zpool.cache contained something about ad6s2d that prevented gpart ad6s2d from being recognized as the same device as "disklabel" one. I really wonder what that could have been? Or maybe gpart reported some subtle property of the device differently... -- Andriy Gapon From mckusick at mckusick.com Tue Nov 11 13:13:56 2008 From: mckusick at mckusick.com (Kirk McKusick) Date: Tue Nov 11 13:14:03 2008 Subject: Different inodes Message-ID: <200811112059.mABKxL3Z068989@chez.mckusick.com> To: Peter Jeremy Cc: freebsd-fs@freebsd.org Subject: Re: Different inodes > On 2008-Nov-10 16:36:56 +0100, Attila Nagy wrote: > It seems that symlinks are inconsistent then listing them with "ls -i" > directly, but with "ls -Ri" shows no errors. Also, I didn't faced any > problems so far when switching NFS servers. When you do an `ls -i' of a directory, the `ls' command does a stat() of the name, discovers that it is a directory, does a readdir of the directory, and prints out the inode numbers listed for each entry in the directory. For entries that are symbolic links, the inode number in the directory is the inode number of the symbolic link. When you do an `ls -i' of a name which is a symbolic link, `ls' does a stat() of the name. When you do a stat() of a symbolic link, you follow the link, so the stat infor that comes back is for the thing to which the symbolic link points. Thus when it prints the inode number, you get the inode number of the file to which the symbolic link points. To get the inode number of the symbolic link, `ls' would have to do an lstat() instead of a stat(). That would be a rather drastic change of historic behavior. Kirk McKusick From linimon at FreeBSD.org Wed Nov 12 17:11:00 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Nov 12 17:11:12 2008 Subject: kern/128829: [smbfs] smbd(8) causes periodic panic on 7-RELEASE Message-ID: <200811130111.mAD1B0qS050570@freefall.freebsd.org> Old Synopsis: smbd causes periodic panic on 7-RELEASE New Synopsis: [smbfs] smbd(8) causes periodic panic on 7-RELEASE Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Nov 13 01:09:39 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=128829 From toasty at dragondata.com Wed Nov 12 18:39:00 2008 From: toasty at dragondata.com (Kevin Day) Date: Wed Nov 12 18:39:07 2008 Subject: UFS Snapshot lock time Message-ID: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> Is there any documentation out there that explains how to optimize UFS snapshotting? Specifically, we've got a rather big filesystem that I'd like to do hourly snapshots of. I don't mind how long the snapshot itself takes, but the amount of time the filesystem is locked is a problem. We're "dead" for about 12 minutes per snapshotting. Is there anything to tweak to speed it up any? (memory v.s. time exchange somewhere?) Is the length of time it takes a function of the number of inodes, directories, or...? Relevant info: Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/da0s1a 739339824 73453348 606739292 11% 1717543 93856471 2% / which is a 6 drive RAID-0 array. CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2593.52-MHz K8-class CPU) usable memory = 17166548992 (16371 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs In theory, this should be a rather fast box, but it is a rather large filesystem. -- Kevin From koitsu at FreeBSD.org Wed Nov 12 20:34:16 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Nov 12 20:34:22 2008 Subject: UFS Snapshot lock time In-Reply-To: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> References: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> Message-ID: <20081113043414.GA10272@icarus.home.lan> On Wed, Nov 12, 2008 at 08:22:29PM -0600, Kevin Day wrote: > Is there any documentation out there that explains how to optimize UFS > snapshotting? > > Specifically, we've got a rather big filesystem that I'd like to do > hourly snapshots of. I don't mind how long the snapshot itself takes, > but the amount of time the filesystem is locked is a problem. We're > "dead" for about 12 minutes per snapshotting. This topic comes up about once every 2 weeks. There's a discussion going about it on -stable right now: http://lists.freebsd.org/pipermail/freebsd-stable/2008-November/046524.html It's also been documented on my issues Wiki: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues At this time, there is no fix. Workarounds: 1) Use rsnapshot (which is rsync-based) to accomplish the same; this works on a UFS/UFS2 filesystem. However, note that file atimes on your source will get destroyed (which will affect the "new mail" capability of classic UNIX mboxes; there is no solution for that) 1) Switch to ZFS, which has a reliable snapshotting. > In theory, this should be a rather fast box, but it is a rather large > filesystem. The speed of the box has nothing to do with the problem; your hardware is not to blame. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From toasty at dragondata.com Wed Nov 12 20:54:38 2008 From: toasty at dragondata.com (Kevin Day) Date: Wed Nov 12 20:54:45 2008 Subject: UFS Snapshot lock time In-Reply-To: <20081113043414.GA10272@icarus.home.lan> References: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> <20081113043414.GA10272@icarus.home.lan> Message-ID: <145D28E0-C04F-456C-990D-0D0672A0EB26@dragondata.com> On Nov 12, 2008, at 10:34 PM, Jeremy Chadwick wrote: > On Wed, Nov 12, 2008 at 08:22:29PM -0600, Kevin Day wrote: >> Is there any documentation out there that explains how to optimize >> UFS >> snapshotting? >> > This topic comes up about once every 2 weeks. There's a discussion > going about it on -stable right now: > > http://lists.freebsd.org/pipermail/freebsd-stable/2008-November/046524.html > My apologies - I looked through recent posts on lists 2 days ago, but didn't search again before posting here. I thought the much earlier complaints were about total deadlocks that didn't come back... re- reading I see otherwise. I'll move this over there. > It's also been documented on my issues Wiki: > > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues > Ack - I thought I had that page memorized. :) -- Kevin From danny at dannysplace.net Wed Nov 12 22:20:02 2008 From: danny at dannysplace.net (Danny Carroll) Date: Wed Nov 12 22:20:09 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <490A8FAD.8060009@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> Message-ID: <491BBF38.9010908@dannysplace.net> Danny Carroll wrote: > Jeremy Chadwick wrote: >> I'd like to see the performance difference between these scenarios: >> >> - Memory cache enabled on Areca, write caching enabled on disks >> - Memory cache enabled on Areca, write caching disabled on disks >> - Memory cache disabled on Areca, write caching enabled on disks >> - Memory cache disabled on Areca, write caching disabled on disks >> The initial results for a ICH9 vs Areca in JBod mode can be found here: http://www.dannysplace.net/ZFS-JBODTests.html Summary: 5 Disk ZFS RaidZ array with atime turned off. ICH9 - block reads avg 400MByte/Sec ICH9 - block writes avg 150MByte/Sec ArecaJBOD - block reads avg 300MByte/Sec ArecaJBOD - block writes avg 160MByte/Sec The Areca seems to be in all except char and block writes. Block reads are 75% as fast as the ICH9 and rewrites are about 85% as fast. There seems to be little difference between enabling and disabling the disk cache on the Areca. This leads me to two conclusions: 1. Disabling the write cache does nothing on Seagate drives. 2. IO to the drives is so slow that a write cache is irrelevant. These are just some quick tests that I started with, mainly to compare the areca bus versus the ich9 bus. If someone has any tuning suggestions, then now is the time to make them before I migrate the ICH9 drives to the Areca bus. -D p.s. My OS details are: FreeBSD 7.1-PRERELEASE #3: Tue Nov 4 13:58:49 EST 2008 localhost# cat /etc/sysctl.conf kern.maxvnodes=400000 net.key.preferred_oldsa=0 net.key.blockacq_count=0 kern.ipc.maxsockbuf=400000 net.inet.ip.fastforwarding=1 net.inet.tcp.rfc1323=1 kern.ipc.maxsockbuf=16777216 net.local.stream.sendspace=82320 net.local.stream.recvspace=82320 net.inet.tcp.local_slowstart_flightsize=10 net.inet.tcp.nolocaltimewait=1 net.inet.tcp.delayed_ack=1 net.inet.tcp.delacktime=100 net.inet.tcp.mssdflt=1460 net.inet.tcp.sendspace=78840 net.inet.tcp.recvspace=78840 net.inet.tcp.slowstart_flightsize=54 net.inet.tcp.inflight.enable=1 net.inet.tcp.inflight.min=6144 net.inet.tcp.hostcache.expire=3900 localhost# cat /boot/loader.conf hw.em.rxd=4096 hw.em.txd=4096 vm.kmem_size="1536M" vm.kmem_size_max="1536M" smb_load="YES" smbus_load="YES" ichsmb_load="YES" From koitsu at FreeBSD.org Wed Nov 12 23:43:03 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Nov 12 23:43:09 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <200811130657.GAA26763@sopwith.solgatos.com> References: <491BBF38.9010908@dannysplace.net> <200811130657.GAA26763@sopwith.solgatos.com> Message-ID: <20081113074301.GA13938@icarus.home.lan> On Wed, Nov 12, 2008 at 10:57:58PM +0000, Dieter wrote: > >> For the array(s) > >> 9 x ST31000340AS 1tb disks > >> 1 x ST31000333AS 1tb disk (trying to swap this for a ST31000340AS) > > > There seems to be little difference between enabling and disabling the > > disk cache on the Areca. This leads me to two conclusions: > > 1. Disabling the write cache does nothing on Seagate drives. > > 2. IO to the drives is so slow that a write cache is irrelevant. > > I have a couple of the ST31000340AS 1TB disks as well as older lower capacity > Seagates, and turning the write cache on/off makes a MASSIVE (roughly 10:1) > difference in write speed. > > Jeremy reports "about 13%" with Seagate ST3120026AS: > http://lists.freebsd.org/pipermail/freebsd-hardware/2008-October/005450.html > > Perhaps there is something about the Areca or the testing? Is the write cache > really getting turned on/off? The Areca controller he has can do caching of its own (it has 256MBytes of cache). Meaning, if you disable write cache on the disks (but not the Areca controller itself), all of the caching being done is purely controller-based. The actual disk writes between the controller and the disk will, of course, be "slow" -- but between the OS and the controller, things should appear fast. Let me outline the 4 test scenarios (I thought I did this in my original mail to Danny, but I believe I also said "don't get caught up in excessive granularity because it'll just confuse people now" -- case in point): - Areca cache disabled, disk write cache enabled - Areca cache disabled, disk write cache disabled - Areca cache enabled, disk write cache enabled - Areca cache enabled, disk write cache disabled [**] As I understand it, Danny performed the tests with the [**] configuration. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From wjw at digiware.nl Thu Nov 13 00:53:27 2008 From: wjw at digiware.nl (Willem Jan Withagen) Date: Thu Nov 13 00:53:39 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491BBF38.9010908@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> Message-ID: <491BE632.1020801@IMAP> Danny Carroll wrote: > Danny Carroll wrote: >> Jeremy Chadwick wrote: >>> I'd like to see the performance difference between these scenarios: >>> >>> - Memory cache enabled on Areca, write caching enabled on disks >>> - Memory cache enabled on Areca, write caching disabled on disks >>> - Memory cache disabled on Areca, write caching enabled on disks >>> - Memory cache disabled on Areca, write caching disabled on disks >>> > > > The initial results for a ICH9 vs Areca in JBod mode can be found here: > http://www.dannysplace.net/ZFS-JBODTests.html Just as a polite question, since I'm very much in favor doing benchmarking and do appreciate these kinds of test. You might want to add an introductory page to your results describing how you setup the test: Details of the hardware Details of the disk setup possible version and options with bonnie The script you used.... This would allow others to redo your experiment and try to figure out why their numbers are different. --WjW From fbsd at dannysplace.net Thu Nov 13 03:09:33 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Thu Nov 13 03:09:45 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491BE632.1020801@IMAP> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491BE632.1020801@IMAP> Message-ID: <491C0B00.4030408@dannysplace.net> Good idea. Actually, what I will do eventually is *also* post the results to the mailing list. It will probably be around long after my own server is gone. -D Willem Jan Withagen wrote: > Danny Carroll wrote: >> Danny Carroll wrote: >>> Jeremy Chadwick wrote: >>>> I'd like to see the performance difference between these scenarios: >>>> >>>> - Memory cache enabled on Areca, write caching enabled on disks >>>> - Memory cache enabled on Areca, write caching disabled on disks >>>> - Memory cache disabled on Areca, write caching enabled on disks >>>> - Memory cache disabled on Areca, write caching disabled on disks >>>> >> >> >> The initial results for a ICH9 vs Areca in JBod mode can be found here: >> http://www.dannysplace.net/ZFS-JBODTests.html > > Just as a polite question, since I'm very much in favor doing > benchmarking and do appreciate these kinds of test. > > You might want to add an introductory page to your results describing > how you setup the test: > Details of the hardware > Details of the disk setup > possible version and options with bonnie > The script you used.... > > This would allow others to redo your experiment and try to figure out > why their numbers are different. > > --WjW > > From freebsd at sopwith.solgatos.com Wed Nov 12 23:30:30 2008 From: freebsd at sopwith.solgatos.com (Dieter) Date: Thu Nov 13 04:23:51 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: Your message of "Thu, 13 Nov 2008 15:46:32 +1000." <491BBF38.9010908@dannysplace.net> Message-ID: <200811130657.GAA26763@sopwith.solgatos.com> >> For the array(s) >> 9 x ST31000340AS 1tb disks >> 1 x ST31000333AS 1tb disk (trying to swap this for a ST31000340AS) > There seems to be little difference between enabling and disabling the > disk cache on the Areca. This leads me to two conclusions: > 1. Disabling the write cache does nothing on Seagate drives. > 2. IO to the drives is so slow that a write cache is irrelevant. I have a couple of the ST31000340AS 1TB disks as well as older lower capacity Seagates, and turning the write cache on/off makes a MASSIVE (roughly 10:1) difference in write speed. Jeremy reports "about 13%" with Seagate ST3120026AS: http://lists.freebsd.org/pipermail/freebsd-hardware/2008-October/005450.html Perhaps there is something about the Areca or the testing? Is the write cache really getting turned on/off? You're getting about 2-3x the speed I'd expect if the write cache were off, so maybe it is still on but there is a bottleneck elsewhere? Have you tried a simple test with /dev/zero and dd to a raw drive to eliminate the effects of the filesystem? From kib at FreeBSD.org Thu Nov 13 05:11:59 2008 From: kib at FreeBSD.org (kib@FreeBSD.org) Date: Thu Nov 13 05:12:05 2008 Subject: kern/128829: smbd(8) causes periodic panic on 7-RELEASE Message-ID: <200811131311.mADDBwwr026494@freefall.freebsd.org> Synopsis: smbd(8) causes periodic panic on 7-RELEASE State-Changed-From-To: open->feedback State-Changed-By: kib State-Changed-When: Thu Nov 13 13:09:51 UTC 2008 State-Changed-Why: I think that the problem you experience might be fixed by r184227. What is exact version of the kernel sources and kern_lockf.c on the problematic machine ? http://www.freebsd.org/cgi/query-pr.cgi?pr=128829 From danny at dannysplace.net Thu Nov 13 05:58:57 2008 From: danny at dannysplace.net (Danny Carroll) Date: Thu Nov 13 05:59:04 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <20081113074301.GA13938@icarus.home.lan> References: <491BBF38.9010908@dannysplace.net> <200811130657.GAA26763@sopwith.solgatos.com> <20081113074301.GA13938@icarus.home.lan> Message-ID: <491C32BF.7020805@dannysplace.net> Jeremy Chadwick wrote: > On Wed, Nov 12, 2008 at 10:57:58PM +0000, Dieter wrote: > The Areca controller he has can do caching of its own (it has 256MBytes > of cache). Meaning, if you disable write cache on the disks (but not > the Areca controller itself), all of the caching being done is purely > controller-based. The actual disk writes between the controller and the > disk will, of course, be "slow" -- but between the OS and the > controller, things should appear fast. It is entirely possible. I do not know however if the Areca cache works just for Raid or also in JBOD mode. The card can be configured via a web interface (it has it's own nic), via the CLI, or via the BIOS. The only setting I do see is: Disk Write Cache Mode. This is what I have tested. It might have been the Areca cache I turned off, or it might have been the disk caches that I turned off. I hope it is the former, otherwise what is the purpose of having a battery backup unit? If the disks cache the write, then you will probably lose data anyway. I think, once I turn on Raid mode, there will be an option to turn on/off caching in the raid part of the config. The manual shows me that there is an option there, but it only indicates that you can change the cache mode from WriteBack to WriteThrough. But for now, since it's in JBOD mode I cannot access that. > Let me outline the 4 test scenarios (I thought I did this in my original > mail to Danny, but I believe I also said "don't get caught up in > excessive granularity because it'll just confuse people now" -- case in > point): > > - Areca cache disabled, disk write cache enabled > - Areca cache disabled, disk write cache disabled > - Areca cache enabled, disk write cache enabled > - Areca cache enabled, disk write cache disabled [**] > > As I understand it, Danny performed the tests with the [**] > configuration. > The tests should have names: Test 1: Areca cache disabled, disk write cache enabled Test 2: Areca cache disabled, disk write cache disabled Test 3: Areca cache enabled, disk write cache enabled Test 4: Areca cache enabled, disk write cache disabled You did outline these, I thought I was performaing test 2 because I am assuming that when you turn on JBOD mode, you do not get caching on the controller. Once I am sure there is not something glaringly wrong with the FreeBSD side of things I'll run as many of these tests as I can. For now, I think it is only tests 1 and 2. So, my thoughts remain, why was the read performance the same, and the write performance actually marginally better, after I turned off the cache? I did a reboot after I turned off the cache but I did not power cycle the drives. Perhaps that is the answer? Or perhaps simply the Areca controller cannot turn off the cache on the ST31000340AS drives. Or perhaps the cache is ALWAYS enabled and cannot be turned off on the controller. That mean I was doing test 4 as Jeremy suggested. That seems a likely possibility as well. In fact, thinking about it now, it makes the most sense to me. -D From tschulz at sebeka.k12.mn.us Thu Nov 13 06:50:04 2008 From: tschulz at sebeka.k12.mn.us (Thad Schulz) Date: Thu Nov 13 06:50:11 2008 Subject: kern/128829: smbd(8) causes periodic panic on 7-RELEASE Message-ID: <200811131450.mADEo3Er095378@freefall.freebsd.org> The following reply was made to PR kern/128829; it has been noted by GNATS. From: Thad Schulz To: bug-followup@FreeBSD.org, tschulz@sebeka.k12.mn.us, kib@FreeBSD.org Cc: Subject: Re: kern/128829: smbd(8) causes periodic panic on 7-RELEASE Date: Thu, 13 Nov 2008 08:20:01 -0600 The server is running the GENERIC kernel from 7.0-RELEASE and the version of kern_lockf.c looks like v 1.57 2007/08/07 09:04:50 if the GENERIC kernel was built from the sources that came with 7.0-RELEASE. So it looks like the kern_lockf.c from r184227 would be newer. -- Thad Schulz Technology Coordinator Sebeka Public School Phone: 218-837-5101 Email: tschulz@sebeka.k12.mn.us From ndenev at gmail.com Thu Nov 13 07:31:19 2008 From: ndenev at gmail.com (Nikolay Denev) Date: Thu Nov 13 07:31:25 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491C32BF.7020805@dannysplace.net> References: <491BBF38.9010908@dannysplace.net> <200811130657.GAA26763@sopwith.solgatos.com> <20081113074301.GA13938@icarus.home.lan> <491C32BF.7020805@dannysplace.net> Message-ID: <04B6F041-3052-4650-BE62-817E2B28D034@gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13 Nov, 2008, at 15:59 , Danny Carroll wrote: [snip] > > It is entirely possible. I do not know however if the Areca cache > works > just for Raid or also in JBOD mode. > I think some RAID controllers do not use the cache when you export the disks as pass-thru/jbod, but on some controllers you can workaround this by making every disk a RAID0(stripe) array with only one disk. Dunno if that would work on the areca... [snip] - -- Regards, Nikolay Denev -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (Darwin) iEYEARECAAYFAkkcQlsACgkQHNAJ/fLbfrkTkgCgo2NupY2Qe3TglJpoIIwne4uH VRwAnRl9p44NFxyWf9zhjrZOOImtiBAs =4Djt -----END PGP SIGNATURE----- From kostikbel at gmail.com Thu Nov 13 08:00:14 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Nov 13 08:00:20 2008 Subject: kern/128829: smbd(8) causes periodic panic on 7-RELEASE Message-ID: <200811131600.mADG0D92047090@freefall.freebsd.org> The following reply was made to PR kern/128829; it has been noted by GNATS. From: Kostik Belousov To: Thad Schulz Cc: bug-followup@freebsd.org Subject: Re: kern/128829: smbd(8) causes periodic panic on 7-RELEASE Date: Thu, 13 Nov 2008 17:04:00 +0200 On Thu, Nov 13, 2008 at 08:20:01AM -0600, Thad Schulz wrote: > The server is running the GENERIC kernel from 7.0-RELEASE and the > version of kern_lockf.c looks like v 1.57 2007/08/07 09:04:50 if the > GENERIC kernel was built from the sources that came with 7.0-RELEASE. > So it looks like the kern_lockf.c from r184227 would be newer. In fact, forthcoming 7.1 contains a new implementation of the advisory locking. The mentioned r184227 was applicable to new code, not older one in 7.0. From jmrueda at diatel.upm.es Thu Nov 13 08:26:43 2008 From: jmrueda at diatel.upm.es (=?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?=) Date: Thu Nov 13 08:26:50 2008 Subject: UFS Snapshot lock time In-Reply-To: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> References: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> Message-ID: <491C51A7.8080000@diatel.upm.es> Kevin Day wrote: > > Is there any documentation out there that explains how to optimize UFS > snapshotting? > > Specifically, we've got a rather big filesystem that I'd like to do > hourly snapshots of. I don't mind how long the snapshot itself takes, > but the amount of time the filesystem is locked is a problem. We're > "dead" for about 12 minutes per snapshotting. > Just a word of caution: I used to do this in some different machines (taking periodic snapshots and leaving a few around), and after a few days or weeks the system would lock up. Any process accessing the filesystem would block in "ufs" or something like that. After rebooting, fsck would report fatal errors and I had to do fsck -y in order to fix them with plenty of scary messages about truncated inodes, unexpected inconsistencies, and so on. This happened in several 6.x releases, on different machines, and both under i386 or amd64. Eventually, I gave up. I strongly suggest you try taking hourly snapshots in a non-production system first for a few weeks, and see if you experience this kind of problems. Sorry to be a party-pooper. It looks as if keeping more than one snapshot eventually is problematic. Taking single snapshots for dump has never been a problem, though. From toasty at dragondata.com Thu Nov 13 08:57:49 2008 From: toasty at dragondata.com (Kevin Day) Date: Thu Nov 13 08:57:55 2008 Subject: UFS Snapshot lock time In-Reply-To: <491C51A7.8080000@diatel.upm.es> References: <6EEFB17C-10DF-4CCD-AB07-83B4B75D033F@dragondata.com> <491C51A7.8080000@diatel.upm.es> Message-ID: On Nov 13, 2008, at 10:11 AM, Javier Mart?n Rueda wrote: > Just a word of caution: I used to do this in some different machines > (taking periodic snapshots and leaving a few around), and after a > few days or weeks the system would lock up. Any process accessing > the filesystem would block in "ufs" or something like that. After > rebooting, fsck would report fatal errors and I had to do fsck -y in > order to fix them with plenty of scary messages about truncated > inodes, unexpected inconsistencies, and so on. This happened in > several 6.x releases, on different machines, and both under i386 or > amd64. Eventually, I gave up. > > I strongly suggest you try taking hourly snapshots in a non- > production system first for a few weeks, and see if you experience > this kind of problems. Sorry to be a party-pooper. > > It looks as if keeping more than one snapshot eventually is > problematic. Taking single snapshots for dump has never been a > problem, though. > We definitely saw this problem in 6.x. Any reboot after a snapshot would be a mess of fsck fun for a few hours, usually resulting in us losing stuff. But, 7.0 has cured that for me. So far hourly/daily snapshots on any of the 7.0 boxes we've tried it on has worked, it's just so slow it's unusable. I'd like to think it's just being slow because it's being very careful. :) -- Kevin From scottl at samsco.org Thu Nov 13 09:15:19 2008 From: scottl at samsco.org (Scott Long) Date: Thu Nov 13 09:15:33 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491BBF38.9010908@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> Message-ID: <491C5AA7.1030004@samsco.org> Danny Carroll wrote: > Danny Carroll wrote: >> Jeremy Chadwick wrote: >>> I'd like to see the performance difference between these scenarios: >>> >>> - Memory cache enabled on Areca, write caching enabled on disks >>> - Memory cache enabled on Areca, write caching disabled on disks >>> - Memory cache disabled on Areca, write caching enabled on disks >>> - Memory cache disabled on Areca, write caching disabled on disks >>> > > > The initial results for a ICH9 vs Areca in JBod mode can be found here: > http://www.dannysplace.net/ZFS-JBODTests.html > > Summary: > 5 Disk ZFS RaidZ array with atime turned off. > ICH9 - block reads avg 400MByte/Sec > ICH9 - block writes avg 150MByte/Sec > ArecaJBOD - block reads avg 300MByte/Sec > ArecaJBOD - block writes avg 160MByte/Sec > > > The Areca seems to be in all except char and block writes. Block reads > are 75% as fast as the ICH9 and rewrites are about 85% as fast. > > There seems to be little difference between enabling and disabling the > disk cache on the Areca. This leads me to two conclusions: > 1. Disabling the write cache does nothing on Seagate drives. > 2. IO to the drives is so slow that a write cache is irrelevant. > > These are just some quick tests that I started with, mainly to compare > the areca bus versus the ich9 bus. If someone has any tuning > suggestions, then now is the time to make them before I migrate the ICH9 > drives to the Areca bus. The Areca controller likely doesn't buffer/cache for disks in JBOD mode, as others in this thread have stated. Without buffering, simple disk controllers will almost always be faster than accelerated raid controllers because the accelerated controllers add more latency between the host and the disk. A simple controller will directly funnel data from the host to the disk as soon as it receives a command. An accelerated controller, however, has a CPU and a mini-OS on it that has to schedule the work coming from the host and handle its own tasks and interrupts. This adds latency that quickly adds up under benchmarks. Your numbers clearly demonstrate this. Scott From fbsd at dannysplace.net Thu Nov 13 12:46:47 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Thu Nov 13 12:46:58 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <04B6F041-3052-4650-BE62-817E2B28D034@gmail.com> References: <491BBF38.9010908@dannysplace.net> <200811130657.GAA26763@sopwith.solgatos.com> <20081113074301.GA13938@icarus.home.lan> <491C32BF.7020805@dannysplace.net> <04B6F041-3052-4650-BE62-817E2B28D034@gmail.com> Message-ID: <491C9224.4050407@dannysplace.net> Nikolay Denev wrote: > I think some RAID controllers do not use the cache when you export the > disks as pass-thru/jbod, I assumed this is the case but now I am not so sure. > but on some controllers you can workaround this by making > every disk a RAID0(stripe) array with only one disk. > Dunno if that would work on the areca... You can probably do that with this as controller as well. However if I look at the manual I do not see an option to disable the cache for Raid sets. Only to change it from Write-back to Write-Through. I guess write-through is *almost* as if the cache is disabled. -D From fbsd at dannysplace.net Thu Nov 13 12:59:43 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Thu Nov 13 12:59:55 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491C5AA7.1030004@samsco.org> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> Message-ID: <491C9535.3030504@dannysplace.net> Scott Long wrote: > The Areca controller likely doesn't buffer/cache for disks in JBOD mode, > as others in this thread have stated. Without buffering, simple disk > controllers will almost always be faster than accelerated raid > controllers because the accelerated controllers add more latency between > the host and the disk. A simple controller will directly funnel data > from the host to the disk as soon as it receives a command. An > accelerated controller, however, has a CPU and a mini-OS on it that has > to schedule the work coming from the host and handle its own tasks and > interrupts. This adds latency that quickly adds up under benchmarks. > Your numbers clearly demonstrate this. That's nice to know. I'm not sure it tells us why the Non-Cached writes were about 8% faster though. The other thing about the "NoWriteCache" test I performed that I neglected to mention yesterday is that I actually panic'd the box (running out of memory). This was the first time I have had that happen with ZFS even though in previous testing (with cache enabled) I punished the box for a lot longer. Perhaps the ZFS caching took over where the disk caching left off? Could that explain why I did not see a negative difference in the numbers between Cache enabled and Cache disabled? One of the questions I wanted to answer for myself was just this: "Does a battery-backed cache on an Areca card protect me when I am in JBOD mode." If the Areca does not buffer/cache in JBOD mode then that means the answer is no. -D From antik at bsd.ee Fri Nov 14 02:56:43 2008 From: antik at bsd.ee (Andrei Kolu) Date: Fri Nov 14 02:56:50 2008 Subject: Filesystem size and free space Message-ID: <491D5296.3000600@bsd.ee> Hi, due to migration from Windows Server 2003 NTFS filesystem to FreeBSD 7.1Beta2 UFS+softupdates filesystem I encountered strange problem. NTFS formatted filesystem seen in FreeBSD as read-only and exactly 500GB with 28GB free space but after format to UFS disk shows up as 484GB and after copying back files that was on same disk (from ntfs) UFS filesystem shows that I got -33GB (minus?) of free space. What's wrong? Is UFS so inefficient filesystem or it is a bug? Andrei From ivoras at freebsd.org Fri Nov 14 03:17:06 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Nov 14 03:17:13 2008 Subject: Filesystem size and free space In-Reply-To: <491D5296.3000600@bsd.ee> References: <491D5296.3000600@bsd.ee> Message-ID: Andrei Kolu wrote: > Hi, > > due to migration from Windows Server 2003 NTFS filesystem to FreeBSD > 7.1Beta2 UFS+softupdates filesystem I encountered strange problem. NTFS > formatted filesystem seen in FreeBSD as read-only and exactly 500GB with > 28GB free space but after format to UFS disk shows up as 484GB and after > copying back files that was on same disk (from ntfs) UFS filesystem > shows that I got -33GB (minus?) of free space. What's wrong? Is UFS so > inefficient filesystem or it is a bug? UFS reserves a small percentage of the space for the superuser (root) utilities and also for performance benefits. "-33 GB" is telling you that you have used 33 GB of this reserved space, which isn't good if the file system is going to be used in a read-write environment. If the file system is going to be used read-only, you can safely ignore this; otherwise try never to use the reserved space. In particular, userland programs running under non-root user accounts will not see this reserved space and will get "out of disk space" errors when they try to add data to files in such a file system. You can lower the size of this reserved space with "tunefs -m" which will allow non-root users to access more space, but this isn't recommended. Either delete files or buy a bigger drive. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081114/fbd60744/signature.pgp From antik at bsd.ee Fri Nov 14 03:52:39 2008 From: antik at bsd.ee (Andrei Kolu) Date: Fri Nov 14 03:52:45 2008 Subject: Filesystem size and free space In-Reply-To: References: <491D5296.3000600@bsd.ee> Message-ID: <491D6689.4090800@bsd.ee> Ivan Voras wrote: > Andrei Kolu wrote: > >> Hi, >> >> due to migration from Windows Server 2003 NTFS filesystem to FreeBSD >> 7.1Beta2 UFS+softupdates filesystem I encountered strange problem. NTFS >> formatted filesystem seen in FreeBSD as read-only and exactly 500GB with >> 28GB free space but after format to UFS disk shows up as 484GB and after >> copying back files that was on same disk (from ntfs) UFS filesystem >> shows that I got -33GB (minus?) of free space. What's wrong? Is UFS so >> inefficient filesystem or it is a bug? >> > > UFS reserves a small percentage of the space for the superuser (root) > utilities and also for performance benefits. "-33 GB" is telling you > that you have used 33 GB of this reserved space, which isn't good if the > file system is going to be used in a read-write environment. If the file > system is going to be used read-only, you can safely ignore this; > otherwise try never to use the reserved space. In particular, userland > programs running under non-root user accounts will not see this reserved > space and will get "out of disk space" errors when they try to add data > to files in such a file system. > > You can lower the size of this reserved space with "tunefs -m" which > will allow non-root users to access more space, but this isn't > recommended. Either delete files or buy a bigger drive. > > So, I can say bye-bye to 50GB of space due to UFS filesystem features? IIRC then FreeBSD 4.x got 10% root reservation per filesystem and these days we got 1000 times bigger hard drives, do we need so much reservation (8% now I presume)? This filesystem would be read-only for users but writable to administrator to store large image files from other computers on network. If I understand correctly then this "reserved space" is used to avoid filesystem fragmentation. Can ZFS be more efficient, what if I create volumes per disk? Andrei From avg at icyb.net.ua Fri Nov 14 04:37:36 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Nov 14 04:37:43 2008 Subject: zfs snapdir: from hidden to visible and back again Message-ID: <491D710A.9090308@icyb.net.ua> I needed to check some files in earlier snapshot, so I did: $ zfs set snapdir=visible tank/usr/local everything went well, I examined /usr/local/.zfs/.... and then did: $ zfs set snapdir=hidden tank/usr/local after that .zfs directory disappeared from output of ls -l /usr/local, BUT: $ mount ... tank/usr/local@upgradeall on /usr/local/.zfs/snapshot/upgradeall (zfs, local, noatime, read-only) This is the snapshot that I examined earlier. Hmm, strange. Then I did: $ umount /usr/local/.zfs/snapshot/upgradeall After that .zfs is not listed in /usr/local and mount command does not list the snapshot anymore. Is this correct behavior, did I have to do umount? Also, even with snapdir=hidden, I still can list snapshots (their contents) if I ls full path with .zfs in it. Is this right? -- Andriy Gapon From avg at icyb.net.ua Fri Nov 14 06:07:33 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri Nov 14 06:07:39 2008 Subject: zfs snapdir: from hidden to visible and back again In-Reply-To: <491D710A.9090308@icyb.net.ua> References: <491D710A.9090308@icyb.net.ua> Message-ID: <491D8621.40101@icyb.net.ua> on 14/11/2008 14:37 Andriy Gapon said the following: > Also, even with snapdir=hidden, I still can list snapshots (their > contents) if I ls full path with .zfs in it. > Is this right? And it seems that any snapshot accessed in this way gets automatically added to mounts. This doesn't seem to be reasonable. For example, periodic security script would report suid binaries found in these snapshots, etc. BTW, forgot this: FreeBSD 7.1-PRERELEASE amd64 (r184944) -- Andriy Gapon From koitsu at FreeBSD.org Fri Nov 14 11:47:54 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Nov 14 11:48:01 2008 Subject: Filesystem size and free space In-Reply-To: <491D6689.4090800@bsd.ee> References: <491D5296.3000600@bsd.ee> <491D6689.4090800@bsd.ee> Message-ID: <20081114194751.GA88072@icarus.home.lan> On Fri, Nov 14, 2008 at 01:52:41PM +0200, Andrei Kolu wrote: > So, I can say bye-bye to 50GB of space due to UFS filesystem features? > IIRC then FreeBSD 4.x got 10% root reservation per filesystem and these > days we got 1000 times bigger hard drives, do we need so much > reservation (8% now I presume)? man tunefs(8), see -m flag. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From peterjeremy at optushome.com.au Fri Nov 14 17:42:13 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Fri Nov 14 17:42:20 2008 Subject: Filesystem size and free space In-Reply-To: <491D5296.3000600@bsd.ee> References: <491D5296.3000600@bsd.ee> Message-ID: <20081115014203.GE51761@server.vk2pj.dyndns.org> On 2008-Nov-14 12:27:34 +0200, Andrei Kolu wrote: >due to migration from Windows Server 2003 NTFS filesystem to FreeBSD >7.1Beta2 UFS+softupdates filesystem I encountered strange problem. NTFS >formatted filesystem seen in FreeBSD as read-only and exactly 500GB with >28GB free space but after format to UFS disk shows up as 484GB and after >copying back files that was on same disk (from ntfs) UFS filesystem >shows that I got -33GB (minus?) of free space. What's wrong? Is UFS so >inefficient filesystem or it is a bug? Maybe your data is not a good match for the UFS2 defaults. In the case of UFS2, the size shown as x-Blocks reflects the size of the underlying media, less a free space allowance: 8% [not 10%] by default - see the -m option of tunefs for details of this and why it exists. Out of this, UFS2 allocates file and direcory data blocks, file metadata and filesystem metadata. By default, data blocks are 16KB with 2KB fragments. Each file or directory needs 256 bytes of metadata (its inode). I can't quickly find the size of the filesystem metadata but estimate it is <<1% of the filesystem size. You haven't said what sort of files you are storing but you might find the following suggestions useful: - As others have suggested, reducing minfree will help remove the negative free space. Be careful doing this unless your filesystem is write-once. - If you have a few very large files, rebuild the filesystem with fewer inodes (large '-i' parameter to newfs) and maybe a bigger blocksize. - If you have lots of small files, you might be better off with an 8K/1K filesystem and maybe even UFS1 (which has a smaller inode size). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081115/3ffaeeed/attachment.pgp From rick-freebsd2008 at kiwi-computer.com Fri Nov 14 22:34:42 2008 From: rick-freebsd2008 at kiwi-computer.com (Rick C. Petty) Date: Fri Nov 14 22:34:49 2008 Subject: Filesystem size and free space In-Reply-To: <20081115014203.GE51761@server.vk2pj.dyndns.org> References: <491D5296.3000600@bsd.ee> <20081115014203.GE51761@server.vk2pj.dyndns.org> Message-ID: <20081115045824.GA23464@keira.kiwi-computer.com> Note, the details in this message are meant for the original poster. On Sat, Nov 15, 2008 at 12:42:03PM +1100, Peter Jeremy wrote: > > In the case of UFS2, the size shown as x-Blocks reflects the size of > the underlying media, less a free space allowance: 8% [not 10%] by > default - see the -m option of tunefs for details of this and why it > exists. Out of this, UFS2 allocates file and direcory data blocks, > file metadata and filesystem metadata. By default, data blocks are > 16KB with 2KB fragments. Each file or directory needs 256 bytes of > metadata (its inode). I can't quickly find the size of the filesystem > metadata but estimate it is <<1% of the filesystem size. Not true. With the current defaults, and not including the 320k reserved at the beginning (for bootblocks, etc.) nor the few MB at the end to align with a cylinder boundary, UFS2 takes around 3% space for its metadata. Almost all of this is inode allocation. Most people don't need nearly this many inodes, but newfs(8) chooses too many instead of too few because running out of inodes is more frustrating (I've done it before and I agree). But you can change this default, and I do this for all my filesystems. Specifying a low inode density can get you into trouble but can reduce the metadata consumption to around 0.3% of the filesystem size. I haven't done the math, but but it seems that NTFS uses about 12.5% (at best) for storing filesystem metadata. Another big difference is that NTFS slows down considerably at about 75% full whereas UFS and UFS2 perform very well to just past 92%. Also NTFS does have much flexibility in being able to control the amount of filesystem metadata. With the price of drives nowadays, I find complaints about metadata waste particularly annoying. Still, I suggest that the OP should use the inode density parameter to newfs if if insisting that UFS wastes too much space. > - As others have suggested, reducing minfree will help remove the negative > free space. Be careful doing this unless your filesystem is write-once. I wouldn't bother. The performance loss is so great that you're better off buying a larger drive. I have gone into the minfree threshold before but it still beats a similarly-full NTFS partition in terms of performance. > - If you have a few very large files, rebuild the filesystem with fewer > inodes (large '-i' parameter to newfs) and maybe a bigger blocksize. If you know the sizes and numbers of files in advance, it's easy to do the math here. Play with "newfs -N -i " until the number of cylinder groups times the number of inodes (per cylinder group, in case the output is not clear) is higher than the total number of files plus directories you will be storing. I always add an extra margin just to be safe. One warning about using this option-- if you ever intend to use growfs(8), be warned that growfs does not have a -i option nor does it account for the number of inodes you previously specified. It's easy to push the numbers such that a growfs will actually reduce your free space below the -8% and thus fail. I don't see much point in using growfs in general; you either are migrating a volume to a large drive (in which case you're better off newfs-ing and using rsync) or you're trying to fiddle with an existing drive that is probably too small for your needs. I find growfs more useful for working with keyfobs or md(4) devices. > - If you have lots of small files, you might be better off with an 8K/1K > filesystem and maybe even UFS1 (which has a smaller inode size). If you're planning on fiddling with the block and fragment sizes, I would do this before adjusting the -i option, since both affect the outcome of the inode density. I find these parameters harder to configure. You need to know your typical file size (not just average size). If your smaller files are much smaller than the block size, then it makes sense to lower these sizes. If you have fewer but larger files, it might make sense to increase them. It helps to understand how the files are allocated on the filesystem. A file always allocates full blocks (each block being 8 fragments) to store its data except for the last block. If a full block isn't needed, the file allocates the number of fragments needed, leaving the remaining fragments of that block to be used by the last block of another file. This really helps when you have a lot of small files but not as much when your files are large. Another thing I recommend if you have (or are planning to have) a number of filesystems is to keep track of which newfs parameters you used to make each file system. This helps you in planning future filesystems and helps you recreate the original filesystems (in case of a restore from backup or if you plan to move a filesystem to another disk). For example, I created a filesystem to store some videos I recorded from VHS and estimated about 1000 inodes were needed, with an average file size of 300 MB. I created this filesystem using "newfs -U -f 8192 -b 65536 -i 314572800". I still got many more inodes than I needed but consumed less space with metadata. HTH, -- Rick C. Petty From peterjeremy at optushome.com.au Sat Nov 15 14:06:36 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Sat Nov 15 14:06:43 2008 Subject: Filesystem size and free space In-Reply-To: <20081115045824.GA23464@keira.kiwi-computer.com> References: <491D5296.3000600@bsd.ee> <20081115014203.GE51761@server.vk2pj.dyndns.org> <20081115045824.GA23464@keira.kiwi-computer.com> Message-ID: <20081115220617.GF51761@server.vk2pj.dyndns.org> On 2008-Nov-14 22:58:24 -0600, "Rick C. Petty" wrote: >> 16KB with 2KB fragments. Each file or directory needs 256 bytes of >> metadata (its inode). I can't quickly find the size of the filesystem >> metadata but estimate it is <<1% of the filesystem size. > >Not true. With the current defaults, and not including the 320k reserved >at the beginning (for bootblocks, etc.) nor the few MB at the end to align >with a cylinder boundary, UFS2 takes around 3% space for its metadata. >Almost all of this is inode allocation. Note that I explicitly differentiated between the inodes (file metadata) and the rest of the filesystem metadata - superblock replicas, cylinder group headers, free block bitmaps, etc. Inodes do have several % reserved for them by default but the other space is very small. > Most people don't need nearly this >many inodes, but newfs(8) chooses too many instead of too few because running >out of inodes is more frustrating (I've done it before and I agree). Also, UFS generally tries to allocate a file in the same CG as its initial directory and lots of spare inodes help here. >With the price of drives nowadays, I find complaints about metadata waste >particularly annoying. Still, I suggest that the OP should use the inode >density parameter to newfs if if insisting that UFS wastes too much space. It is an issue where you are trying to move data from one FS to another whilst reusing the same physical space - which I gather the OP was. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081115/212065d8/attachment.pgp From ltning at anduin.net Sun Nov 16 13:11:23 2008 From: ltning at anduin.net (=?ISO-8859-1?Q?Eirik_=D8verby?=) Date: Sun Nov 16 13:11:30 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <491C9535.3030504@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> Message-ID: On Nov 13, 2008, at 21:59, Danny Carroll wrote: > Scott Long wrote: >> The Areca controller likely doesn't buffer/cache for disks in JBOD >> mode, >> as others in this thread have stated. Without buffering, simple disk >> controllers will almost always be faster than accelerated raid >> controllers because the accelerated controllers add more latency >> between >> the host and the disk. A simple controller will directly funnel data >> from the host to the disk as soon as it receives a command. An >> accelerated controller, however, has a CPU and a mini-OS on it that >> has >> to schedule the work coming from the host and handle its own tasks >> and >> interrupts. This adds latency that quickly adds up under benchmarks. >> Your numbers clearly demonstrate this. > > That's nice to know. I'm not sure it tells us why the Non-Cached > writes > were about 8% faster though. The other thing about the "NoWriteCache" > test I performed that I neglected to mention yesterday is that I > actually panic'd the box (running out of memory). This was the first > time I have had that happen with ZFS even though in previous testing > (with cache enabled) I punished the box for a lot longer. > > Perhaps the ZFS caching took over where the disk caching left off? > Could that explain why I did not see a negative difference in the > numbers between Cache enabled and Cache disabled? > > One of the questions I wanted to answer for myself was just this: > "Does > a battery-backed cache on an Areca card protect me when I am in JBOD > mode." If the Areca does not buffer/cache in JBOD mode then that > means > the answer is no. I have noticed that my 3ware controllers, after updating firmware recently, have removed the JBOD option entirely, classifying it as something you wouldn't want to do with that kind of hardware anyway. I believed then, and even more so now, they are correct. Use the RAID-0 disk trick to be able to utilize the controller cache. And regarding write-back vs write-through; I believe write-through is equvivalent to disabling controller write cache, however it WILL cache the writes in order to respond to future reads of the data being written. I would guess, but I don't know, that this also goes for disk- level caches too, though, so it probably doesn't matter. /Eirik From james-freebsd-fs2 at jrv.org Sun Nov 16 18:53:47 2008 From: james-freebsd-fs2 at jrv.org (James R. Van Artsdalen) Date: Sun Nov 16 18:53:53 2008 Subject: Will XFS be adopted In-Reply-To: <20081109184349.GG51239@server.vk2pj.dyndns.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> Message-ID: <4920D879.3070806@jrv.org> Peter Jeremy wrote: > FreeBSD has ZFS - which is a re-sizable FS with an integrated volume > manager. > > ZFS has limitations. It is not appropriate for "appliance" applications such as the Soekris boxes does due to memory consumption. ZFS strongly depends on write-ordering around cache flushes, and a pool can easily be corrupted when this dependency is not met. BTRFS will be another filesystem to watch. Perhaps foreign filesystems could be supported out of ports. But the fundamental limitation, as was said, is that someone has to care enough to do the port. From fbsd at dannysplace.net Sun Nov 16 19:15:51 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Sun Nov 16 19:15:58 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> Message-ID: <4920E1DD.7000101@dannysplace.net> Eirik ?verby wrote: > I have noticed that my 3ware controllers, after updating firmware > recently, have removed the JBOD option entirely, classifying it as > something you wouldn't want to do with that kind of hardware anyway. I > believed then, and even more so now, they are correct. It kinda depends. If there were a good 8 or 16+ port SATA card out there that *simply* did SATA with no bells and whistles, then there would be no point buying a Raid adaptor when you want to use things like ZFS. But there are no such cards available. > Use the RAID-0 disk trick to be able to utilize the controller cache. > And regarding write-back vs write-through; I believe write-through is > equvivalent to disabling controller write cache, however it WILL cache > the writes in order to respond to future reads of the data being > written. I would guess, but I don't know, that this also goes for > disk-level caches too, though, so it probably doesn't matter. It is interesting to me that the default setting on the Areca card was to have the disk caches turned on. I think that is strange because by default you have a situation that can lead to data loss even if you have a battery backup unit. -D From pjd at FreeBSD.org Sun Nov 16 20:50:19 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sun Nov 16 20:50:26 2008 Subject: zfs snapdir: from hidden to visible and back again In-Reply-To: <491D8621.40101@icyb.net.ua> References: <491D710A.9090308@icyb.net.ua> <491D8621.40101@icyb.net.ua> Message-ID: <20081117043042.GA2101@garage.freebsd.pl> On Fri, Nov 14, 2008 at 04:07:29PM +0200, Andriy Gapon wrote: > on 14/11/2008 14:37 Andriy Gapon said the following: > > Also, even with snapdir=hidden, I still can list snapshots (their > > contents) if I ls full path with .zfs in it. > > Is this right? > > And it seems that any snapshot accessed in this way gets automatically > added to mounts. This doesn't seem to be reasonable. > > For example, periodic security script would report suid binaries found > in these snapshots, etc. Everything you described is expected behaviour. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081117/a7fefbf3/attachment.pgp From dan-freebsd-questions at ourbrains.org Sun Nov 16 21:31:02 2008 From: dan-freebsd-questions at ourbrains.org (Dan) Date: Sun Nov 16 21:31:08 2008 Subject: Will XFS be adopted In-Reply-To: <4920D879.3070806@jrv.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> Message-ID: <20081117050441.GA16855@ourbrains.org> James R. Van Artsdalen(james-freebsd-fs2@jrv.org)@2008.11.16 20:35:37 -0600: > ZFS has limitations. > > It is not appropriate for "appliance" applications such as the Soekris > boxes does due to memory consumption. YES! In my opinion it's not even appropriate for a machine with 2GB of RAM. Why waste so much RAM on an FS? Does anyone know? Or is this some sort of conspiracy to sell more bgger boxes. It's Sun, afterall.... > BTRFS will be another filesystem to watch. Perhaps foreign filesystems > could be supported out of ports. But the fundamental limitation, as was > said, is that someone has to care enough to do the port. What kinda bugs me is why FreeBSD hasn't adopted a nice journaling FS until now. Look at Linux - Reiser, EXT3 and XFS/JFS have been in it for years. What gives with FreeBSD? From brooks at freebsd.org Sun Nov 16 21:33:38 2008 From: brooks at freebsd.org (Brooks Davis) Date: Sun Nov 16 21:33:45 2008 Subject: Will XFS be adopted In-Reply-To: <20081117050441.GA16855@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> Message-ID: <20081117053423.GA58892@lor.one-eyed-alien.net> On Mon, Nov 17, 2008 at 12:04:41AM -0500, Dan wrote: > James R. Van Artsdalen(james-freebsd-fs2@jrv.org)@2008.11.16 20:35:37 -0600: > > ZFS has limitations. > > > > It is not appropriate for "appliance" applications such as the Soekris > > boxes does due to memory consumption. > YES! In my opinion it's not even appropriate for a machine with 2GB of > RAM. Why waste so much RAM on an FS? Does anyone know? Or is this some > sort of conspiracy to sell more bgger boxes. It's Sun, afterall.... > > > BTRFS will be another filesystem to watch. Perhaps foreign filesystems > > could be supported out of ports. But the fundamental limitation, as was > > said, is that someone has to care enough to do the port. > > What kinda bugs me is why FreeBSD hasn't adopted a nice journaling FS > until now. Look at Linux - Reiser, EXT3 and XFS/JFS have been in it for > years. What gives with FreeBSD? Someone needs to actually write one. We don't have all that many file system experts and even a pretty basic file system is a huge undertaking. -- Brooks -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081117/f8979022/attachment.pgp From koitsu at FreeBSD.org Sun Nov 16 21:43:50 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Sun Nov 16 21:43:57 2008 Subject: Will XFS be adopted In-Reply-To: <20081117050441.GA16855@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> Message-ID: <20081117054347.GA20749@icarus.home.lan> On Mon, Nov 17, 2008 at 12:04:41AM -0500, Dan wrote: > James R. Van Artsdalen(james-freebsd-fs2@jrv.org)@2008.11.16 20:35:37 -0600: > > ZFS has limitations. > > > > It is not appropriate for "appliance" applications such as the Soekris > > boxes does due to memory consumption. > YES! In my opinion it's not even appropriate for a machine with 2GB of > RAM. Why waste so much RAM on an FS? Does anyone know? Or is this some > sort of conspiracy to sell more bgger boxes. It's Sun, afterall.... Please. If Sun's sole goal was to "sell bigger boxes", they wouldn't be participating in the open-source world with OpenSolaris and helping other OSes with getting ZFS. None of those things tie you to Sun hardware. The ZFS caching concept as I see it is quite simple to understand: keep as much data as possible in RAM, to decrease overall disk I/O (RAM is significantly faster than disk). There's other reasons (goals) as well, but I'm trying to keep it simple. The reality of the situation is that for most desktops and servers, you can buy 4GB of RAM for something like US$25-30. That's incredibly inexpensive -- were you around back in the mid-90s when 2x4MB SIMMs cost you US$200? Or in the late 80s when a 1MB expansion card for the Apple IIGS cost US$300? I understand (really!) these things can't be compared to an embedded platform, but the entire world does not use embedded hardware. Step back for a moment and reflect. And you do realise that the memory requirements of ZFS can be tuned, yes? You can literally tell it "only use 16MB of memory for the ARC". > > BTRFS will be another filesystem to watch. Perhaps foreign filesystems > > could be supported out of ports. But the fundamental limitation, as was > > said, is that someone has to care enough to do the port. > > What kinda bugs me is why FreeBSD hasn't adopted a nice journaling FS > until now. Look at Linux - Reiser, EXT3 and XFS/JFS have been in it for > years. What gives with FreeBSD? There's gjournal(8), which does journalling on a block level, meaning you can use whatever FS you want atop it. Also: if Linux has the things you want, use it! Pick whichever OS gets the job done for you, and meets your requirements. If FreeBSD lacks something which Linux has, and that something is important to you, going with Linux is the correct choice. The same applies to any OS, not just Linux. Its about having choices, and solving problems -- not about blind OS advocacy. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From matt at corp.spry.com Sun Nov 16 22:06:48 2008 From: matt at corp.spry.com (Matt Simerson) Date: Sun Nov 16 22:06:54 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <4920E1DD.7000101@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> Message-ID: On Nov 16, 2008, at 7:15 PM, Danny Carroll wrote: > Eirik ?verby wrote: >> I have noticed that my 3ware controllers, after updating firmware >> recently, have removed the JBOD option entirely, classifying it as >> something you wouldn't want to do with that kind of hardware >> anyway. I >> believed then, and even more so now, they are correct. > > It kinda depends. If there were a good 8 or 16+ port SATA card out > there that *simply* did SATA with no bells and whistles, then there > would be no point buying a Raid adaptor when you want to use things > like > ZFS. > > But there are no such cards available. Allow me to introduce you to Marvell. The sell the SATA controller used in the Sun thumper (X4500). I've used that same SATA controller under OpenSolaris and FreeBSD. Unfortunately, that controller doesn't use multi-lane cables. When you pack in 3 controllers and 24 disks, it's a cabling disaster. http://freebsd.monkey.org/freebsd-fs/200808/msg00027.html >> Use the RAID-0 disk trick to be able to utilize the controller cache. >> And regarding write-back vs write-through; I believe write-through is >> equvivalent to disabling controller write cache, however it WILL >> cache >> the writes in order to respond to future reads of the data being >> written. I would guess, but I don't know, that this also goes for >> disk-level caches too, though, so it probably doesn't matter. > > It is interesting to me that the default setting on the Areca card was > to have the disk caches turned on. I think that is strange because by > default you have a situation that can lead to data loss even if you > have > a battery backup unit. The Areca cards do NOT have the cache enabled by default. I ordered the optional battery and RAM upgrade for my collection of 1231ML cards. Even with the BBWC, the cache is not enabled by default. I had to go out of my way to enable it, on every single controller. Matt From koitsu at FreeBSD.org Sun Nov 16 23:08:20 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Sun Nov 16 23:08:27 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> Message-ID: <20081117070818.GA22231@icarus.home.lan> On Sun, Nov 16, 2008 at 10:06:42PM -0800, Matt Simerson wrote: > > On Nov 16, 2008, at 7:15 PM, Danny Carroll wrote: > >> Eirik ?verby wrote: >>> I have noticed that my 3ware controllers, after updating firmware >>> recently, have removed the JBOD option entirely, classifying it as >>> something you wouldn't want to do with that kind of hardware anyway. >>> I >>> believed then, and even more so now, they are correct. >> >> It kinda depends. If there were a good 8 or 16+ port SATA card out >> there that *simply* did SATA with no bells and whistles, then there >> would be no point buying a Raid adaptor when you want to use things >> like >> ZFS. >> >> But there are no such cards available. > > Allow me to introduce you to Marvell. The sell the SATA controller used > in the Sun thumper (X4500). I've used that same SATA controller under > OpenSolaris and FreeBSD. Unfortunately, that controller doesn't use > multi-lane cables. When you pack in 3 controllers and 24 disks, it's a > cabling disaster. > > http://freebsd.monkey.org/freebsd-fs/200808/msg00027.html I participated in that thread. http://freebsd.monkey.org/freebsd-fs/200808/msg00028.html The questions I had never got answered. The most important one being: have you actually performed a hard failure or forced disk swap with both the Areca and Marvell controllers? And how does FreeBSD behave when you do this? I've a feeling it works fine on the Areca (since CAM/da(4) are used), but if the Marvell card uses ata(4) (and I'm guessing it does) I'm concerned. Why? For sake of comparison: Promise controllers are considered one of the most well-supported controllers under FreeBSD, mainly due to Soren having access to their documentation; yet, when I attempted to do an actual disk upgrade, the Promise controller did nothing but cause me grief, forcing me to yank the entire card from my system. http://wiki.freebsd.org/JeremyChadwick/ZFS_disk_upgrade_gone_bad Users should read this story and the follow-up. And in my situation, the disk wasn't even bad/failed. What was supposed to be a simple procedure (and it was with Intel AHCI, as you'll read) turned into a complete nightmare. Take my story and apply it to a production datacentre -- but with an 8 or 16-port card and a shelf of disks. What're you going to tell your boss when this stuff fails like how I documented? "Yeah so I need US$600 to replace the card" "Why? We don't have that kind of budget. Is the card bad? Can we RMA it?" "No, the card isn't bad" "Then what is the problem?" "Well you see......" So when I see someone say "Yeah, try the , it works great", my first response is "Just how well have you actually tested failure or upgrade scenarios?" Most don't, and instead just *assume* come fail-time, that everything will "just work" -- and they find out the horrible truth when it's already too late. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From andrew at modulus.org Mon Nov 17 00:40:57 2008 From: andrew at modulus.org (Andrew Snow) Date: Mon Nov 17 00:41:04 2008 Subject: zfs snapdir: from hidden to visible and back again In-Reply-To: <20081117043042.GA2101@garage.freebsd.pl> References: <491D710A.9090308@icyb.net.ua> <491D8621.40101@icyb.net.ua> <20081117043042.GA2101@garage.freebsd.pl> Message-ID: <49212E04.9000507@modulus.org> > And it seems that any snapshot accessed in this way gets automatically > added to mounts. This doesn't seem to be reasonable. One workaround I use is to use the "clone" command on any specific snapshots I want to have mounted, and then the rest can be left hidden. When you are finished accessing the snapshot, simply destroy the clone. - Andrew From bugmaster at FreeBSD.org Mon Nov 17 03:06:50 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Nov 17 03:07:56 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200811171106.mAHB6ntQ082509@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 24 problems total. From danny at dannysplace.net Mon Nov 17 03:42:53 2008 From: danny at dannysplace.net (Danny Carroll) Date: Mon Nov 17 03:42:59 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> Message-ID: <492158D2.5020506@dannysplace.net> Matt Simerson wrote: > Allow me to introduce you to Marvell. The sell the SATA controller used > in the Sun thumper (X4500). I've used that same SATA controller under > OpenSolaris and FreeBSD. Unfortunately, that controller doesn't use > multi-lane cables. When you pack in 3 controllers and 24 disks, it's a > cabling disaster. > > http://freebsd.monkey.org/freebsd-fs/200808/msg00027.html Interesting. Wish I had seen it before. To be honest I did consider this board but I was really in favour of PCIe over PCIX. That might have been a mistake :-) > The Areca cards do NOT have the cache enabled by default. I ordered the > optional battery and RAM upgrade for my collection of 1231ML cards. Even > with the BBWC, the cache is not enabled by default. I had to go out of > my way to enable it, on every single controller. Are you talking about the Areca cache or the disks own caches? On my board it was enabled. But maybe mine was the exception. -D From morganw at chemikals.org Mon Nov 17 03:44:50 2008 From: morganw at chemikals.org (Wes Morgan) Date: Mon Nov 17 03:45:01 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> Message-ID: On Sun, 16 Nov 2008, Matt Simerson wrote: > > On Nov 16, 2008, at 7:15 PM, Danny Carroll wrote: > >> Eirik ?verby wrote: >>> I have noticed that my 3ware controllers, after updating firmware >>> recently, have removed the JBOD option entirely, classifying it as >>> something you wouldn't want to do with that kind of hardware anyway. I >>> believed then, and even more so now, they are correct. >> >> It kinda depends. If there were a good 8 or 16+ port SATA card out >> there that *simply* did SATA with no bells and whistles, then there >> would be no point buying a Raid adaptor when you want to use things like >> ZFS. >> >> But there are no such cards available. > > Allow me to introduce you to Marvell. The sell the SATA controller used in > the Sun thumper (X4500). I've used that same SATA controller under > OpenSolaris and FreeBSD. Unfortunately, that controller doesn't use > multi-lane cables. When you pack in 3 controllers and 24 disks, it's a > cabling disaster. > > http://freebsd.monkey.org/freebsd-fs/200808/msg00027.html > >>> Use the RAID-0 disk trick to be able to utilize the controller cache. >>> And regarding write-back vs write-through; I believe write-through is >>> equvivalent to disabling controller write cache, however it WILL cache >>> the writes in order to respond to future reads of the data being >>> written. I would guess, but I don't know, that this also goes for >>> disk-level caches too, though, so it probably doesn't matter. >> >> It is interesting to me that the default setting on the Areca card was >> to have the disk caches turned on. I think that is strange because by >> default you have a situation that can lead to data loss even if you have >> a battery backup unit. > > The Areca cards do NOT have the cache enabled by default. I ordered the > optional battery and RAM upgrade for my collection of 1231ML cards. Even with > the BBWC, the cache is not enabled by default. I had to go out of my way to > enable it, on every single controller. Are you using these areca cards successfully with large arrays? I found a 1680i card for a decent price and installed it this weekend, but since then I'm seeing the raidz2 pool that it's running hang so frequently that I can't even trust using it. The hangs occur in both 7-stable and 8-current with the new ZFS patch. Same exact settings that have been rock solid for me before now don't want to work at all. The drives are just set as JBOD -- the controller actually defaulted to this, so I didn't have to make any real changes in the BIOS. Any tips on your setup? Did you have any similar problems? From avg at icyb.net.ua Mon Nov 17 03:57:33 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Mon Nov 17 03:57:40 2008 Subject: zfs snapdir: from hidden to visible and back again In-Reply-To: <20081117043042.GA2101@garage.freebsd.pl> References: <491D710A.9090308@icyb.net.ua> <491D8621.40101@icyb.net.ua> <20081117043042.GA2101@garage.freebsd.pl> Message-ID: <49215C28.1020405@icyb.net.ua> on 17/11/2008 06:31 Pawel Jakub Dawidek said the following: > On Fri, Nov 14, 2008 at 04:07:29PM +0200, Andriy Gapon wrote: >> on 14/11/2008 14:37 Andriy Gapon said the following: >>> Also, even with snapdir=hidden, I still can list snapshots (their >>> contents) if I ls full path with .zfs in it. >>> Is this right? >> And it seems that any snapshot accessed in this way gets automatically >> added to mounts. This doesn't seem to be reasonable. >> >> For example, periodic security script would report suid binaries found >> in these snapshots, etc. > > Everything you described is expected behaviour. > I see. I guess there is no way to access something without mounting and no way to auto-unmount after use. Thanks. -- Andriy Gapon From pjd at FreeBSD.org Mon Nov 17 06:15:46 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Mon Nov 17 06:15:52 2008 Subject: zfs snapdir: from hidden to visible and back again In-Reply-To: <49215C28.1020405@icyb.net.ua> References: <491D710A.9090308@icyb.net.ua> <491D8621.40101@icyb.net.ua> <20081117043042.GA2101@garage.freebsd.pl> <49215C28.1020405@icyb.net.ua> Message-ID: <20081117141523.GB2101@garage.freebsd.pl> On Mon, Nov 17, 2008 at 01:57:28PM +0200, Andriy Gapon wrote: > on 17/11/2008 06:31 Pawel Jakub Dawidek said the following: > > On Fri, Nov 14, 2008 at 04:07:29PM +0200, Andriy Gapon wrote: > >> on 14/11/2008 14:37 Andriy Gapon said the following: > >>> Also, even with snapdir=hidden, I still can list snapshots (their > >>> contents) if I ls full path with .zfs in it. > >>> Is this right? > >> And it seems that any snapshot accessed in this way gets automatically > >> added to mounts. This doesn't seem to be reasonable. > >> > >> For example, periodic security script would report suid binaries found > >> in these snapshots, etc. > > > > Everything you described is expected behaviour. > > > > I see. I guess there is no way to access something without mounting and > no way to auto-unmount after use. > Thanks. You can setup a cron job which will try to unmount all the snapshots every few minutes. If something is using the snapshot, unmount should fail. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081117/2bdeb750/attachment.pgp From matt at corp.spry.com Mon Nov 17 13:05:00 2008 From: matt at corp.spry.com (Matt Simerson) Date: Mon Nov 17 13:05:07 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: <492158D2.5020506@dannysplace.net> References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> <492158D2.5020506@dannysplace.net> Message-ID: On Nov 17, 2008, at 3:43 AM, Danny Carroll wrote: > Matt Simerson wrote: >> Allow me to introduce you to Marvell. The sell the SATA controller >> used >> in the Sun thumper (X4500). I've used that same SATA controller under >> OpenSolaris and FreeBSD. Unfortunately, that controller doesn't use >> multi-lane cables. When you pack in 3 controllers and 24 disks, >> it's a >> cabling disaster. >> >> http://freebsd.monkey.org/freebsd-fs/200808/msg00027.html > > Interesting. Wish I had seen it before. To be honest I did consider > this board but I was really in favour of PCIe over PCIX. That might > have been a mistake :-) > >> The Areca cards do NOT have the cache enabled by default. I ordered >> the >> optional battery and RAM upgrade for my collection of 1231ML cards. >> Even >> with the BBWC, the cache is not enabled by default. I had to go out >> of >> my way to enable it, on every single controller. > > Are you talking about the Areca cache or the disks own caches? Disk caching is a completely different animal, and one which I didn't mention. I'm spoke only about the write cache on the controller. Mine all arrived off by default, which is a VERY reasonable default configuration. Page 97 of the manual says about it: >>> 3.7.5.12 Disk Write Cache Mode >>> User can set the "Disk Write Cache Mode" to Auto, Enabled, or >>> Disabled. Enabled increases speed, Disabled increases reliability. > On my board it was enabled. But maybe mine was the exception. Perhaps it's model specific, or your vendor configured it that way. Or you got a return that someone else monkeyed with. I'm not going to speak for Areca but it seems quite odd that Areca would ship them with the cache enabled. I've used many hundreds of RAID controllers over the years and without exception, every single one with a write cache had it disabled by default. Matt From matt at corp.spry.com Mon Nov 17 14:07:55 2008 From: matt at corp.spry.com (Matt Simerson) Date: Mon Nov 17 14:08:06 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> Message-ID: <8B620677-C2CA-4408-A0B1-AACC23FD0FF1@corp.spry.com> On Nov 17, 2008, at 3:26 AM, Wes Morgan wrote: >>> The Areca cards do NOT have the cache enabled by default. I >>> ordered the optional battery and RAM upgrade for my collection of >>> 1231ML cards. Even with the BBWC, the cache is not enabled by >>> default. I had to go out of my way to enable it, on every single >>> controller. > > Are you using these areca cards successfully with large arrays? Yes, if you consider 24 x 1TB large. > I found a 1680i card for a decent price and installed it this > weekend, but since then I'm seeing the raidz2 pool that it's running > hang so frequently that I can't even trust using it. The hangs occur > in both 7-stable and 8-current with the new ZFS patch. Same exact > settings that have been rock solid for me before now don't want to > work at all. The drives are just set as JBOD -- the controller > actually defaulted to this, so I didn't have to make any real > changes in the BIOS. > > Any tips on your setup? Did you have any similar problems? I talked to a storage vendor of ours that has sold several SuperMicro systems like ours where the client was using OpenSolaris and having similar stability issues to what we see on FreeBSD. It seems to be a lack of maturity in ZFS that underlies these problems. It appears that running ZFS on FreeBSD will either thrill or horrify. When I tested with modest I/O requirements, it worked great and I was tickled. But when I build these new systems as backup servers, I was generating immensely more disk I/O. I started with 7.0 release and saw crashes hourly. With tuning, I was only crashing once or twice a day (always memory related). With 16GB of RAM. I ran for a month with one server on JBOD with RAIDZ2 and another with RAIDZ across two RAID 5 arrays. Then I lost a disk and consequently the array on the JBOD server. Since RAID 5 had proved to run so much faster, I ditched the Marvell cards, installed a pair of 1231MLs and reformatted it with RAID 5. Both 24 disk systems have been ZFS RAIDZ across two RAID 5 hardware arrays for months since. If I build another system tomorrow, that's exactly how I'd do it. After upgrading to 8-HEAD and applying The Great ZFS Patch, I am content with only having to reboot the systems once every 7-12 days. I have another system with only 8 disks and 4GB of RAM with ZFS running on a single RAID 5 array. Under the same workload as the 24 disk systems, it was crashing at least once a day. This was existing hardware, so we were confident it wasn't hardware issues. I finally resolved it by wiping the disks clean, creating a GPT partition on the array and using UFS. The system hasn't crashed once since and is far more responsive under heavy load than my ZFS systems. Of course, all of this might get a fair bit better soon: http://svn.freebsd.org/viewvc/base?view=revision&revision=185029 Matt From danny at dannysplace.net Mon Nov 17 15:46:17 2008 From: danny at dannysplace.net (Danny Carroll) Date: Mon Nov 17 15:46:27 2008 Subject: Areca vs. ZFS performance testing. In-Reply-To: References: <490A782F.9060406@dannysplace.net> <20081031033208.GA21220@icarus.home.lan> <490A849C.7030009@dannysplace.net> <20081031043412.GA22289@icarus.home.lan> <490A8FAD.8060009@dannysplace.net> <491BBF38.9010908@dannysplace.net> <491C5AA7.1030004@samsco.org> <491C9535.3030504@dannysplace.net> <4920E1DD.7000101@dannysplace.net> <492158D2.5020506@dannysplace.net> Message-ID: <49220238.2040507@dannysplace.net> Matt Simerson wrote: > Disk caching is a completely different animal, and one which I didn't > mention. I'm spoke only about the write cache on the controller. Mine > all arrived off by default, which is a VERY reasonable default > configuration. Page 97 of the manual says about it: Ahhh, no I was talking about the disk cache setting. That is the one that is set to on by default (at least for me). I find it strange that this is the case. IMHO it makes the idea of a Battery backed cache redundant. > Perhaps it's model specific, or your vendor configured it that way. Or > you got a return that someone else monkeyed with. I'm not going to speak > for Areca but it seems quite odd that Areca would ship them with the > cache enabled. I've used many hundreds of RAID controllers over the > years and without exception, every single one with a write cache had it > disabled by default. I guess I had a return model. It's not really a big deal. -D From xcllnt at mac.com Mon Nov 17 22:27:20 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Mon Nov 17 22:27:27 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <49198A1A.3080600@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> Message-ID: [sorry for the delay] On Nov 11, 2008, at 5:35 AM, Andriy Gapon wrote: > on 05/11/2008 18:03 Andriy Gapon said the following: >> Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I *snip* >> Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and >> options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it >> couldn't mount ZFS root and I simply rebooted my machine when I >> stuck at >> mountroot prompt (I couldn't enter UFS2 root because of unrelated >> keyboard problem). >> The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART >> messages (errors, warnings). The problem is very likely related to change 184204. This change fixes a conflict between MBR and BSD. Unfortunately this fix wasn't in 7.1-BETA2. You should not have a problem with 7.1-RELEASE (nor 7-STABLE). FYI, -- Marcel Moolenaar xcllnt@mac.com From avg at icyb.net.ua Tue Nov 18 00:10:30 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Nov 18 00:10:37 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> Message-ID: <49227875.6090902@icyb.net.ua> on 18/11/2008 07:27 Marcel Moolenaar said the following: > [sorry for the delay] > > On Nov 11, 2008, at 5:35 AM, Andriy Gapon wrote: > >> on 05/11/2008 18:03 Andriy Gapon said the following: >>> Using GENERIC amd64 7-BETA2 system (installed from "official" ISO) I > *snip* >>> Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and >>> options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel it >>> couldn't mount ZFS root and I simply rebooted my machine when I stuck at >>> mountroot prompt (I couldn't enter UFS2 root because of unrelated >>> keyboard problem). >>> The boot was verbose and I didn't see any peculiar GEOM or GEOM_PART >>> messages (errors, warnings). > > The problem is very likely related to change 184204. This > change fixes a conflict between MBR and BSD. Unfortunately > this fix wasn't in 7.1-BETA2. You should not have a problem > with 7.1-RELEASE (nor 7-STABLE). Marcel, this particular change was definitely in kernel. As I reported in subsequent posts gpart show reported everything correctly and device node existed in dev, etc. UFS was happy about all its partitions, only ZFS had trouble. I think that this was something different, more subtle. -- Andriy Gapon From xcllnt at mac.com Tue Nov 18 08:45:39 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Tue Nov 18 08:45:51 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <49227875.6090902@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> Message-ID: <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> On Nov 18, 2008, at 12:10 AM, Andriy Gapon wrote: > on 18/11/2008 07:27 Marcel Moolenaar said the following: >> [sorry for the delay] >> On Nov 11, 2008, at 5:35 AM, Andriy Gapon wrote: >>> on 05/11/2008 18:03 Andriy Gapon said the following: >>>> Using GENERIC amd64 7-BETA2 system (installed from "official" >>>> ISO) I >> *snip* >>>> Then I built a custom kernel with nooptions for GEOM_(BSD|MBR) and >>>> options for GEOM_PART_(BSD|MBR). When I tried to boot this kernel >>>> it >>>> couldn't mount ZFS root and I simply rebooted my machine when I >>>> stuck at >>>> mountroot prompt (I couldn't enter UFS2 root because of unrelated >>>> keyboard problem). >>>> The boot was verbose and I didn't see any peculiar GEOM or >>>> GEOM_PART >>>> messages (errors, warnings). >> The problem is very likely related to change 184204. This >> change fixes a conflict between MBR and BSD. Unfortunately >> this fix wasn't in 7.1-BETA2. You should not have a problem >> with 7.1-RELEASE (nor 7-STABLE). > > Marcel, > > this particular change was definitely in kernel. > As I reported in subsequent posts gpart show reported everything > correctly and device node existed in dev, etc. UFS was happy about > all its partitions, only ZFS had trouble. I think that this was > something different, more subtle. Hmmm... this goes over my head. Some ZFS guru needs to tell us what criteria are being checked exactly before a disk/provider is considered the one recorded in the meta-data. -- Marcel Moolenaar xcllnt@mac.com From avg at icyb.net.ua Tue Nov 18 09:29:46 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Nov 18 09:29:53 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> Message-ID: <4922FB81.50608@icyb.net.ua> I just remembered that I saved old zpool.cache file before "migrating" the pool. I looked at the diff of hexdumps and there are a number of differences, it's hard to understand them because the file is binary (actually it seems to contain serialized name-value pairs), but one difference is prominent: ... 00000260 64 65 76 69 64 00 00 00 00 00 00 09 00 00 00 01 |devid...........| ... -00000270 00 00 00 15 61 64 3a 47 45 41 35 33 34 52 46 30 |....ad:GEA534RF0| -00000280 54 4b 33 35 41 73 31 73 33 00 00 00 00 00 00 28 |TK35As1s3......(| ... +00000270 00 00 00 11 61 64 3a 47 45 41 35 33 34 52 46 30 |....ad:GEA534RF0| +00000280 54 4b 33 35 41 00 00 00 00 00 00 28 00 00 00 28 |TK35A......(...(| ... It looks like old "devid" value is "ad:GEA534RF0TK35As1s3" and new one is "ad:GEA534RF0TK35A". Just a reminder: actual zpool device is ad6s2d. The new value is what is reported by diskinfo: $ diskinfo -v ad6 ad6 ... ad:GEA534RF0TK35A # Disk ident. $ diskinfo -v ad6s2 ad6s2 ... ad:GEA534RF0TK35A # Disk ident. $ diskinfo -v ad6s2d ad6s2d ... ad:GEA534RF0TK35A # Disk ident. Hmm, "indent" is reported to be the same for all three entities. I don't remember what diskinfo reported with pre-gpart kernel, but I suspect that it was something different. Could anybody please check this? (on 7.X machine without GEOM_PART). I quickly glimpsed through sources and it seems that this comes from DIOCGIDENT GEOM ioctl i.e. "GEOM::ident" attribute. It seems that geom_slice.c code has some special handling for that. -- Andriy Gapon From peter.schuller at infidyne.com Tue Nov 18 09:52:12 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Tue Nov 18 09:52:19 2008 Subject: Will XFS be adopted In-Reply-To: <20081117050441.GA16855@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> Message-ID: <20081118175210.GA3753@hyperion.scode.org> > YES! In my opinion it's not even appropriate for a machine with 2GB of > RAM. Why waste so much RAM on an FS? Does anyone know? Or is this some > sort of conspiracy to sell more bgger boxes. It's Sun, afterall.... I don't know whether, when people say this, they are just trying to spew FUD or they really don't realize the distinction. But regardless, please note that ZFS does not "waste" 2 GBs of memory. The memory it "uses" has to do with the fact that it has a dedicated cache - the ARC - that is distinct from the otherwise operating system integrated buffer cache. Now I *fully* realize that this sucks for some use cases (I have such use cases myself) where you simply do not want to reserve a portion of memory to file system caching. I also realize that the issues people have in terms of forcing the ARC to be really small can be a problem. However, the implication when people say that ZFS "wastes" a bunch of memory, seems to be that it somehow just uses up a bunch of memory for no good reason other than some kind of bloat. This is not the case. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081118/4af4cd08/attachment.pgp From xcllnt at mac.com Tue Nov 18 11:50:01 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Tue Nov 18 11:50:08 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <4922FB81.50608@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> <4922FB81.50608@icyb.net.ua> Message-ID: <022C4222-63B2-4535-8B7E-0426E9CE2BEA@mac.com> On Nov 18, 2008, at 9:29 AM, Andriy Gapon wrote: > I just remembered that I saved old zpool.cache file before "migrating" > the pool. > I looked at the diff of hexdumps and there are a number of > differences, > it's hard to understand them because the file is binary (actually it > seems to contain serialized name-value pairs), but one difference is > prominent: > ... > 00000260 64 65 76 69 64 00 00 00 00 00 00 09 00 00 00 01 > |devid...........| > ... > -00000270 00 00 00 15 61 64 3a 47 45 41 35 33 34 52 46 30 > |....ad:GEA534RF0| > -00000280 54 4b 33 35 41 73 31 73 33 00 00 00 00 00 00 28 > |TK35As1s3......(| > ... > +00000270 00 00 00 11 61 64 3a 47 45 41 35 33 34 52 46 30 > |....ad:GEA534RF0| > +00000280 54 4b 33 35 41 00 00 00 00 00 00 28 00 00 00 28 > |TK35A......(...(| > ... > > It looks like old "devid" value is "ad:GEA534RF0TK35As1s3" and new one > is "ad:GEA534RF0TK35A". Just a reminder: actual zpool device is > ad6s2d. > > The new value is what is reported by diskinfo: > $ diskinfo -v ad6 > ad6 > ... > ad:GEA534RF0TK35A # Disk ident. > > $ diskinfo -v ad6s2 > ad6s2 > ... > ad:GEA534RF0TK35A # Disk ident. > > $ diskinfo -v ad6s2d > ad6s2d > ... > ad:GEA534RF0TK35A # Disk ident. > > Hmm, "indent" is reported to be the same for all three entities. > > I don't remember what diskinfo reported with pre-gpart kernel, but I > suspect that it was something different. > Could anybody please check this? (on 7.X machine without GEOM_PART). > > I quickly glimpsed through sources and it seems that this comes from > DIOCGIDENT GEOM ioctl i.e. "GEOM::ident" attribute. It seems that > geom_slice.c code has some special handling for that. Interesting. Can you try the attached patch to GPart: -- Marcel Moolenaar xcllnt@mac.com -------------- next part -------------- A non-text attachment was scrubbed... Name: gpart.diff Type: application/octet-stream Size: 1759 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081118/4c51f9ee/gpart.obj From dan-freebsd-fs at ourbrains.org Tue Nov 18 16:17:21 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Tue Nov 18 16:17:28 2008 Subject: Will XFS be adopted In-Reply-To: <20081118175210.GA3753@hyperion.scode.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> <20081118175210.GA3753@hyperion.scode.org> Message-ID: <20081119001742.GA21835@ourbrains.org> Peter Schuller(peter.schuller@infidyne.com)@2008.11.18 18:52:10 +0100: > However, the implication when people say that ZFS "wastes" a bunch of > memory, seems to be that it somehow just uses up a bunch of memory for > no good reason other than some kind of bloat. This is not the case. Has anyone done any bechmarks? Is the cache really helping that much? If it doesn't, and it performs similarly to other journaling FSes that do not use this much RAM, well, if it's not waste then what? Does it guarantee the same atomicity that UFS does? Is it OK to run an email server on it? Will I lose messages in cases of powerfail/crash? From andrew at modulus.org Tue Nov 18 16:48:55 2008 From: andrew at modulus.org (Andrew Snow) Date: Tue Nov 18 16:49:02 2008 Subject: Will XFS be adopted In-Reply-To: <20081119001742.GA21835@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> <20081118175210.GA3753@hyperion.scode.org> <20081119001742.GA21835@ourbrains.org> Message-ID: <49235D86.4050106@modulus.org> Dan wrote: > Has anyone done any bechmarks? Is the cache really helping that much? 1. The downside to the ZFS benefits of instantaneous snapshots, clones, and filesystem-level RAID, is that it has to go through its metadata when you want to search directories or read files. A big cache helps make that faster as the commonly loaded tree nodes are pre-fetched and cached. File data is also pre-fetched, ZFS can handle multiple forward or backward reading streams per open file. 2. Much of the cache is used for writing cache, the more memory that can be thrown at that the more optimised the writing to disk can be. > If > it doesn't, and it performs similarly to other journaling FSes that do > not use this much RAM, well, if it's not waste then what? As I said above, the other filesystems don't give you built-in instant snapshotting and RAID. > Does it guarantee the same atomicity that UFS does? Yes. > Is it OK to run an email server on it? Will I lose messages in cases of powerfail/crash? It is perfect for running email because the transparent compression saves you space and I/O time. However, I would wait until it has been considered stable and moved into the 7-STABLE tree before deploying a production server. - Andrew From dan-freebsd-fs at ourbrains.org Tue Nov 18 21:24:08 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Tue Nov 18 21:24:15 2008 Subject: (no subject) Message-ID: <20081119052428.GC4136@ourbrains.org> A recent question came up about huge numbers of files in one directory. Well, some people actually have to deal with it on the job: http://leaf.dragonflybsd.org/mailarchive/kernel/2008-11/msg00070.html An FS doesn't have to be designed such that file look-ups take a very long time to search when directories are large. When a nice hash is used as part of the FS design, the time to search for 1 in a 100 files or 2 billion is the same. I view it as a feature. I can imagine a few cases where a large, non-human-readable directory is used to store many files. When developers know they have this feature at hand, they might as well use it. FS-based databases, image/sound editing, etc. From des at des.no Wed Nov 19 00:44:40 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed Nov 19 00:44:47 2008 Subject: Will XFS be adopted In-Reply-To: <49235D86.4050106@modulus.org> (Andrew Snow's message of "Wed, 19 Nov 2008 11:27:50 +1100") References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> <20081118175210.GA3753@hyperion.scode.org> <20081119001742.GA21835@ourbrains.org> <49235D86.4050106@modulus.org> Message-ID: <86bpwcp1d8.fsf@ds4.des.no> Andrew Snow writes: > [...] I would wait until it has been considered stable and moved into > the 7-STABLE tree before deploying a production server. ZFS has been in 7 for over a year. DES -- Dag-Erling Sm?rgrav - des@des.no From fbsd at dannysplace.net Wed Nov 19 04:46:23 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Wed Nov 19 04:46:29 2008 Subject: ZFS Drive change best practice. Message-ID: <49240A80.5010106@dannysplace.net> I need to migrate my ZFS drives from one bus to another. As such they will be getting new device names. I am pretty sure that if I do this with an export/import then it will just get the job done. Does anyone know if there I need to take anything special into consideration, like perhaps maintaining the order of the drives? -D From koitsu at FreeBSD.org Wed Nov 19 04:51:45 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Nov 19 04:51:52 2008 Subject: ZFS Drive change best practice. In-Reply-To: <49240A80.5010106@dannysplace.net> References: <49240A80.5010106@dannysplace.net> Message-ID: <20081119125138.GA86942@icarus.home.lan> On Wed, Nov 19, 2008 at 10:45:52PM +1000, Danny Carroll wrote: > I need to migrate my ZFS drives from one bus to another. As such they > will be getting new device names. > > I am pretty sure that if I do this with an export/import then it will > just get the job done. > > Does anyone know if there I need to take anything special into > consideration, like perhaps maintaining the order of the drives? Drive order does not matter, and drive label (e.g. adX or daX) does not matter; "zpool import -a" will figure it out. HOWEVER, there is a problem where ZFS can list the same drive label twice or more in the members list. Take a look at the last part of my Wiki here -- you'll see two members with the same name. This was induced by moving cables between two different controllers: http://wiki.freebsd.org/JeremyChadwick/ZFS_disk_upgrade_gone_bad I sent mail to pjd@ about this, but didn't receive a response (which is understandable given how busy he is). I have no idea if the latest ZFS commit on CURRENT fixes this problem. Just something to keep in mind. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From nick.barkas at gmail.com Wed Nov 19 05:55:46 2008 From: nick.barkas at gmail.com (Nick Barkas) Date: Wed Nov 19 05:55:52 2008 Subject: (no subject) In-Reply-To: <20081119052428.GC4136@ourbrains.org> References: <20081119052428.GC4136@ourbrains.org> Message-ID: On Wed, Nov 19, 2008 at 06:24, Dan wrote: > A recent question came up about huge numbers of files in one directory. > Well, some people actually have to deal with it on the job: > > http://leaf.dragonflybsd.org/mailarchive/kernel/2008-11/msg00070.html > > An FS doesn't have to be designed such that file look-ups take a very > long time to search when directories are large. When a nice hash is used > as part of the FS design, the time to search for 1 in a 100 files or 2 > billion is the same. I view it as a feature. I can imagine a few cases > where a large, non-human-readable directory is used to store many files. > When developers know they have this feature at hand, they might as well > use it. FS-based databases, image/sound editing, etc. I'm not sure if this is what you're looking for, but FreeBSD's does have some provisions to avoid too much performance degradation with large directories. The VFS name cache will speed up look-up operations on specific individual files in any size directory that are repeatedly searched for, and it is filesystem independent. Specific to UFS2 there is dirhash, which was implemented by Ian Dowse and David Malone. It speeds up more types of operations involving large directories. They wrote a paper about it you can find here: http://www.usenix.org/events/usenix02/tech/freenix/dowse.html More recently I've done a little bit of work on dirhash as well that might further speed things up. It's not committed to SVN yet, but is in Perforce. I sent out patches to this list a little while back but have not received any reports from testers. My patches might need to be updated to apply on the latest -CURRENT, and I'll try to update the wiki page (http://wiki.freebsd.org/DirhashDynamicMemory) if I find out that that is the case. I am hoping to find the time in the next few months to start working on on-disk directory indexing for UFS2 so that linear searching through directory entries is never necessary. You are correct in that filesystems don't have to be designed such that searches are slow for large directories, but UFS was designed quite a long time ago. It is not trivial to change disk formats for directories now, especially given that we want to remain backwards compatible and be able to work properly with softupdates. I hope I can help make it happen though :) Nick From dan-freebsd-fs at ourbrains.org Wed Nov 19 06:38:52 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Wed Nov 19 06:38:58 2008 Subject: Large Directories In-Reply-To: References: <20081119052428.GC4136@ourbrains.org> Message-ID: <20081119143913.GA6058@ourbrains.org> Nick Barkas(nick.barkas@gmail.com)@2008.11.19 14:29:24 +0100: I know about dirhash, but it blows out at a few dozen thousand files. There's is a nice discussion here: http://leaf.dragonflybsd.org/mailarchive/kernel/2008-11/msg00055.html From pjd at FreeBSD.org Wed Nov 19 13:12:58 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Wed Nov 19 13:13:04 2008 Subject: ZFS Drive change best practice. In-Reply-To: <20081119125138.GA86942@icarus.home.lan> References: <49240A80.5010106@dannysplace.net> <20081119125138.GA86942@icarus.home.lan> Message-ID: <20081119211227.GA2553@garage.freebsd.pl> On Wed, Nov 19, 2008 at 04:51:38AM -0800, Jeremy Chadwick wrote: > On Wed, Nov 19, 2008 at 10:45:52PM +1000, Danny Carroll wrote: > > I need to migrate my ZFS drives from one bus to another. As such they > > will be getting new device names. > > > > I am pretty sure that if I do this with an export/import then it will > > just get the job done. > > > > Does anyone know if there I need to take anything special into > > consideration, like perhaps maintaining the order of the drives? > > Drive order does not matter, and drive label (e.g. adX or daX) does > not matter; "zpool import -a" will figure it out. > > HOWEVER, there is a problem where ZFS can list the same drive label > twice or more in the members list. Take a look at the last part of my > Wiki here -- you'll see two members with the same name. This was > induced by moving cables between two different controllers: > > http://wiki.freebsd.org/JeremyChadwick/ZFS_disk_upgrade_gone_bad > > I sent mail to pjd@ about this, but didn't receive a response (which is > understandable given how busy he is). I have no idea if the latest ZFS > commit on CURRENT fixes this problem. > > Just something to keep in mind. Sorry about lack of response. The situation should improve with recent ZFS, although at the end I decided to give up on disks IDs, as they only work with ATA disks and nobody implemented this functionality for SCSI disks. After some more tests I'll commit changes that make ZFS to always depend on metadata only. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081119/8da145fd/attachment.pgp From fbsd at dannysplace.net Wed Nov 19 14:22:05 2008 From: fbsd at dannysplace.net (Danny Carroll) Date: Wed Nov 19 14:22:11 2008 Subject: ZFS Drive change best practice. In-Reply-To: <20081119211227.GA2553@garage.freebsd.pl> References: <49240A80.5010106@dannysplace.net> <20081119125138.GA86942@icarus.home.lan> <20081119211227.GA2553@garage.freebsd.pl> Message-ID: <49249169.9010406@dannysplace.net> Pawel Jakub Dawidek wrote: > After some more tests I'll commit changes that make ZFS to always depend > on metadata only. > So I guess that means in the future, you could power down a system, re-arrange the disk positions and when you powered back up ZFS would just work? Could I also ask what the usual lead time is for ZFS changes to go from current -> stable? I'm just curious. -D From olivier at gid0.org Wed Nov 19 14:35:28 2008 From: olivier at gid0.org (Olivier SMEDTS) Date: Wed Nov 19 14:35:36 2008 Subject: ZFSBoot try and bsdlabel bootstrap code Message-ID: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> Hello, I want to boot off a ZFS pool (version 13) on an USB stick for testing purposes. But I'm stuck with the bsdlabel bootstrap code size... I'm using a 2 hours old CURRENT. # kldload usb2_storage_mass # kldload zfs # dd if=/dev/zero of=/dev/da0 bs=512 count=32 # fdisk -BI da0 # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 # bsdlabel -wB -b /boot/zfsboot da0s1 bsdlabel: boot code /boot/zfsboot is wrong size Is what I'm trying to do with bsdlabel wrong ? I previously tried with the default bootstrap code but I had an (expected) "boot: Not ufs" error at boot. PS : I'm not subscribed to this list. Cheers, Olivier -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From dan-freebsd-fs at ourbrains.org Wed Nov 19 15:15:23 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Wed Nov 19 15:15:30 2008 Subject: (no subject) In-Reply-To: References: <20081119052428.GC4136@ourbrains.org> Message-ID: <20081119231543.GA7659@ourbrains.org> Nick Barkas(nick.barkas@gmail.com)@2008.11.19 14:29:24 +0100: > I'm not sure if this is what you're looking for, but FreeBSD's does > have some provisions to avoid too much performance degradation with > large directories. The VFS name cache will speed up look-up operations > on specific individual files in any size directory that are repeatedly > searched for, and it is filesystem independent. Specific to UFS2 there > is dirhash, which was implemented by Ian Dowse and David Malone. It > speeds up more types of operations involving large directories. They I know. dirhash on only great until a few dozen thousand files, then it blows out. You might be interested in the dicussion here: http://leaf.dragonflybsd.org/mailarchive/kernel/2008-11/msg00055.html From peter.schuller at infidyne.com Wed Nov 19 15:50:51 2008 From: peter.schuller at infidyne.com (Peter Schuller) Date: Wed Nov 19 15:50:58 2008 Subject: ZFS Drive change best practice. In-Reply-To: <49249169.9010406@dannysplace.net> References: <49240A80.5010106@dannysplace.net> <20081119125138.GA86942@icarus.home.lan> <20081119211227.GA2553@garage.freebsd.pl> <49249169.9010406@dannysplace.net> Message-ID: <20081119235049.GA61843@hyperion.scode.org> > So I guess that means in the future, you could power down a system, > re-arrange the disk positions and when you powered back up ZFS would > just work? FWIW, if glabel is used for all constituent drives this is already the case. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081119/220ae898/attachment.pgp From doconnor at gsoft.com.au Wed Nov 19 18:26:45 2008 From: doconnor at gsoft.com.au (Daniel O'Connor) Date: Wed Nov 19 18:26:52 2008 Subject: Unique ID for UFS? Message-ID: <200811201215.42008.doconnor@gsoft.com.au> Hi, I am wondering if there is a unique ID generated for each UFS already? If not would it be possible to add one somehow? There is glabel, but I think having a UUID embedded in the FS would be very handy for automation andwould prevent accidents that glabel can cause. So, there could be a gfsid module that reads IDs from the FS (NTFS, ext2/3, UFS) and creates device nodes to allow access. Linux has something like this (or rather ext2/xfs do) and NTFS appears to have a unique ID (or at least Linux thinks so :) Thanks. PS please CC me as I'm not on the list. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081120/846ceb8c/attachment.pgp From nick.barkas at gmail.com Thu Nov 20 01:01:13 2008 From: nick.barkas at gmail.com (Nick Barkas) Date: Thu Nov 20 01:01:19 2008 Subject: Large Directories In-Reply-To: <20081119143913.GA6058@ourbrains.org> References: <20081119052428.GC4136@ourbrains.org> <20081119143913.GA6058@ourbrains.org> Message-ID: On Wed, Nov 19, 2008 at 15:39, Dan wrote: > Nick Barkas(nick.barkas@gmail.com)@2008.11.19 14:29:24 +0100: > I know about dirhash, but it blows out at a few dozen thousand files. This is because the default maximum amount of memory dirhash is allowed to use is only 2MB. Try increasing vfs.ufs.dirhash_maxmem. Of course, if you do have a directory larger than the amount of memory you can allow dirhash (e.g. millions of files in a directory on a system that doesn't have tens or hundreds of MB of memory to spare), dirhash can't help you. I just did a test creating ten million fake email messages in a maildir, and dirhash needed about 260MB of memory for it. Nick From ivoras at freebsd.org Thu Nov 20 02:27:42 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Nov 20 02:27:48 2008 Subject: Unique ID for UFS? In-Reply-To: <200811201215.42008.doconnor@gsoft.com.au> References: <200811201215.42008.doconnor@gsoft.com.au> Message-ID: Daniel O'Connor wrote: > Hi, > I am wondering if there is a unique ID generated for each UFS already? If not > would it be possible to add one somehow? > > There is glabel, but I think having a UUID embedded in the FS would be very > handy for automation andwould prevent accidents that glabel can cause. > > So, there could be a gfsid module that reads IDs from the FS (NTFS, ext2/3, > UFS) and creates device nodes to allow access. Looking at the output of dumpfs, there is an 64-bit numeric "id" field that changes from file system to file system so this might it: magic 19540119 (UFS2) time Sat Nov 15 04:16:42 2008 superblock location 65536 id [ 46ea67b4 178d71a1 ] (but judging from how the value changes on my file systems it might be related to the timestamp). If this is a usable ID, it should be trivial to make glabel create IDs nodes (i.e. /dev/ufs/46ea67b4178d71a1). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081120/d127d8fc/signature.pgp From ivoras at freebsd.org Thu Nov 20 02:33:48 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Nov 20 02:33:54 2008 Subject: Unique ID for UFS? In-Reply-To: References: <200811201215.42008.doconnor@gsoft.com.au> Message-ID: Ivan Voras wrote: > magic 19540119 (UFS2) time Sat Nov 15 04:16:42 2008 > superblock location 65536 id [ 46ea67b4 178d71a1 ] > > (but judging from how the value changes on my file systems it might be > related to the timestamp). ^^^^ File system *CREATION* file stamp that is. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081120/6c0f35f2/signature.pgp From dfr at rabson.org Thu Nov 20 08:07:52 2008 From: dfr at rabson.org (Doug Rabson) Date: Thu Nov 20 08:07:59 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> Message-ID: <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: > Hello, > > I want to boot off a ZFS pool (version 13) on an USB stick for testing > purposes. But I'm stuck with the bsdlabel bootstrap code size... > I'm using a 2 hours old CURRENT. > > # kldload usb2_storage_mass > # kldload zfs > # dd if=/dev/zero of=/dev/da0 bs=512 count=32 > # fdisk -BI da0 > # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 > # bsdlabel -wB -b /boot/zfsboot da0s1 > bsdlabel: boot code /boot/zfsboot is wrong size > > Is what I'm trying to do with bsdlabel wrong ? > I previously tried with the default bootstrap code but I had an > (expected) "boot: Not ufs" error at boot. > > PS : I'm not subscribed to this list. The process for install zfsboot is a bit manual (and undocumented). Try something like this: # dd if=/boot/zfsboot of=/dev/da0s1 count=1 # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 Alternatively, you might try using the brand new support for GPT that I committed yesterday: # gpt create -f da0 # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 # gpt add -t freebsd-zfs da0 # zpool create mypool da0p2 From olivier at gid0.org Thu Nov 20 08:20:28 2008 From: olivier at gid0.org (Olivier SMEDTS) Date: Thu Nov 20 08:20:35 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> Message-ID: <367b2c980811200820h5d1a058ax48cceb26d0c48137@mail.gmail.com> 2008/11/20 Doug Rabson : > > The process for install zfsboot is a bit manual (and undocumented). Try > something like this: > > # dd if=/boot/zfsboot of=/dev/da0s1 count=1 > # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 Great, I'm going to try that. Thanks ! > Alternatively, you might try using the brand new support for GPT that I > committed yesterday: > > # gpt create -f da0 > # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 > # gpt add -t freebsd-zfs da0 > # zpool create mypool da0p2 I like testing brand new FreeBSD features on my brand old boxes. I'm going to have fun this weekend :) Cheers, Olivier -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From lulf at stud.ntnu.no Thu Nov 20 08:26:07 2008 From: lulf at stud.ntnu.no (Ulf Lilleengen) Date: Thu Nov 20 08:26:14 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> Message-ID: <20081120172634.GA1438@carrot.studby.ntnu.no> On Thu, Nov 20, 2008 at 04:07:50PM +0000, Doug Rabson wrote: > > On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: > > > Hello, > > > > I want to boot off a ZFS pool (version 13) on an USB stick for testing > > purposes. But I'm stuck with the bsdlabel bootstrap code size... > > I'm using a 2 hours old CURRENT. > > > > # kldload usb2_storage_mass > > # kldload zfs > > # dd if=/dev/zero of=/dev/da0 bs=512 count=32 > > # fdisk -BI da0 > > # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 > > # bsdlabel -wB -b /boot/zfsboot da0s1 > > bsdlabel: boot code /boot/zfsboot is wrong size > > > > Is what I'm trying to do with bsdlabel wrong ? > > I previously tried with the default bootstrap code but I had an > > (expected) "boot: Not ufs" error at boot. > > > > PS : I'm not subscribed to this list. > > The process for install zfsboot is a bit manual (and undocumented). I see bsdlabel is restricted by BBSIZE. Could we perhaps increase it? or is this something that will break everything? I suspect it might be hard to maintain bsdlabel backwards compability with such a change. As an alternative, adding a flag for extended block size might be an option. -- Ulf Lilleengen From dfr at rabson.org Thu Nov 20 08:40:25 2008 From: dfr at rabson.org (Doug Rabson) Date: Thu Nov 20 08:40:32 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <20081120172634.GA1438@carrot.studby.ntnu.no> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <20081120172634.GA1438@carrot.studby.ntnu.no> Message-ID: <81FBD67A-6908-4416-822D-CF943CDD50CF@rabson.org> On 20 Nov 2008, at 17:26, Ulf Lilleengen wrote: > On Thu, Nov 20, 2008 at 04:07:50PM +0000, Doug Rabson wrote: >> >> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >> >>> Hello, >>> >>> I want to boot off a ZFS pool (version 13) on an USB stick for >>> testing >>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>> I'm using a 2 hours old CURRENT. >>> >>> # kldload usb2_storage_mass >>> # kldload zfs >>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>> # fdisk -BI da0 >>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>> bsdlabel: boot code /boot/zfsboot is wrong size >>> >>> Is what I'm trying to do with bsdlabel wrong ? >>> I previously tried with the default bootstrap code but I had an >>> (expected) "boot: Not ufs" error at boot. >>> >>> PS : I'm not subscribed to this list. >> >> The process for install zfsboot is a bit manual (and undocumented). > I see bsdlabel is restricted by BBSIZE. Could we perhaps increase > it? or is > this something that will break everything? I suspect it might be > hard to > maintain bsdlabel backwards compability with such a change. As an > alternative, adding a flag for extended block size might be an option. That won't help much - bsdlabel understands how to install bootcode for UFS only. The boot process for ZFS is a bit different. From dan-freebsd-fs at ourbrains.org Thu Nov 20 08:48:03 2008 From: dan-freebsd-fs at ourbrains.org (Dan) Date: Thu Nov 20 08:48:09 2008 Subject: Will XFS be adopted In-Reply-To: <86bpwcp1d8.fsf@ds4.des.no> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> <20081118175210.GA3753@hyperion.scode.org> <20081119001742.GA21835@ourbrains.org> <49235D86.4050106@modulus.org> <86bpwcp1d8.fsf@ds4.des.no> Message-ID: <20081120164823.GA8513@ourbrains.org> Dag-Erling Sm??rgrav(des@des.no)@2008.11.19 09:26:59 +0100: > Andrew Snow writes: > > [...] I would wait until it has been considered stable and moved into > > the 7-STABLE tree before deploying a production server. > > ZFS has been in 7 for over a year. > > DES > -- > Dag-Erling Sm??rgrav - des@des.no But is it considered stable? :) From freebsdlists at bsdunix.ch Thu Nov 20 16:48:15 2008 From: freebsdlists at bsdunix.ch (Thomas Vogt) Date: Thu Nov 20 16:48:22 2008 Subject: Will XFS be adopted In-Reply-To: <20081120164823.GA8513@ourbrains.org> References: <20081109174303.GA5146@ourbrains.org> <20081109184349.GG51239@server.vk2pj.dyndns.org> <4920D879.3070806@jrv.org> <20081117050441.GA16855@ourbrains.org> <20081118175210.GA3753@hyperion.scode.org> <20081119001742.GA21835@ourbrains.org> <49235D86.4050106@modulus.org> <86bpwcp1d8.fsf@ds4.des.no> <20081120164823.GA8513@ourbrains.org> Message-ID: <18070C72-6354-43B4-9F36-7E1BE41DDA0A@bsdunix.ch> Helo Am 20.11.2008 um 17:48 schrieb Dan: > Dag-Erling Sm??rgrav(des@des.no)@2008.11.19 09:26:59 +0100: >> Andrew Snow writes: >>> [...] I would wait until it has been considered stable and moved >>> into >>> the 7-STABLE tree before deploying a production server. >> >> ZFS has been in 7 for over a year. >> >> DES >> -- >> Dag-Erling Sm??rgrav - des@des.no > > But is it considered stable? :) I can share my experiance: We run an official mirror server for many opensource projects. We use FreeBSD 7 including ZFS as the storage server. The system is mirroring a lot of data every day via rsync to the local zfs pool and offering all data via ftp/rsync and http to the end user. We have a few terabyte traffic every week and a lot of i/o load. After a few tweaks (vfs.zfs.arc_max etc) FreeBSD 7.x is running without any problems since a few months. Even with a few crashes at the beginning, we never encountered any data loss with zfs. I'm just talking about FreeBSD 7.x and not FreeBSD current. But current is not considered as production ready. Regards Thomas From olivier at gid0.org Fri Nov 21 13:31:32 2008 From: olivier at gid0.org (Olivier SMEDTS) Date: Fri Nov 21 13:31:39 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> Message-ID: <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> 2008/11/20 Doug Rabson : > > On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: > >> Hello, >> >> I want to boot off a ZFS pool (version 13) on an USB stick for testing >> purposes. But I'm stuck with the bsdlabel bootstrap code size... >> I'm using a 2 hours old CURRENT. >> >> # kldload usb2_storage_mass >> # kldload zfs >> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >> # fdisk -BI da0 >> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >> # bsdlabel -wB -b /boot/zfsboot da0s1 >> bsdlabel: boot code /boot/zfsboot is wrong size >> >> Is what I'm trying to do with bsdlabel wrong ? >> I previously tried with the default bootstrap code but I had an >> (expected) "boot: Not ufs" error at boot. >> >> PS : I'm not subscribed to this list. > > The process for install zfsboot is a bit manual (and undocumented). Try > something like this: > > # dd if=/boot/zfsboot of=/dev/da0s1 count=1 > # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 It works ! Now I'm stuck at loader(8) prompt. After having a look at your patch, I tried building world with "LOADER_ZFS_SUPPORT=yes". And it seems broken, at least on amd64 : # cd /usr/src # make buildworld LOADER_ZFS_SUPPORT=yes [...] ===> sys (all) ===> sys/boot (all) ===> sys/boot/ficl (all) ===> sys/boot/efi (all) ===> sys/boot/efi/libefi (all) ===> sys/boot/zfs (all) ln -sf /work/src/sys/boot/zfs/../../../i386/include machine cc -O2 -pipe -march=native -I/work/src/sys/boot/zfs/../common -I/work/src/sys/boot/zfs/../.. -I. -I/work/src/sys/boot/zfs/../../../lib/libstand -I/work/src/sys/boot/zfs/../../cddl/boot/zfs -ffreestanding -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -Wformat -Wall -c /work/src/sys/boot/zfs/zfs.c /work/src/sys/boot/zfs/zfs.c:1: error: -mpreferred-stack-boundary=2 is not between 4 and 12 *** Error code 1 Stop in /work/src/sys/boot/zfs. *** Error code 1 Stop in /work/src/sys/boot. *** Error code 1 Stop in /work/src/sys. *** Error code 1 Stop in /work/src. *** Error code 1 Stop in /work/src. *** Error code 1 Stop in /work/src. > Alternatively, you might try using the brand new support for GPT that I > committed yesterday: Well, this one was broken on amd64 and is now disconnected from the build. Any advice ? Olivier -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From olivier at gid0.org Fri Nov 21 16:37:19 2008 From: olivier at gid0.org (Olivier SMEDTS) Date: Fri Nov 21 16:37:26 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> Message-ID: <20081122001256.GA16276@q.gid0.org> On Fri, Nov 21, 2008 at 10:31:31PM +0100, Olivier SMEDTS wrote: > 2008/11/20 Doug Rabson : > > > > On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: > > > >> Hello, > >> > >> I want to boot off a ZFS pool (version 13) on an USB stick for testing > >> purposes. But I'm stuck with the bsdlabel bootstrap code size... > >> I'm using a 2 hours old CURRENT. > >> > >> # kldload usb2_storage_mass > >> # kldload zfs > >> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 > >> # fdisk -BI da0 > >> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 > >> # bsdlabel -wB -b /boot/zfsboot da0s1 > >> bsdlabel: boot code /boot/zfsboot is wrong size > >> > >> Is what I'm trying to do with bsdlabel wrong ? > >> I previously tried with the default bootstrap code but I had an > >> (expected) "boot: Not ufs" error at boot. > >> > >> PS : I'm not subscribed to this list. > > > > The process for install zfsboot is a bit manual (and undocumented). Try > > something like this: > > > > # dd if=/boot/zfsboot of=/dev/da0s1 count=1 > > # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 > > It works ! > > Now I'm stuck at loader(8) prompt. > After having a look at your patch, I tried building world with > "LOADER_ZFS_SUPPORT=yes". And it seems broken, at least on amd64 : I managed to complete a fresh "buildworld LOADER_ZFS_SUPPORT=yes" with the attached patch. I took flags from sys/boot/i386/Makefile.inc without trying to really understand what was needed. Loader seems to recognize the zpool but can't "ls". I'll investigate that later. > # cd /usr/src > # make buildworld LOADER_ZFS_SUPPORT=yes > [...] > ===> sys (all) > ===> sys/boot (all) > ===> sys/boot/ficl (all) > ===> sys/boot/efi (all) > ===> sys/boot/efi/libefi (all) > ===> sys/boot/zfs (all) > ln -sf /work/src/sys/boot/zfs/../../../i386/include machine > cc -O2 -pipe -march=native -I/work/src/sys/boot/zfs/../common > -I/work/src/sys/boot/zfs/../.. -I. > -I/work/src/sys/boot/zfs/../../../lib/libstand > -I/work/src/sys/boot/zfs/../../cddl/boot/zfs -ffreestanding > -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 > -mno-sse3 -Wformat -Wall -c /work/src/sys/boot/zfs/zfs.c > /work/src/sys/boot/zfs/zfs.c:1: error: -mpreferred-stack-boundary=2 is > not between 4 and 12 > *** Error code 1 > > Stop in /work/src/sys/boot/zfs. > *** Error code 1 > > Stop in /work/src/sys/boot. > *** Error code 1 > > Stop in /work/src/sys. > *** Error code 1 > > Stop in /work/src. > *** Error code 1 > > Stop in /work/src. > *** Error code 1 > > Stop in /work/src. > > > > Alternatively, you might try using the brand new support for GPT that I > > committed yesterday: > > Well, this one was broken on amd64 and is now disconnected from the build. > > Any advice ? > > Olivier > > > -- > Olivier Smedts _ > ASCII ribbon campaign ( ) > e-mail: olivier@gid0.org - against HTML email & vCards X > www: http://www.gid0.org - against proprietary attachments / \ > > "Il y a seulement 10 sortes de gens dans le monde : > ceux qui comprennent le binaire, > et ceux qui ne le comprennent pas." -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." -------------- next part -------------- --- sys/boot/zfs/Makefile.orig 2008-11-22 00:15:42.000000000 +0100 +++ sys/boot/zfs/Makefile 2008-11-22 00:16:22.000000000 +0100 @@ -17,6 +17,9 @@ CFLAGS+= -Wformat -Wall .if ${MACHINE_ARCH} == "amd64" +CFLAGS+= -m32 -march=i386 +LDFLAGS+= -m elf_i386_fbsd +AFLAGS+= --32 CLEANFILES+= machine machine: ln -sf ${.CURDIR}/../../../i386/include machine From dfr at rabson.org Sat Nov 22 01:27:37 2008 From: dfr at rabson.org (Doug Rabson) Date: Sat Nov 22 01:27:48 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> Message-ID: <111E2DF2-62A3-40E7-96D3-A59BFDF2910C@rabson.org> On 21 Nov 2008, at 21:31, Olivier SMEDTS wrote: > 2008/11/20 Doug Rabson : >> >> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >> >>> Hello, >>> >>> I want to boot off a ZFS pool (version 13) on an USB stick for >>> testing >>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>> I'm using a 2 hours old CURRENT. >>> >>> # kldload usb2_storage_mass >>> # kldload zfs >>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>> # fdisk -BI da0 >>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>> bsdlabel: boot code /boot/zfsboot is wrong size >>> >>> Is what I'm trying to do with bsdlabel wrong ? >>> I previously tried with the default bootstrap code but I had an >>> (expected) "boot: Not ufs" error at boot. >>> >>> PS : I'm not subscribed to this list. >> >> The process for install zfsboot is a bit manual (and undocumented). >> Try >> something like this: >> >> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 > > It works ! > > Now I'm stuck at loader(8) prompt. > After having a look at your patch, I tried building world with > "LOADER_ZFS_SUPPORT=yes". And it seems broken, at least on amd64 : > > # cd /usr/src > # make buildworld LOADER_ZFS_SUPPORT=yes > [...] > > >> Alternatively, you might try using the brand new support for GPT >> that I >> committed yesterday: > > Well, this one was broken on amd64 and is now disconnected from the > build. > > Any advice ? I will sort out the amd64 build problems. My test machine was i386 so I missed it. From pjd at FreeBSD.org Sat Nov 22 15:01:57 2008 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Sat Nov 22 15:02:28 2008 Subject: ZFS Drive change best practice. In-Reply-To: <49249169.9010406@dannysplace.net> References: <49240A80.5010106@dannysplace.net> <20081119125138.GA86942@icarus.home.lan> <20081119211227.GA2553@garage.freebsd.pl> <49249169.9010406@dannysplace.net> Message-ID: <20081122230151.GB2016@garage.freebsd.pl> On Thu, Nov 20, 2008 at 08:21:29AM +1000, Danny Carroll wrote: > Pawel Jakub Dawidek wrote: > > After some more tests I'll commit changes that make ZFS to always depend > > on metadata only. > > > > So I guess that means in the future, you could power down a system, > re-arrange the disk positions and when you powered back up ZFS would > just work? Yes. I committed needed changes earlier today. > Could I also ask what the usual lead time is for ZFS changes to go from > current -> stable? I'm just curious. There is no fixed time, it all depends on much testing will it receive and how many bugs will be there plus when *at() syscalls will be MFCed. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081122/950e3b96/attachment.pgp From linimon at FreeBSD.org Sun Nov 23 10:01:51 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sun Nov 23 10:02:03 2008 Subject: kern/129084: [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZE(65536) Message-ID: <200811231801.mANI1pV0011250@freefall.freebsd.org> Old Synopsis: udf panic: getblk: size(67584) > MAXBSIZE(65536) New Synopsis: [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZE(65536) Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Nov 23 18:01:23 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=129084 From bugmaster at FreeBSD.org Mon Nov 24 03:07:12 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Nov 24 03:07:55 2008 Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org Message-ID: <200811241107.mAOB7BXj019887@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/129084 fs [udf] [panic] udf panic: getblk: size(67584) > MAXBSIZ f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad o kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs][panic] changing into .zfs dir from nfs client ca o kern/124621 fs [ext3] Cannot mount ext2fs partition o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/118249 fs mv(1): moving a directory changes its mtime o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D 25 problems total. From josh.carroll at gmail.com Mon Nov 24 11:53:43 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Mon Nov 24 11:53:49 2008 Subject: ext2 inode size patch - RE: PR kern/124621 Message-ID: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> A while back, I submitted a patch for PR kern/124621, which allows the mounting of an ext2(3) filesystem created with an inode size other than 128. The e2fsprogs' default is now 256, so file systems created on newer Linux distributions or with the port will not be mountable. I was hopeful this would get committed in time for 7.1-RELEASE (and 6.4-RELEASE), however the PR remains open. If there is an issue with the patch itself, I would be glad to fix it. I'm posting to fs@ because hopefully some folks more experienced with file system/kernel code can have a look and see if the patch is ok to commit. I've seen a few people in ##freebsdhelp on Freenode as well as #freebsdhelp on EFnet with this problem, and have had them test this patch out with success (and no obvious adverse effects), so I was hoping it could committed in time for 7.1-RELEASE. Since 6.4 is so close to release, I'm not so sure about that. Anyway, I would appreciate it if the patch could get some review to see if it can be committed in time. Regards, Josh From joao.barros at gmail.com Mon Nov 24 18:09:08 2008 From: joao.barros at gmail.com (Joao Barros) Date: Mon Nov 24 18:09:15 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> Message-ID: <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> On Fri, Nov 21, 2008 at 9:31 PM, Olivier SMEDTS wrote: > 2008/11/20 Doug Rabson : >> >> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >> >>> Hello, >>> >>> I want to boot off a ZFS pool (version 13) on an USB stick for testing >>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>> I'm using a 2 hours old CURRENT. >>> >>> # kldload usb2_storage_mass >>> # kldload zfs >>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>> # fdisk -BI da0 >>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>> bsdlabel: boot code /boot/zfsboot is wrong size >>> >>> Is what I'm trying to do with bsdlabel wrong ? >>> I previously tried with the default bootstrap code but I had an >>> (expected) "boot: Not ufs" error at boot. >>> >>> PS : I'm not subscribed to this list. >> >> The process for install zfsboot is a bit manual (and undocumented). Try >> something like this: >> >> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 >> >>Alternatively, you might try using the brand new support for GPT that I committed yesterday: >> >> # gpt create -f da0 >> # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 >> # gpt add -t freebsd-zfs da0 >> # zpool create mypool da0p2 > > It works ! > > Now I'm stuck at loader(8) prompt. That's a me too. I tried this under vmware with LOADER_ZFS_SUPPORT=yes on make.conf: # gpart create -s gpt ad0 # gpart add -b 34 -s 128 -t freebsd-boot ad0 ad0p1 added # gpart add -b 162 -s 15078327 -t freebsd-zfs ad0 ad0p2 added # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 # zpool create tank ad0p2 # zpool set bootfs = tank tank lsdev on loader shows: cd devices: disk devices: disk0: BIOS drive c: disk0p1: FreeBSD boot disk0p2: FreeBSD ZFS pxe devices: zfs devices: Any hints? -- Joao Barros From linimon at FreeBSD.org Mon Nov 24 20:36:38 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Nov 24 20:36:44 2008 Subject: kern/129152: [panic] non-userfriendly panic when trying to mount(8) non-existing or wrong rootdev Message-ID: <200811250436.mAP4acNs021186@freefall.freebsd.org> Old Synopsis: non-userfriendly panic when trying to mount non-existing or wrong rootdev New Synopsis: [panic] non-userfriendly panic when trying to mount(8) non-existing or wrong rootdev Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Nov 25 04:35:55 UTC 2008 Responsible-Changed-Why: Sounds like this could be in the filesystem code. http://www.freebsd.org/cgi/query-pr.cgi?pr=129152 From linimon at FreeBSD.org Mon Nov 24 21:12:47 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Nov 24 21:12:58 2008 Subject: kern/129148: [zfs] panic on concurrent writing & rollback Message-ID: <200811250512.mAP5ClfA054764@freefall.freebsd.org> Synopsis: [zfs] panic on concurrent writing & rollback Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Nov 25 05:12:35 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=129148 From avg at icyb.net.ua Tue Nov 25 03:55:48 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue Nov 25 03:55:55 2008 Subject: unionfs below: only read-only? Message-ID: <492BE7C1.8030503@icyb.net.ua> $ uname -srm FreeBSD 7.1-PRERELEASE amd64 $ mount -t unionfs -o below -o rw /usr/ports/distfiles /export/j386/usr/ports/distfiles mount_unionfs: /export/j386/usr/ports/distfiles: : Operation not supported Exit 71 $ mount -t unionfs -o below -o ro /usr/ports/distfiles /export/j386/usr/ports/distfiles Exit 0 My intention was to mount /usr/ports/distfiles into a jail but in such way that jail can not modify any "global" files but can add some files of its own. This is similar to the last example in mount_unionfs(8): mount -t unionfs -o noatime -o below /sys $HOME/sys But it seems that with -o below I can mount a filesystem only in RO mode which is totally useless because I can not write anything to "uniondir" i.e. /export/j386/usr/ports/distfiles: $ cp ~/tar/diablo-caffe-freebsd6-i386-1.5.0_07-b01.tar.bz2 /export/j386/usr/ports/distfiles/ cp: /export/j386/usr/ports/distfiles/diablo-caffe-freebsd6-i386-1.5.0_07-b01.tar.bz2: Read-only file system Exit 1 So effectively this is a nullfs ro mount. Is there a buglet or am I missing something? -- Andriy Gapon From pjd at FreeBSD.org Tue Nov 25 05:36:34 2008 From: pjd at FreeBSD.org (pjd@FreeBSD.org) Date: Tue Nov 25 05:36:40 2008 Subject: kern/129148: [zfs] panic on concurrent writing & rollback Message-ID: <200811251336.mAPDaYZh067008@freefall.freebsd.org> Synopsis: [zfs] panic on concurrent writing & rollback Responsible-Changed-From-To: freebsd-fs->pjd Responsible-Changed-By: pjd Responsible-Changed-When: wto 25 lis 13:36:20 2008 UTC Responsible-Changed-Why: Let me see... http://www.freebsd.org/cgi/query-pr.cgi?pr=129148 From kostikbel at gmail.com Tue Nov 25 06:06:09 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Nov 25 06:06:22 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> Message-ID: <20081125140601.GH2042@deviant.kiev.zoral.com.ua> On Mon, Nov 24, 2008 at 02:29:57PM -0500, Josh Carroll wrote: > A while back, I submitted a patch for PR kern/124621, which allows the > mounting of an ext2(3) filesystem created with an inode size other > than 128. The e2fsprogs' default is now 256, so file systems created > on newer Linux distributions or with the port will not be mountable. > > I was hopeful this would get committed in time for 7.1-RELEASE (and > 6.4-RELEASE), however the PR remains open. > > If there is an issue with the patch itself, I would be glad to fix it. > I'm posting to fs@ because hopefully some folks more experienced with > file system/kernel code can have a look and see if the patch is ok to > commit. > > I've seen a few people in ##freebsdhelp on Freenode as well as > #freebsdhelp on EFnet with this problem, and have had them test this > patch out with success (and no obvious adverse effects), so I was > hoping it could committed in time for 7.1-RELEASE. Since 6.4 is so > close to release, I'm not so sure about that. > > Anyway, I would appreciate it if the patch could get some review to > see if it can be committed in time. I already expressed my opinion on http://lists.freebsd.org/pipermail/freebsd-hackers/2008-September/025933.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081125/8dd4f85c/attachment.pgp From josh.carroll at gmail.com Tue Nov 25 06:17:07 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Tue Nov 25 06:18:32 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <20081125140601.GH2042@deviant.kiev.zoral.com.ua> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> Message-ID: <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> > I already expressed my opinion on > http://lists.freebsd.org/pipermail/freebsd-hackers/2008-September/025933.html > Sorry, I do not subscribe to hackers@ so I did not see that message. So what do you recommend is done to further test it? I tested simple things like copies, writes, deletes, etc on a memory disk, but nothing formal. I don't (currently) have a spare blank disk or space on an existing disk to test on physical media, but I can look into doing so once my development box is back up and running. I'm also curious what about the changes you feel are dangerous, so I can target the testing to exercise the boundary conditions or circumstances you think this patch would elicit problems. Thanks, Josh From kostikbel at gmail.com Tue Nov 25 06:28:31 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Nov 25 06:33:26 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> Message-ID: <20081125142827.GI2042@deviant.kiev.zoral.com.ua> On Tue, Nov 25, 2008 at 09:17:06AM -0500, Josh Carroll wrote: > > I already expressed my opinion on > > http://lists.freebsd.org/pipermail/freebsd-hackers/2008-September/025933.html > > > > Sorry, I do not subscribe to hackers@ so I did not see that message. > So what do you recommend is done to further test it? I tested simple > things like copies, writes, deletes, etc on a memory disk, but nothing > formal. I don't (currently) have a spare blank disk or space on an > existing disk to test on physical media, but I can look into doing so > once my development box is back up and running. > > I'm also curious what about the changes you feel are dangerous, so > I can target the testing to exercise the boundary conditions or > circumstances you think this patch would elicit problems. I do not suggest testing. I suggest understand what inode metadata is stored in the added 128 bytes and evaluate whether this information can be ignored without dangerous consequences for filesystem consistency or user data. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081125/5e1343f7/attachment.pgp From josh.carroll at gmail.com Tue Nov 25 06:57:19 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Tue Nov 25 06:57:25 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <20081125142827.GI2042@deviant.kiev.zoral.com.ua> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> Message-ID: <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> > I do not suggest testing. I suggest understand what inode metadata is stored > in the added 128 bytes and evaluate whether this information can be ignored > without dangerous consequences for filesystem consistency or user data. > Well, to be clear I didn't just double the size of the inode table. It is dynamically determined based on the data structure. I'm not a file system expert (to call me a novice would probably be stretching it), so I'm hoping someone more versed can chime in. All the code does is query the data structure (specifically, the s_inode_size field of the structure) and use that value instead of blindly assuming an inode size of 128. I don't think it's a matter of what is done with the extra bits, since it's just querying the size of an already created filesystem. Thanks, Josh From kostikbel at gmail.com Tue Nov 25 07:03:49 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Nov 25 07:03:55 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> Message-ID: <20081125150342.GL2042@deviant.kiev.zoral.com.ua> On Tue, Nov 25, 2008 at 09:57:18AM -0500, Josh Carroll wrote: > > I do not suggest testing. I suggest understand what inode metadata is stored > > in the added 128 bytes and evaluate whether this information can be ignored > > without dangerous consequences for filesystem consistency or user data. > > > > Well, to be clear I didn't just double the size of the inode table. It > is dynamically determined based on the data structure. I'm not a file > system expert (to call me a novice would probably be stretching it), > so I'm hoping someone more versed can chime in. > > All the code does is query the data structure (specifically, the > s_inode_size field of the structure) and use that value instead of > blindly assuming an inode size of 128. I don't think it's a matter of > what is done with the extra bits, since it's just querying the size of > an already created filesystem. Ok, I describe my concern once more. I do not object against the checking of the inode size. But, if inode size is changed, then some data is added to the inode, that could (and usually does, otherwise why extend it ?) change intrerpetation of the inode. Thus, we need a verification of the fact that simply ignoring added fields does not damage filesystem or cause user data corruption. Verification != testing. Until we make this work, patch cannot go into the tree. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081125/5b0a9f8b/attachment.pgp From josh.carroll at gmail.com Tue Nov 25 07:11:10 2008 From: josh.carroll at gmail.com (Josh Carroll) Date: Tue Nov 25 07:11:16 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <20081125150342.GL2042@deviant.kiev.zoral.com.ua> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> Message-ID: <8cb6106e0811250711x39775d2asd601e8a53eaaeac7@mail.gmail.com> > Ok, I describe my concern once more. I do not object against the checking > of the inode size. But, if inode size is changed, then some data is added > to the inode, that could (and usually does, otherwise why extend it ?) > change intrerpetation of the inode. Thus, we need a verification of the > fact that simply ignoring added fields does not damage filesystem or > cause user data corruption. Verification != testing. Ok, I see your point. I will do some more research into the ext2 inode structure on disk and see what happens when inode size > 128. > > Until we make this work, patch cannot go into the tree. > Understood, thanks for your attention. Regards, Josh From peter at simons-rock.edu Tue Nov 25 08:18:25 2008 From: peter at simons-rock.edu (Peter C. Lai) Date: Tue Nov 25 08:19:09 2008 Subject: ext2 inode size patch - RE: PR kern/124621 In-Reply-To: <8cb6106e0811250711x39775d2asd601e8a53eaaeac7@mail.gmail.com> References: <8cb6106e0811241129o642dcf28re4ae177c8ccbaa25@mail.gmail.com> <20081125140601.GH2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250617q5fffb41exe20dfb8314fc4a9d@mail.gmail.com> <20081125142827.GI2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250657q6fdf08b0x1e94f35fd0a7ed4f@mail.gmail.com> <20081125150342.GL2042@deviant.kiev.zoral.com.ua> <8cb6106e0811250711x39775d2asd601e8a53eaaeac7@mail.gmail.com> Message-ID: <20081125154826.GI27780@cesium.hyperfine.info> On 2008-11-25 10:11:09AM -0500, Josh Carroll wrote: > > Ok, I describe my concern once more. I do not object against the checking > > of the inode size. But, if inode size is changed, then some data is added > > to the inode, that could (and usually does, otherwise why extend it ?) > > change intrerpetation of the inode. Thus, we need a verification of the > > fact that simply ignoring added fields does not damage filesystem or > > cause user data corruption. Verification != testing. > > Ok, I see your point. I will do some more research into the ext2 inode > structure on disk and see what happens when inode size > 128. Possibly overstating the obvious, but since e2fsprogs were the ones who actually initiated the change in default inode size, maybe start digging through that to see what it actually does with the other 128 bytes (the changelog and some posts on comp.os.linux seem to suggest is that it has something to do with optimizing extended attributes/acls; basically extended acls being inaccessible to kernels that ignore the extra data, and 2.4 kernels refusing to mount those filesystems at all (presumably due to the same assumption we've been making)). -- =========================================================== Peter C. Lai | Bard College at Simon's Rock Systems Administrator | 84 Alford Rd. Information Technology Svcs. | Gt. Barrington, MA 01230 USA peter AT simons-rock.edu | (413) 528-7428 =========================================================== From linimon at FreeBSD.org Tue Nov 25 09:34:59 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Tue Nov 25 09:35:06 2008 Subject: kern/129174: [nfs][zfs][panic] NFS v3 Panic when under high load exporting ZFS file system Message-ID: <200811251734.mAPHYwqv049981@freefall.freebsd.org> Synopsis: [nfs][zfs][panic] NFS v3 Panic when under high load exporting ZFS file system Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Nov 25 17:34:50 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=129174 From doconnor at gsoft.com.au Tue Nov 25 15:41:09 2008 From: doconnor at gsoft.com.au (Daniel O'Connor) Date: Tue Nov 25 15:41:21 2008 Subject: Unique ID for UFS? In-Reply-To: References: <200811201215.42008.doconnor@gsoft.com.au> Message-ID: <200811261011.06490.doconnor@gsoft.com.au> On Thursday 20 November 2008 20:58:25 Ivan Voras wrote: > Daniel O'Connor wrote: > > Hi, > > I am wondering if there is a unique ID generated for each UFS already? If > > not would it be possible to add one somehow? > > > > There is glabel, but I think having a UUID embedded in the FS would be > > very handy for automation andwould prevent accidents that glabel can > > cause. > > > > So, there could be a gfsid module that reads IDs from the FS (NTFS, > > ext2/3, UFS) and creates device nodes to allow access. > > Looking at the output of dumpfs, there is an 64-bit numeric "id" field > that changes from file system to file system so this might it: > > magic 19540119 (UFS2) time Sat Nov 15 04:16:42 2008 > superblock location 65536 id [ 46ea67b4 178d71a1 ] > > (but judging from how the value changes on my file systems it might be > related to the timestamp). Yeah, on my system I have.. / 45c14592 caf91460 /var 45c1459d 2461df81 /usr 45c14596 fc5b2e49 Ah I think I found it in newfs.. /usr/src/sbin/newfs/mkfs.c lines 407 & 408.. sblock.fs_id[0] = (long)utime; sblock.fs_id[1] = newfs_random(); > If this is a usable ID, it should be trivial to make glabel create IDs > nodes (i.e. /dev/ufs/46ea67b4178d71a1). Yes indeed. I guess there's no excuse for me not to write such a thing now ;) PS you didn't cc me :) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081125/63122015/attachment.pgp From daichi at freebsd.org Wed Nov 26 00:45:46 2008 From: daichi at freebsd.org (Daichi GOTO) Date: Wed Nov 26 00:45:53 2008 Subject: unionfs below: only read-only? In-Reply-To: <492BE7C1.8030503@icyb.net.ua> References: <492BE7C1.8030503@icyb.net.ua> Message-ID: <492D0882.7070500@freebsd.org> Andriy Gapon wrote: > $ uname -srm > FreeBSD 7.1-PRERELEASE amd64 > > $ mount -t unionfs -o below -o rw /usr/ports/distfiles > /export/j386/usr/ports/distfiles > mount_unionfs: /export/j386/usr/ports/distfiles: : Operation not supported > Exit 71 > > $ mount -t unionfs -o below -o ro /usr/ports/distfiles > /export/j386/usr/ports/distfiles > Exit 0 I need more information to think. Please get up off result of df(1) and mount(8). And... check the permission of /export/j386/usr/ports/distfiles. I'm doubting that permission of /export/j386/usr/ports/distfiles is read only. > My intention was to mount /usr/ports/distfiles into a jail but in such > way that jail can not modify any "global" files but can add some files > of its own. > This is similar to the last example in mount_unionfs(8): > mount -t unionfs -o noatime -o below /sys $HOME/sys > > But it seems that with -o below I can mount a filesystem only in RO mode > which is totally useless because I can not write anything to "uniondir" > i.e. /export/j386/usr/ports/distfiles: > $ cp ~/tar/diablo-caffe-freebsd6-i386-1.5.0_07-b01.tar.bz2 > /export/j386/usr/ports/distfiles/ > cp: > /export/j386/usr/ports/distfiles/diablo-caffe-freebsd6-i386-1.5.0_07-b01.tar.bz2: > Read-only file system > Exit 1 > > So effectively this is a nullfs ro mount. > > Is there a buglet or am I missing something? > -- Daichi GOTO, http://people.freebsd.org/~daichi From ivoras at gmail.com Wed Nov 26 01:48:01 2008 From: ivoras at gmail.com (Ivan Voras) Date: Wed Nov 26 01:48:08 2008 Subject: Unique ID for UFS? In-Reply-To: <200811261011.06490.doconnor@gsoft.com.au> References: <200811201215.42008.doconnor@gsoft.com.au> <200811261011.06490.doconnor@gsoft.com.au> Message-ID: <9bbcef730811260147n473d213y10a5dc93273e4e5d@mail.gmail.com> 2008/11/26 Daniel O'Connor : > On Thursday 20 November 2008 20:58:25 Ivan Voras wrote: >> magic 19540119 (UFS2) time Sat Nov 15 04:16:42 2008 >> superblock location 65536 id [ 46ea67b4 178d71a1 ] > Yes indeed. > I guess there's no excuse for me not to write such a thing now ;) I've created a patch for it already but I've encountered a bug (either in label, slice or geom tasting) that makes having multiple labels for the same device almost unusable. > PS you didn't cc me :) I'm normally accessing the lists through NNTP. From avg at icyb.net.ua Wed Nov 26 02:12:58 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 02:13:05 2008 Subject: endless zpool scrub? Message-ID: <492D2122.4050203@icyb.net.ua> I noticed that zpool scrub on a certain pool runs "like forever". I decided to monitor its progress using periodic zpool status command, once in 10 seconds. Here's a snippet from the capture around an interesting point. Please notice two highlighted reports ('oops'). This is stable/7. Thank you in advance for insights/comments. pool: tank state: ONLINE scrub: scrub in progress, 78.87% done, 0h31m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 78.94% done, 0h31m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 78.97% done, 0h31m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 233h25m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 ****oops, went back to 0%**** errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 259h47m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 186h50m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 186h45m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 306h30m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 292h19m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 231h16m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 283h55m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 232h8m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 164h37m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 183h40m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 166h21m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: scrub stopped with 0 errors on Wed Nov 26 12:02:02 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 ****oops, scrub is reported as stopped**** errors: No known data errors pool: tank state: ONLINE scrub: scrub in progress, 0.00% done, 127h29m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ad6s2d ONLINE 0 0 0 errors: No known data errors -- Andriy Gapon From doconnor at gsoft.com.au Wed Nov 26 02:32:17 2008 From: doconnor at gsoft.com.au (Daniel O'Connor) Date: Wed Nov 26 02:32:23 2008 Subject: Unique ID for UFS? In-Reply-To: <9bbcef730811260147n473d213y10a5dc93273e4e5d@mail.gmail.com> References: <200811201215.42008.doconnor@gsoft.com.au> <200811261011.06490.doconnor@gsoft.com.au> <9bbcef730811260147n473d213y10a5dc93273e4e5d@mail.gmail.com> Message-ID: <200811262102.15437.doconnor@gsoft.com.au> On Wednesday 26 November 2008 20:17:58 Ivan Voras wrote: > 2008/11/26 Daniel O'Connor : > > On Thursday 20 November 2008 20:58:25 Ivan Voras wrote: > >> magic 19540119 (UFS2) time Sat Nov 15 04:16:42 2008 > >> superblock location 65536 id [ 46ea67b4 178d71a1 ] > > > > Yes indeed. > > I guess there's no excuse for me not to write such a thing now ;) > > I've created a patch for it already but I've encountered a bug (either > in label, slice or geom tasting) that makes having multiple labels for > the same device almost unusable. Hmm OK.. I must confess I only loaded my module and checked that it created the device nodes, I haven't tried mounting it yet :) > > PS you didn't cc me :) > > I'm normally accessing the lists through NNTP. Fair enough. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081126/88198138/attachment.pgp From ivoras at gmail.com Wed Nov 26 02:39:38 2008 From: ivoras at gmail.com (Ivan Voras) Date: Wed Nov 26 02:39:44 2008 Subject: Unique ID for UFS? In-Reply-To: <200811262102.15437.doconnor@gsoft.com.au> References: <200811201215.42008.doconnor@gsoft.com.au> <200811261011.06490.doconnor@gsoft.com.au> <9bbcef730811260147n473d213y10a5dc93273e4e5d@mail.gmail.com> <200811262102.15437.doconnor@gsoft.com.au> Message-ID: <9bbcef730811260239k68ccec3ah7acf480b7d1b765e@mail.gmail.com> 2008/11/26 Daniel O'Connor : >> I've created a patch for it already but I've encountered a bug (either >> in label, slice or geom tasting) that makes having multiple labels for >> the same device almost unusable. > > Hmm OK.. I must confess I only loaded my module and checked that it created > the device nodes, I haven't tried mounting it yet :) See if re-tasting works after labels have been mounted and unmounted (i.e. if you get both labels back after spoiling destroys them). From andrew at modulus.org Wed Nov 26 03:04:06 2008 From: andrew at modulus.org (Andrew Snow) Date: Wed Nov 26 03:04:12 2008 Subject: endless zpool scrub? In-Reply-To: <492D2122.4050203@icyb.net.ua> References: <492D2122.4050203@icyb.net.ua> Message-ID: <492D2D0F.5020300@modulus.org> Andriy Gapon wrote: > > I noticed that zpool scrub on a certain pool runs "like forever". > I decided to monitor its progress using periodic zpool status command, > once in 10 seconds. > Here's a snippet from the capture around an interesting point. > Please notice two highlighted reports ('oops'). > This is stable/7. Its a bug, and is fixed in 8-current. From glz at hidden-powers.com Wed Nov 26 03:24:26 2008 From: glz at hidden-powers.com (Goran Lowkrantz) Date: Wed Nov 26 03:24:58 2008 Subject: endless zpool scrub? In-Reply-To: <492D2122.4050203@icyb.net.ua> References: <492D2122.4050203@icyb.net.ua> Message-ID: --On November 26, 2008 12:12:50 +0200 Andriy Gapon wrote: > > I noticed that zpool scrub on a certain pool runs "like forever". > I decided to monitor its progress using periodic zpool status command, > once in 10 seconds. > Here's a snippet from the capture around an interesting point. > Please notice two highlighted reports ('oops'). > This is stable/7. > Thank you in advance for insights/comments. > > > pool: tank > state: ONLINE > scrub: scrub in progress, 78.87% done, 0h31m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 78.94% done, 0h31m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 78.97% done, 0h31m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 233h25m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > ****oops, went back to 0%**** > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 259h47m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 186h50m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 186h45m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 306h30m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 292h19m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 231h16m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 283h55m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 232h8m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 164h37m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 183h40m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 166h21m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub stopped with 0 errors on Wed Nov 26 12:02:02 2008 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > ****oops, scrub is reported as stopped**** > > errors: No known data errors > pool: tank > state: ONLINE > scrub: scrub in progress, 0.00% done, 127h29m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ad6s2d ONLINE 0 0 0 > > errors: No known data errors > Do you have a script or something that creates snapshots running? I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a running scrub is reset by a snapshot creation. If you are using this script, we have tested a modified version that suspends snapshot creation on a pool that is scrubbing. /glz --- "There is hopeful symbolism in the fact that flags do not wave in a vacuum." -- Arthur C. Clarke From avg at icyb.net.ua Wed Nov 26 04:33:43 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 04:33:51 2008 Subject: endless zpool scrub? In-Reply-To: <492D2D0F.5020300@modulus.org> References: <492D2122.4050203@icyb.net.ua> <492D2D0F.5020300@modulus.org> Message-ID: <492D421E.7090109@icyb.net.ua> on 26/11/2008 13:03 Andrew Snow said the following: > Andriy Gapon wrote: >> >> I noticed that zpool scrub on a certain pool runs "like forever". >> I decided to monitor its progress using periodic zpool status command, >> once in 10 seconds. >> Here's a snippet from the capture around an interesting point. >> Please notice two highlighted reports ('oops'). >> This is stable/7. > > Its a bug, and is fixed in 8-current. Andrew, are you speaking of something related to what Goran suggested? -- Andriy Gapon From avg at icyb.net.ua Wed Nov 26 04:42:22 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 04:42:27 2008 Subject: endless zpool scrub? In-Reply-To: References: <492D2122.4050203@icyb.net.ua> Message-ID: <492D4426.1080307@icyb.net.ua> on 26/11/2008 13:24 Goran Lowkrantz said the following: > Do you have a script or something that creates snapshots running? > > I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a > running scrub is reset by a snapshot creation. Yes, I've just realized I had recently enabled zfs-snapshot-mgmt on this machine. And the time when scrub is "disrupted" is the time when snapshots are taken. Thank you! > If you are using this script, we have tested a modified version that > suspends snapshot creation on a pool that is scrubbing. I would be very grateful for the patch! -- Andriy Gapon From morganw at chemikals.org Wed Nov 26 04:50:42 2008 From: morganw at chemikals.org (Wes Morgan) Date: Wed Nov 26 04:50:49 2008 Subject: endless zpool scrub? In-Reply-To: <492D4426.1080307@icyb.net.ua> References: <492D2122.4050203@icyb.net.ua> <492D4426.1080307@icyb.net.ua> Message-ID: On Wed, 26 Nov 2008, Andriy Gapon wrote: > on 26/11/2008 13:24 Goran Lowkrantz said the following: >> Do you have a script or something that creates snapshots running? >> >> I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a >> running scrub is reset by a snapshot creation. > > Yes, I've just realized I had recently enabled zfs-snapshot-mgmt on this > machine. And the time when scrub is "disrupted" is the time when > snapshots are taken. Thank you! > >> If you are using this script, we have tested a modified version that >> suspends snapshot creation on a pool that is scrubbing. > > I would be very grateful for the patch! > > Just FYI, once Pawel merges the recent ZFS patches to -stable (please don't ask when, though!), snapshotting will not interfere with a scrub. From avg at icyb.net.ua Wed Nov 26 04:52:04 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 04:52:10 2008 Subject: endless zpool scrub? In-Reply-To: References: <492D2122.4050203@icyb.net.ua> <492D4426.1080307@icyb.net.ua> Message-ID: <492D466B.1080609@icyb.net.ua> on 26/11/2008 14:50 Wes Morgan said the following: > On Wed, 26 Nov 2008, Andriy Gapon wrote: > >> on 26/11/2008 13:24 Goran Lowkrantz said the following: >>> Do you have a script or something that creates snapshots running? >>> >>> I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a >>> running scrub is reset by a snapshot creation. >> >> Yes, I've just realized I had recently enabled zfs-snapshot-mgmt on this >> machine. And the time when scrub is "disrupted" is the time when >> snapshots are taken. Thank you! >> >>> If you are using this script, we have tested a modified version that >>> suspends snapshot creation on a pool that is scrubbing. >> >> I would be very grateful for the patch! >> >> > > Just FYI, once Pawel merges the recent ZFS patches to -stable (please > don't ask when, though!), snapshotting will not interfere with a scrub. > Thanks! I guess this is what Andrew also meant. -- Andriy Gapon From glz at hidden-powers.com Wed Nov 26 04:59:41 2008 From: glz at hidden-powers.com (Goran Lowkrantz) Date: Wed Nov 26 04:59:48 2008 Subject: endless zpool scrub? In-Reply-To: <492D4426.1080307@icyb.net.ua> References: <492D2122.4050203@icyb.net.ua> <492D4426.1080307@icyb.net.ua> Message-ID: <4EECB3AA317601AE36BE0DB5@syn> --On November 26, 2008 14:42:14 +0200 Andriy Gapon wrote: > on 26/11/2008 13:24 Goran Lowkrantz said the following: >> Do you have a script or something that creates snapshots running? >> >> I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a >> running scrub is reset by a snapshot creation. > > Yes, I've just realized I had recently enabled zfs-snapshot-mgmt on this > machine. And the time when scrub is "disrupted" is the time when > snapshots are taken. Thank you! > >> If you are using this script, we have tested a modified version that >> suspends snapshot creation on a pool that is scrubbing. > > I would be very grateful for the patch! > > -- > Andriy Gapon Attached is the latest we have from the author. It suspends snapshots during scrubs and support recursive snapshots. /glz --- "There is hopeful symbolism in the fact that flags do not wave in a vacuum." -- Arthur C. Clarke -------------- next part -------------- A non-text attachment was scrubbed... Name: zfs-snapshot-mgmt Type: application/octet-stream Size: 6093 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20081126/f3ae88d9/zfs-snapshot-mgmt.obj From avg at icyb.net.ua Wed Nov 26 05:14:15 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 05:14:22 2008 Subject: endless zpool scrub? In-Reply-To: <4EECB3AA317601AE36BE0DB5@syn> References: <492D2122.4050203@icyb.net.ua> <492D4426.1080307@icyb.net.ua> <4EECB3AA317601AE36BE0DB5@syn> Message-ID: <492D4BA2.90905@icyb.net.ua> on 26/11/2008 14:59 Goran Lowkrantz said the following: > --On November 26, 2008 14:42:14 +0200 Andriy Gapon wrote: > >> on 26/11/2008 13:24 Goran Lowkrantz said the following: >>> Do you have a script or something that creates snapshots running? >>> >>> I used sysutils/zfs-snapshot-mgmt and did see the same as you, as a >>> running scrub is reset by a snapshot creation. >> >> Yes, I've just realized I had recently enabled zfs-snapshot-mgmt on this >> machine. And the time when scrub is "disrupted" is the time when >> snapshots are taken. Thank you! >> >>> If you are using this script, we have tested a modified version that >>> suspends snapshot creation on a pool that is scrubbing. >> >> I would be very grateful for the patch! >> >> -- >> Andriy Gapon > > Attached is the latest we have from the author. It suspends snapshots > during scrubs and support recursive snapshots. Thank you and the author! These two features are quite cool. -- Andriy Gapon From des at des.no Wed Nov 26 07:28:38 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed Nov 26 07:28:45 2008 Subject: endless zpool scrub? In-Reply-To: <4EECB3AA317601AE36BE0DB5@syn> (Goran Lowkrantz's message of "Wed, 26 Nov 2008 13:59:36 +0100") References: <492D2122.4050203@icyb.net.ua> <492D4426.1080307@icyb.net.ua> <4EECB3AA317601AE36BE0DB5@syn> Message-ID: <861vwy7bh6.fsf@ds4.des.no> Goran Lowkrantz writes: > Attached is the latest we have from the author. It suspends snapshots > during scrubs and support recursive snapshots. Additional feature request: allow the entire snapshot name to be specified (as an strftime format string) instead of just the prefix. DES -- Dag-Erling Sm?rgrav - des@des.no From avg at icyb.net.ua Wed Nov 26 13:22:48 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed Nov 26 13:22:55 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <022C4222-63B2-4535-8B7E-0426E9CE2BEA@mac.com> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> <4922FB81.50608@icyb.net.ua> <022C4222-63B2-4535-8B7E-0426E9CE2BEA@mac.com> Message-ID: <492DBE1F.2040501@icyb.net.ua> on 18/11/2008 21:49 Marcel Moolenaar said the following: > > On Nov 18, 2008, at 9:29 AM, Andriy Gapon wrote: > >> I just remembered that I saved old zpool.cache file before "migrating" >> the pool. >> I looked at the diff of hexdumps and there are a number of differences, >> it's hard to understand them because the file is binary (actually it >> seems to contain serialized name-value pairs), but one difference is >> prominent: >> ... >> 00000260 64 65 76 69 64 00 00 00 00 00 00 09 00 00 00 01 >> |devid...........| >> ... >> -00000270 00 00 00 15 61 64 3a 47 45 41 35 33 34 52 46 30 >> |....ad:GEA534RF0| >> -00000280 54 4b 33 35 41 73 31 73 33 00 00 00 00 00 00 28 >> |TK35As1s3......(| >> ... >> +00000270 00 00 00 11 61 64 3a 47 45 41 35 33 34 52 46 30 >> |....ad:GEA534RF0| >> +00000280 54 4b 33 35 41 00 00 00 00 00 00 28 00 00 00 28 >> |TK35A......(...(| >> ... >> >> It looks like old "devid" value is "ad:GEA534RF0TK35As1s3" and new one >> is "ad:GEA534RF0TK35A". Just a reminder: actual zpool device is ad6s2d. >> >> The new value is what is reported by diskinfo: >> $ diskinfo -v ad6 >> ad6 >> ... >> ad:GEA534RF0TK35A # Disk ident. >> >> $ diskinfo -v ad6s2 >> ad6s2 >> ... >> ad:GEA534RF0TK35A # Disk ident. >> >> $ diskinfo -v ad6s2d >> ad6s2d >> ... >> ad:GEA534RF0TK35A # Disk ident. >> >> Hmm, "indent" is reported to be the same for all three entities. >> >> I don't remember what diskinfo reported with pre-gpart kernel, but I >> suspect that it was something different. >> Could anybody please check this? (on 7.X machine without GEOM_PART). >> >> I quickly glimpsed through sources and it seems that this comes from >> DIOCGIDENT GEOM ioctl i.e. "GEOM::ident" attribute. It seems that >> geom_slice.c code has some special handling for that. > > Interesting. Can you try the attached patch to GPart: > Marcel, unfortunately the patch caused a panic. Unfortunately, again, I wasn't able to catch a proper dump, but I remembered that the panic was in g_part_done+0x16. In general, I am not sure if anything is really needed in this direction. First, I think that pjd has recently committed changes to trunk ZFS, so that it doesn't need device ids anymore and uses some metadata in the devices. Second, there is a migration path that I used - export/import of a pool. So unless this detail of backward compatibility is really needed somewhere else... -- Andriy Gapon From xcllnt at mac.com Wed Nov 26 14:55:19 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Wed Nov 26 14:55:31 2008 Subject: zfs: affected by geom_(mbr|bsd) => geom_part_(mbr|bsd) ? In-Reply-To: <492DBE1F.2040501@icyb.net.ua> References: <4911C3E9.405@icyb.net.ua> <49198A1A.3080600@icyb.net.ua> <49227875.6090902@icyb.net.ua> <93FC5F5D-91CD-450B-B08D-5C5EC5A1C880@mac.com> <4922FB81.50608@icyb.net.ua> <022C4222-63B2-4535-8B7E-0426E9CE2BEA@mac.com> <492DBE1F.2040501@icyb.net.ua> Message-ID: On Nov 26, 2008, at 1:22 PM, Andriy Gapon wrote: > on 18/11/2008 21:49 Marcel Moolenaar said the following: >> On Nov 18, 2008, at 9:29 AM, Andriy Gapon wrote: >>> I just remembered that I saved old zpool.cache file before >>> "migrating" >>> the pool. >>> I looked at the diff of hexdumps and there are a number of >>> differences, >>> it's hard to understand them because the file is binary (actually it >>> seems to contain serialized name-value pairs), but one difference is >>> prominent: >>> ... >>> 00000260 64 65 76 69 64 00 00 00 00 00 00 09 00 00 00 01 >>> |devid...........| >>> ... >>> -00000270 00 00 00 15 61 64 3a 47 45 41 35 33 34 52 46 30 >>> |....ad:GEA534RF0| >>> -00000280 54 4b 33 35 41 73 31 73 33 00 00 00 00 00 00 28 >>> |TK35As1s3......(| >>> ... >>> +00000270 00 00 00 11 61 64 3a 47 45 41 35 33 34 52 46 30 >>> |....ad:GEA534RF0| >>> +00000280 54 4b 33 35 41 00 00 00 00 00 00 28 00 00 00 28 >>> |TK35A......(...(| >>> ... >>> >>> It looks like old "devid" value is "ad:GEA534RF0TK35As1s3" and new >>> one >>> is "ad:GEA534RF0TK35A". Just a reminder: actual zpool device is >>> ad6s2d. >>> >>> The new value is what is reported by diskinfo: >>> $ diskinfo -v ad6 >>> ad6 >>> ... >>> ad:GEA534RF0TK35A # Disk ident. >>> >>> $ diskinfo -v ad6s2 >>> ad6s2 >>> ... >>> ad:GEA534RF0TK35A # Disk ident. >>> >>> $ diskinfo -v ad6s2d >>> ad6s2d >>> ... >>> ad:GEA534RF0TK35A # Disk ident. >>> >>> Hmm, "indent" is reported to be the same for all three entities. >>> >>> I don't remember what diskinfo reported with pre-gpart kernel, but I >>> suspect that it was something different. >>> Could anybody please check this? (on 7.X machine without GEOM_PART). >>> >>> I quickly glimpsed through sources and it seems that this comes from >>> DIOCGIDENT GEOM ioctl i.e. "GEOM::ident" attribute. It seems that >>> geom_slice.c code has some special handling for that. >> Interesting. Can you try the attached patch to GPart: > > Marcel, > > unfortunately the patch caused a panic. > Unfortunately, again, I wasn't able to catch a proper dump, but I > remembered that the panic was in g_part_done+0x16. I see :-/ > In general, I am not sure if anything is really needed in this > direction. > First, I think that pjd has recently committed changes to trunk ZFS, > so that it doesn't need device ids anymore and uses some metadata in > the devices. > Second, there is a migration path that I used - export/import of a > pool. > > So unless this detail of backward compatibility is really needed > somewhere else... pjd told me that and since it was added for ZFS, I think I'll just drop it. Patching GEOM:ident this way is kinda ugly... Thanks for testing! -- Marcel Moolenaar xcllnt@mac.com From alfred at freebsd.org Thu Nov 27 13:12:59 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Thu Nov 27 13:13:06 2008 Subject: questions about nmount and nfs Message-ID: <20081127205417.GE58709@elvis.mu.org> Hey all (and Craig and Doug), There's some patches floating around for NFS performance, I also have a few trivial ones myself for this, there's also a few nfs globals that I'd like to make per-mount... How is nfs and nmount working these days? Should I try to use nmount to control various tunables? Or should I make a sysctl tree per-mount and have users do that? I'd _really_ like to be able to see the mount options via just running "mount" like so: /usr/src/sbin/mount % mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local) mac:/Users/parallels on /vol/mac (nfs,nofsyncclose,negativecache=200) Note: nofsyncclose and negativecache=200 are two options I want to add. What do you guys think? Is nmount up for this? Any pointers to using nmount? Or should I sysctl? -- - Alfred Perlstein From gavin at FreeBSD.org Thu Nov 27 14:35:41 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Thu Nov 27 14:35:53 2008 Subject: kern/129231: New UFS mount (norandom) option - mostly useful for building redundant NFS servers Message-ID: <200811272235.mARMZfj1011326@freefall.freebsd.org> Synopsis: New UFS mount (norandom) option - mostly useful for building redundant NFS servers Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: gavin Responsible-Changed-When: Thu Nov 27 22:33:24 UTC 2008 Responsible-Changed-Why: Not sure of the best mailing list to pass this to, make a guess that -fs is probably a good guess. http://www.freebsd.org/cgi/query-pr.cgi?pr=129231 From joao.barros at gmail.com Sat Nov 29 17:08:48 2008 From: joao.barros at gmail.com (Joao Barros) Date: Sat Nov 29 17:08:54 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> Message-ID: <70e8236f0811291708h7ece06dcm1bff0081b5b0fde8@mail.gmail.com> On Tue, Nov 25, 2008 at 1:48 AM, Joao Barros wrote: > On Fri, Nov 21, 2008 at 9:31 PM, Olivier SMEDTS wrote: >> 2008/11/20 Doug Rabson : >>> >>> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >>> >>>> Hello, >>>> >>>> I want to boot off a ZFS pool (version 13) on an USB stick for testing >>>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>>> I'm using a 2 hours old CURRENT. >>>> >>>> # kldload usb2_storage_mass >>>> # kldload zfs >>>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>>> # fdisk -BI da0 >>>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>>> bsdlabel: boot code /boot/zfsboot is wrong size >>>> >>>> Is what I'm trying to do with bsdlabel wrong ? >>>> I previously tried with the default bootstrap code but I had an >>>> (expected) "boot: Not ufs" error at boot. >>>> >>>> PS : I'm not subscribed to this list. >>> >>> The process for install zfsboot is a bit manual (and undocumented). Try >>> something like this: >>> >>> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >>> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 >>> >>>Alternatively, you might try using the brand new support for GPT that I committed yesterday: >>> >>> # gpt create -f da0 >>> # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 >>> # gpt add -t freebsd-zfs da0 >>> # zpool create mypool da0p2 >> >> It works ! >> >> Now I'm stuck at loader(8) prompt. > > That's a me too. > > I tried this under vmware with LOADER_ZFS_SUPPORT=yes on make.conf: > # gpart create -s gpt ad0 > # gpart add -b 34 -s 128 -t freebsd-boot ad0 > ad0p1 added > # gpart add -b 162 -s 15078327 -t freebsd-zfs ad0 > ad0p2 added > # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 > # zpool create tank ad0p2 > # zpool set bootfs = tank tank > > lsdev on loader shows: > cd devices: > disk devices: > disk0: BIOS drive c: > disk0p1: FreeBSD boot > disk0p2: FreeBSD ZFS > pxe devices: > zfs devices: > > Any hints? > I'm trying to figure out why loader doesn't see my zfs pool and here's what I got: FreeBSD/i386 boot Default: tank:/boot/loader boot: status pool: tank config: NAME STATE tank ONLINE ad0p2 ONLINE I added some printfs on loader\main.c: guid = kargs->zfspool; unit = zfs_guid_to_unit(guid); if (unit >= 0) { sprintf(devname, "zfs%d", unit); setenv("currdev", devname, 1); } and guid returns the correct guid for my pool but unit returns -1 which by looking at zfs_guid_to_unit means something is not right. Any pointers Doug? -- Joao Barros From dfr at rabson.org Sun Nov 30 01:05:18 2008 From: dfr at rabson.org (Doug Rabson) Date: Sun Nov 30 01:05:25 2008 Subject: ZFSBoot try and bsdlabel bootstrap code In-Reply-To: <70e8236f0811291708h7ece06dcm1bff0081b5b0fde8@mail.gmail.com> References: <367b2c980811191412h5e0af470k165b37edc2fc5853@mail.gmail.com> <16C31872-6A83-4FAB-AC85-213D604CDDE4@rabson.org> <367b2c980811211331v551893a8sde2231c3bc65468c@mail.gmail.com> <70e8236f0811241748w41884a12la50e4e63f83a7542@mail.gmail.com> <70e8236f0811291708h7ece06dcm1bff0081b5b0fde8@mail.gmail.com> Message-ID: On 30 Nov 2008, at 01:08, Joao Barros wrote: > On Tue, Nov 25, 2008 at 1:48 AM, Joao Barros > wrote: >> On Fri, Nov 21, 2008 at 9:31 PM, Olivier SMEDTS >> wrote: >>> 2008/11/20 Doug Rabson : >>>> >>>> On 19 Nov 2008, at 22:12, Olivier SMEDTS wrote: >>>> >>>>> Hello, >>>>> >>>>> I want to boot off a ZFS pool (version 13) on an USB stick for >>>>> testing >>>>> purposes. But I'm stuck with the bsdlabel bootstrap code size... >>>>> I'm using a 2 hours old CURRENT. >>>>> >>>>> # kldload usb2_storage_mass >>>>> # kldload zfs >>>>> # dd if=/dev/zero of=/dev/da0 bs=512 count=32 >>>>> # fdisk -BI da0 >>>>> # dd if=/dev/zero of=/dev/da0s1 bs=512 count=32 >>>>> # bsdlabel -wB -b /boot/zfsboot da0s1 >>>>> bsdlabel: boot code /boot/zfsboot is wrong size >>>>> >>>>> Is what I'm trying to do with bsdlabel wrong ? >>>>> I previously tried with the default bootstrap code but I had an >>>>> (expected) "boot: Not ufs" error at boot. >>>>> >>>>> PS : I'm not subscribed to this list. >>>> >>>> The process for install zfsboot is a bit manual (and >>>> undocumented). Try >>>> something like this: >>>> >>>> # dd if=/boot/zfsboot of=/dev/da0s1 count=1 >>>> # dd if=/boot/zfsboot of=/dev/ds0s1 skip=1 seek=1024 >>>> >>>> Alternatively, you might try using the brand new support for GPT >>>> that I committed yesterday: >>>> >>>> # gpt create -f da0 >>>> # gpt boot -b /boot/pmbr -g /boot/gptzfsboot da0 >>>> # gpt add -t freebsd-zfs da0 >>>> # zpool create mypool da0p2 >>> >>> It works ! >>> >>> Now I'm stuck at loader(8) prompt. >> >> That's a me too. >> >> I tried this under vmware with LOADER_ZFS_SUPPORT=yes on make.conf: >> # gpart create -s gpt ad0 >> # gpart add -b 34 -s 128 -t freebsd-boot ad0 >> ad0p1 added >> # gpart add -b 162 -s 15078327 -t freebsd-zfs ad0 >> ad0p2 added >> # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 >> # zpool create tank ad0p2 >> # zpool set bootfs = tank tank >> >> lsdev on loader shows: >> cd devices: >> disk devices: >> disk0: BIOS drive c: >> disk0p1: FreeBSD boot >> disk0p2: FreeBSD ZFS >> pxe devices: >> zfs devices: >> >> Any hints? >> > > > I'm trying to figure out why loader doesn't see my zfs pool and here's > what I got: > > FreeBSD/i386 boot > Default: tank:/boot/loader > boot: status pool: tank > config: > NAME STATE > tank ONLINE > ad0p2 ONLINE > > I added some printfs on loader\main.c: > > guid = kargs->zfspool; > unit = zfs_guid_to_unit(guid); > if (unit >= 0) { > sprintf(devname, "zfs%d", unit); > setenv("currdev", devname, 1); > } > > and guid returns the correct guid for my pool but unit returns -1 > which by looking at zfs_guid_to_unit means something is not right. > > Any pointers Doug? It looks like loader didn't manage to find the pool for some reason. This probing process happens in sys/boot/zfs/zfs.c in the function zfs_dev_init(). Its supposed to taste all the available disks and partitions for the presence of a ZFS pool. The actual tasting process happens in vdev_probe().