From pathiaki2 at yahoo.com Wed Jan 7 20:15:23 2009 From: pathiaki2 at yahoo.com (Paul Patterson) Date: Wed Jan 7 20:15:31 2009 Subject: ZFS, NFS and Network tuning (Paul Patterson) References: <722609.11236.qm@web65412.mail.ac4.yahoo.com> Message-ID: <640696.48083.qm@web110509.mail.gq1.yahoo.com> Thanks to all. I think this is the last post on this. James Chang (don't ever apologize for minor English issues when you're trying to help. Anyone who would fault a person for that doesn't belong in the BSD community. :-) Besides, you solved the problem.) James suggested getting another card. (I had already ordered one) It has an Intel chipset and comes up under the em driver. I was getting 500+ Mb/sec consistently over two drives striped (SAS) with ZFS on the drives and running it from a Linux client over NFS. I did a quick: dd if=/dev/zero of=/mount/foo bs=64k count=10000 (/mount/foo is the FreeBSD server zfs drive on the linux client) It transferred 6.6 GB in roughly 100 seconds at 66 MB/sec ( 66 x 8 = 528 Mb ) Zero_copy_sockets was enabled. Polling was enabled in the kernel. Sadly, when I turned on polling on this card, it consistently ran 10-20 MB/sec SLOWER. The good thing is that it looks like either the Broadcom chipset sucks or the bge driver sucks. Either way, just popping the Intel card in and moving the cable and my throughput jumped by 500%. Thank you everyone, Paul ________________________________ From: Michelle Li To: freebsd-performance@freebsd.org Sent: Saturday, December 20, 2008 7:25:02 PM Subject: Re: ZFS, NFS and Network tuning (Paul Patterson) ...and the dmesg? please post freebsd-performance-request@freebsd.org wrote: Send freebsd-performance mailing list submissions to freebsd-performance@freebsd.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.freebsd.org/mailman/listinfo/freebsd-performance or, via email, send a message with subject or body 'help' to freebsd-performance-request@freebsd.org You can reach the person managing the list at freebsd-performance-owner@freebsd.org When replying, please edit your Subject line so it is more specific than "Re: Contents of freebsd-performance digest..." Today's Topics: 1. Re: ZFS, NFS and Network tuning (Paul Patterson) 2. Re: ZFS, NFS and Network tuning (Paul Patterson) 3. Re: ZFS, NFS and Network tuning (Paul Patterson) 4. intel i7 and Hyperthreading (Mike Tancsa) ---------------------------------------------------------------------- Message: 1 Date: Fri, 19 Dec 2008 06:47:59 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <15723.22980.qm@web110511.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 2 Date: Fri, 19 Dec 2008 10:03:14 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <400826.77992.qm@web110510.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hello all, I guess I've got to send this as I've already had about 5 responses claiming the same thing. This is not a disk bottleneck. The ZFS partition is capable of performing at the theoretical max of the drives. The machine is performing at less than 5 MB combined. I'm assuming that this is a problem with the NFSv3 throughput. I just 'dd' 1000 1MB records (about 1GB) from the clients to their respective servers: Client 1 to NetAPP: 3 tests for 45.9, 45.1, 46.1 Pretty consistent Client 2 to FreeBSD/ZFS: 3 test for 29.7, 12.5, 19.1 NOT consistent (also, the drives were lucky to hit 12% busy. I'm about to mount these servers to each client and see if there's a variation (although they are hw configured the same and bought the same time.) I'll write after this. However, if more people could review the configurations below and see if there's anything glaring.... However, the lack of consistency shows something is wrong network wise. P. ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 9:47:59 AM Subject: Re: ZFS, NFS and Network tuning Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 3 Date: Fri, 19 Dec 2008 10:59:54 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <309927.87042.qm@web110514.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hi, Well, I got some input on things: kern.ipc.somaxconn=32768 net.inet.tcp.mssdflt=1460 And for fstab rw,tcp,intr,noatime,nfsv3,-w=65536,-r=65536 I tried turning on polling with ifconfig bce0 polling, however, I didn't see it in ifconfig bce0 so I don't believe it to be active or the card doesn't support it. aI also removed async from the mounts. These had a detrimental affect on the FreeBSD server. I now get 64K per transfer (system -vm) but I'm still only getting about 4MB/sec on the disks and their utilization has dropped to about 5%. Throughput from both clients is ~8.5MB/sec. The tests were run separately. The NetAPP on each host was over 48.5 MB/sec. The FreeBSD host still has about 2 GB free. Paul ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 1:03:14 PM Subject: Re: ZFS, NFS and Network tuning Hello all, I guess I've got to send this as I've already had about 5 responses claiming the same thing. This is not a disk bottleneck. The ZFS partition is capable of performing at the theoretical max of the drives. The machine is performing at less than 5 MB combined. I'm assuming that this is a problem with the NFSv3 throughput. I just 'dd' 1000 1MB records (about 1GB) from the clients to their respective servers: Client 1 to NetAPP: 3 tests for 45.9, 45.1, 46.1 Pretty consistent Client 2 to FreeBSD/ZFS: 3 test for 29.7, 12.5, 19.1 NOT consistent (also, the drives were lucky to hit 12% busy. I'm about to mount these servers to each client and see if there's a variation (although they are hw configured the same and bought the same time.) I'll write after this. However, if more people could review the configurations below and see if there's anything glaring.... However, the lack of consistency shows something is wrong network wise. P. ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 9:47:59 AM Subject: Re: ZFS, NFS and Network tuning Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 4 Date: Fri, 19 Dec 2008 17:01:46 -0500 From: Mike Tancsa Subject: intel i7 and Hyperthreading To: freebsd-performance@freebsd.org Message-ID: <200812192214.mBJMEj2Q009511@lava.sentex.ca> Content-Type: text/plain; charset="us-ascii"; format=flowed Just got our first board to play around with and unlike in the past, having hyperthreading enabled seems to help performance.... At least in buildworld tests. doing a make -j4 vs -j6 make -j8 vs -j10 gives -j buildworld time % improvement over -j4 4 13:57 6 12:11 13% 8 11:32 18% 10 11:43 17% dmesg below of the hardware... The CPU seems to run fairly cool, but the board has a lot of nasty hot heatsinks eg. running 8 burnP6 procs 0[ns3c]# sysctl -a | grep temperature dev.cpu.0.temperature: 67 dev.cpu.1.temperature: 67 dev.cpu.2.temperature: 65 dev.cpu.3.temperature: 65 dev.cpu.4.temperature: 66 dev.cpu.5.temperature: 66 dev.cpu.6.temperature: 64 dev.cpu.7.temperature: 64 0[ns3c]# vs idle dev.cpu.0.temperature: 46 dev.cpu.1.temperature: 46 dev.cpu.2.temperature: 42 dev.cpu.3.temperature: 42 dev.cpu.4.temperature: 44 dev.cpu.5.temperature: 44 dev.cpu.6.temperature: 40 dev.cpu.7.temperature: 40 Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-PRERELEASE #0: Fri Dec 19 19:48:15 EST 2008 mdtancsa@ns3c.recycle.net:/usr/obj/usr/src/sys/recycle Timecounter "i8254" frequency 1193182 Hz quality 0 === message truncated === _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" From carl at cs.mcgill.ca Thu Jan 15 12:23:36 2009 From: carl at cs.mcgill.ca (carl tropper) Date: Thu Jan 15 12:23:42 2009 Subject: (no subject) Message-ID: <55291.72.228.102.77.1232051014.squirrel@mail.cs.mcgill.ca> What are the performance counters for multicore (unix based) machines which are relevant to cache behavior? I am interested in determining the amount of performance loss due to cache misses. Carl Tropper Department of Computer Science McConnell Engineering Building McGill University Montreal, Canada, H3A 2A6 tel: (514)398-3743 fax: (514)398-3883 url:www.cs.mcgill.ca/~carl From kip.macy at gmail.com Thu Jan 15 19:15:48 2009 From: kip.macy at gmail.com (Kip Macy) Date: Thu Jan 15 19:16:03 2009 Subject: performance counters was Re: (no subject) Message-ID: <3c1674c90901151847lbb5e440lb5f45628b4746ccd@mail.gmail.com> The man pages have a fair amount of documentation, you can also look dev/hwpmc/pmc_events.h to find all the event names. -Kip On Thu, Jan 15, 2009 at 12:23 PM, carl tropper wrote: > > What are the performance counters for multicore (unix based) machines which > are relevant to cache behavior? I am interested in determining the amount of > performance loss due to cache misses. > > > Carl Tropper > Department of Computer Science > McConnell Engineering Building > McGill University > Montreal, Canada, H3A 2A6 > tel: (514)398-3743 > fax: (514)398-3883 > url:www.cs.mcgill.ca/~carl > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > From redcrash at gmail.com Fri Jan 16 02:10:28 2009 From: redcrash at gmail.com (Harald Servat) Date: Fri Jan 16 02:10:34 2009 Subject: performance counters was Re: (no subject) In-Reply-To: <3c1674c90901151847lbb5e440lb5f45628b4746ccd@mail.gmail.com> References: <3c1674c90901151847lbb5e440lb5f45628b4746ccd@mail.gmail.com> Message-ID: On Fri, Jan 16, 2009 at 3:47 AM, Kip Macy wrote: > The man pages have a fair amount of documentation, you can also look > dev/hwpmc/pmc_events.h to find all the event names. > > -Kip > > On Thu, Jan 15, 2009 at 12:23 PM, carl tropper wrote: > > > > What are the performance counters for multicore (unix based) machines > which > > are relevant to cache behavior? I am interested in determining the amount > of > > performance loss due to cache misses. > > > > > > Carl Tropper > > Department of Computer Science > > McConnell Engineering Building > > McGill University > > Montreal, Canada, H3A 2A6 > > tel: (514)398-3743 > > fax: (514)398-3883 > > url:www.cs.mcgill.ca/~carl > > > > > Carl, If you are worried by portability (you just talk about "unix based"), you can also consider PAPI (http://icl.cs.utk.edu/papi). It provides a layer built on top of the different available substrates related with performance counters. For example, it works on FreeBSD on top of libpmc, in Linux on top perfctr and/or perfmon and in AIX on top of PMAPI. PAPI tries to simplify some performance metrics because the CPU typically provides counters highly-related with its architecture, for example I've seen AIX/PowerPC machines that provide cache misses for level 2.5 and 2.75 which is quite peculiar. PAPI is also able to provide direct access to them, if you are interested. If you just consider FreeBSD (as you contacted this list), you may directly look for the hwpmc and pmc entry manuals. Regards, From pathiaki2 at yahoo.com Thu Jan 22 05:11:20 2009 From: pathiaki2 at yahoo.com (Paul Patterson) Date: Thu Jan 22 05:11:28 2009 Subject: ZFS, NFS and Network tuning (Paul Patterson) Message-ID: <365391.74205.qm@web110505.mail.gq1.yahoo.com> Thanks to all. I think this is the last post on this. James Chang (don't ever apologize for minor English issues when you're trying to help. Anyone who would fault a person for that doesn't belong in the BSD community. :-) Besides, you solved the problem.) James suggested getting another card. (I had already ordered one) It has an Intel chipset and comes up under the em driver. I was getting almost 700 Mb/sec consistently over two drives striped (SAS) with ZFS on the drives and running it from a Linux client over NFS. I did a quick: dd if=/dev/zero of=/mount/foo bs=64k count=10000 It transferred 6.6 GB in roughly ________________________________ From: Michelle Li To: freebsd-performance@freebsd.org Sent: Saturday, December 20, 2008 7:25:02 PM Subject: Re: ZFS, NFS and Network tuning (Paul Patterson) ...and the dmesg? please post freebsd-performance-request@freebsd.org wrote: Send freebsd-performance mailing list submissions to freebsd-performance@freebsd.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.freebsd.org/mailman/listinfo/freebsd-performance or, via email, send a message with subject or body 'help' to freebsd-performance-request@freebsd.org You can reach the person managing the list at freebsd-performance-owner@freebsd.org When replying, please edit your Subject line so it is more specific than "Re: Contents of freebsd-performance digest..." Today's Topics: 1. Re: ZFS, NFS and Network tuning (Paul Patterson) 2. Re: ZFS, NFS and Network tuning (Paul Patterson) 3. Re: ZFS, NFS and Network tuning (Paul Patterson) 4. intel i7 and Hyperthreading (Mike Tancsa) ---------------------------------------------------------------------- Message: 1 Date: Fri, 19 Dec 2008 06:47:59 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <15723.22980.qm@web110511.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 2 Date: Fri, 19 Dec 2008 10:03:14 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <400826.77992.qm@web110510.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hello all, I guess I've got to send this as I've already had about 5 responses claiming the same thing. This is not a disk bottleneck. The ZFS partition is capable of performing at the theoretical max of the drives. The machine is performing at less than 5 MB combined. I'm assuming that this is a problem with the NFSv3 throughput. I just 'dd' 1000 1MB records (about 1GB) from the clients to their respective servers: Client 1 to NetAPP: 3 tests for 45.9, 45.1, 46.1 Pretty consistent Client 2 to FreeBSD/ZFS: 3 test for 29.7, 12.5, 19.1 NOT consistent (also, the drives were lucky to hit 12% busy. I'm about to mount these servers to each client and see if there's a variation (although they are hw configured the same and bought the same time.) I'll write after this. However, if more people could review the configurations below and see if there's anything glaring.... However, the lack of consistency shows something is wrong network wise. P. ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 9:47:59 AM Subject: Re: ZFS, NFS and Network tuning Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 3 Date: Fri, 19 Dec 2008 10:59:54 -0800 (PST) From: Paul Patterson Subject: Re: ZFS, NFS and Network tuning To: Paul Patterson , freebsd-performance@freebsd.org Message-ID: <309927.87042.qm@web110514.mail.gq1.yahoo.com> Content-Type: text/plain; charset=us-ascii Hi, Well, I got some input on things: kern.ipc.somaxconn=32768 net.inet.tcp.mssdflt=1460 And for fstab rw,tcp,intr,noatime,nfsv3,-w=65536,-r=65536 I tried turning on polling with ifconfig bce0 polling, however, I didn't see it in ifconfig bce0 so I don't believe it to be active or the card doesn't support it. aI also removed async from the mounts. These had a detrimental affect on the FreeBSD server. I now get 64K per transfer (system -vm) but I'm still only getting about 4MB/sec on the disks and their utilization has dropped to about 5%. Throughput from both clients is ~8.5MB/sec. The tests were run separately. The NetAPP on each host was over 48.5 MB/sec. The FreeBSD host still has about 2 GB free. Paul ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 1:03:14 PM Subject: Re: ZFS, NFS and Network tuning Hello all, I guess I've got to send this as I've already had about 5 responses claiming the same thing. This is not a disk bottleneck. The ZFS partition is capable of performing at the theoretical max of the drives. The machine is performing at less than 5 MB combined. I'm assuming that this is a problem with the NFSv3 throughput. I just 'dd' 1000 1MB records (about 1GB) from the clients to their respective servers: Client 1 to NetAPP: 3 tests for 45.9, 45.1, 46.1 Pretty consistent Client 2 to FreeBSD/ZFS: 3 test for 29.7, 12.5, 19.1 NOT consistent (also, the drives were lucky to hit 12% busy. I'm about to mount these servers to each client and see if there's a variation (although they are hw configured the same and bought the same time.) I'll write after this. However, if more people could review the configurations below and see if there's anything glaring.... However, the lack of consistency shows something is wrong network wise. P. ________________________________ From: Paul Patterson To: Paul Patterson ; freebsd-performance@freebsd.org Sent: Friday, December 19, 2008 9:47:59 AM Subject: Re: ZFS, NFS and Network tuning Hi, as promised, the parameter tuning I have on the box (does anyone see anything wrong?) /boot/loader.conf kern.hz="100" vm.kmem_size_max="1536M" vm.kmem_size="1536M" vfs.zfs.prefetch_disble=1 /etc/sysctl.conf kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=8192 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.mxvnodes=600000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=16384 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcpsendbuf_inc=8192 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 ________________________________ From: Paul Patterson To: freebsd-performance@freebsd.org Sent: Thursday, December 18, 2008 8:04:37 PM Subject: ZFS, NFS and Network tuning Hi, I just set up my first machine with ZFS. (First, ZFS is nothing short of amazing) I'm running FreeBSD 7.1-RC1 as an NFS server with ZFS striped across two volumes (just testing throughput for now.) Anyhow, I was benching this box, 4GB or RAM, the volume is on 2x146 GB SAS 10K rpm drives and it's an HP Proliant DL360 with dual Gb interfaces. (device bce) Now, I believe that I have tuned this box to the hilt with all the parameters that I can think of (it's at work right now so I'll cut and paste all the sysctls and loader.conf parameters for ZFS and networking) and it still seems to have some type of bottleneck. I have two Debian Linux clients that I use to bench with. I run a script that makes calls that writes to the NFS device and, after about 30 minutes, starts to delete the initial data and follow behind writing and deleting. Here's what's happening: The "other" machine is a NetAPP. It's got 1GB of RAM and it's running RAID DP with 2 parity drives and 6 data drives, all SATA 750 GB 7200 RPM drives with dual Gb interfaces. The benchmark script manages to write lots of little (all less than 30KB) files at a rate of 11,000 per minute, however, after 30 minutes, when it starts deleting, the throughput on write goes to 9500 and deletion is 6000 per minute. If I turn on the second node, I get 17,000 writing combined with about 11,000 deletions combined. One way or another, this will overflow in time. Not good. Now, on to my pet project. :-) The FreeBSD/ZFS server is only able to maintain about 3500 writes per minute but also deletes at the same rate! (I would expect deletion to be at least as fast as writing) The drives are running at only 20-35% while this is going on and only putting down about 4-5 MB/sec each. So, at 1Gb or ~92MB/sec theoretical max (is that about right?) There's something wrong somewhere. I'm assuming it's the network. (I'll post all the tunings tomorrow.) Thinking something wrong, I mounted only one client to each server (they are identical clients and the same configuration as the FreeBSD box). I did a simple stream of: dd if=/dev/zero of=/mnt/nfs bs=1m count=1000. The FreeBSD box wins?! It cranked up the drives to 45-50 MB/sec each and balanced them perfectly on transactions/sec KB/sec, etc from systat -vm. (Woohoo!) The NetAPPs CPU was at over 35-40% constantly, (it does that while benching, too) I'll post the NetAPP finding tomorrow as I forgot it for now. As for the client mounting, it was with the options: nfsvers=3,rsize=32768,wsize=32768,hard,intr,async,noatime I'm trying to figure out why, when running this benchmark, can the NetAPP with WAFL nearly triple the FreeBSD/ZFS box. Also, I'm having something strange happen when I try to mount the disk from the FreeBSD server versus the NetAPP. The FreeBSD server will sometimes RPC timeout. Mounting the NetAPP is instantaneous. That's the beginning. If I have a list of things to check tomorrow, I will. I'd like to see the little machine that could kick the NetAPPs butt. (No offense to NetAPP. :-) ) Thank you for reading, Paul _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------ Message: 4 Date: Fri, 19 Dec 2008 17:01:46 -0500 From: Mike Tancsa Subject: intel i7 and Hyperthreading To: freebsd-performance@freebsd.org Message-ID: <200812192214.mBJMEj2Q009511@lava.sentex.ca> Content-Type: text/plain; charset="us-ascii"; format=flowed Just got our first board to play around with and unlike in the past, having hyperthreading enabled seems to help performance.... At least in buildworld tests. doing a make -j4 vs -j6 make -j8 vs -j10 gives -j buildworld time % improvement over -j4 4 13:57 6 12:11 13% 8 11:32 18% 10 11:43 17% dmesg below of the hardware... The CPU seems to run fairly cool, but the board has a lot of nasty hot heatsinks eg. running 8 burnP6 procs 0[ns3c]# sysctl -a | grep temperature dev.cpu.0.temperature: 67 dev.cpu.1.temperature: 67 dev.cpu.2.temperature: 65 dev.cpu.3.temperature: 65 dev.cpu.4.temperature: 66 dev.cpu.5.temperature: 66 dev.cpu.6.temperature: 64 dev.cpu.7.temperature: 64 0[ns3c]# vs idle dev.cpu.0.temperature: 46 dev.cpu.1.temperature: 46 dev.cpu.2.temperature: 42 dev.cpu.3.temperature: 42 dev.cpu.4.temperature: 44 dev.cpu.5.temperature: 44 dev.cpu.6.temperature: 40 dev.cpu.7.temperature: 40 Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-PRERELEASE #0: Fri Dec 19 19:48:15 EST 2008 mdtancsa@ns3c.recycle.net:/usr/obj/usr/src/sys/recycle Timecounter "i8254" frequency 1193182 Hz quality 0 === message truncated === _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" From pluknet at gmail.com Tue Jan 27 06:33:05 2009 From: pluknet at gmail.com (pluknet) Date: Tue Jan 27 06:33:12 2009 Subject: mysql scalability: freebsd vs solaris Message-ID: Do anyone have MySQL scalability comparison numbers between FreeBSD7.x/8 and Solaris? Thanks. -- wbr, pluknet From pluknet at gmail.com Tue Jan 27 08:47:57 2009 From: pluknet at gmail.com (pluknet) Date: Tue Jan 27 08:48:03 2009 Subject: mysql scalability: freebsd vs solaris In-Reply-To: References: Message-ID: 2009/1/27 pluknet : > Do anyone have MySQL scalability comparison numbers between > FreeBSD7.x/8 and Solaris? > > Thanks. I have quick'n'dirty scalability benchmark. HW: 2 x Intel(r) Xeon(r) CPU E5405 @ 2.00GHz 8 cpus summary. Values extracted from sysbench: read/write requests 1 thread / 8 threads FreeBSD 6.2 6091.80 5614.48 FreeBSD 7.1 6741.98 28091.60 SunOS 5.10 4286.61 6949.80 Too strange for Solaris.. -- wbr, pluknet From brent at servuhome.net Wed Jan 28 23:44:59 2009 From: brent at servuhome.net (Brent Jones) Date: Wed Jan 28 23:45:05 2009 Subject: ZFS, NFS and Network tuning Message-ID: I'm reviving this, as I too am seeing something eerily similar. I have made my own thread under freebsd-stable, so I will hopefully move that discussion to this list. I believe we are seeing performance problems when the FreeBSD NFS client issues FSYNC NFS instead of ASYNC, sending performance to a mere percentage of what disks and network links are capable of. Further testing tonight demonstrates that other NFSv3 and v4 clients do not issue FSYNC unless they modify attributed and close a file, or append and close a file. FreeBSD NFS client will issue FSYNCs anytime the write size (-w) is reached, instead of when just closing the file. This is not necessary, since NFSv3 and v4 TCP have provisions for safe async writes that 'guarantee' state of NFS writes. Here is the contents of what I wrote there verbatim: http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/048063.html ------- Hello FreeBSD users, I am running into some performance problems with NFSv3/v4 mounts. I have a Sun X4540 running OpenSolaris 2008.11 with ZFS exporting NFS shares The NFS clients are a FreeBSD 6.3 32 bit, quad core xeon with 4GB ram and a FreeBSD 7.1 32bit with same hardware. The issue I am seeing, is that for certain file types, the FreeBSD NFS client will either issue an ASYNC write, or an FSYNC. However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP versions of the protocol, so that should be the default. Issuing FSYNC's for every compete block transmitted adds substantial overhead and slows everything down. The two test files I have that can reproduce this data are a file created by 'dump' which is just binary data: $ file testbinery testbinery: data ASCII text file from a Maildir format: $ file ascittest ascittest: ASCII mail text My NFS mount command lines I have tried to get all data to ASYNC write: $ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ $ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ $ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ Here is an excerpt from a snoop from the binary data file: $ snoop rpc nfs obsmtp02.local -> pdxfilu01 NFS C ACCESS3 FH=57D3 (read,lookup,modify,extend,delete,execute) pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend) obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 testbinery pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 OK FH=57D3 obsmtp02.local -> pdxfilu01 NFS C ACCESS3 FH=57D3 (read,lookup,modify,extend,delete,execute) pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend) obsmtp02.local -> pdxfilu01 NFS C SETATTR3 FH=57D3 pdxfilu01 -> obsmtp02.local NFS R SETATTR3 OK obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 0 for 32768 (ASYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 582647808 for 32768 (ASYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 592871424 for 32768 (ASYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 605421568 for 32768 (ASYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) And on and on.. it will acheive near full wire-speed, about 110MB/sec during the copy Here is the same snoop, only copying the ASCII mail file: $ snoop rpc nfs obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 ascittest pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 ascittest pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory obsmtp02.local -> pdxfilu01 NFS C CREATE3 FH=BB85 (UNCHECKED) ascittest pdxfilu01 -> obsmtp02.local NFS R CREATE3 OK FH=69D3 obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 0 for 32768 (FSYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 32768 for 32768 (FSYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 65536 for 32768 (FSYNC) pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) And so on. I've reproduced this with several files, and the only difference between tests is the file type. Is the FreeBSD NFS client requesting FSYNC or ASYNC depending on the file type/contents? If so, is there a tuneable setting to make all write ASYNC? Otherwise, FSYNC'ing for every block written over NFS will cause so many IOPS on the NFS server, that performance will degrade severely. Testing with an OpenSolaris 2008.11 client will issue ASYNC writes for any file type, if mounted with NFSv3 of NFSv4 (TCP). Any ideas? Thanks in advance! -- Brent Jones brent@servuhome.net From brent at servuhome.net Thu Jan 29 00:43:08 2009 From: brent at servuhome.net (Brent Jones) Date: Thu Jan 29 00:43:14 2009 Subject: ZFS, NFS and Network tuning In-Reply-To: References: Message-ID: On Wed, Jan 28, 2009 at 11:21 PM, Brent Jones wrote: > I'm reviving this, as I too am seeing something eerily similar. I have > made my own thread under freebsd-stable, so I will hopefully move that > discussion to this list. > > I believe we are seeing performance problems when the FreeBSD NFS > client issues FSYNC NFS instead of ASYNC, sending performance to a > mere percentage of what disks and network links are capable of. > Further testing tonight demonstrates that other NFSv3 and v4 clients > do not issue FSYNC unless they modify attributed and close a file, or > append and close a file. > FreeBSD NFS client will issue FSYNCs anytime the write size (-w) is > reached, instead of when just closing the file. > This is not necessary, since NFSv3 and v4 TCP have provisions for safe > async writes that 'guarantee' state of NFS writes. > > Here is the contents of what I wrote there verbatim: > > http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/048063.html > > ------- > > > Hello FreeBSD users, > I am running into some performance problems with NFSv3/v4 mounts. > I have a Sun X4540 running OpenSolaris 2008.11 with ZFS exporting NFS shares > The NFS clients are a FreeBSD 6.3 32 bit, quad core xeon with 4GB ram > and a FreeBSD 7.1 32bit with same hardware. > > The issue I am seeing, is that for certain file types, the FreeBSD NFS > client will either issue an ASYNC write, or an FSYNC. > However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP > versions of the protocol, so that should be the default. > Issuing FSYNC's for every compete block transmitted adds substantial > overhead and slows everything down. > > The two test files I have that can reproduce this data are a file > created by 'dump' which is just binary data: > > $ file testbinery > testbinery: data > > ASCII text file from a Maildir format: > > $ file ascittest > ascittest: ASCII mail text > > My NFS mount command lines I have tried to get all data to ASYNC write: > > $ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ > $ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ > $ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ > > Here is an excerpt from a snoop from the binary data file: > > $ snoop rpc nfs > > obsmtp02.local -> pdxfilu01 NFS C ACCESS3 FH=57D3 > (read,lookup,modify,extend,delete,execute) > pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend) > obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 testbinery > pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 OK FH=57D3 > obsmtp02.local -> pdxfilu01 NFS C ACCESS3 FH=57D3 > (read,lookup,modify,extend,delete,execute) > pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend) > obsmtp02.local -> pdxfilu01 NFS C SETATTR3 FH=57D3 > pdxfilu01 -> obsmtp02.local NFS R SETATTR3 OK > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 0 for 32768 (ASYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 582647808 for > 32768 (ASYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 592871424 for > 32768 (ASYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=57D3 at 605421568 for > 32768 (ASYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC) > > > And on and on.. it will acheive near full wire-speed, about 110MB/sec > during the copy > > > Here is the same snoop, only copying the ASCII mail file: > > $ snoop rpc nfs > > obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 ascittest > pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory > obsmtp02.local -> pdxfilu01 NFS C LOOKUP3 FH=BB85 ascittest > pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory > obsmtp02.local -> pdxfilu01 NFS C CREATE3 FH=BB85 (UNCHECKED) ascittest > pdxfilu01 -> obsmtp02.local NFS R CREATE3 OK FH=69D3 > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 0 for 32768 (FSYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 32768 for 32768 (FSYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) > obsmtp02.local -> pdxfilu01 NFS C WRITE3 FH=69D3 at 65536 for 32768 (FSYNC) > pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC) > > > And so on. I've reproduced this with several files, and the only > difference between tests is the file type. > Is the FreeBSD NFS client requesting FSYNC or ASYNC depending on the > file type/contents? > If so, is there a tuneable setting to make all write ASYNC? > Otherwise, FSYNC'ing for every block written over NFS will cause so > many IOPS on the NFS server, that performance will degrade severely. > > Testing with an OpenSolaris 2008.11 client will issue ASYNC writes for > any file type, if mounted with NFSv3 of NFSv4 (TCP). > > Any ideas? > > Thanks in advance! > > > > > -- > Brent Jones > brent@servuhome.net > I have found a 4 year old bug, which may be related to this. cp uses mmap for small files (and I imagine lots of things use mmap for file operations) and causes slowdowns via NFS, due to the fsync data provided above. http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/87792 That bugid accurately describes the issue, is there any way to attach more 'interested parties' or additional details to that bug? -- Brent Jones brent@servuhome.net From pluknet at gmail.com Thu Jan 29 05:07:51 2009 From: pluknet at gmail.com (pluknet) Date: Thu Jan 29 05:07:58 2009 Subject: mysql scalability: freebsd vs solaris In-Reply-To: References: Message-ID: 2009/1/27 pluknet : > read/write requests 1 thread / 8 threads > > FreeBSD 6.2 6091.80 5614.48 > FreeBSD 7.1 6741.98 28091.60 > SunOS 5.10 4286.61 6949.80 > Err.. It seems we overoptimized our MySQL or whatever.. Official 5.0.67 build for SunOS gives us: SunOS 5.10 5676.93 25085.87 -- wbr, pluknet From brde at optusnet.com.au Thu Jan 29 12:19:31 2009 From: brde at optusnet.com.au (Bruce Evans) Date: Thu Jan 29 12:19:43 2009 Subject: ZFS, NFS and Network tuning In-Reply-To: References: Message-ID: <20090129234158.B46285@delplex.bde.org> On Thu, 29 Jan 2009, Brent Jones wrote: > On Wed, Jan 28, 2009 at 11:21 PM, Brent Jones wrote: >> ... >> The issue I am seeing, is that for certain file types, the FreeBSD NFS >> client will either issue an ASYNC write, or an FSYNC. >> However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP >> versions of the protocol, so that should be the default. >> Issuing FSYNC's for every compete block transmitted adds substantial >> overhead and slows everything down. I use some patches (mainly for nfs write clustering on the server) by Bjorn Gronwall and some local fixes (mainly for vfs write clustering on the server, and tuning off excessive nfs[io]d daemons which get in each other's way due to poor scheduling, and things that only help for lots of small files), and see reasonable performance in all cases (~90% of disk bandwidth with all-async mounts, and half that with the client mounted noasync on an old version of FreeBSD. The client in -current is faster.) Writing is actually faster than reading here. >> ... >> My NFS mount command lines I have tried to get all data to ASYNC write: >> >> $ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ >> $ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ >> $ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/ Also try -r16384 -w16384, and udp, and async on the server. I think block sizes default to 8K for udp and 32K for tcp. 8K is too small, and 32K may be too large (it increases latency for little benefit if the server fs block size is 16K). udp gives lower latency. async on the server makes little difference provided the server block size is not too small. > I have found a 4 year old bug, which may be related to this. cp uses > mmap for small files (and I imagine lots of things use mmap for file > operations) and causes slowdowns via NFS, due to the fsync data > provided above. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/87792 mmap apparently breaks the async mount preference in the following code: from vnode_pager.c: % /* % * pageouts are already clustered, use IO_ASYNC t o force a bawrite() % * rather then a bdwrite() to prevent paging I/O from saturating % * the buffer cache. Dummy-up the sequential heuristic to cause % * large ranges to cluster. If neither IO_SYNC or IO_ASYNC is set, % * the system decides how to cluster. % */ % ioflags = IO_VMIO; % if (flags & (VM_PAGER_PUT_SYNC | VM_PAGER_PUT_INVAL)) % ioflags |= IO_SYNC; This apparently gives lots of sync writes. (Sync writes are the default for nfs, but we mount with async to try to get async writes.) % else if ((flags & VM_PAGER_CLUSTER_OK) == 0) % ioflags |= IO_ASYNC; nfs doesn't even support this flag. In fact, ffs is the only file system that supports it, and here is the only place that sets it. This might explain some slowness. One of the bugs in vfs clustering that I don't have is related to this. IIRC, mounting the server with -o async doesn't work as well as it should because the buffer cache becomes congested with i/o that should have been sent to the disk. Some writes must be done async as explained above, but one place in vfs_cache.c is too agressive in delaying async writes for file systems that are mounted async. This problem is more noticeable for nfs, at least with networks not much faster than disks, since it results in the client and server taking turns waiting for each other. (The names here are very confusing -- the async mount flag normally delays both sync and async writes for as long as possible, except for nfs it doesn't affect delays but asks for async writes instead of sync writes on the server, while the IO_ASYNC flag asks for async writes and thus often has the opposite sense to the async mount flag.) % ioflags |= (flags & VM_PAGER_PUT_INVAL) ? IO_INVAL: 0; % ioflags |= IO_SEQMAX << IO_SEQSHIFT; Bruce From brent at servuhome.net Fri Jan 30 16:08:03 2009 From: brent at servuhome.net (Brent Jones) Date: Fri Jan 30 16:08:09 2009 Subject: NFS writes calling FSYNC and ASYNC not consistent In-Reply-To: <20090129230247.GF4375@elvis.mu.org> References: <20090129230247.GF4375@elvis.mu.org> Message-ID: On Thu, Jan 29, 2009 at 3:02 PM, Alfred Perlstein wrote: > Apologies for being terse, in a hurry here. > > 1) -o async doesn't work with NFS, don't use that. > 2) how big are the text versus binary files? I tested with a 6MB text file, and a 2GB binary file. Text file would go ~1MB/sec, issuing FSYNCs for every block write size (32KB default) One thing I found that another FreeBSD user discovered exactly explains my situation: In bin/cp/utils.c (source) there is a check, if the file is less than 8MB or so, it uses mmap, if the file is larger, it will use write() I modified the source and recompiled to -never- use mmap, only to use write(), and my performance increased about 100 fold (from 1MB/sec over NFS, to over 100MB/sec). Changed line 143: original: fs->st_size <= 8 * 1048576) { New: fs->st_size <= 8 * 8) { It will use mmap still if the file is larger than 64bytes (if it uses bytes there, pretty sure it does). But is much faster for files ~1KB to 8MB now. Regards > 3) how are you copying them over nfs? > > I suspect, (could be wrong of course) that the ascii files > are a lot smaller than the binary files, so what's happening > is that for binary files, the client is issuing write-behind > async, however for ascii files its issuing the writes at > close time which will force the sync flag. > > -Alfred > -- Brent Jones brent@servuhome.net