From rmosher at he.net Fri May 1 03:20:33 2009 From: rmosher at he.net (Rob Mosher) Date: Fri May 1 03:20:41 2009 Subject: Poll packet loss from tunnel traffic Message-ID: <49FA643A.1070505@he.net> Hi, I'm seeing loss with polling due to the tunnel driver. Any information that can help me resolve this will be greatly appreciated. I'll start with a little bit of background information. This is an IPv6 tunnel server, with 1500 gif interfaces configured. There is also 1 teredo tunnel configured. After upgrading to 7.1 and enabling polling, The poll loss only shows up when there is tunnel traffic passing through. If I disable polling, the tunnel driver (used by miredo) starts dropping packets at a high rate. There is no packet loss to the machine directly when using interrupts, but the teredo tunnel introduces loss to traffic being sent through it. During the test below, there was about 5000pps of tunnel traffic going to the box. I generated about 65kpps of packets to send to the box, and you can see the input errors skyrocket. During this test I filtered the tunnel traffic to the machine, and the input errors disappeared and it was receiving 67kpps without any issues. As soon as I restored the traffic, the errors started again. Does anyone have any input on why tunnel traffic is causing polling drops? I have modified if.h to change IFQ_MAXLEN to 1000, since 50 was not enough for the teredo tunnel. The same problem existed before this change. This has also been tested at 1000hz with no difference. My settings are below. If I left anything off please let me know. netstat -I bge0 -w 1 input (bge0) output packets errs bytes packets errs bytes colls 54867 13988 3404372 492 0 97696 0 53407 15668 3306569 444 0 73218 0 56372 12560 3478105 494 0 77937 0 56765 12512 3492866 392 0 64412 0 57100 11230 3519765 373 0 65237 0 55354 13617 3409092 477 0 77209 0 67886 789 4089003 220 0 17898 0 67489 0 4068003 240 0 20439 0 67488 0 4062953 211 0 17574 0 67430 0 4056830 213 0 16777 0 67482 0 4064063 241 0 17981 0 67434 0 4057161 150 0 12777 0 67318 0 4048591 163 0 12296 0 67485 0 4057702 229 0 16279 0 67863 0 4082756 220 0 17994 0 56381 24676 3490003 297 0 51457 0 47338 35006 2918266 323 0 51448 0 56400 19233 3465354 307 0 48882 0 56364 12603 3467025 342 0 57955 0 57968 12039 3562743 284 0 53378 0 53421 11315 3350914 324 0 47294 0 33038 22232 2019793 171 0 29951 0 50857 33673 3103288 289 0 41463 0 [root@tserv3 /usr/src/sys/net]# netstat -nr | wc -l #this is high because teredo adds routes. 36444 [root@tserv3 /usr/src/sys/net]# sysctl -a kern kern.ostype: FreeBSD kern.osrelease: 7.1-RELEASE-p4 kern.osrevision: 199506 kern.version: FreeBSD 7.1-RELEASE-p4 #8: Mon Dec 31 16:32:13 PST 2001 root@:/usr/obj/usr/src/sys/GENERIC kern.maxvnodes: 100000 kern.maxproc: 6164 kern.maxfiles: 12328 kern.argmax: 262144 kern.securelevel: -1 kern.hostname: tserv3.fmt2.ipv6.he.net kern.hostid: 2180312168 kern.clockrate: { hz = 2000, tick = 500, profhz = 2000, stathz = 133 } kern.posix1version: 200112 kern.ngroups: 16 kern.job_control: 1 kern.saved_ids: 0 kern.boottime: { sec = 1241138975, usec = 851402 } Thu Apr 30 17:49:35 2009 kern.domainname: kern.osreldate: 701000 kern.bootfile: /boot/kernel/kernel kern.maxfilesperproc: 11095 kern.maxprocperuid: 5547 kern.ipc.maxsockbuf: 1000000 kern.ipc.sockbuf_waste_factor: 8 kern.ipc.somaxconn: 128 kern.ipc.max_linkhdr: 16 kern.ipc.max_protohdr: 60 kern.ipc.max_hdr: 76 kern.ipc.max_datalen: 128 kern.ipc.nmbjumbo16: 3200 kern.ipc.nmbjumbo9: 6400 kern.ipc.nmbjumbop: 12800 kern.ipc.nmbclusters: 100000 kern.ipc.piperesizeallowed: 1 kern.ipc.piperesizefail: 0 kern.ipc.pipeallocfail: 0 kern.ipc.pipefragretry: 0 kern.ipc.pipekva: 32768 kern.ipc.maxpipekva: 16777216 kern.ipc.msgseg: 2048 kern.ipc.msgssz: 8 kern.ipc.msgtql: 40 kern.ipc.msgmnb: 2048 kern.ipc.msgmni: 40 kern.ipc.msgmax: 16384 kern.ipc.semaem: 16384 kern.ipc.semvmx: 32767 kern.ipc.semusz: 92 kern.ipc.semume: 10 kern.ipc.semopm: 100 kern.ipc.semmsl: 60 kern.ipc.semmnu: 30 kern.ipc.semmns: 60 kern.ipc.semmni: 10 kern.ipc.semmap: 30 kern.ipc.shm_allow_removed: 0 kern.ipc.shm_use_phys: 0 kern.ipc.shmall: 8192 kern.ipc.shmseg: 128 kern.ipc.shmmni: 192 kern.ipc.shmmin: 1 kern.ipc.shmmax: 33554432 kern.ipc.maxsockets: 25600 kern.ipc.zero_copy.send: 1 kern.ipc.zero_copy.receive: 1 kern.ipc.numopensockets: 43 kern.ipc.nsfbufsused: 0 kern.ipc.nsfbufspeak: 27 kern.ipc.nsfbufs: 6656 kern.dummy: 0 kern.ps_strings: 3217031152 kern.usrstack: 3217031168 kern.logsigexit: 1 kern.iov_max: 1024 kern.hostuuid: 00020003-0004-0005-0006-000700080009 kern.cam.cam_srch_hi: 0 kern.cam.scsi_delay: 5000 kern.cam.cd.changer.max_busy_seconds: 15 kern.cam.cd.changer.min_busy_seconds: 5 kern.cam.da.da_send_ordered: 1 kern.cam.da.default_timeout: 60 kern.cam.da.retry_count: 4 kern.dcons.poll_hz: 100 kern.disks: ad6 ad4 kern.geom.collectstats: 1 kern.geom.debugflags: 0 kern.geom.label.debug: 0 kern.elf32.fallback_brand: -1 kern.init_shutdown_timeout: 120 kern.init_path: /sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init:/stand/sysinstall kern.acct_suspended: 0 kern.acct_configured: 0 kern.acct_chkfreq: 15 kern.acct_resume: 4 kern.acct_suspend: 2 kern.cp_times: 17030 0 127675 277305 457322 19681 0 201650 179434 478628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 kern.cp_time: 36711 0 329325 456739 935950 kern.openfiles: 94 kern.kq_calloutmax: 4096 kern.ps_arg_cache_limit: 256 kern.stackprot: 7 kern.randompid: 0 kern.lastpid: 76148 kern.ktrace.request_pool: 100 kern.ktrace.genio_size: 4096 kern.module_path: /boot/kernel;/boot/modules kern.malloc_count: 247 kern.fallback_elf_brand: -1 kern.features.compat_freebsd6: 1 kern.features.compat_freebsd5: 1 kern.features.compat_freebsd4: 1 kern.maxusers: 384 kern.ident: GENERIC kern.polling.idlepoll_sleeping: 1 kern.polling.stalled: 11236 kern.polling.suspect: 625362 kern.polling.phase: 0 kern.polling.enable: 0 kern.polling.handlers: 1 kern.polling.residual_burst: 0 kern.polling.pending_polls: 0 kern.polling.lost_polls: 5605120 kern.polling.short_ticks: 2031 kern.polling.reg_frac: 1 kern.polling.user_frac: 1 kern.polling.idle_poll: 0 kern.polling.each_burst: 1000 kern.polling.burst_max: 1000 kern.polling.burst: 29 kern.kstack_pages: 2 kern.shutdown.kproc_shutdown_wait: 60 kern.shutdown.poweroff_delay: 5000 kern.sync_on_panic: 0 kern.corefile: %N.core kern.nodump_coredump: 0 kern.coredump: 1 kern.sugid_coredump: 0 kern.sigqueue.alloc_fail: 0 kern.sigqueue.overflow: 0 kern.sigqueue.preallocate: 1024 kern.sigqueue.max_pending_per_proc: 128 kern.forcesigexit: 1 kern.fscale: 2048 kern.timecounter.tick: 2 kern.timecounter.choice: TSC(-100) ACPI-safe(850) i8254(0) dummy(-1000000) kern.timecounter.hardware: ACPI-safe kern.timecounter.nsetclock: 3 kern.timecounter.ngetmicrotime: 700031 kern.timecounter.ngetnanotime: 12814 kern.timecounter.ngetbintime: 0 kern.timecounter.ngetmicrouptime: 510350 kern.timecounter.ngetnanouptime: 3766 kern.timecounter.ngetbinuptime: 20127 kern.timecounter.nmicrotime: 808701 kern.timecounter.nnanotime: 6634 kern.timecounter.nbintime: 815336 kern.timecounter.nmicrouptime: 37590443 kern.timecounter.nnanouptime: 119 kern.timecounter.nbinuptime: 40052546 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 4041 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.ACPI-safe.mask: 4294967295 kern.timecounter.tc.ACPI-safe.counter: 2482401764 kern.timecounter.tc.ACPI-safe.frequency: 3579545 kern.timecounter.tc.ACPI-safe.quality: 850 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 487861987 kern.timecounter.tc.TSC.frequency: 2593518990 kern.timecounter.tc.TSC.quality: -100 kern.timecounter.smp_tsc: 0 kern.threads.virtual_cpu: 2 kern.threads.max_threads_hits: 0 kern.threads.max_threads_per_proc: 1500 kern.ccpu: 0 kern.sched.preemption: 1 kern.sched.topology: 0 kern.sched.steal_thresh: 1 kern.sched.steal_idle: 1 kern.sched.steal_htt: 1 kern.sched.balance_interval: 133 kern.sched.balance: 1 kern.sched.tryself: 1 kern.sched.affinity: 3 kern.sched.pick_pri: 1 kern.sched.preempt_thresh: 64 kern.sched.interact: 30 kern.sched.slice: 13 kern.sched.name: ULE kern.devstat.version: 6 kern.devstat.generation: 129 kern.devstat.numdevs: 2 kern.kobj_methodcount: 140 kern.log_wakeups_per_second: 5 kern.msgbuf_clear: 0 kern.msgbuf: kern.always_console_output: 0 kern.log_console_output: 1 kern.smp.forward_roundrobin_enabled: 1 kern.smp.forward_signal_enabled: 1 kern.smp.cpus: 2 kern.smp.disabled: 0 kern.smp.active: 1 kern.smp.maxcpus: 16 kern.smp.maxid: 15 kern.nselcoll: 0 kern.tty_nout: 12126931 kern.tty_nin: 1974280 kern.drainwait: 300 kern.constty_wakeups_per_second: 5 kern.consmsgbuf_size: 8192 kern.consmute: 0 kern.console: consolectl,dcons,/dcons,consolectl,ttyd0, kern.minvnodes: 25000 kern.metadelay: 28 kern.dirdelay: 29 kern.filedelay: 30 kern.chroot_allow_open_directories: 1 kern.rpc.invalid: 0 kern.rpc.unexpected: 0 kern.rpc.timeouts: 0 kern.rpc.request: 0 kern.rpc.retries: 0 kern.random.yarrow.gengateinterval: 10 kern.random.yarrow.bins: 10 kern.random.yarrow.fastthresh: 192 kern.random.yarrow.slowthresh: 256 kern.random.yarrow.slowoverthresh: 2 kern.random.sys.seeded: 1 kern.random.sys.harvest.ethernet: 1 kern.random.sys.harvest.point_to_point: 1 kern.random.sys.harvest.interrupt: 1 kern.random.sys.harvest.swi: 0 [root@tserv3 /usr/src/sys/net]# sysctl -a net net.local.stream.recvspace: 100000 net.local.stream.sendspace: 100000 net.local.dgram.recvspace: 100000 net.local.dgram.maxdgram: 100000 net.local.recycled: 0 net.local.taskcount: 0 net.local.inflight: 0 net.inet.ip.portrange.randomtime: 45 net.inet.ip.portrange.randomcps: 10 net.inet.ip.portrange.randomized: 1 net.inet.ip.portrange.reservedlow: 0 net.inet.ip.portrange.reservedhigh: 1023 net.inet.ip.portrange.hilast: 65535 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.last: 65535 net.inet.ip.portrange.first: 49152 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.lowfirst: 1023 net.inet.ip.forwarding: 1 net.inet.ip.redirect: 1 net.inet.ip.ttl: 64 net.inet.ip.rtexpire: 3600 net.inet.ip.rtminexpire: 10 net.inet.ip.rtmaxcache: 128 net.inet.ip.sourceroute: 0 net.inet.ip.intr_queue_maxlen: 1024 net.inet.ip.intr_queue_drops: 0 net.inet.ip.accept_sourceroute: 0 net.inet.ip.keepfaith: 0 net.inet.ip.gifttl: 30 net.inet.ip.same_prefix_carp_only: 0 net.inet.ip.subnets_are_local: 0 net.inet.ip.fastforwarding: 1 net.inet.ip.maxfragpackets: 3125 net.inet.ip.maxfragsperpacket: 16 net.inet.ip.fragpackets: 0 net.inet.ip.check_interface: 0 net.inet.ip.random_id: 0 net.inet.ip.sendsourcequench: 0 net.inet.ip.process_options: 1 net.inet.icmp.maskrepl: 0 net.inet.icmp.icmplim: 200 net.inet.icmp.bmcastecho: 0 net.inet.icmp.quotelen: 8 net.inet.icmp.reply_from_interface: 0 net.inet.icmp.reply_src: net.inet.icmp.icmplim_output: 1 net.inet.icmp.log_redirect: 0 net.inet.icmp.drop_redirect: 0 net.inet.icmp.maskfake: 0 net.inet.tcp.rfc1323: 1 net.inet.tcp.mssdflt: 512 net.inet.tcp.keepidle: 7200000 net.inet.tcp.keepintvl: 75000 net.inet.tcp.sendspace: 32768 net.inet.tcp.recvspace: 65536 net.inet.tcp.keepinit: 75000 net.inet.tcp.delacktime: 100 net.inet.tcp.v6mssdflt: 1024 net.inet.tcp.hostcache.purge: 0 net.inet.tcp.hostcache.prune: 300 net.inet.tcp.hostcache.expire: 3600 net.inet.tcp.hostcache.count: 3 net.inet.tcp.hostcache.bucketlimit: 30 net.inet.tcp.hostcache.hashsize: 512 net.inet.tcp.hostcache.cachelimit: 15360 net.inet.tcp.recvbuf_max: 262144 net.inet.tcp.recvbuf_inc: 16384 net.inet.tcp.recvbuf_auto: 1 net.inet.tcp.insecure_rst: 0 net.inet.tcp.rfc3390: 1 net.inet.tcp.rfc3042: 1 net.inet.tcp.drop_synfin: 0 net.inet.tcp.delayed_ack: 1 net.inet.tcp.blackhole: 0 net.inet.tcp.log_in_vain: 0 net.inet.tcp.sendbuf_max: 262144 net.inet.tcp.sendbuf_inc: 8192 net.inet.tcp.sendbuf_auto: 1 net.inet.tcp.tso: 1 net.inet.tcp.newreno: 1 net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.slowstart_flightsize: 1 net.inet.tcp.path_mtu_discovery: 1 net.inet.tcp.reass.overflows: 0 net.inet.tcp.reass.maxqlen: 48 net.inet.tcp.reass.cursegments: 0 net.inet.tcp.reass.maxsegments: 6250 net.inet.tcp.sack.globalholes: 0 net.inet.tcp.sack.globalmaxholes: 65536 net.inet.tcp.sack.maxholes: 128 net.inet.tcp.sack.enable: 1 net.inet.tcp.inflight.stab: 20 net.inet.tcp.inflight.max: 1073725440 net.inet.tcp.inflight.min: 6144 net.inet.tcp.inflight.rttthresh: 10 net.inet.tcp.inflight.debug: 0 net.inet.tcp.inflight.enable: 1 net.inet.tcp.isn_reseed_interval: 0 net.inet.tcp.icmp_may_rst: 1 net.inet.tcp.pcbcount: 21 net.inet.tcp.do_tcpdrain: 1 net.inet.tcp.tcbhashsize: 512 net.inet.tcp.log_debug: 0 net.inet.tcp.minmss: 216 net.inet.tcp.syncache.rst_on_sock_fail: 1 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.syncache.hashsize: 512 net.inet.tcp.syncache.count: 0 net.inet.tcp.syncache.cachelimit: 15360 net.inet.tcp.syncache.bucketlimit: 30 net.inet.tcp.syncookies_only: 0 net.inet.tcp.syncookies: 1 net.inet.tcp.timer_race: 0 net.inet.tcp.finwait2_timeout: 60000 net.inet.tcp.fast_finwait2_recycle: 0 net.inet.tcp.always_keepalive: 1 net.inet.tcp.rexmit_slop: 200 net.inet.tcp.rexmit_min: 30 net.inet.tcp.msl: 30000 net.inet.tcp.nolocaltimewait: 0 net.inet.tcp.maxtcptw: 5120 net.inet.udp.checksum: 1 net.inet.udp.maxdgram: 100000 net.inet.udp.recvspace: 100000 net.inet.udp.soreceive_dgram_enabled: 0 net.inet.udp.blackhole: 0 net.inet.udp.log_in_vain: 0 net.inet.sctp.enable_sack_immediately: 0 net.inet.sctp.udp_tunneling_port: 0 net.inet.sctp.udp_tunneling_for_client_enable: 0 net.inet.sctp.mobility_fasthandoff: 0 net.inet.sctp.mobility_base: 0 net.inet.sctp.default_frag_interleave: 1 net.inet.sctp.default_cc_module: 0 net.inet.sctp.log_level: 0 net.inet.sctp.max_retran_chunk: 30 net.inet.sctp.min_residual: 1452 net.inet.sctp.strict_data_order: 0 net.inet.sctp.abort_at_limit: 0 net.inet.sctp.hb_max_burst: 4 net.inet.sctp.do_sctp_drain: 1 net.inet.sctp.max_chained_mbufs: 5 net.inet.sctp.abc_l_var: 1 net.inet.sctp.nat_friendly: 1 net.inet.sctp.auth_disable: 0 net.inet.sctp.asconf_auth_nochk: 0 net.inet.sctp.early_fast_retran_msec: 250 net.inet.sctp.early_fast_retran: 0 net.inet.sctp.cwnd_maxburst: 1 net.inet.sctp.cmt_pf: 0 net.inet.sctp.cmt_use_dac: 0 net.inet.sctp.cmt_on_off: 0 net.inet.sctp.outgoing_streams: 10 net.inet.sctp.add_more_on_output: 1452 net.inet.sctp.path_rtx_max: 5 net.inet.sctp.assoc_rtx_max: 10 net.inet.sctp.init_rtx_max: 8 net.inet.sctp.valid_cookie_life: 60000 net.inet.sctp.init_rto_max: 60000 net.inet.sctp.rto_initial: 3000 net.inet.sctp.rto_min: 1000 net.inet.sctp.rto_max: 60000 net.inet.sctp.secret_lifetime: 3600 net.inet.sctp.shutdown_guard_time: 180 net.inet.sctp.pmtu_raise_time: 600 net.inet.sctp.heartbeat_interval: 30000 net.inet.sctp.asoc_resource: 10 net.inet.sctp.sys_resource: 1000 net.inet.sctp.sack_freq: 2 net.inet.sctp.delayed_sack_time: 200 net.inet.sctp.chunkscale: 10 net.inet.sctp.min_split_point: 2904 net.inet.sctp.pcbhashsize: 256 net.inet.sctp.tcbhashsize: 1024 net.inet.sctp.maxchunks: 3200 net.inet.sctp.maxburst: 4 net.inet.sctp.peer_chkoh: 256 net.inet.sctp.strict_init: 1 net.inet.sctp.loopback_nocsum: 1 net.inet.sctp.strict_sacks: 0 net.inet.sctp.ecn_nonce: 0 net.inet.sctp.ecn_enable: 1 net.inet.sctp.auto_asconf: 1 net.inet.sctp.recvspace: 233016 net.inet.sctp.sendspace: 233016 net.inet.raw.recvspace: 100000 net.inet.raw.maxdgram: 100000 net.inet.accf.unloadable: 0 net.link.generic.system.ifcount: 1509 net.link.ether.inet.log_arp_permanent_modify: 1 net.link.ether.inet.log_arp_movements: 1 net.link.ether.inet.log_arp_wrong_iface: 1 net.link.ether.inet.proxyall: 0 net.link.ether.inet.useloopback: 1 net.link.ether.inet.maxtries: 5 net.link.ether.inet.max_age: 1200 net.link.ether.ipfw: 0 net.link.stf.route_cache: 1 net.link.gif.parallel_tunnels: 0 net.link.gif.max_nesting: 1 net.link.log_link_state_change: 1 net.link.tun.devfs_cloning: 1 net.inet6.ip6.forwarding: 1 net.inet6.ip6.redirect: 1 net.inet6.ip6.hlim: 64 net.inet6.ip6.maxfragpackets: 25000 net.inet6.ip6.accept_rtadv: 0 net.inet6.ip6.keepfaith: 0 net.inet6.ip6.log_interval: 5 net.inet6.ip6.hdrnestlimit: 15 net.inet6.ip6.dad_count: 1 net.inet6.ip6.auto_flowlabel: 1 net.inet6.ip6.defmcasthlim: 1 net.inet6.ip6.gifhlim: 30 net.inet6.ip6.kame_version: FreeBSD net.inet6.ip6.use_deprecated: 1 net.inet6.ip6.rr_prune: 5 net.inet6.ip6.v6only: 1 net.inet6.ip6.rtexpire: 3600 net.inet6.ip6.rtminexpire: 10 net.inet6.ip6.rtmaxcache: 128 net.inet6.ip6.use_tempaddr: 0 net.inet6.ip6.temppltime: 86400 net.inet6.ip6.tempvltime: 604800 net.inet6.ip6.auto_linklocal: 1 net.inet6.ip6.prefer_tempaddr: 0 net.inet6.ip6.use_defaultzone: 0 net.inet6.ip6.maxfrags: 25000 net.inet6.ip6.mcast_pmtu: 0 net.inet6.icmp6.rediraccept: 1 net.inet6.icmp6.redirtimeout: 600 net.inet6.icmp6.nd6_prune: 1 net.inet6.icmp6.nd6_delay: 5 net.inet6.icmp6.nd6_umaxtries: 3 net.inet6.icmp6.nd6_mmaxtries: 3 net.inet6.icmp6.nd6_useloopback: 1 net.inet6.icmp6.nodeinfo: 3 net.inet6.icmp6.errppslimit: 100 net.inet6.icmp6.nd6_maxnudhint: 0 net.inet6.icmp6.nd6_debug: 0 net.inet6.icmp6.nd6_maxqueuelen: 1 net.inet6.icmp6.nd6_onlink_ns_rfc4861: 0 net.bpf.maxinsns: 512 net.bpf.maxbufsize: 524288 net.bpf.bufsize: 4096 net.isr.swi_count: 67634604 net.isr.drop: 0 net.isr.queued: 183166 net.isr.deferred: 41476140 net.isr.directed: 41346528 net.isr.count: 82819191 net.isr.direct: 1 net.raw.recvspace: 100000 net.raw.sendspace: 100000 net.my_fibnum: 0 net.add_addr_allfibs: 1 net.fibs: 1 net.route.netisr_maxqlen: 256 net.wlan.recv_bar: 1 net.wlan.debug: 0 [root@tserv3 /usr/src/sys/net]# sysctl -a dev.bge.0 dev.bge.0.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2100 dev.bge.0.%driver: bge dev.bge.0.%location: slot=3 function=0 dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1648 subvendor=0x15d9 subdevice=0x1648 class=0x020000 dev.bge.0.%parent: pci2 dev.bge.0.stats.FramesDroppedDueToFilters: 0 dev.bge.0.stats.DmaWriteQueueFull: 2997701 dev.bge.0.stats.DmaWriteHighPriQueueFull: 0 dev.bge.0.stats.NoMoreRxBDs: 0 dev.bge.0.stats.InputDiscards: 287023 dev.bge.0.stats.InputErrors: 0 dev.bge.0.stats.RecvThresholdHit: 33654645 dev.bge.0.stats.DmaReadQueueFull: 434236 dev.bge.0.stats.DmaReadHighPriQueueFull: 0 dev.bge.0.stats.SendDataCompQueueFull: 0 dev.bge.0.stats.RingSetSendProdIndex: 31463597 dev.bge.0.stats.RingStatusUpdate: 64933072 dev.bge.0.stats.Interrupts: 75393 dev.bge.0.stats.AvoidedInterrupts: 64857679 dev.bge.0.stats.SendThresholdHit: 0 dev.bge.0.stats.rx.Octets: 79971511 dev.bge.0.stats.rx.Fragments: 1 dev.bge.0.stats.rx.UcastPkts: 33966088 dev.bge.0.stats.rx.MulticastPkts: 0 dev.bge.0.stats.rx.FCSErrors: 0 dev.bge.0.stats.rx.AlignmentErrors: 0 dev.bge.0.stats.rx.xonPauseFramesReceived: 0 dev.bge.0.stats.rx.xoffPauseFramesReceived: 0 dev.bge.0.stats.rx.ControlFramesReceived: 0 dev.bge.0.stats.rx.xoffStateEntered: 0 dev.bge.0.stats.rx.FramesTooLong: 0 dev.bge.0.stats.rx.Jabbers: 0 dev.bge.0.stats.rx.UndersizePkts: 0 dev.bge.0.stats.rx.inRangeLengthError: 0 dev.bge.0.stats.rx.outRangeLengthError: 0 dev.bge.0.stats.tx.Octets: 142844970 dev.bge.0.stats.tx.Collisions: 0 dev.bge.0.stats.tx.XonSent: 0 dev.bge.0.stats.tx.XoffSent: 0 dev.bge.0.stats.tx.flowControlDone: 0 dev.bge.0.stats.tx.InternalMacTransmitErrors: 0 dev.bge.0.stats.tx.SingleCollisionFrames: 0 dev.bge.0.stats.tx.MultipleCollisionFrames: 0 dev.bge.0.stats.tx.DeferredTransmissions: 0 dev.bge.0.stats.tx.ExcessiveCollisions: 0 dev.bge.0.stats.tx.LateCollisions: 0 dev.bge.0.stats.tx.UcastPkts: 31339486 dev.bge.0.stats.tx.MulticastPkts: 0 dev.bge.0.stats.tx.BroadcastPkts: 7 dev.bge.0.stats.tx.CarrierSenseErrors: 0 dev.bge.0.stats.tx.Discards: 0 dev.bge.0.stats.tx.Errors: 0 -- -- Rob Mosher Network Engineer Hurricane Electric / AS6939 From markwkm at gmail.com Tue May 5 05:57:32 2009 From: markwkm at gmail.com (Mark Wong) Date: Tue May 5 05:58:05 2009 Subject: filesystem performance Message-ID: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> Hi everyone, We (PostgreSQL community) have a HP DL380 G5 that we were using to do some very basic filesystem characterizations as part of a database performance tuning project, so we wanted to give FreeBSD a try out of the box. For this set of data we used 7.1. We're (us few that are running the tests) are fairly unfamiliar with the community here, so I'll be as brief as I can. We're basically wondering if the data we're getting out of the box is expected, and any tuning guidelines including what changes we should expect to see in the performance. So that said, I'll start with some of the charted results: Scaling from 1 device to 4 devices using RAID 0: http://207.173.203.223/~markwkm/community10/fio/freebsd-7.1/raid0-ufs.png Comparing RAID configurations of similar capacity: http://207.173.203.223/~markwkm/community10/fio/freebsd-7.1/capacity-ufs.png Comparing RAID configuration using same number of drives (4): http://207.173.203.223/~markwkm/community10/fio/freebsd-7.1/4-disk-ufs.png All of the raw data fio output, iostat, and vmstat including charts of some of the iostat and vmstat data are buried in here: http://207.173.203.223/~markwkm/community10/fio/freebsd-7.1/ We used fio v1.23 to run the tests and picked parameters based on what PostgreSQL is capable of. The profiles used are here: http://git.postgresql.org/gitweb?p=performance-tuning.git;a=tree;f=contrib/freebsd/fio;h=5ae97420fba010a3685c12c24cd69266d83b4daf;hb=HEAD Hardware details are here (note we only used 4 of the internal drives for this test, none in the MSA 70, and you may notice we have a lit of Linux data on this page too): http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide#Hardware_Details I know I've been brief and at the same time dumped a lot of raw data. Any pointers to tuning guides that we have obviously not seen would be a appreciated. We hope this data is of interest and is helpful. Regards, Mark From ssanders at opnet.com Tue May 5 15:39:24 2009 From: ssanders at opnet.com (Stephen Sanders) Date: Tue May 5 15:39:31 2009 Subject: VM sysctl tuning Message-ID: <4A005935.40206@opnet.com> Does any know if there is any advantage to tuning sysctl's for the VM when one has an insane amount of memory in their machine? We've a system with 16GB of RAM that we are attempting to optimize the system's memory copy and disk write performance. Thanks. From leccine at gmail.com Tue May 5 18:26:46 2009 From: leccine at gmail.com (=?ISO-8859-1?B?SXN0duFu?=) Date: Tue May 5 18:26:53 2009 Subject: VM sysctl tuning In-Reply-To: <4A005935.40206@opnet.com> References: <4A005935.40206@opnet.com> Message-ID: I guess you can find some useful information here: http://wiki.freebsd.org/ZFSTuningGuide Please note the difference between 7.2 and all the previous releases. "FreeBSD 7.2+ has improved kernel memory allocation strategy and no tuning may be necessary on systems with more than 2 GB of RAM. " Also wort play a bit with kern.maxvnodes as the document says. Not to forget: http://www.freebsd.org/doc/en/books/arch-handbook/vm-tuning.html Regards, Istvan On Tue, May 5, 2009 at 4:20 PM, Stephen Sanders wrote: > Does any know if there is any advantage to tuning sysctl's for the VM > when one has an insane amount of memory in their machine? > > We've a system with 16GB of RAM that we are attempting to optimize the > system's memory copy and disk write performance. > > Thanks. > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > -- the sun shines for all From ap00 at mail.ru Wed May 6 09:50:04 2009 From: ap00 at mail.ru (Anthony Pankov) Date: Wed May 6 09:50:10 2009 Subject: filesystem performance In-Reply-To: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> References: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> Message-ID: <5871156390.20090506132550@mail.ru> Hello Mark, May i ask a question while more expierenced people is waking up? I don't fully understand the target. For what filesystem should be optimized? I expect a patterns of recorded IO calls when pgsql perform typical operations with statistics and in-depth analysis. Are you sure there is ? strong relation between fio benchmark result and PostgreSQL performance? Tuesday, May 05, 2009, 9:30:47 AM, you wrote: MW> Hi everyone, MW> We (PostgreSQL community) have a HP DL380 G5 that we were using to do MW> some very basic filesystem characterizations as part of a database MW> performance tuning project, so we wanted to give FreeBSD a try out of MW> the box. For this set of data we used 7.1. We're (us few that are MW> running the tests) are fairly unfamiliar with the community here, so MW> I'll be as brief as I can. We're basically wondering if the data MW> we're getting out of the box is expected, and any tuning guidelines MW> including what changes we should expect to see in the performance. -- Best regards, Anthony mailto:ap00@mail.ru From om-lists-bsd at omx.ch Wed May 6 12:21:31 2009 From: om-lists-bsd at omx.ch (Olivier Mueller) Date: Wed May 6 12:21:44 2009 Subject: filesystem: 12h to delete 32GB of data Message-ID: <1241610888.16418.64.camel@ompc.insign.local> Hello, $ df -m ; date ; rm -r templates_c ; df -m ; date Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da0s1a 989 45 864 5% / /dev/da0s1f 128631 102179 16160 86% /usr [...] Wed May 6 00:23:01 CEST 2009 Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da0s1a 989 45 864 5% / /dev/da0s1f 128631 69844 48496 59% /usr Wed May 6 12:21:02 CEST 2009 -> it took about 12 hours to delete these 30GB of files and sub-directories (smarty cache files: many small files in many dirs). It's a little bit surprising, as it's on a recent HP proliant DL360 g5 with SAS disks (Raid1) running freebsd 6.x ( /dev/da0s1f on /usr (ufs, local, soft-updates) ) Surprisingly, cpu load remained quite low during the operation (apache stayed responsive). Is it a known problem on this kind of hardware or something related to the filesystem? Is there a way to improve this? Even on my $500 PC with IDE disks this goes quicker... :) I checked http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html but I'm not sure if this would help in this case. Any suggestion how I can "fix" that? Regards, Olivier From wmoran at potentialtech.com Wed May 6 13:08:09 2009 From: wmoran at potentialtech.com (Bill Moran) Date: Wed May 6 13:08:22 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <1241610888.16418.64.camel@ompc.insign.local> References: <1241610888.16418.64.camel@ompc.insign.local> Message-ID: <20090506084834.61600c42.wmoran@potentialtech.com> In response to Olivier Mueller : > Hello, > > $ df -m ; date ; rm -r templates_c ; df -m ; date > Filesystem 1M-blocks Used Avail Capacity Mounted on > /dev/da0s1a 989 45 864 5% / > /dev/da0s1f 128631 102179 16160 86% /usr > [...] > Wed May 6 00:23:01 CEST 2009 > > Filesystem 1M-blocks Used Avail Capacity Mounted on > /dev/da0s1a 989 45 864 5% / > /dev/da0s1f 128631 69844 48496 59% /usr > Wed May 6 12:21:02 CEST 2009 > > > -> it took about 12 hours to delete these 30GB of files and > sub-directories (smarty cache files: many small files in many dirs). > It's a little bit surprising, as it's on a recent HP proliant DL360 g5 > with SAS disks (Raid1) running freebsd 6.x > ( /dev/da0s1f on /usr (ufs, local, soft-updates) ) > > Surprisingly, cpu load remained quite low during the operation (apache > stayed responsive). Is it a known problem on this kind of hardware or > something related to the filesystem? Is there a way to improve this? > Even on my $500 PC with IDE disks this goes quicker... :) > > I checked > http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html but > I'm not sure if this would help in this case. Any suggestion how I can > "fix" that? With lots of small files, the time involved is far less dependent on the size of data, and much more dependent on the number of files, and the resultant number of directory entries that need to be updated. "Lots" isn't a particularly accurate count of the # of files, but if you're talking web cache files, I'll guess they average 5k each, which means you had 6 million files. df -i would have been more useful in the output above. This brings a number of questions up: * Are you _sure_ softupdates is enabled on that partition? That's going to make the biggest improvement in speed. * Are these 7200RPM disks or 15,000? Again, going to make a big difference. * If apache was still running, is it possible that it was creating enough disk activity to slow the activity down? Running top -m io will show you how much disk IO each process is creating. * When you compared the speed to your laptop, did you delete 6 million files from the laptop? If you deleted a single 30G file, then you're comparing apples to atom bombs. If this is a directory that you blow away on a regular schedule, you'd do much better to make it a dedicated partition and simply reformat it. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ From om-lists-bsd at omx.ch Wed May 6 13:22:05 2009 From: om-lists-bsd at omx.ch (Olivier Mueller) Date: Wed May 6 13:22:12 2009 Subject: filesystem: 12h to delete 32GB of data (4 million files) In-Reply-To: <20090506084834.61600c42.wmoran@potentialtech.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> Message-ID: <1241616121.16418.109.camel@ompc.insign.local> Thanks for your answer Bill! (and to Will as well), Some more infos I gathered a few minutes ago: [~/templates_c]$ date; du -s -m ; date Wed May 6 13:35:15 CEST 2009 2652 . Wed May 6 13:52:36 CEST 2009 [~/templates_c]$ date ; find . | wc -l ; date Wed May 6 13:52:56 CEST 2009 305461 Wed May 6 14:09:39 CEST 2009 So this is on the system after a complete cache cleanup (at 00h00). 300'000 files and 2.6GB. So this night, there were probably around 3-4 million files to delete. Deletion may take time, but 20 minutes juste to _count_ all the files seems pretty long to me... I think I'll say a word to the developers to let them tune their caching system a bit :) On Wed, 2009-05-06 at 08:48 -0400, Bill Moran wrote: > With lots of small files, the time involved is far less dependent on > the size of data, and much more dependent on the number of files, and > the resultant number of directory entries that need to be updated. > "Lots" isn't a particularly accurate count of the # of files, but if > you're talking web cache files, I'll guess they average 5k each, which > means you had 6 million files. df -i would have been more useful in > the output above. Thanks, noted for next time. Now it looks like that: Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/da0s1f 128631 70544 47795 60% 1913875 15114219 11% /usr > This brings a number of questions up: > * Are you _sure_ softupdates is enabled on that partition? That's > going to make the biggest improvement in speed. According to "mount" output, yes. I found no specific message about that in the syslog or dmesg. > * Are these 7200RPM disks or 15,000? Again, going to make a big > difference. HP 146GB 6G SAS 10K SFF DP ENT HDD (15k were not available at the time the servers were ordered) ( http://h18004.www1.hp.com/products/servers/proliantstorage/serial/sas/index.html ) > * If apache was still running, is it possible that it was creating > enough disk activity to slow the activity down? Running > top -m io will show you how much disk IO each process is creating. Yes, apache was still running, but the activity was quite low (it was during the night, and the webpage doesn't get so many hits before 9 am local time) While watching "top -m io", the "du" or "find" takes between 80 and 99%, so I guess it's not the probleme here: PID UID VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 87996 1002 59 56 0 0 0 0 0.00% php 45389 1002 35 25 0 0 2 2 0.84% php 3964 1002 0 0 0 0 0 0 0.00% httpd 3822 1002 151 98 0 0 0 0 0.00% httpd 3005 1002 0 0 0 0 0 0 0.00% httpd 4129 1002 0 0 0 0 0 0 0.00% httpd 3971 1002 0 0 0 0 0 0 0.00% httpd 4231 1002 1 0 0 0 0 0 0.00% httpd 4132 0 234 5 234 0 0 234 97.91% find 98862 1002 1 0 0 0 0 0 0.00% top 609 0 0 0 0 0 0 0 0.00% snmpd [...] > * When you compared the speed to your laptop, did you delete 6 million > files from the laptop? If you deleted a single 30G file, then you're > comparing apples to atom bombs. Yes sorry, I know :) > If this is a directory that you blow away on a regular schedule, you'd > do much better to make it a dedicated partition and simply reformat > it. Yes, it is one of the best options. My initial goal was to delete all files older than N days by cron (find | xargs | rm, etc.), but if each cronjob takes 2 hours (and takes so much cpu time), it's probably not the best way. I'll make some more tests on an test-server later this week and speak with the devs. Thanks again for your very constructive feedback! Regards, Olivier From arkadijs.sislovs at affecto.lv Wed May 6 13:56:55 2009 From: arkadijs.sislovs at affecto.lv (Arkadi Shishlov) Date: Wed May 6 14:14:15 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <20090506084834.61600c42.wmoran@potentialtech.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> Message-ID: <20090506161543.062ba223@hal9000.mebius.lv> Its probably "dirhash' that is not enabled or its cache is too small for the task. From flo at kasimir.com Wed May 6 14:37:21 2009 From: flo at kasimir.com (Florian Smeets) Date: Wed May 6 14:37:28 2009 Subject: filesystem performance In-Reply-To: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> References: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> Message-ID: <4A019C24.6010804@kasimir.com> On 05.05.09 07:30, Mark Wong wrote: > Hi everyone, > > We (PostgreSQL community) have a HP DL380 G5 that we were using to do > some very basic filesystem characterizations as part of a database > performance tuning project, so we wanted to give FreeBSD a try out of > the box. For this set of data we used 7.1. We're (us few that are > running the tests) are fairly unfamiliar with the community here, so > I'll be as brief as I can. We're basically wondering if the data > we're getting out of the box is expected, and any tuning guidelines > including what changes we should expect to see in the performance. > I guess you are using the ciss driver in this box? There was a performance regression in this driver in 7.1. This should be fixed in 7.2, which came out recently. It is believed that you should get a whole lot better IO performance with 7.2 if you are using the ciss driver. From the 7.2 release notes: A bug in the ciss(4) driver which caused low ?max device openings? count and led to poor performance has been fixed. HTH, Florian From markwkm at gmail.com Wed May 6 15:01:30 2009 From: markwkm at gmail.com (Mark Wong) Date: Wed May 6 15:01:36 2009 Subject: filesystem performance In-Reply-To: <5871156390.20090506132550@mail.ru> References: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> <5871156390.20090506132550@mail.ru> Message-ID: <70c01d1d0905060801r1eb7b9f7o5c1c9505130a7667@mail.gmail.com> On Wed, May 6, 2009 at 2:25 AM, Anthony Pankov wrote: > Hello Mark, > > May i ask a question while more expierenced people is waking up? > > I don't fully understand the target. For what filesystem should be > optimized? > > I expect a patterns of recorded IO calls when pgsql perform typical > operations with statistics and in-depth analysis. The angle we're trying to look at is from a sizing perspective. In order words we want to have an idea of what to expect before we do it. For example, if I have 10 drives, what can I expect if I configure them in a RAID 10 configuration? > Are you sure there is ? strong relation between fio benchmark result and > PostgreSQL performance? Sorry, this was something I was trying to make clearer originally. No, I don't think there is a strong relationship between fio and PostgreSQL, but these i/o patterns we are simulating do give us a rough estimate of we can expect. For example, in workloads with lots of update and inserts into a database will generate a lot of sequential writes to the database logs, which we can physically isolate onto it's own lun. Similarly, in some warehousing applications there may be a table that is always scanned and read sequential, which also can be on its own physical lun. Regards, Mark From om-lists-bsd at omx.ch Wed May 6 15:02:35 2009 From: om-lists-bsd at omx.ch (Olivier Mueller) Date: Wed May 6 15:02:42 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <20090506161543.062ba223@hal9000.mebius.lv> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <20090506161543.062ba223@hal9000.mebius.lv> Message-ID: <1241622146.16418.128.camel@ompc.insign.local> On Wed, 2009-05-06 at 16:15 +0300, Arkadi Shishlov wrote: > Its probably "dirhash' that is not enabled or its cache is too small for the task. $ sysctl -a |grep dirha UFS dirhash 1262 286K - 9715683 16,32,64,128,256,512,1024,2048,4096 vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 2087495 vfs.ufs.dirhash_maxmem: 2097152 vfs.ufs.dirhash_minsize: 2560 So it's active, but probably too small as you suggest. Can I update this value "on the fly" or does it require a reboot (+ settings in loader.conf) ? regards, Olivier From markwkm at gmail.com Wed May 6 15:03:01 2009 From: markwkm at gmail.com (Mark Wong) Date: Wed May 6 15:03:08 2009 Subject: filesystem performance In-Reply-To: <4A019C24.6010804@kasimir.com> References: <70c01d1d0905042230v3357622cgf4c8e52a2a4ead96@mail.gmail.com> <4A019C24.6010804@kasimir.com> Message-ID: <70c01d1d0905060802y3eeb80c7m59451d29d99ae7cc@mail.gmail.com> On Wed, May 6, 2009 at 7:18 AM, Florian Smeets wrote: > On 05.05.09 07:30, Mark Wong wrote: >> >> Hi everyone, >> >> We (PostgreSQL community) have a HP DL380 G5 that we were using to do >> some very basic filesystem characterizations as part of a database >> performance tuning project, so we wanted to give FreeBSD a try out of >> the box. ?For this set of data we used 7.1. ?We're (us few that are >> running the tests) are fairly unfamiliar with the community here, so >> I'll be as brief as I can. ?We're basically wondering if the data >> we're getting out of the box is expected, and any tuning guidelines >> including what changes we should expect to see in the performance. >> > > I guess you are using the ciss driver in this box? There was a performance > regression in this driver in 7.1. This should be fixed in 7.2, which came > out recently. It is believed that you should get a whole lot better IO > performance with 7.2 if you are using the ciss driver. > > From the 7.2 release notes: > > A bug in the ciss(4) driver which caused low ?max device openings? count and > led to poor performance has been fixed. We'll have to make time to try that. :) Thanks (and to the others) for pointing that out. Regards, Mark From wmoran at collaborativefusion.com Wed May 6 15:05:54 2009 From: wmoran at collaborativefusion.com (Bill Moran) Date: Wed May 6 15:06:00 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <20090506161543.062ba223@hal9000.mebius.lv> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <20090506161543.062ba223@hal9000.mebius.lv> Message-ID: <20090506105542.45212e34.wmoran@collaborativefusion.com> In response to Arkadi Shishlov : > Its probably "dirhash' that is not enabled or its cache is too small for the task. I'm no expert, but I thought dirhash only improved read speed. His bottleneck would be writes. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. **************************************************************** From wmoran at potentialtech.com Wed May 6 15:10:25 2009 From: wmoran at potentialtech.com (Bill Moran) Date: Wed May 6 15:10:40 2009 Subject: filesystem: 12h to delete 32GB of data (4 million files) In-Reply-To: <1241616121.16418.109.camel@ompc.insign.local> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <1241616121.16418.109.camel@ompc.insign.local> Message-ID: <20090506111022.05d06f1a.wmoran@potentialtech.com> In response to Olivier Mueller : > > Yes, it is one of the best options. My initial goal was to delete all > files older than N days by cron (find | xargs | rm, etc.), but if each > cronjob takes 2 hours (and takes so much cpu time), it's probably not > the best way. > > I'll make some more tests on an test-server later this week and speak > with the devs. Thanks again for your very constructive feedback! Based on your comments here, it really sounds like your devs need to implement some sort of cache cleaning algo into their code. If it's just deleting the oldest files, then you could probably run it far more frequently if you simply created a new cache directory each hour, and deleted the previous one. Honestly, I'm really confused -- if you can just throw away the cache each night, then why are you caching to begin with? If you just need temp files, why doesn't the app clean up its temp files when it's done with them? If you have access to the developers, I think you'll be able to come up with a much better solution by working with them. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ From wojtek at wojtek.tensor.gdynia.pl Wed May 6 15:50:50 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Wed May 6 16:01:17 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <1241610888.16418.64.camel@ompc.insign.local> References: <1241610888.16418.64.camel@ompc.insign.local> Message-ID: > -> it took about 12 hours to delete these 30GB of files and > sub-directories (smarty cache files: many small files in many dirs). > It's a little bit surprising, as it's on a recent HP proliant DL360 g5 > with SAS disks (Raid1) running freebsd 6.x > ( /dev/da0s1f on /usr (ufs, local, soft-updates) ) > if you would use no raid or software raid it will behave normally. it takes <30 minutes for me to delete 300GB of squid files on ordinary SATA disk , millions of small files. From wojtek at wojtek.tensor.gdynia.pl Wed May 6 15:50:51 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Wed May 6 16:01:28 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <20090506084834.61600c42.wmoran@potentialtech.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> Message-ID: > means you had 6 million files. df -i would have been more useful in > the output above. > > This brings a number of questions up: > * Are you _sure_ softupdates is enabled on that partition? That's he showed mount output - he has softdeps on. > * Are these 7200RPM disks or 15,000? Again, going to make a big > difference. on 7200 RPM ordinary SATA disk i deleted 15 million files taking 300GB (squid cache) in less than 30 minutes. for sure it's because of his "hardware raid". i've NEVER seen "hardware raid" that is actually faster than non-raid config, or gmirror/gstripe config. usually it's far much slower From benjamin at seattlefenix.net Wed May 6 17:16:21 2009 From: benjamin at seattlefenix.net (Benjamin Krueger) Date: Wed May 6 17:16:27 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> Message-ID: <4A01C202.8080803@seattlefenix.net> Wojciech Puchar wrote: >> means you had 6 million files. df -i would have been more useful in > > >> the output above. >> >> This brings a number of questions up: >> * Are you _sure_ softupdates is enabled on that partition? That's > > he showed mount output - he has softdeps on. > >> * Are these 7200RPM disks or 15,000? Again, going to make a big >> difference. > > on 7200 RPM ordinary SATA disk i deleted 15 million files taking 300GB > (squid cache) in less than 30 minutes. > > for sure it's because of his "hardware raid". > > i've NEVER seen "hardware raid" that is actually faster than non-raid > config, or gmirror/gstripe config. > > usually it's far much slower Sorry, but my experience with that very server using a P400 controller with 256MB write cache is very different. My benchmarks showed that controller using Raid5 (with only 4 disks) is significantly faster than software layouts. The days when hardware controllers could automatically be considered slow are long gone. The hardware does get faster over time. Don't make any assumptions without doing benchmarks. From wmoran at potentialtech.com Wed May 6 18:29:54 2009 From: wmoran at potentialtech.com (Bill Moran) Date: Wed May 6 18:30:02 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> Message-ID: <20090506142951.2a27284d.wmoran@potentialtech.com> In response to "Gary Gatten" : > It could just be me, but I swear Hardware RAID has been faster for many > many years, especially with RAID5 arrays - or anything that requires > parity calcs. Most of my benchmarking was done on SCO OpenServer and > Novell UnixWare and Netware, but hardware RAID controllers were always > faster and of course required far less host CPU resources. Raid > 0/1/10/0+1/whatever arrays, I recall weren't as drastic, but I can't > imagine the controller making as big a difference as the drives in the > array - unless of course the drive for said controller sux! Keep in mind that there are a LOT of RAID controllers out there, and yes, some of them suck royally. Especially the consumer-grade stuff intended for people to use on their home systems. I'd be willing to bet that software RAID is faster than 90% of the consumer grade RAID cards, and probably more reliable than most of them as well. Controllers make a huge difference, even in server class RAID (in my experience). There is a significant gap in performance between the good stuff and the good enough stuff. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ From Ggatten at waddell.com Wed May 6 18:31:07 2009 From: Ggatten at waddell.com (Gary Gatten) Date: Wed May 6 18:50:18 2009 Subject: filesystem: 12h to delete 32GB of data Message-ID: <70C0964126D66F458E688618E1CD008A0793EBD2@WADPEXV0.waddell.com> Sorry, "drive" in last sentence should be "driver"! ----- Original Message ----- From: owner-freebsd-questions@freebsd.org To: Benjamin Krueger ; Wojciech Puchar Cc: freebsd-performance@freebsd.org ; Olivier Mueller ; Bill Moran ; freebsd-questions@freebsd.org Sent: Wed May 06 13:08:46 2009 Subject: RE: filesystem: 12h to delete 32GB of data It could just be me, but I swear Hardware RAID has been faster for many many years, especially with RAID5 arrays - or anything that requires parity calcs. Most of my benchmarking was done on SCO OpenServer and Novell UnixWare and Netware, but hardware RAID controllers were always faster and of course required far less host CPU resources. Raid 0/1/10/0+1/whatever arrays, I recall weren't as drastic, but I can't imagine the controller making as big a difference as the drives in the array - unless of course the drive for said controller sux!
"This email is intended to be reviewed by only the intended recipient and may contain information that is privileged and/or confidential. If you are not the intended recipient, you are hereby notified that any review, use, dissemination, disclosure or copying of this email and its attachments, if any, is strictly prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system."
_______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
"This email is intended to be reviewed by only the intended recipient and may contain information that is privileged and/or confidential. If you are not the intended recipient, you are hereby notified that any review, use, dissemination, disclosure or copying of this email and its attachments, if any, is strictly prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system."
From wojtek at wojtek.tensor.gdynia.pl Wed May 6 18:31:14 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Wed May 6 18:50:36 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <4A01C202.8080803@seattlefenix.net> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> Message-ID: >> config, or gmirror/gstripe config. >> >> usually it's far much slower > > Sorry, but my experience with that very server using a P400 controller with > 256MB write cache is very different. My benchmarks showed that controller > using Raid5 (with only 4 disks) is significantly faster than software > layouts. possibly with RAID5, but for sure slower than single drive > The days when hardware controllers could automatically be considered slow are > long gone. unfortunately not. From wojtek at wojtek.tensor.gdynia.pl Wed May 6 18:32:12 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Wed May 6 18:50:47 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> Message-ID: > It could just be me, but I swear Hardware RAID has been faster for many > many years, especially with RAID5 arrays - or anything that requires maybe with RAID5, but using RAID5 today (huge disk sizes, little sense to save on disk space) instead of RAID1/10 doesn't make much sense, as RAID5 is slow on writes by design From wojtek at wojtek.tensor.gdynia.pl Wed May 6 18:32:47 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Wed May 6 18:50:53 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <20090506142951.2a27284d.wmoran@potentialtech.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> <20090506142951.2a27284d.wmoran@potentialtech.com> Message-ID: > yes, some of them suck royally. you should rather say "some of them doesn't suck". From Ggatten at waddell.com Wed May 6 18:50:23 2009 From: Ggatten at waddell.com (Gary Gatten) Date: Wed May 6 19:06:54 2009 Subject: filesystem: 12h to delete 32GB of data Message-ID: <70C0964126D66F458E688618E1CD008A0793EBD4@WADPEXV0.waddell.com> OT now, but in high i/o envs with high concurrency needs, RAID5 is still the way to go, esp if 90% of i/o is reads. Of course it depends on file size / type as well... Anyway, let's sum it up with "a storage subsystem is only as fast as its slowest link" ----- Original Message ----- From: Wojciech Puchar To: Bill Moran Cc: Gary Gatten; Benjamin Krueger ; freebsd-performance@freebsd.org ; Olivier Mueller ; freebsd-questions@freebsd.org Sent: Wed May 06 13:31:53 2009 Subject: Re: filesystem: 12h to delete 32GB of data > yes, some of them suck royally. you should rather say "some of them doesn't suck".
"This email is intended to be reviewed by only the intended recipient and may contain information that is privileged and/or confidential. If you are not the intended recipient, you are hereby notified that any review, use, dissemination, disclosure or copying of this email and its attachments, if any, is strictly prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system."
From Ggatten at waddell.com Wed May 6 18:54:37 2009 From: Ggatten at waddell.com (Gary Gatten) Date: Wed May 6 19:07:24 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <4A01C202.8080803@seattlefenix.net> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> Message-ID: <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> It could just be me, but I swear Hardware RAID has been faster for many many years, especially with RAID5 arrays - or anything that requires parity calcs. Most of my benchmarking was done on SCO OpenServer and Novell UnixWare and Netware, but hardware RAID controllers were always faster and of course required far less host CPU resources. Raid 0/1/10/0+1/whatever arrays, I recall weren't as drastic, but I can't imagine the controller making as big a difference as the drives in the array - unless of course the drive for said controller sux!
"This email is intended to be reviewed by only the intended recipient and may contain information that is privileged and/or confidential. If you are not the intended recipient, you are hereby notified that any review, use, dissemination, disclosure or copying of this email and its attachments, if any, is strictly prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system."
From m.seaman at infracaninophile.co.uk Wed May 6 19:21:58 2009 From: m.seaman at infracaninophile.co.uk (Matthew Seaman) Date: Wed May 6 19:22:10 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <70C0964126D66F458E688618E1CD008A0793EBD4@WADPEXV0.waddell.com> References: <70C0964126D66F458E688618E1CD008A0793EBD4@WADPEXV0.waddell.com> Message-ID: <4A01E343.4020608@infracaninophile.co.uk> Gary Gatten wrote: > OT now, but in high i/o envs with high concurrency needs, RAID5 is > still the way to go, esp if 90% of i/o is reads. Of course it depends > on file size / type as well... Anyway, let's sum it up with "a > storage subsystem is only as fast as its slowest link" It's not just the balance of reads over writes. It's the size and sequential location of the IO requests. RAID5 is good for sequential reads -- eg. streaming a video -- where the system can read whole blocks from all the drives involved, calculate parity over the whole lot and then push all that blob of data up to the CPU. RAID5 is pretty pessimal if your usage pattern is small reads or writes randomly scattered over your storage area -- eg. typical RDBMS behaviour -- which works a great deal better on RAID10. I'd also contend that the essential difference between a really good fast hardware raid controller and something disappointingly mundane is a decent amount of non-volatile cache memory. For most H/W raid that equates to using a battery backup unit. I've been thinking though that a few GB of fast solid-state hard drive configured as a gjournal for a RAID10 (ie gstripe +gmirror) might achieve the same effect for rather less outlay... It would probably not be too shabby with RAID5 even, but of course you'ld lose the benefit of offloading parity calculations onto the RAID controller's CPU. Still, modern multi-core CPUs are probably fast enough nowadays to make that viable for many purposes. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20090506/a3456f62/signature.pgp From fjwcash at gmail.com Wed May 6 20:51:10 2009 From: fjwcash at gmail.com (Freddie Cash) Date: Wed May 6 20:51:23 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <4A01E343.4020608@infracaninophile.co.uk> References: <70C0964126D66F458E688618E1CD008A0793EBD4@WADPEXV0.waddell.com> <4A01E343.4020608@infracaninophile.co.uk> Message-ID: On Wed, May 6, 2009 at 12:21 PM, Matthew Seaman wrote: > Gary Gatten wrote: >> OT now, but in high i/o envs with high concurrency needs, RAID5 is >> still the way to go, esp if 90% of i/o is reads. Of course it depends >> on file size / type as well... Anyway, let's sum it up with "a >> storage subsystem is only as fast as its slowest link" > > It's not just the balance of reads over writes. ?It's the size and > sequential location of the IO requests. ?RAID5 is good for sequential reads -- eg. > streaming a video -- where the system can read whole blocks from all the > drives involved, calculate parity over the whole lot and then push all that > blob of data up to the CPU. > > RAID5 is pretty pessimal if your usage pattern is small reads or writes > randomly scattered over your storage area -- eg. typical RDBMS behaviour > -- which works a great deal better on RAID10. > > I'd also contend that the essential difference between a really good fast > hardware raid controller and something disappointingly mundane is a decent > amount of non-volatile cache memory. ?For most H/W raid that equates to > using a battery backup unit. ?I've been thinking though that a few GB of > fast solid-state hard drive configured as a gjournal for a RAID10 (ie > gstripe +gmirror) might achieve the same effect for rather less outlay... ?It > would probably not be too shabby with RAID5 even, but of course you'ld > lose the benefit of offloading parity calculations onto the RAID > controller's CPU. Still, modern multi-core CPUs are probably fast enough nowadays to > make that viable for many purposes. Depending on the number of drives you are using, ZFS would also be worth looking at. The raidz implementation works quite nicely, and (in theory) doesn't suffer from the major issues that RAID5/6 does. It also does implicit striping across all vdevs, so you can make some very fancy RAID layouts (each vdev can be mirrored, raidz1, raidz2, or just a bunch of disks). I don't know if the version of ZFS in FreeBSD 7.x supports hybrid pools, but the version in FreeBSD 8.0 should, which lets you add SSDs to the pool to be used automatically as "cache" in-between RAM and harddrives. -- Freddie Cash fjwcash@gmail.com From pathiaki2 at yahoo.com Thu May 7 02:05:19 2009 From: pathiaki2 at yahoo.com (Paul Patterson) Date: Thu May 7 02:05:25 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> Message-ID: <83156.91671.qm@web110507.mail.gq1.yahoo.com> Sorry. This statement is incorrect. If you aren't using ZFS, or even a GEOM volume with mirror/RAID5/softup/etc, you cannot make the statement that hardware RAID is faster. I learned that 3 years ago. It takes about 30 minutes to mirror 1.5TB on ZFS. Try that on hardware RAID. I did the same with 80 GB SATA drives a couple of years ago. Gmirror killed hardware mirror by 50% When your processor on your hardware RAID card is junk and you have a kickass processor and good chunk of memory on your main system and decent controller that isn't getting maxed, the "hardware RAID is always faster" paradigm walked out the door a few years ago. This does not go for EMC, IBM, Hitachi high-end storage arrays where you write to TBs of RAM Cache. P. ________________________________ From: Wojciech Puchar To: Gary Gatten Cc: freebsd-questions@freebsd.org; Benjamin Krueger ; Olivier Mueller ; freebsd-performance@freebsd.org; Bill Moran Sent: Wednesday, May 6, 2009 2:31:16 PM Subject: RE: filesystem: 12h to delete 32GB of data > It could just be me, but I swear Hardware RAID has been faster for many > many years, especially with RAID5 arrays - or anything that requires maybe with RAID5, but using RAID5 today (huge disk sizes, little sense to save on disk space) instead of RAID1/10 doesn't make much sense, as RAID5 is slow on writes by design _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" From wojtek at wojtek.tensor.gdynia.pl Thu May 7 12:16:45 2009 From: wojtek at wojtek.tensor.gdynia.pl (Wojciech Puchar) Date: Thu May 7 12:27:59 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <83156.91671.qm@web110507.mail.gq1.yahoo.com> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <4A01C202.8080803@seattlefenix.net> <70C0964126D66F458E688618E1CD008A0793EBD1@WADPEXV0.waddell.com> <83156.91671.qm@web110507.mail.gq1.yahoo.com> Message-ID: > If you aren't using ZFS, or even a GEOM volume with mirror/RAID5/softup/etc, > you cannot make the statement that hardware RAID is faster. I learned > that 3 years ago. i state exactly opposite. all hardware raid cards are made just to suck money from those who believe in it. like "performance is not enough - buy better/more expensive model." > This does not go for EMC, IBM, Hitachi high-end storage arrays where you write to TBs of RAM Cache. having same amount of extra memory on FreeBSD server directly will make better use of it. From seklecki at noc.cfi.pgh.pa.us Fri May 8 11:43:46 2009 From: seklecki at noc.cfi.pgh.pa.us (Brian A. Seklecki) Date: Fri May 8 11:48:47 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <1241610888.16418.64.camel@ompc.insign.local> References: <1241610888.16418.64.camel@ompc.insign.local> Message-ID: <1241782516.2053.10.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com> On Wed, 2009-05-06 at 13:54 +0200, Olivier Mueller wrote: > -> it took about 12 hours to delete these 30GB of files and > sub-directories (smarty cache files: many small files in many dirs). Haven't you ever had the pleasure of running Sendmail on Solaris? :) Move this data store to a separate partition. When it comes time to burn the queue, stop the service, unmount the partition, newfs it, remount, restart svc. Long live Pisces v2. ~BAS From ivoras at freebsd.org Fri May 15 11:29:47 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Fri May 15 11:29:54 2009 Subject: filesystem: 12h to delete 32GB of data In-Reply-To: <1241622146.16418.128.camel@ompc.insign.local> References: <1241610888.16418.64.camel@ompc.insign.local> <20090506084834.61600c42.wmoran@potentialtech.com> <20090506161543.062ba223@hal9000.mebius.lv> <1241622146.16418.128.camel@ompc.insign.local> Message-ID: Olivier Mueller wrote: > On Wed, 2009-05-06 at 16:15 +0300, Arkadi Shishlov wrote: >> Its probably "dirhash' that is not enabled or its cache is too small for the task. > > $ sysctl -a |grep dirha > UFS dirhash 1262 286K - 9715683 16,32,64,128,256,512,1024,2048,4096 > vfs.ufs.dirhash_docheck: 0 > vfs.ufs.dirhash_mem: 2087495 > vfs.ufs.dirhash_maxmem: 2097152 > vfs.ufs.dirhash_minsize: 2560 > > So it's active, but probably too small as you suggest. Can I update this > value "on the fly" or does it require a reboot (+ settings in > loader.conf) ? It's a regular sysctl - you can update it on runtime. Try setting it to 8 M and only increase if it needs to be increased. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20090515/25a4ae8b/signature.pgp From nysalsa at hotmail.it Wed May 27 14:44:20 2009 From: nysalsa at hotmail.it (justme2) Date: Wed May 27 20:26:54 2009 Subject: ipfw top packet processing rate Message-ID: <23743694.post@talk.nabble.com> Hi, are you aware of ipfw's performance processing packets on a 10Gbe link with 5Gb of traffic (or on a study/report about it) ? The box has the following configuration: - 2x Intel Xeon 5160 dual-core 3.00 GHz CPU (1333MHz FSB/4MB L2 cache) - 4x 512 MB FB (Fully Buffered) DDR2/667 memory - NIC Intel PRO/10Gbe Thanks Rob -- View this message in context: http://www.nabble.com/ipfw-top-packet-processing-rate-tp23743694p23743694.html Sent from the freebsd-performance mailing list archive at Nabble.com.