From ntapfaq at gmail.com Wed Apr 15 03:30:39 2009 From: ntapfaq at gmail.com (ntap faq) Date: Wed Apr 15 03:47:29 2009 Subject: Q on Flat profiling Message-ID: <20ef3b320904150303x64c1665enf2aa2b24929ba86d@mail.gmail.com> Hi, I am doing flat profiling for custom kernel code on hardware with multiple CPUs. Just wanted to verify if i am doing things correctly. If your?e still interested, read on. The way I collect the gmon.* files is by doing: For each CPU [0..3] { #switch cpu by using sysctl?. kgmon ?r # reset kgmon ?b # start } (sleep for 1min/10mins and let the code do its job) For each CPU [0..3] { #switch cpu by using sysctl?. kgmon ?h # stop } #dumps the gmon.* files which I make sense with by using gprof. For 1 min delays, I get smooth , consistent sampling profiles: *Sampling when profiling ios done for 60 seconds/1min* gprof.out.0.4:granularity: each sample hit covers 16 byte(s) for 0.00% of 65.06 seconds gprof.out.1.4:granularity: each sample hit covers 16 byte(s) for 0.00% of 65.02 seconds gprof.out.2.4:granularity: each sample hit covers 16 byte(s) for 0.00% of 65.05 seconds gprof.out.3.4:granularity: each sample hit covers 16 byte(s) for 0.00% of 65.12 seconds It turns out that the sampling numbers are skewed once the sampling time goes to 10mins? *Sampling when profiling ios done for 600 seconds/10mins* gprof.out.0:granularity: each sample hit covers 16 byte(s) for 0.00% of 304.74 seconds gprof.out.1:granularity: each sample hit covers 16 byte(s) for 0.00% of 403.11 seconds gprof.out.2granularity: each sample hit covers 16 byte(s) for 0.00% of 501.55 seconds gprof.out.3:granularity: each sample hit covers 16 byte(s) for 0.00% of 206.47 seconds No I cant fathom why? if you do have any idea why, I?d appreciate it. Thanks rohit From david at catwhisker.org Fri Apr 24 17:27:13 2009 From: david at catwhisker.org (David Wolfskill) Date: Fri Apr 24 17:28:21 2009 Subject: Presentation of performance data & analysis? Message-ID: <20090424165012.GB1387@albert.catwhisker.org> I apologize, as this is a bit tangential to the description of the list. I've been doing some measurements of workloads of interest (in my case, the workload is building some software, and the metric of greatest interest is "elapsed time" (which I obtain via /usr/bin/time)). And I've been using phk's ministat (/usr/src/tools/tools/ministat, for any who aren't aware of it). It is quite useful (so yeah, I owe phk a beer), but I'm trying to figure out how to present results to management-types. While I don't have any PHBs in my direct management chain, I've seen some PHB tendencies in the management of the folks I'm supporting. And I get the message that "complicated" won't ccommunicate to them. Nor will "nuanced." Even a pointer to some examples of approaches that seem to work well for this sort of thing would help a great deal -- my training hasn't exactly been in statistical analysis or in presentation of data. :-} I'll be happy to summarize responses that are not sent to the list (unless you'd rather I didn't, of course). Thanks. Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20090424/3def5e5d/attachment.pgp From ivoras at freebsd.org Fri Apr 24 18:09:57 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Apr 24 18:10:04 2009 Subject: Presentation of performance data & analysis? In-Reply-To: <20090424165012.GB1387@albert.catwhisker.org> References: <20090424165012.GB1387@albert.catwhisker.org> Message-ID: David Wolfskill wrote: > I apologize, as this is a bit tangential to the description of the list. > > I've been doing some measurements of workloads of interest (in my case, > the workload is building some software, and the metric of greatest > interest is "elapsed time" (which I obtain via /usr/bin/time)). > > And I've been using phk's ministat (/usr/src/tools/tools/ministat, for > any who aren't aware of it). It is quite useful (so yeah, I owe phk a > beer), but I'm trying to figure out how to present results to > management-types. Um, I may be missing your point but what is wrong with putting the text "55 bogons per second difference" in large bold letters centered on the page and "with 95% probability" in small plain letters, as a footnote? They probably wouldn't have use for the statistical graph even if they knew how to parse it, so you might as well put it in in the background and assign 90% transparency on it, to serve as eye candy only. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20090424/9407c5e5/signature.pgp From cswiger at mac.com Fri Apr 24 18:58:03 2009 From: cswiger at mac.com (Chuck Swiger) Date: Fri Apr 24 18:58:09 2009 Subject: Presentation of performance data & analysis? In-Reply-To: <20090424165012.GB1387@albert.catwhisker.org> References: <20090424165012.GB1387@albert.catwhisker.org> Message-ID: <60B82B59-50EF-475B-8F5A-D233F2C06FD8@mac.com> Hi, David-- On Apr 24, 2009, at 9:50 AM, David Wolfskill wrote: > While I don't have any PHBs in my direct management chain, I've seen > some PHB tendencies in the management of the folks I'm supporting. > And > I get the message that "complicated" won't ccommunicate to them. Nor > will "nuanced." > > Even a pointer to some examples of approaches that seem to work well > for this sort of thing would help a great deal -- my training hasn't > exactly been in statistical analysis or in presentation of data. :-} Presentation to PHBs and statistical analysis are remarkably different skills. Save the latter for an appendix or footnotes, so the engineering types have some confidence that you've actually done some work in testing and your results are likely to be sane, if/when the PHBs for the client ask their local tech gurus about it afterwards. For the "presentation to PHBs" part, it's simple: start with an intro that briefly mentions what you want to talk about and why they should care, typically, how much money can they save if they make the change (or how much more can they make, depending, etc). The most direct example I can recall of this was from a professor of human-computer interaction (HCI), who was studying things like supermarket checkout line scanners and telephone operator systems. It turns out that if you pre-record the initial greeting, ie, where you dial 0 and the operator says "Hello, this is AT&T [or whomever], how may I help you?" so that the operator can focus on the type of incoming call (ie, residential line operator request, pay phone, fire/ police/emergency, jail/prison calls, etc) instead of speaking a rote response, this saves a few (about 3 seconds) per call in processing. At the time this study was done (1990ish), that represented on the order of $50 million dollars per year savings to the phone company. Then go into more details such as what the change would entail, what benefits should occur, what tradeoffs might apply, any caveats, and then summarize with a repeat of the core idea and cost/benefit or savings they get for the conclusion. If this sounds to you like the way the classic 5-paragraph essay works (ie, paragraph 1: intro, tell them what you're saying, paragraphs 2-4: three points, paragraph 5: conclusion, where you tell them again what you've just said :), well, you're getting the idea.... Regards, -- -Chuck From kuan.joe at gmail.com Fri Apr 24 23:08:41 2009 From: kuan.joe at gmail.com (Joseph Kuan) Date: Fri Apr 24 23:36:11 2009 Subject: FreeBSD 7.1 taskq em performance Message-ID: <40bb871a0904241542o3f4d6c6ap62ff71876074bbea@mail.gmail.com> Hi all, I have been hitting some barrier with FreeBSD 7.1 network performance. I have written an application which contains two kernel threads that takes mbufs directly from a network interface and forwards to another network interface. This idea is to simulate different network environment. I have been using FreeBSD 6.4 amd64 and tested with an Ixia box (specialised hardware firing very high packet rate). The PC was a Core2 2.6 GHz with dual ports Intel PCIE Gigabit network card. It can manage up to 1.2 million pps. I have a higher spec PC with FreeBSD 7.1 amd64 and Quadcore 2.3 GHz and PCIE Gigabit network card. The performance can only achieve up to 600k pps. I notice the 'taskq em0' and 'taskq em1' is solid 100% CPU but it is not in FreeBSD 6.4. Any advice? Many thanks in advance Joe From pieter at degoeje.nl Mon Apr 27 00:02:49 2009 From: pieter at degoeje.nl (Pieter de Goeje) Date: Mon Apr 27 00:02:56 2009 Subject: ACPI-fast default timecounter, but HPET 83% faster Message-ID: <200904270150.31912.pieter@degoeje.nl> Dear hackers, While fiddling with the sysctl kern.timecounter.hardware, I found out that on my system HPET is significantly faster than ACPI-fast. Using the program below I measured the number of clock_gettime() calls the system can execute per second. I ran the program 10 times for each configuration and here are the results: x ACPI-fast + HPET +-------------------------------------------------------------------------+ |x +| |x +| |x ++| |x ++| |x ++| |x ++| |A |A| +-------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 822032 823752 823551 823397.8 509.43254 + 10 1498348 1506862 1502830 1503267.4 2842.9779 Difference at 95.0% confidence 679870 +/- 1918.94 82.5688% +/- 0.233052% (Student's t, pooled s = 2042.31) System details: Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz (3200.02-MHz 686-class CPU), Gigabyte P35-DS3R motherboard running i386 -CURRENT updated today. Unfortunately I only have one system with a HPET timecounter, so I cannot verify these results on another system. If similar results are obtained on other machines, I think the HPET timecounter quality needs to be increased beyond that of ACPI-fast. Regards, Pieter de Goeje ----- 8< ----- clock_gettime.c ----- 8< ------ #include #include #include #define COUNT 1000000 int main() { struct timespec ts_start, ts_stop, ts_read; double time; int i; clock_gettime(CLOCK_MONOTONIC, &ts_start); for(i = 0; i < COUNT; i++) { clock_gettime(CLOCK_MONOTONIC, &ts_read); } clock_gettime(CLOCK_MONOTONIC, &ts_stop); time = (ts_stop.tv_sec - ts_start.tv_sec) + (ts_stop.tv_nsec - ts_start.tv_nsec) * 1E-9; printf("%.0f\n", COUNT / time); } From yanefbsd at gmail.com Mon Apr 27 03:00:32 2009 From: yanefbsd at gmail.com (Garrett Cooper) Date: Mon Apr 27 03:14:34 2009 Subject: ACPI-fast default timecounter, but HPET 83% faster In-Reply-To: <200904270150.31912.pieter@degoeje.nl> References: <200904270150.31912.pieter@degoeje.nl> Message-ID: <7d6fde3d0904261927s1a67cf85jc982c1a68e30e081@mail.gmail.com> On Sun, Apr 26, 2009 at 4:50 PM, Pieter de Goeje wrote: > Dear hackers, > > While fiddling with the sysctl kern.timecounter.hardware, I found out that on > my system HPET is significantly faster than ACPI-fast. Using the program > below I measured the number of clock_gettime() calls the system can execute > per second. I ran the program 10 times for each configuration and here are > the results: > > x ACPI-fast > + HPET > +-------------------------------------------------------------------------+ > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +| > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +| > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?++| > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?++| > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?++| > |x ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?++| > |A ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|A| > +-------------------------------------------------------------------------+ > ? ?N ? ? ? ? ? Min ? ? ? ? ? Max ? ? ? ?Median ? ? ? ? ? Avg ? ? ? ?Stddev > x ?10 ? ? ? ?822032 ? ? ? ?823752 ? ? ? ?823551 ? ? ?823397.8 ? ? 509.43254 > + ?10 ? ? ? 1498348 ? ? ? 1506862 ? ? ? 1502830 ? ? 1503267.4 ? ? 2842.9779 > Difference at 95.0% confidence > ? ? ? ?679870 +/- 1918.94 > ? ? ? ?82.5688% +/- 0.233052% > ? ? ? ?(Student's t, pooled s = 2042.31) > > System details: Intel(R) Core(TM)2 Duo CPU E6750 ?@ 2.66GHz (3200.02-MHz > 686-class CPU), Gigabyte P35-DS3R motherboard running i386 -CURRENT updated > today. > > Unfortunately I only have one system with a HPET timecounter, so I cannot > verify these results on another system. If similar results are obtained on > other machines, I think the HPET timecounter quality needs to be increased > beyond that of ACPI-fast. > > Regards, > > Pieter de Goeje > > ----- 8< ----- clock_gettime.c ----- 8< ------ > #include > #include > #include > > #define COUNT 1000000 > > int main() { > ? ? ? ?struct timespec ts_start, ts_stop, ts_read; > ? ? ? ?double time; > ? ? ? ?int i; > > ? ? ? ?clock_gettime(CLOCK_MONOTONIC, &ts_start); > ? ? ? ?for(i = 0; i < COUNT; i++) { > ? ? ? ? ? ? ? ?clock_gettime(CLOCK_MONOTONIC, &ts_read); > ? ? ? ?} > ? ? ? ?clock_gettime(CLOCK_MONOTONIC, &ts_stop); > > ? ? ? ?time = (ts_stop.tv_sec - ts_start.tv_sec) + (ts_stop.tv_nsec - > ts_start.tv_nsec) * 1E-9; > ? ? ? ?printf("%.0f\n", COUNT / time); > } I'm seeing similar results. [root@orangebox /usr/home/gcooper]# dmesg | grep 'Timecounter "' Timecounter "i8254" frequency 1193182 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Timecounter "HPET" frequency 14318180 Hz quality 900 [root@orangebox /usr/home/gcooper]# ./cgt 1369355 [root@orangebox /usr/home/gcooper]# sysctl kern.timecounter.hardware="ACPI-fast" kern.timecounter.hardware: HPET -> ACPI-fast [root@orangebox /usr/home/gcooper]# ./cgt 772289 Why's the default ACPI-fast? For power-saving functionality or because of the `quality' factor? What is the criteria that determines the `quality' of a clock as what's being reported above (I know what determines the quality of a clock visually from a oscilloscope =])? Thanks, -Garrett From raykinsella78 at gmail.com Mon Apr 27 08:43:45 2009 From: raykinsella78 at gmail.com (Ray Kinsella) Date: Mon Apr 27 08:44:02 2009 Subject: FreeBSD 7.1 taskq em performance In-Reply-To: <40bb871a0904241542o3f4d6c6ap62ff71876074bbea@mail.gmail.com> References: <40bb871a0904241542o3f4d6c6ap62ff71876074bbea@mail.gmail.com> Message-ID: <584ec6bb0904270118v37795ee2k24c9262d4c1abd80@mail.gmail.com> Joseph, I would recommend that you start with PMCStat and figure where the bottleneck is, Given that you have a two threads and your CPU is at 100%, my a apriori guess would be a contention for a spinlock, so I might also try to use LOCK_PROFILING to handle on this. Regards Ray Kinsella On Fri, Apr 24, 2009 at 11:42 PM, Joseph Kuan wrote: > Hi all, > I have been hitting some barrier with FreeBSD 7.1 network performance. I > have written an application which contains two kernel threads that takes > mbufs directly from a network interface and forwards to another network > interface. This idea is to simulate different network environment. > > I have been using FreeBSD 6.4 amd64 and tested with an Ixia box > (specialised hardware firing very high packet rate). The PC was a Core2 2.6 > GHz with dual ports Intel PCIE Gigabit network card. It can manage up to > 1.2 > million pps. > > I have a higher spec PC with FreeBSD 7.1 amd64 and Quadcore 2.3 GHz and > PCIE Gigabit network card. The performance can only achieve up to 600k pps. > I notice the 'taskq em0' and 'taskq em1' is solid 100% CPU but it is not in > FreeBSD 6.4. > > Any advice? > > Many thanks in advance > > Joe > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > From jhb at freebsd.org Thu Apr 30 21:41:24 2009 From: jhb at freebsd.org (John Baldwin) Date: Thu Apr 30 22:28:10 2009 Subject: ACPI-fast default timecounter, but HPET 83% faster In-Reply-To: <7d6fde3d0904261927s1a67cf85jc982c1a68e30e081@mail.gmail.com> References: <200904270150.31912.pieter@degoeje.nl> <7d6fde3d0904261927s1a67cf85jc982c1a68e30e081@mail.gmail.com> Message-ID: <200904300846.41576.jhb@freebsd.org> On Sunday 26 April 2009 10:27:42 pm Garrett Cooper wrote: > I'm seeing similar results. > > [root@orangebox /usr/home/gcooper]# dmesg | grep 'Timecounter "' > Timecounter "i8254" frequency 1193182 Hz quality 0 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > Timecounter "HPET" frequency 14318180 Hz quality 900 > [root@orangebox /usr/home/gcooper]# ./cgt > 1369355 > [root@orangebox /usr/home/gcooper]# sysctl > kern.timecounter.hardware="ACPI-fast" > kern.timecounter.hardware: HPET -> ACPI-fast > [root@orangebox /usr/home/gcooper]# ./cgt > 772289 > > Why's the default ACPI-fast? For power-saving functionality or because > of the `quality' factor? What is the criteria that determines the > `quality' of a clock as what's being reported above (I know what > determines the quality of a clock visually from a oscilloscope =])? I suspect that the quality of the HPET driver is lower simply because no one had measured it previously and HPET is newer and less "proven". -- John Baldwin From bruce at cran.org.uk Thu Apr 30 21:52:52 2009 From: bruce at cran.org.uk (Bruce Cran) Date: Thu Apr 30 22:28:20 2009 Subject: ACPI-fast default timecounter, but HPET 83% faster In-Reply-To: <200904300846.41576.jhb@freebsd.org> References: <200904270150.31912.pieter@degoeje.nl> <7d6fde3d0904261927s1a67cf85jc982c1a68e30e081@mail.gmail.com> <200904300846.41576.jhb@freebsd.org> Message-ID: <20090430225245.538d073e@gluon.draftnet> On Thu, 30 Apr 2009 08:46:41 -0400 John Baldwin wrote: > On Sunday 26 April 2009 10:27:42 pm Garrett Cooper wrote: > > Why's the default ACPI-fast? For power-saving functionality or > > because of the `quality' factor? What is the criteria that > > determines the `quality' of a clock as what's being reported above > > (I know what determines the quality of a clock visually from a > > oscilloscope =])? > > I suspect that the quality of the HPET driver is lower simply because > no one had measured it previously and HPET is newer and less "proven". > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_hpet.c shows some of the history behind the decision. Apparently it used to be slower but it was hoped it would get faster as systems supported it better. I guess that's happening now. -- Bruce Cran