Why is PCE not set in CR4?

Grumble invalid at kma.eu.org
Wed Oct 1 02:45:25 PDT 2003


[ First posted to freebsd-questions and freebsd-ia32 ]
[ Add freebsd-hackers which I hope is appropriate    ]

The References: and In-Reply-To: headers are missing from this 
message. If your mail client does not thread it correctly, please 
accept my apologies. Before mailman, I could display messages in raw 
format, with full headers, e.g.

docs.freebsd.org/cgi/getmsg.cgi?fetch=2771331+0+archive/2003/freebsd-questions/20030928.freebsd-questions+raw

>> I've been playing with my Athlon's timestamp counter for a while,
>> and I would like to experiment with the performance-monitoring
>> counters now.
>>
>> I can execute the RDTSC instruction from ring 3 because the TSD
>> (TimeStamp Disable) bit in CR4 (Control Register 4) is cleared.
>>
>> However, I am not allowed to use the RDPMC instruction from ring 3
>> because the PCE (Performance-monitoring Counters Enable) bit is not set.
> 
> You can do it with /dev/perfmon. man 4 perfmon.

I have read the perfmon documentation and source code. For several 
reasons, I do not think it is totally adequate in my situation.

It was designed in 1996 with the Pentium Pro in mind, which, 
apparently, only has two performance counters:

   #define NPMC 2
   if (pmc < 0 || pmc >= NPMC) return EINVAL;

I mentioned kernel modules because I want to avoid having to 
recompile my kernel. Even if I did set NPMC to 4 and recompiled, 
I am not convinced that perfmon would still work.

void perfmon_init(void)
{
   ...
   case CPUCLASS_686:
     perfmon_cpuok = 1;
     msr_ctl[0] = 0x186;
     msr_ctl[1] = 0x187;
     msr_pmc[0] = 0xc1;
     msr_pmc[1] = 0xc2;
     writectl = writectl6;
     break;

/* if NPMC>2 then msr_ctl[] and msr_pmc[] are not completely
 * initialized, is this a problem? */

Assume I get perfmon to work with my K7's 4 performance-monitoring 
counters. Since PCE is not set, I am not allowed to call RDPMC from 
ring 3. I have to make a system call, just to read the counters.

I will pay in terms of computation overhead to process a system 
call, instead of a single instruction. But more importantly, it will 
wreck the cache, and possibly the TLB.

There is no point in monitoring an event if the monitoring tools 
disturb the environment too much.

>> Is there a reason (security? performance? other?) why FreeBSD does
>> not set PCE at boot time?

Is it just an oversight that FreeBSD does not set PCE at boot time, 
or is there a reason?

I can provide a patch if nobody opposes the idea. Or write a kernel 
module that will do it when loaded.

>> On a related subject, is there a way for a kernel module to catch a
>> general-protection fault caused by an application trying to execute
>> RDMSR or WRMSR, and have the kernel module execute the instruction
>> for the application? Or is it cleaner to register two new system
>> calls to achieve the same thing?
> 
> That would (probably) require adding superuser-configurable permissions
> to read/write to a specific MSR, as some of them are critical. I doubt
> it's worth creating extra device nodes, and I wonder if there's a
> "cleaner" way to do that.

My intent is to allow an application access to the 4 performance 
monitoring control registers ONLY. The application would try to 
execute WRMSR (a privileged instruction) which would cause a GPF. 
The kernel module would catch the fault, sanity-check the arguments, 
and proceed with the WRMSR when the arguments are valid.

Could you point me to some documentation, or is the source the only 
documentation available in this situation? :-)

-- 
Shill (shill at free dot fr)




More information about the freebsd-hackers mailing list