kern/129164: Wrong priority value for normal processes

David Xu davidxu at freebsd.org
Thu Nov 27 18:46:07 PST 2008


Jeff Roberson wrote:
> On Thu, 27 Nov 2008, Unga wrote:
> 
>> --- On Tue, 11/25/08, Unga <unga888 at yahoo.com> wrote:
>>
>>> The priority value for root and other normal processes is
>>> 65504 (rtp.prio) where zero (0) is expected.
>>>
>>> I checked the program flow from /usr/src/usr.bin/su/su.c to
>>> /usr/src/lib/libutil/login_class.c and it looks
>>> setusercontext() is setting the priority zero (0) right but
>>> the moment it come out from the setusercontext() call in
>>> su.c, the priority has already turn to 65504.
>>>
>>> Maximum priority value for normal priority processes can
>>> take is 20, not 65504. Normal priority processes are
>>> expected to run at priority zero (0) as it is specified in
>>> /etc/login.conf under login class "default".
>>>
>>
>> I have further checked the rtprio(2) system call for how it set and 
>> read priorities.
>>
>> Setting Priority:
>> rtprio(RTP_SET, 0, &rtp)
>>
>> rtprio() => rtprio_thread() => rtp_to_pri()
>>
>> rtp_to_pri() calculates newpri as:
>> newpri = PRI_MIN_TIMESHARE + rtp->prio;
>>
>> PRI_MIN_TIMESHARE is for normal priority, its PRI_MIN_REALTIME for 
>> realtime priority etc.
>>
>> Now rtp_to_pri() calls sched_class() to set the priority class. It 
>> sets td->td_pri_class to the priority class given.
>>
>> Then rtp_to_pri() calls sched_user_prio() to set the priority. It sets 
>> following fields to the priority calculated (newpri):
>> td->td_base_user_pri
>> td->td_user_pri
>>
>> Then rtp_to_pri() calls sched_prio(). It sets following field to the 
>> priority calculated (newpri):
>> td->td_base_pri
>>
>> The sched_prio() calls sched_thread_priority() which sets 
>> td->td_priority to the priority calculated (newpri).
>>
>> Of course not all td->td_* fields are set in one go. Some are set 
>> conditionally. But the td->td_base_user_pri is always set.
>>
>>
>> Reading Priority:
>> rtprio(RTP_LOOKUP, 0, &rtp)
>>
>> rtprio() => rtprio_thread() => pri_to_rtp()
>>
>> At pri_to_rtp(), rtp->type and rtp->prio are set as follows:
>> rtp->type = td->td_pri_class;
>> rtp->prio = td->td_base_user_pri - PRI_MIN_TIMESHARE;
>>
>> That is, rtprio(2) system call sets the td->td_base_user_pri when 
>> request to set priority, and when request to read the priority, it 
>> reads the td->td_base_user_pri.
>>
>> In another word, for rtprio(2) to function properly the 
>> td->td_base_user_pri should not be changed.
>>
>> As the rtp->prio is unsigned short, for rtp->prio to become a huge 
>> number (65504), td->td_base_user_pri should be less than 
>> PRI_MIN_TIMESHARE.
>>
>>
>> This shows the actual problem is in the scheduler. In this case, 
>> sched_ule.
>>
>> I presume the sched_ule should not touch the td->td_base_user_pri. 
>> Instead, probably, it should use td->td_priority for its internal 
>> purposes.
>>
>> Appreciate if Jeffrey Roberson <jeff at freebsd.org> could shed more 
>> light on this issue.
> 
> The base_pri vs td_priority is really jhb's domain.  I added him to the cc.
> 
> Thanks,
> Jeff
> 

This might be caused by following code in sched_ule.c:

static void
sched_priority(struct thread *td)
{
         int score;
         int pri;

         if (td->td_pri_class != PRI_TIMESHARE)
                 return;
         /*
          * If the score is interactive we place the thread in the realtime
          * queue with a priority that is less than kernel and interrupt
          * priorities.  These threads are not subject to nice restrictions.
          *
          * Scores greater than this are placed on the normal timeshare 
queue
          * where the priority is partially decided by the most recent cpu
          * utilization and the rest is decided by nice value.
          *
          * The nice value of the process has a linear effect on the 
calculated
          * score.  Negative nice values make it easier for a thread to be
          * considered interactive.
          */
         score = imax(0, sched_interact_score(td) - td->td_proc->p_nice);
         if (score < sched_interact) {
                 pri = PRI_MIN_REALTIME;
                 pri += ((PRI_MAX_REALTIME - PRI_MIN_REALTIME) / 
sched_interact)
                     * score;
                 KASSERT(pri >= PRI_MIN_REALTIME && pri <= PRI_MAX_REALTIME,
                     ("sched_priority: invalid interactive priority %d 
score %d",
                     pri, score));
         } else {


it uses PRI_MIN_REALTIME, then it calls sched_user_prio(td, pri) which 
sets td_base_user_pri and td_user_pri, and causes td_user_pri and 
td_base_user_pri to be out of range. Should PRI_MIN_REALTIME and 
PRI_MAX_REALTIME be PRI_MIN_TIMESHARE and PRI_MAX_TIMESHARE ?

>>
>> Since I'm not conversant with the sched_ule, I may not be able to 
>> develop a fix for sched_ule. Appreciate either Jeffrey or somebody 
>> else could look into a fix for sched_ule. I can certainly help in 
>> apply a patch and test.
>>
>> Best regards
>> Unga
>>
>>
>>
>>
> _______________________________________________
> freebsd-bugs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe at freebsd.org"
> 



More information about the freebsd-bugs mailing list