pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer julian at freebsd.org
Fri Feb 17 07:40:10 UTC 2012


adding jkim as he seems to be the last person working with TSC.


On 2/16/12 6:42 PM, David Xu wrote:
> On 2012/2/17 10:19, Julian Elischer wrote:
>> On 2/16/12 5:56 PM, David Xu wrote:
>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>> Adding David Xu for his thoughts since he reqrote the code in 
>>>> quesiton in revision 213098
>>>>
>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>
>>>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>>>> has suddenly started misbehaving.
>>>>>>>>
>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>>
>>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>>
>>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>>                  mutex->waiters++;
>>>>>>>>                  ret = 
>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>                  mutex->waiters--;
>>>>>>>>          }
>>>>>>>>
>>>>>>>>          if (!ret) {
>>>>>>>>                  mutex->value--;
>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>>          }
>>>>>>>>
>>>>>>>>
>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my 
>>>>>>>> machine) with a
>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>> despite the fact that we set the timeout 10 seconds into the 
>>>>>>>> future.
>>>>>>>>
>>>>>>>> Has anyone else seen anything like this?
>>>>>>>> (and yes the condition variable attribute have been set to 
>>>>>>>> use the REALTIME clock).
>>>>>>> But why?
>>>>>>>
>>>>>>> Just a hypothesis that maybe there is some issue with time 
>>>>>>> keeping on that system.
>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>
>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and 
>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>
>>>>>> I think we will try move out machine forward to a newer -stable 
>>>>>> to see if it resolves.
>>>>> Kan upgraded the machine today to today's 9.x branch tip and the 
>>>>> problem still occurs.
>>>>> 8.x does not have this problem.
>>>>>
>>>>> I have not got a 9-RELEASE machine to test on.. so I can not 
>>>>> tell if this came in with the burst of stuff
>>>>> that came in after the 9.x branch was unfrozen after the release 
>>>>> of 9.0.
>>>>>
>>>>>
>>>>
>>> I am trying to reproduce the problem,  do you have complete sample 
>>> code to test ?
>>
>> I'm still looking the exact set
>> but on my machine (4 cpus) the program from ports sysutils/fio 
>> exhibits the problem when used with
>> kern.timecounter.hardware=TSC-low and with the following config file:
>>
>> pu05 # cat config.fio
>>
>> [global]
>> #clocksource=cpu
>> direct=1
>> rw=randread
>> bs=4096
>> fill_device=1
>> numjobs=16
>> iodepth=16
>> #ioengine=posixaio
>> #ioengine=psync
>> ioengine=psync
>> group_reporting
>> norandommap
>> time_based
>> runtime=60000
>> randrepeat=0
>>
>> [file1]
>> filename=/dev/ada0
>>
>> pu05 #
>> pu05 # fio config.fio
>> fio: this platform does not support process shared mutexes, forcing 
>> use of threads. Use the 'thread' option to get rid of this warning.
>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>> ...
>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>> fio 2.0.3
>> Starting 15 threads and 1 process
>> fio: job startup hung? exiting.
>> fio: 5 jobs failed to start
>> Segmentation fault (core dumped)
>> pu05#
>>
>>
>> The reason 5 jobs failed to start is because the parent timed out 
>> on them immediately.
>> It didn't time out on 10 of them apparently.
>>
>>
>> if I set the timer to ACPI-fast it works as expected..
> maybe following code can check to see if TSC-LOW works by let the 
> thread run
> on each cpu.
>
> gettimeofday(&prev, NULL);
> int cpu = 0;
> for (;;) {
>      cpuset_t set;
>      cpu = ++cpu % 4;
>      CPU_ZERO(&set);
>      CPU_SET(cpu, &set);
>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>      gettimeofday(&cur, NULL);
>      if ( timercmp(&prev, &cur, >=)) {
>         abort();
>    }
> }
>
>



More information about the freebsd-stable mailing list