pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

David Xu listlog2011 at gmail.com
Fri Feb 17 02:42:29 UTC 2012


On 2012/2/17 10:19, Julian Elischer wrote:
> On 2/16/12 5:56 PM, David Xu wrote:
>> On 2012/2/17 8:42, Julian Elischer wrote:
>>> Adding David Xu for his thoughts since he reqrote the code in 
>>> quesiton in revision 213098
>>>
>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>
>>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>>> has suddenly started misbehaving.
>>>>>>>
>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>
>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>
>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>                  mutex->waiters++;
>>>>>>>                  ret = 
>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>                  mutex->waiters--;
>>>>>>>          }
>>>>>>>
>>>>>>>          if (!ret) {
>>>>>>>                  mutex->value--;
>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>          }
>>>>>>>
>>>>>>>
>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my 
>>>>>>> machine) with a
>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>> despite the fact that we set the timeout 10 seconds into the 
>>>>>>> future.
>>>>>>>
>>>>>>> Has anyone else seen anything like this?
>>>>>>> (and yes the condition variable attribute have been set to use 
>>>>>>> the REALTIME clock).
>>>>>> But why?
>>>>>>
>>>>>> Just a hypothesis that maybe there is some issue with time 
>>>>>> keeping on that system.
>>>>>> How would that code work out for you with MONOTONIC?
>>>>>
>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, 
>>>>> and they both had the same problem..
>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>
>>>>> I think we will try move out machine forward to a newer -stable to 
>>>>> see if it resolves.
>>>> Kan upgraded the machine today to today's 9.x branch tip and the 
>>>> problem still occurs.
>>>> 8.x does not have this problem.
>>>>
>>>> I have not got a 9-RELEASE machine to test on.. so I can not tell 
>>>> if this came in with the burst of stuff
>>>> that came in after the 9.x branch was unfrozen after the release of 
>>>> 9.0.
>>>>
>>>>
>>>
>> I am trying to reproduce the problem,  do you have complete sample 
>> code to test ?
>
> I'm still looking the exact set
> but on my machine (4 cpus) the program from ports sysutils/fio 
> exhibits the problem when used with
> kern.timecounter.hardware=TSC-low and with the following config file:
>
> pu05 # cat config.fio
>
> [global]
> #clocksource=cpu
> direct=1
> rw=randread
> bs=4096
> fill_device=1
> numjobs=16
> iodepth=16
> #ioengine=posixaio
> #ioengine=psync
> ioengine=psync
> group_reporting
> norandommap
> time_based
> runtime=60000
> randrepeat=0
>
> [file1]
> filename=/dev/ada0
>
> pu05 #
> pu05 # fio config.fio
> fio: this platform does not support process shared mutexes, forcing 
> use of threads. Use the 'thread' option to get rid of this warning.
> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
> ...
> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
> fio 2.0.3
> Starting 15 threads and 1 process
> fio: job startup hung? exiting.
> fio: 5 jobs failed to start
> Segmentation fault (core dumped)
> pu05#
>
>
> The reason 5 jobs failed to start is because the parent timed out on 
> them immediately.
> It didn't time out on 10 of them apparently.
>
>
> if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let the thread run
on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
      cpuset_t set;
      cpu = ++cpu % 4;
      CPU_ZERO(&set);
      CPU_SET(cpu, &set);
      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
      gettimeofday(&cur, NULL);
      if ( timercmp(&prev, &cur, >=)) {
         abort();
    }
}



More information about the freebsd-stable mailing list