Query about bhyve's blockif_cancel and the signalling mechanisms
    Tycho Nightingale 
    tychon at freebsd.org
       
    Wed Dec 14 00:07:00 UTC 2016
    
    
  
Hi,
On Dec 13, 2016, at 1:32 AM, Peter Grehan <grehan at freebsd.org> wrote:
> Hi Ian,
> 
>> To recap my understanding of the mechanisms at work (glossing over the
>> queue handling and condvars involved etc), the bhyve block_if
>> infrastructure registers a callback for SIGCONT with the mevent
>> subsystem, which is a kevent/kqueue thing which delivers events to the
>> main thread (mevent_dispatch is the last thing in main()) it also sets
>> SIGCONT to SIG_IGN.
> 
> That's correct. The intent was to have the signal delivered via the kevent callback rather than standard signal delivery.
> 
>> When a disk controller device model wants to
>> cancel a block request (e.g. in ahci_port_stop) it calls
>> blockif_cancel which sends a SIGCONT to the blkio thread which has
>> claimed the request, notionally to kick it out of whatever blocking
>> system call it is in and cause it to return an error to the device
>> model.
> 
> Yep, that's correct.
> 
>> The main thing I do not follow is whether or not the blkio thread is
>> actually interrupted at all when the signal has been configured to be
>> delivered via the kevent/kqueue mechanisms to a 3rd unrelated thread.
> 
> It is interrupted on FreeBSD.
> 
>> I've dug around in the FreeBSD kevent and signal man pages but I
>> cannot find any part which describes anything of the semantics which
>> bhyve seems to be relying on (which seems to be that the system call
>> in the target thread will return EINTR at some point before the thread
>> which is "handling" the signal via kevent/kqueue sees that event).
>> 
>> Have I missed something here or is bhyve relying on some subtle
>> underlying semantics?
> 
> I didn't think it too FreeBSD-specific - if a thread is blocked in a system call, sending a signal should force it to exit on most Unices.
> 
>> I have a secondary concern which is what happens if the IO thread is
>> on its way to making a blocking system call in blockif_proc but has
>> not actually done so when the signal is delivered. It seems like it
>> would simply carry on and make the blocking call with perhaps
>> unexpected consequences (i/o getting wedged, perhaps only until a
>> second reset attempt). I've not actually seen this happening though
>> and there's a chance I'm simply over thinking things after staring at
>> them for so long!
> 
> I believe this case is handled - I discussed this at length with Tycho when the code was committed a while back.
> 
> Tycho - any thoughts ?
ahci_port_stop() is called under the protection the port soft-state lock so that will stem any further requests from landing in the blockif queue.  That’s the easy case.
As for blockif requests which are queued, those are simply completed.  The ones that are in-flight all have their status set to BST_BUSY when they are moved from the pending queue to the busy queue just prior to being sent to blockif_proc().  It’s therefore possible that an in-flight request (one on the busy list) has yet to call blockif_proc(), or is already inside blockif_proc() or has just completed blockif_proc().  In all cases however BST_BUSY is cleared in blockif_complete().  The key is therefore that regardless of where the thread is, blockif_cancel() will continue to issue pthread_kill() until the request reaches blockif_complete() — breaking it out of system calls as necessary.
Does that make sense?
Tycho
    
    
More information about the freebsd-virtualization
mailing list