svn commit: r277199 - in head/sys: fs/devfs kern

Sun Jan 18 11:07:40 UTC 2015

Hi Konstantin,

On 01/18/15 11:54, Konstantin Belousov wrote:
> On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote:
>> Hi Konstantin,
>>
>> On 01/16/15 09:03, Konstantin Belousov wrote:
>>> My opinion is that you should have tried to handle the issue at the driver
>>> level, instead of making this devfs issue.  I.e., if you already have
>>> cdev node with the correct name, driver should have replaced the private
>>> data to point to new device.
>>
>> I think this way cannot be implemented in a clean way, because of
>> locking order reversal. And if you try to avoid the LOR you end up with
>> the race. Chess mate sort of ;-)  Let me explain:
>>
>> Thread 1:
>> usb_sx_lock();
>>     cdev = Look in freelist for existing device();
>> else
>>     cdev = make_dev();
>> usb_sx_unlock();
>>
>> Thread 2:
>> usb_sx_lock();
>> put cdev on freelist
>> usb_sx_unlock();
>>
>> Thread 3:
>> usb_sx_lock();
>> cdev = remove first entry in freelist
>> usb_sx_unlock();
>>
>> /*
>>    * XXX because USB needs to call destroy_dev() unlocked we
>>    * are racing with Thread 1 again
>>    */
>> destroy_dev(cdev);
> I do not understand the 'freelist' part.
>
> You have some usb (or whatever other kind) device, which is represented by
> some data structure.  This data structure references cdev.  In reverse,
> cdev points to this data structure by si_drv1 or similar driver-owned
> member of struct cdev.
>
> The structures naming usb devices have some mapping to/from usb bus addresses.
> When device is destroyed or created due to external plug event, there is
> some mechanism that ensures mapping consistence.  It could be lock, it
> could be single-threaded process which discover the bus, or something
> else, I do not know.
>
> While the structure notes that device with some address goes out and comes
> in, it can look up the corresponding cdev and just change the si_drv1 pointer
> to point to the new device.

What about events for "devd" that a new character devices is present in 
the system or being destroyed for user-space applications?

>
> I find it very strange that you need sleepless operation which can
> _both_ destroy and create cdev. At least one operation of creation or
> destruction must sleep, at least in the current devfs design. It could
> be made sleepless, by making VFS visible part even less connected to the
> cdev part, but this is not how it (can) currently work.

I think it would be good practice that all resources needed for creating 
a character device can be allocated prior to registration. That means we 
first should allocate all resources needed, but not register as a single 
function.

For example:

Once make_dev() has returned, we can get an "d_open" callback. But 
"si_drv1/2" is always set by drivers _after_ that "make_dev()" has 
returned! This causes a race we could simply avoid by splitting the 
make_dev() like I suggest. Instead of putting more logic and code inside 
the drivers to handle the race!

>>
>>>
>>> This would also close a window where /dev node is non-existent or operate
>>> erronously.
>>
>> I'm not saying I plan so, but I think "cdevs" at some point need to
>> understand mutexes and locks. That means like with other API's in the
>> kernel we can associate a lock with the "cdev", and this lock is then
>> used to ensure an atomic shutdown of the system in a non-blocking
>> fashion. In my past experience multithreaded APIs should be high level
>> implemented like this:
>>
>> NON-BLOCKING methods:
>> lock(); **
>> xxx_start();
>> xxx_stop();
>> unlock();
>>
>> BLOCKING methods:
>> setup(); // init
>> unsetup(); // drain
>>
>> Any callbacks should always be called locked **
>>
>> In devfs there was no non-blocking stop before I added the delist_dev()
>> function.
>>
>>>
>>> You do not understand my point.
>>>
>>> I object against imposing one additional global reference on all cdevs
>>> just to cope with the delist hack.  See the patch at the end of the message.
>>
>> It's fine by me.
>>>
>>> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
>>> delist_dev(), the node still exists in the /dev. It is only removed
>>> after populate loop is run sometime later. dev_sched() KPI is inheritly
>>> racy, drivers must handle the races for other reasons.
>>
>> The populate loop is all running under the dev_lock() from what I can
>> see and make_dev() is also keeping the same lock when inserting and
>> removing new cdevs. The populate loop should always give a consistent
> No, populate loop is not run under dev_mtx.

Are you sure:

> static int
> devfs_populate_loop(struct devfs_mount *dm, int cleanup)
> {
>         struct cdev_priv *cdp;
>         struct devfs_dirent *de;
>         struct devfs_dirent *dd, *dt;
>         struct cdev *pdev;
>         int de_flags, depth, j;
>         char *q, *s;
>
>         sx_assert(&dm->dm_lock, SX_XLOCKED);
>         dev_lock();
           ^^^^ what is this ?
>         TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {

> It would not be able to
> allocate memory, even in M_NOWAIT fashion.  The dev_mtx is after the
> vm and vnode locks.  Only iteration over the cdevp_list is protected by
> dev_mtx, which is dropped right after something which require an
> action, is found on the list. Populate() needs some way to avoid
> reading inconsistent data from the cdev layer, but there is not
> guarantee that we are always up to date.
>
>> view of the character devices available, and I don't see how "cdev"
>> structures without the CDP_ACTIVE flag can appear with recently created
>> ones, even if the name is the same.
> It cannot simply because it is not synchronized with the device
> creation/destruction.

> Lookup (or any other VFS-level code) is not protected by dev_mtx, it
> is only dm lock which ensures that populate loop is not run in parallel,
> modifying direntries.

See comment above.

>
>>
>> That system functions can still call into the dangling read/write/ioctl
>> functions is another story, and that is why I tell, that in order to
>> simplify this teardown, we possibly should associate a client selectable
>> lock with each CDEV, for teardown purposes. Like done for several years
>> in the callout and USB APIs and possibly many other places.
> There is d_purge method which provides the tear-down semantic for
> drivers which needs it.  Only tty uses the method AFAIK.
>
> Devfs ensures that no client code is active when cdev is removed for real.

--HPS