crash via vm_page_sleep_if_busy() and contigmalloc

Brian Fundakowski Feldman green at FreeBSD.org
Fri Jul 16 18:29:54 PDT 2004


On Fri, Jul 16, 2004 at 08:32:05PM -0400, Brian Fundakowski Feldman wrote:
> Anyone VM-y enough to be up to the task: please take a look at this
> current vm_contig.c code and the crash that I have.
> 
> This crash is not common -- this is the first time I've seen it --
> but the problem certainly doesn't seem unique.
> 
> What seems to happen is that vm_page_sleep_if_busy() is called from
> a place that expects that a page may go away, but it does not really
> realize this.  If it has to sleep because the page is busy, it will
> afterward happily dereference m->object which may now be NULL or belong
> to something else, and unlock its mutex (which may be locked).
> 
> It seems that this is a generic problem that needs to be solved by not
> dicking around with vm_object inside vm_page_sleep_if_busy(): pass it
> in locked all of the time, return it unlocked all of the time if the
> page queue mutex was relinquished.  Also, assumptions should be removed
> from other callers of vm_page_sleep_if_busy() such that they know the
> object may not exist after return, so if the page queue lock is gone
> then the object is gone and it must not reference it anymore.
> 
> Essentially every bit of code that calls vm_page_sleep_if_busy() without
> explicit knowledge of the backing object is in violation of this.  As
> such, I think callers need to either lock the vm_object in every case
> before locking the page queues, or if they hold the page queues' mutex,
> do a trylock before trying to call vm_page_sleep_if_busy(), and be able
> to handle both of the locks being relinquished on a return of TRUE.
> 
> Comments?

Actually, I think there's a hacky fix for vm_page_sleep_if_busy() that is
only just as hacky as the current vm_page_sleep_if_busy() implementation
itself.  Only if we locked the vm_object is the code that is going to
possibly dereference a NULL (freed) m->object going to be called, so if
we make sure the caller holds a reference to the object if it holds the lock,
which in most cases it does except for in contigmalloc() where it looks up
the object from the page instead and then locks it, caching m->object into
a local variable should provide the semantics to fix this problem.

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green at FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\


More information about the freebsd-hackers mailing list