Merge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "The xfs folks have been running into weird and very rare lockups for
  some time now.  I didn't think this could have been from workqueue
  side because no one else was reporting it.  This time, Eric had a
  kdump which we looked into and it turned out this actually was a
  workqueue bug and the bug has been there since the beginning of
  concurrency managed workqueue.

  A worker pool ensures forward progress of the workqueues associated
  with it by always having at least one worker reserved from executing
  work items.  When the pool is under contention, the idle one tries to
  create more workers for the pool and if that doesn't succeed quickly
  enough, it calls the rescuers to the pool.

  This logic had a subtle race condition in an early exit path.  When a
  worker invokes this manager function, the function may return %false
  indicating that the caller may proceed to executing work items either
  because another worker is already performing the role or conditions
  have changed and the pool is no longer under contention.

  The latter part depended on the assumption that whether more workers
  are necessary or not remains stable while the pool is locked; however,
  pool->nr_running (concurrency count) may change asynchronously and it
  getting bumped from zero asynchronously could send off the last idle
  worker to execute work items.

  The race window is fairly narrow, and, even when it gets triggered,
  the pool deadlocks iff if all work items get blocked on pending work
  items of the pool, which is highly unlikely but can be triggered by
  xfs.

  The patch removes the race window by removing the early exit path,
  which doesn't server any purpose anymore anyway"

* 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix subtle pool management issue which can stall whole worker_pool
This commit is contained in:
Linus Torvalds 2015-01-21 07:51:46 +12:00
commit d4b2d0061d

View File

@ -1841,17 +1841,11 @@ static void pool_mayday_timeout(unsigned long __pool)
* spin_lock_irq(pool->lock) which may be released and regrabbed
* multiple times. Does GFP_KERNEL allocations. Called only from
* manager.
*
* Return:
* %false if no action was taken and pool->lock stayed locked, %true
* otherwise.
*/
static bool maybe_create_worker(struct worker_pool *pool)
static void maybe_create_worker(struct worker_pool *pool)
__releases(&pool->lock)
__acquires(&pool->lock)
{
if (!need_to_create_worker(pool))
return false;
restart:
spin_unlock_irq(&pool->lock);
@ -1877,7 +1871,6 @@ restart:
*/
if (need_to_create_worker(pool))
goto restart;
return true;
}
/**
@ -1897,16 +1890,14 @@ restart:
* multiple times. Does GFP_KERNEL allocations.
*
* Return:
* %false if the pool don't need management and the caller can safely start
* processing works, %true indicates that the function released pool->lock
* and reacquired it to perform some management function and that the
* conditions that the caller verified while holding the lock before
* calling the function might no longer be true.
* %false if the pool doesn't need management and the caller can safely
* start processing works, %true if management function was performed and
* the conditions that the caller verified before calling the function may
* no longer be true.
*/
static bool manage_workers(struct worker *worker)
{
struct worker_pool *pool = worker->pool;
bool ret = false;
/*
* Anyone who successfully grabs manager_arb wins the arbitration
@ -1919,12 +1910,12 @@ static bool manage_workers(struct worker *worker)
* actual management, the pool may stall indefinitely.
*/
if (!mutex_trylock(&pool->manager_arb))
return ret;
return false;
ret |= maybe_create_worker(pool);
maybe_create_worker(pool);
mutex_unlock(&pool->manager_arb);
return ret;
return true;
}
/**