Revision f6ba488073fe8159851fe398cc3c5ee383bb4c7a authored by Vladimir Davydov on 18 August 2017, 22:16:08 UTC, committed by Linus Torvalds on 18 August 2017, 22:32:01 UTC
To avoid a possible deadlock, sysfs_slab_remove() schedules an asynchronous work to delete sysfs entries corresponding to the kmem cache. To ensure the cache isn't freed before the work function is called, it takes a reference to the cache kobject. The reference is supposed to be released by the work function. However, the work function (sysfs_slab_remove_workfn()) does nothing in case the cache sysfs entry has already been deleted, leaking the kobject and the corresponding cache. This may happen on a per memcg cache destruction, because sysfs entries of a per memcg cache are deleted on memcg offline if the cache is empty (see __kmemcg_cache_deactivate()). The kmemleak report looks like this: unreferenced object 0xffff9f798a79f540 (size 32): comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.554s) hex dump (first 32 bytes): 6b 6d 61 6c 6c 6f 63 2d 31 36 28 31 35 39 39 3a kmalloc-16(1599: 6e 65 77 72 6f 6f 74 29 00 23 6b c0 ff ff ff ff newroot).#k..... backtrace: kmemleak_alloc+0x4a/0xa0 __kmalloc_track_caller+0x148/0x2c0 kvasprintf+0x66/0xd0 kasprintf+0x49/0x70 memcg_create_kmem_cache+0xe6/0x160 memcg_kmem_cache_create_func+0x20/0x110 process_one_work+0x205/0x5d0 worker_thread+0x4e/0x3a0 kthread+0x109/0x140 ret_from_fork+0x2a/0x40 unreferenced object 0xffff9f79b6136840 (size 416): comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.573s) hex dump (first 32 bytes): 40 fb 80 c2 3e 33 00 00 00 00 00 40 00 00 00 00 @...>3.....@.... 00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00 ................ backtrace: kmemleak_alloc+0x4a/0xa0 kmem_cache_alloc+0x128/0x280 create_cache+0x3b/0x1e0 memcg_create_kmem_cache+0x118/0x160 memcg_kmem_cache_create_func+0x20/0x110 process_one_work+0x205/0x5d0 worker_thread+0x4e/0x3a0 kthread+0x109/0x140 ret_from_fork+0x2a/0x40 Fix the leak by adding the missing call to kobject_put() to sysfs_slab_remove_workfn(). Link: http://lkml.kernel.org/r/20170812181134.25027-1-vdavydov.dev@gmail.com Fixes: 3b7b314053d02 ("slub: make sysfs file removal asynchronous") Signed-off-by: Vladimir Davydov <vdavydov.dev@gmail.com> Reported-by: Andrei Vagin <avagin@gmail.com> Tested-by: Andrei Vagin <avagin@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> [4.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 3010f87
refcount.c
/*
* Variant of atomic_t specialized for reference counts.
*
* The interface matches the atomic_t interface (to aid in porting) but only
* provides the few functions one should use for reference counting.
*
* It differs in that the counter saturates at UINT_MAX and will not move once
* there. This avoids wrapping the counter and causing 'spurious'
* use-after-free issues.
*
* Memory ordering rules are slightly relaxed wrt regular atomic_t functions
* and provide only what is strictly required for refcounts.
*
* The increments are fully relaxed; these will not provide ordering. The
* rationale is that whatever is used to obtain the object we're increasing the
* reference count on will provide the ordering. For locked data structures,
* its the lock acquire, for RCU/lockless data structures its the dependent
* load.
*
* Do note that inc_not_zero() provides a control dependency which will order
* future stores against the inc, this ensures we'll never modify the object
* if we did not in fact acquire a reference.
*
* The decrements will provide release order, such that all the prior loads and
* stores will be issued before, it also provides a control dependency, which
* will order us against the subsequent free().
*
* The control dependency is against the load of the cmpxchg (ll/sc) that
* succeeded. This means the stores aren't fully ordered, but this is fine
* because the 1->0 transition indicates no concurrency.
*
* Note that the allocator is responsible for ordering things between free()
* and alloc().
*
*/
#include <linux/refcount.h>
#include <linux/bug.h>
#ifdef CONFIG_REFCOUNT_FULL
/**
* refcount_add_not_zero - add a value to a refcount unless it is 0
* @i: the value to add to the refcount
* @r: the refcount
*
* Will saturate at UINT_MAX and WARN.
*
* Provides no memory ordering, it is assumed the caller has guaranteed the
* object memory to be stable (RCU, etc.). It does provide a control dependency
* and thereby orders future stores. See the comment on top.
*
* Use of this function is not recommended for the normal reference counting
* use case in which references are taken and released one at a time. In these
* cases, refcount_inc(), or one of its variants, should instead be used to
* increment a reference count.
*
* Return: false if the passed refcount is 0, true otherwise
*/
bool refcount_add_not_zero(unsigned int i, refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
do {
if (!val)
return false;
if (unlikely(val == UINT_MAX))
return true;
new = val + i;
if (new < val)
new = UINT_MAX;
} while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
return true;
}
EXPORT_SYMBOL(refcount_add_not_zero);
/**
* refcount_add - add a value to a refcount
* @i: the value to add to the refcount
* @r: the refcount
*
* Similar to atomic_add(), but will saturate at UINT_MAX and WARN.
*
* Provides no memory ordering, it is assumed the caller has guaranteed the
* object memory to be stable (RCU, etc.). It does provide a control dependency
* and thereby orders future stores. See the comment on top.
*
* Use of this function is not recommended for the normal reference counting
* use case in which references are taken and released one at a time. In these
* cases, refcount_inc(), or one of its variants, should instead be used to
* increment a reference count.
*/
void refcount_add(unsigned int i, refcount_t *r)
{
WARN_ONCE(!refcount_add_not_zero(i, r), "refcount_t: addition on 0; use-after-free.\n");
}
EXPORT_SYMBOL(refcount_add);
/**
* refcount_inc_not_zero - increment a refcount unless it is 0
* @r: the refcount to increment
*
* Similar to atomic_inc_not_zero(), but will saturate at UINT_MAX and WARN.
*
* Provides no memory ordering, it is assumed the caller has guaranteed the
* object memory to be stable (RCU, etc.). It does provide a control dependency
* and thereby orders future stores. See the comment on top.
*
* Return: true if the increment was successful, false otherwise
*/
bool refcount_inc_not_zero(refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
do {
new = val + 1;
if (!val)
return false;
if (unlikely(!new))
return true;
} while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
return true;
}
EXPORT_SYMBOL(refcount_inc_not_zero);
/**
* refcount_inc - increment a refcount
* @r: the refcount to increment
*
* Similar to atomic_inc(), but will saturate at UINT_MAX and WARN.
*
* Provides no memory ordering, it is assumed the caller already has a
* reference on the object.
*
* Will WARN if the refcount is 0, as this represents a possible use-after-free
* condition.
*/
void refcount_inc(refcount_t *r)
{
WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
}
EXPORT_SYMBOL(refcount_inc);
/**
* refcount_sub_and_test - subtract from a refcount and test if it is 0
* @i: amount to subtract from the refcount
* @r: the refcount
*
* Similar to atomic_dec_and_test(), but it will WARN, return false and
* ultimately leak on underflow and will fail to decrement when saturated
* at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
* before, and provides a control dependency such that free() must come after.
* See the comment on top.
*
* Use of this function is not recommended for the normal reference counting
* use case in which references are taken and released one at a time. In these
* cases, refcount_dec(), or one of its variants, should instead be used to
* decrement a reference count.
*
* Return: true if the resulting refcount is 0, false otherwise
*/
bool refcount_sub_and_test(unsigned int i, refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
do {
if (unlikely(val == UINT_MAX))
return false;
new = val - i;
if (new > val) {
WARN_ONCE(new > val, "refcount_t: underflow; use-after-free.\n");
return false;
}
} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
return !new;
}
EXPORT_SYMBOL(refcount_sub_and_test);
/**
* refcount_dec_and_test - decrement a refcount and test if it is 0
* @r: the refcount
*
* Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
* decrement when saturated at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
* before, and provides a control dependency such that free() must come after.
* See the comment on top.
*
* Return: true if the resulting refcount is 0, false otherwise
*/
bool refcount_dec_and_test(refcount_t *r)
{
return refcount_sub_and_test(1, r);
}
EXPORT_SYMBOL(refcount_dec_and_test);
/**
* refcount_dec - decrement a refcount
* @r: the refcount
*
* Similar to atomic_dec(), it will WARN on underflow and fail to decrement
* when saturated at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
* before.
*/
void refcount_dec(refcount_t *r)
{
WARN_ONCE(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n");
}
EXPORT_SYMBOL(refcount_dec);
#endif /* CONFIG_REFCOUNT_FULL */
/**
* refcount_dec_if_one - decrement a refcount if it is 1
* @r: the refcount
*
* No atomic_t counterpart, it attempts a 1 -> 0 transition and returns the
* success thereof.
*
* Like all decrement operations, it provides release memory order and provides
* a control dependency.
*
* It can be used like a try-delete operator; this explicit case is provided
* and not cmpxchg in generic, because that would allow implementing unsafe
* operations.
*
* Return: true if the resulting refcount is 0, false otherwise
*/
bool refcount_dec_if_one(refcount_t *r)
{
int val = 1;
return atomic_try_cmpxchg_release(&r->refs, &val, 0);
}
EXPORT_SYMBOL(refcount_dec_if_one);
/**
* refcount_dec_not_one - decrement a refcount if it is not 1
* @r: the refcount
*
* No atomic_t counterpart, it decrements unless the value is 1, in which case
* it will return false.
*
* Was often done like: atomic_add_unless(&var, -1, 1)
*
* Return: true if the decrement operation was successful, false otherwise
*/
bool refcount_dec_not_one(refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
do {
if (unlikely(val == UINT_MAX))
return true;
if (val == 1)
return false;
new = val - 1;
if (new > val) {
WARN_ONCE(new > val, "refcount_t: underflow; use-after-free.\n");
return true;
}
} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
return true;
}
EXPORT_SYMBOL(refcount_dec_not_one);
/**
* refcount_dec_and_mutex_lock - return holding mutex if able to decrement
* refcount to 0
* @r: the refcount
* @lock: the mutex to be locked
*
* Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
* to decrement when saturated at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
* before, and provides a control dependency such that free() must come after.
* See the comment on top.
*
* Return: true and hold mutex if able to decrement refcount to 0, false
* otherwise
*/
bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
{
if (refcount_dec_not_one(r))
return false;
mutex_lock(lock);
if (!refcount_dec_and_test(r)) {
mutex_unlock(lock);
return false;
}
return true;
}
EXPORT_SYMBOL(refcount_dec_and_mutex_lock);
/**
* refcount_dec_and_lock - return holding spinlock if able to decrement
* refcount to 0
* @r: the refcount
* @lock: the spinlock to be locked
*
* Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
* decrement when saturated at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
* before, and provides a control dependency such that free() must come after.
* See the comment on top.
*
* Return: true and hold spinlock if able to decrement refcount to 0, false
* otherwise
*/
bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
{
if (refcount_dec_not_one(r))
return false;
spin_lock(lock);
if (!refcount_dec_and_test(r)) {
spin_unlock(lock);
return false;
}
return true;
}
EXPORT_SYMBOL(refcount_dec_and_lock);
Computing file changes ...