Revision - efad4e4 - mm, memory_hotplug: is_mem_section_removable do not pass the end [...]

Revision efad4e475c312456edb3c789d0996d12ed744c13 authored by Michal Hocko on 01 February 2019, 22:20:34 UTC, committed by Linus Torvalds on 01 February 2019, 23:46:23 UTC

mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1 parent 9bcdeb5

Files
Changes

Permalinks

flexible-arrays.txt

===================================
Using flexible arrays in the kernel
===================================

:Updated: Last updated for 2.6.32
:Author: Jonathan Corbet <corbet@lwn.net>

Large contiguous memory allocations can be unreliable in the Linux kernel.
Kernel programmers will sometimes respond to this problem by allocating
pages with vmalloc().  This solution not ideal, though.  On 32-bit systems,
memory from vmalloc() must be mapped into a relatively small address space;
it's easy to run out.  On SMP systems, the page table changes required by
vmalloc() allocations can require expensive cross-processor interrupts on
all CPUs.  And, on all systems, use of space in the vmalloc() range
increases pressure on the translation lookaside buffer (TLB), reducing the
performance of the system.

In many cases, the need for memory from vmalloc() can be eliminated by
piecing together an array from smaller parts; the flexible array library
exists to make this task easier.

A flexible array holds an arbitrary (within limits) number of fixed-sized
objects, accessed via an integer index.  Sparse arrays are handled
reasonably well.  Only single-page allocations are made, so memory
allocation failures should be relatively rare.  The down sides are that the
arrays cannot be indexed directly, individual object size cannot exceed the
system page size, and putting data into a flexible array requires a copy
operation.  It's also worth noting that flexible arrays do no internal
locking at all; if concurrent access to an array is possible, then the
caller must arrange for appropriate mutual exclusion.

The creation of a flexible array is done with::

    #include <linux/flex_array.h>

    struct flex_array *flex_array_alloc(int element_size,
					unsigned int total,
					gfp_t flags);

The individual object size is provided by element_size, while total is the
maximum number of objects which can be stored in the array.  The flags
argument is passed directly to the internal memory allocation calls.  With
the current code, using flags to ask for high memory is likely to lead to
notably unpleasant side effects.

It is also possible to define flexible arrays at compile time with::

    DEFINE_FLEX_ARRAY(name, element_size, total);

This macro will result in a definition of an array with the given name; the
element size and total will be checked for validity at compile time.

Storing data into a flexible array is accomplished with a call to::

    int flex_array_put(struct flex_array *array, unsigned int element_nr,
    		       void *src, gfp_t flags);

This call will copy the data from src into the array, in the position
indicated by element_nr (which must be less than the maximum specified when
the array was created).  If any memory allocations must be performed, flags
will be used.  The return value is zero on success, a negative error code
otherwise.

There might possibly be a need to store data into a flexible array while
running in some sort of atomic context; in this situation, sleeping in the
memory allocator would be a bad thing.  That can be avoided by using
GFP_ATOMIC for the flags value, but, often, there is a better way.  The
trick is to ensure that any needed memory allocations are done before
entering atomic context, using::

    int flex_array_prealloc(struct flex_array *array, unsigned int start,
			    unsigned int nr_elements, gfp_t flags);

This function will ensure that memory for the elements indexed in the range
defined by start and nr_elements has been allocated.  Thereafter, a
flex_array_put() call on an element in that range is guaranteed not to
block.

Getting data back out of the array is done with::

    void *flex_array_get(struct flex_array *fa, unsigned int element_nr);

The return value is a pointer to the data element, or NULL if that
particular element has never been allocated.

Note that it is possible to get back a valid pointer for an element which
has never been stored in the array.  Memory for array elements is allocated
one page at a time; a single allocation could provide memory for several
adjacent elements.  Flexible array elements are normally initialized to the
value FLEX_ARRAY_FREE (defined as 0x6c in <linux/poison.h>), so errors
involving that number probably result from use of unstored array entries.
Note that, if array elements are allocated with __GFP_ZERO, they will be
initialized to zero and this poisoning will not happen.

Individual elements in the array can be cleared with::

    int flex_array_clear(struct flex_array *array, unsigned int element_nr);

This function will set the given element to FLEX_ARRAY_FREE and return
zero.  If storage for the indicated element is not allocated for the array,
flex_array_clear() will return -EINVAL instead.  Note that clearing an
element does not release the storage associated with it; to reduce the
allocated size of an array, call::

    int flex_array_shrink(struct flex_array *array);

The return value will be the number of pages of memory actually freed.
This function works by scanning the array for pages containing nothing but
FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work
if the array's pages are allocated with __GFP_ZERO.

It is possible to remove all elements of an array with a call to::

    void flex_array_free_parts(struct flex_array *array);

This call frees all elements, but leaves the array itself in place.
Freeing the entire array is done with::

    void flex_array_free(struct flex_array *array);

As of this writing, there are no users of flexible arrays in the mainline
kernel.  The functions described here are also not exported to modules;
that will probably be fixed when somebody comes up with a need for it.

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...