Revision ec3937107ab43f3e8b2bc9dad95710043c462ff7 authored by Baoquan He on 04 April 2019, 02:03:13 UTC, committed by Borislav Petkov on 18 April 2019, 08:42:58 UTC
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:

  (actual system RAM size + padding size) vs (max system RAM size supported)

This calculation is wrong since commit

  b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").

In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.

Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.

Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.

   |____|_______actual RAM_______|_padding_|______the rest_______|
   0            64TB                                            ~120TB

Instead, it should be like this:

   |____|_______actual RAM_______|_________the rest______________|
   0            64TB                                            ~120TB

The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.

The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.

Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.

 [ bp: Massage commit message. ]

Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
1 parent a943245
Raw File
Kconfig.debug
config PAGE_EXTENSION
	bool "Extend memmap on extra space for more information on page"
	---help---
	  Extend memmap on extra space for more information on page. This
	  could be used for debugging features that need to insert extra
	  field for every page. This extension enables us to save memory
	  by not allocating this extra memory according to boottime
	  configuration.

config DEBUG_PAGEALLOC
	bool "Debug page memory allocations"
	depends on DEBUG_KERNEL
	depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
	select PAGE_EXTENSION
	select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
	---help---
	  Unmap pages from the kernel linear mapping after free_pages().
	  Depending on runtime enablement, this results in a small or large
	  slowdown, but helps to find certain types of memory corruption.

	  For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
	  fill the pages with poison patterns after free_pages() and verify
	  the patterns before alloc_pages().  Additionally,
	  this option cannot be enabled in combination with hibernation as
	  that would result in incorrect warnings of memory corruption after
	  a resume because free pages are not saved to the suspend image.

	  By default this option will have a small overhead, e.g. by not
	  allowing the kernel mapping to be backed by large pages on some
	  architectures. Even bigger overhead comes when the debugging is
	  enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
	  command line parameter.

config DEBUG_PAGEALLOC_ENABLE_DEFAULT
	bool "Enable debug page memory allocations by default?"
	default n
	depends on DEBUG_PAGEALLOC
	---help---
	  Enable debug page memory allocations by default? This value
	  can be overridden by debug_pagealloc=off|on.

config PAGE_OWNER
	bool "Track page owner"
	depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
	select DEBUG_FS
	select STACKTRACE
	select STACKDEPOT
	select PAGE_EXTENSION
	help
	  This keeps track of what call chain is the owner of a page, may
	  help to find bare alloc_page(s) leaks. Even if you include this
	  feature on your build, it is disabled in default. You should pass
	  "page_owner=on" to boot parameter in order to enable it. Eats
	  a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
	  for user-space helper.

	  If unsure, say N.

config PAGE_POISONING
	bool "Poison pages after freeing"
	select PAGE_POISONING_NO_SANITY if HIBERNATION
	---help---
	  Fill the pages with poison patterns after free_pages() and verify
	  the patterns before alloc_pages. The filling of the memory helps
	  reduce the risk of information leaks from freed data. This does
	  have a potential performance impact if enabled with the
	  "page_poison=1" kernel boot option.

	  Note that "poison" here is not the same thing as the "HWPoison"
	  for CONFIG_MEMORY_FAILURE. This is software poisoning only.

	  If unsure, say N

config PAGE_POISONING_NO_SANITY
	depends on PAGE_POISONING
	bool "Only poison, don't sanity check"
	---help---
	   Skip the sanity checking on alloc, only fill the pages with
	   poison on free. This reduces some of the overhead of the
	   poisoning feature.

	   If you are only interested in sanitization, say Y. Otherwise
	   say N.

config PAGE_POISONING_ZERO
	bool "Use zero for poisoning instead of debugging value"
	depends on PAGE_POISONING
	---help---
	   Instead of using the existing poison value, fill the pages with
	   zeros. This makes it harder to detect when errors are occurring
	   due to sanitization but the zeroing at free means that it is
	   no longer necessary to write zeros when GFP_ZERO is used on
	   allocation.

	   If unsure, say N

config DEBUG_PAGE_REF
	bool "Enable tracepoint to track down page reference manipulation"
	depends on DEBUG_KERNEL
	depends on TRACEPOINTS
	---help---
	  This is a feature to add tracepoint for tracking down page reference
	  manipulation. This tracking is useful to diagnose functional failure
	  due to migration failures caused by page reference mismatches.  Be
	  careful when enabling this feature because it adds about 30 KB to the
	  kernel code.  However the runtime performance overhead is virtually
	  nil until the tracepoints are actually enabled.

config DEBUG_RODATA_TEST
    bool "Testcase for the marking rodata read-only"
    depends on STRICT_KERNEL_RWX
    ---help---
      This option enables a testcase for the setting rodata read-only.
back to top