Revision df9576def004d2cd5beedc00cb6e8901427634b9 authored by Yang Shi on 03 August 2019, 04:48:37 UTC, committed by Linus Torvalds on 03 August 2019, 14:02:00 UTC
When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:

  WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
  Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
  CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
  ...
   kmemleak_alloc+0x4e/0xb0
   kmem_cache_alloc+0x2a7/0x3e0
   mempool_alloc_slab+0x2d/0x40
   mempool_alloc+0x118/0x2b0
   bio_alloc_bioset+0x19d/0x350
   get_swap_bio+0x80/0x230
   __swap_writepage+0x5ff/0xb20

The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
has __GFP_NOFAIL set all the time due to d9570ee3bd1d4f2 ("kmemleak:
allow to coexist with fault injection").  But, it doesn't make any sense
to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
time.

According to the discussion on the mailing list, the commit should be
reverted for short term solution.  Catalin Marinas would follow up with
a better solution for longer term.

The failure rate of kmemleak metadata allocation may increase in some
circumstances, but this should be expected side effect.

Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: d9570ee3bd1d4f2 ("kmemleak: allow to coexist with fault injection")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 68d8681
Raw File
Kconfig
# SPDX-License-Identifier: GPL-2.0
#
# Block layer core configuration
#
menuconfig BLOCK
       bool "Enable the block layer" if EXPERT
       default y
       select SBITMAP
       select SRCU
       help
	 Provide block layer support for the kernel.

	 Disable this option to remove the block layer support from the
	 kernel. This may be useful for embedded devices.

	 If this option is disabled:

	   - block device files will become unusable
	   - some filesystems (such as ext3) will become unavailable.

	 Also, SCSI character devices and USB storage will be disabled since
	 they make use of various block layer definitions and facilities.

	 Say Y here unless you know you really don't want to mount disks and
	 suchlike.

if BLOCK

config BLK_SCSI_REQUEST
	bool

config BLK_DEV_BSG
	bool "Block layer SG support v4"
	default y
	select BLK_SCSI_REQUEST
	help
	  Saying Y here will enable generic SG (SCSI generic) v4 support
	  for any block device.

	  Unlike SG v3 (aka block/scsi_ioctl.c drivers/scsi/sg.c), SG v4
	  can handle complicated SCSI commands: tagged variable length cdbs
	  with bidirectional data transfers and generic request/response
	  protocols (e.g. Task Management Functions and SMP in Serial
	  Attached SCSI).

	  This option is required by recent UDEV versions to properly
	  access device serial numbers, etc.

	  If unsure, say Y.

config BLK_DEV_BSGLIB
	bool "Block layer SG support v4 helper lib"
	select BLK_DEV_BSG
	select BLK_SCSI_REQUEST
	help
	  Subsystems will normally enable this if needed. Users will not
	  normally need to manually enable this.

	  If unsure, say N.

config BLK_DEV_INTEGRITY
	bool "Block layer data integrity support"
	select CRC_T10DIF if BLK_DEV_INTEGRITY
	---help---
	Some storage devices allow extra information to be
	stored/retrieved to help protect the data.  The block layer
	data integrity option provides hooks which can be used by
	filesystems to ensure better data integrity.

	Say yes here if you have a storage device that provides the
	T10/SCSI Data Integrity Field or the T13/ATA External Path
	Protection.  If in doubt, say N.

config BLK_DEV_ZONED
	bool "Zoned block device support"
	select MQ_IOSCHED_DEADLINE
	---help---
	Block layer zoned block device support. This option enables
	support for ZAC/ZBC host-managed and host-aware zoned block devices.

	Say yes here if you have a ZAC or ZBC storage device.

config BLK_DEV_THROTTLING
	bool "Block layer bio throttling support"
	depends on BLK_CGROUP=y
	---help---
	Block layer bio throttling support. It can be used to limit
	the IO rate to a device. IO rate policies are per cgroup and
	one needs to mount and use blkio cgroup controller for creating
	cgroups and specifying per device IO rate policies.

	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.

config BLK_DEV_THROTTLING_LOW
	bool "Block throttling .low limit interface support (EXPERIMENTAL)"
	depends on BLK_DEV_THROTTLING
	---help---
	Add .low limit interface for block throttling. The low limit is a best
	effort limit to prioritize cgroups. Depending on the setting, the limit
	can be used to protect cgroups in terms of bandwidth/iops and better
	utilize disk resource.

	Note, this is an experimental interface and could be changed someday.

config BLK_CMDLINE_PARSER
	bool "Block device command line partition parser"
	---help---
	Enabling this option allows you to specify the partition layout from
	the kernel boot args.  This is typically of use for embedded devices
	which don't otherwise have any standardized method for listing the
	partitions on a block device.

	See Documentation/block/cmdline-partition.rst for more information.

config BLK_WBT
	bool "Enable support for block device writeback throttling"
	---help---
	Enabling this option enables the block layer to throttle buffered
	background writeback from the VM, making it more smooth and having
	less impact on foreground operations. The throttling is done
	dynamically on an algorithm loosely based on CoDel, factoring in
	the realtime performance of the disk.

config BLK_CGROUP_IOLATENCY
	bool "Enable support for latency based cgroup IO protection"
	depends on BLK_CGROUP=y
	---help---
	Enabling this option enables the .latency interface for IO throttling.
	The IO controller will attempt to maintain average IO latencies below
	the configured latency target, throttling anybody with a higher latency
	target than the victimized group.

	Note, this is an experimental interface and could be changed someday.

config BLK_WBT_MQ
	bool "Multiqueue writeback throttling"
	default y
	depends on BLK_WBT
	---help---
	Enable writeback throttling by default on multiqueue devices.
	Multiqueue currently doesn't have support for IO scheduling,
	enabling this option is recommended.

config BLK_DEBUG_FS
	bool "Block layer debugging information in debugfs"
	default y
	depends on DEBUG_FS
	---help---
	Include block layer debugging information in debugfs. This information
	is mostly useful for kernel developers, but it doesn't incur any cost
	at runtime.

	Unless you are building a kernel for a tiny system, you should
	say Y here.

config BLK_DEBUG_FS_ZONED
       bool
       default BLK_DEBUG_FS && BLK_DEV_ZONED

config BLK_SED_OPAL
	bool "Logic for interfacing with Opal enabled SEDs"
	---help---
	Builds Logic for interfacing with Opal enabled controllers.
	Enabling this option enables users to setup/unlock/lock
	Locking ranges for SED devices using the Opal protocol.

menu "Partition Types"

source "block/partitions/Kconfig"

endmenu

endif # BLOCK

config BLOCK_COMPAT
	bool
	depends on BLOCK && COMPAT
	default y

config BLK_MQ_PCI
	bool
	depends on BLOCK && PCI
	default y

config BLK_MQ_VIRTIO
	bool
	depends on BLOCK && VIRTIO
	default y

config BLK_MQ_RDMA
	bool
	depends on BLOCK && INFINIBAND
	default y

config BLK_PM
	def_bool BLOCK && PM

source "block/Kconfig.iosched"
back to top