https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
1e28eed Linux 5.12-rc3 14 March 2021, 21:41:02 UTC
c995f12 prctl: fix PR_SET_MM_AUXV kernel stack leak Doing a prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1); will copy 1 byte from userspace to (quite big) on-stack array and then stash everything to mm->saved_auxv. AT_NULL terminator will be inserted at the very end. /proc/*/auxv handler will find that AT_NULL terminator and copy original stack contents to userspace. This devious scheme requires CAP_SYS_RESOURCE. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 14 March 2021, 21:33:27 UTC
70404fe Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of irqchip updates: - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct - Add a missing DT compatible string for the Ingenic driver - Remove the pointless debugfs_file pointer from struct irqdomain" * tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/ingenic: Add support for the JZ4760 dt-bindings/irq: Add compatible string for the JZ4760B irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly irqdomain: Remove debugfs_file from struct irq_domain 14 March 2021, 20:33:33 UTC
802b31c Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix in for hrtimers to prevent an interrupt storm caused by the lack of reevaluation of the timers which expire in softirq context under certain circumstances, e.g. when the clock was set" * tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event() 14 March 2021, 20:29:38 UTC
c72cbc9 Merge tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A set of scheduler updates: - Prevent a NULL pointer dereference in the migration_stop_cpu() mechanims - Prevent self concurrency of affine_move_task() - Small fixes and cleanups related to task migration/affinity setting - Ensure that sync_runqueues_membarrier_state() is invoked on the current CPU when it is in the cpu mask" * tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/membarrier: fix missing local execution of ipi_sync_rq_state() sched: Simplify set_affinity_pending refcounts sched: Fix affine_move_task() self-concurrency sched: Optimize migration_cpu_stop() sched: Collate affine_move_task() stoppers sched: Simplify migration_cpu_stop() sched: Fix migration_cpu_stop() requeueing 14 March 2021, 20:27:06 UTC
19469d2 Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Thomas Gleixner: "A single objtool fix to handle the PUSHF/POPF validation correctly for the paravirt changes which modified arch_local_irq_restore not to use popf" * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool,x86: Fix uaccess PUSHF/POPF validation 14 March 2021, 20:15:55 UTC
fa509ff Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A couple of locking fixes: - A fix for the static_call mechanism so it handles unaligned addresses correctly. - Make u64_stats_init() a macro so every instance gets a seperate lockdep key. - Make seqcount_latch_init() a macro as well to preserve the static variable which is used for the lockdep key" * tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: seqlock,lockdep: Fix seqcount_latch_init() u64_stats,lockdep: Fix u64_stats_init() vs lockdep static_call: Fix the module key fixup 14 March 2021, 20:03:21 UTC
75013c6 Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure PMU internal buffers are flushed for per-CPU events too and properly handle PID/TID for large PEBS. - Handle the case properly when there's no PMU and therefore return an empty list of perf MSRs for VMX to switch instead of reading random garbage from the stack. * tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR perf/core: Flush PMU internal buffers for per-CPU events 14 March 2021, 19:57:17 UTC
836d7f0 Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Ard Biesheuvel via Borislav Petkov: "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was added v5.10, but failed to take the SetVirtualAddressMap() RT service into account" * tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table 14 March 2021, 19:54:56 UTC
0a7c10d Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A couple of SEV-ES fixes and robustifications: verify usermode stack pointer in NMI is not coming from the syscall gap, correctly track IRQ states in the #VC handler and access user insn bytes atomically in same handler as latter cannot sleep. - Balance 32-bit fast syscall exit path to do the proper work on exit and thus not confuse audit and ptrace frameworks. - Two fixes for the ORC unwinder going "off the rails" into KASAN redzones and when ORC data is missing. * tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __copy_from_user_inatomic() x86/sev-es: Correctly track IRQ states in runtime #VC handler x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack x86/sev-es: Introduce ip_within_syscall_gap() helper x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls x86/unwind/orc: Silence warnings caused by missing ORC data x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2 14 March 2021, 19:48:10 UTC
c3c7579 Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.12: - Fix wrong instruction encoding for lis in ppc_function_entry(), which could potentially lead to missed kprobes. - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of non-volatile GPRs immediately after exec. - Clean up a missed SRR specifier in the recent interrupt rework. - Don't treat unrecoverable_exception() as an interrupt handler, it's called from other handlers so shouldn't do the interrupt entry/exit accounting itself. - Fix build errors caused by missing declarations for [en/dis]able_kernel_vsx(). Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri Olsa, Naveen N. Rao, and Nicholas Piggin" * tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/traps: unrecoverable_exception() is not an interrupt handler powerpc: Fix missing declaration of [en/dis]able_kernel_vsx() powerpc/64s/exception: Clean up a missed SRR specifier powerpc: Fix inverted SET_FULL_REGS bitop powerpc/64s: Use symbolic macros for function entry encoding powerpc/64s: Fix instruction encoding for lis in ppc_function_entry() 14 March 2021, 19:37:43 UTC
9d0c8e7 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "More fixes for ARM and x86" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: LAPIC: Advancing the timer expiration on guest initiated write KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged kvm: x86: annotate RCU pointers KVM: arm64: Fix exclusive limit for IPA size KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM: arm64: Ensure I-cache isolation between vcpus of a same VM KVM: arm64: Don't use cbz/adr with external symbols KVM: arm64: Fix range alignment when walking page tables KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Fix nVHE hyp panic host context restore KVM: arm64: Avoid corrupting vCPU context register in guest exit KVM: arm64: nvhe: Save the SPE context early kvm: x86: use NULL instead of using plain integer as pointer KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' KVM: x86: Ensure deadline timer has truly expired before posting its IRQ 14 March 2021, 19:35:02 UTC
50eb842 Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "28 patches. Subsystems affected by this series: mm (memblock, pagealloc, hugetlb, highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and ia64" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits) zram: fix broken page writeback zram: fix return value on writeback_store mm/memcg: set memcg when splitting page mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls mm/userfaultfd: fix memory corruption due to writeprotect kasan: fix KASAN_STACK dependency for HW_TAGS kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC mm/madvise: replace ptrace attach requirement for process_madvise include/linux/sched/mm.h: use rcu_dereference in in_vfork() kfence: fix reports if constant function prefixes exist kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations kfence: fix printk format for ptrdiff_t linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* MAINTAINERS: exclude uapi directories in API/ABI section binfmt_misc: fix possible deadlock in bm_register_write mm/highmem.c: fix zero_user_segments() with start > end hugetlb: do early cow when page pinned on src mm mm: use is_cow_mapping() across tree where proper ... 14 March 2021, 19:23:34 UTC
b470ebc Merge tag 'irqchip-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - More compatible strings for the Ingenic irqchip (introducing the JZ4760B SoC) - Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform - Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip Kconfig, now relying on the architecture to get it right - Drop the debugfs_file field from struct irq_domain, now that debugfs can track things on its own 14 March 2021, 15:34:35 UTC
88fe492 Merge tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small misc/char driver fixes to resolve some reported problems: - habanalabs driver fixes - Acrn build fixes (reported many times) - pvpanic module table export fix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc/pvpanic: Export module FDT device table misc: fastrpc: restrict user apps from sending kernel RPC messages virt: acrn: Correct type casting of argument of copy_from_user() virt: acrn: Use EPOLLIN instead of POLLIN virt: acrn: Use vfs_poll() instead of f_op->poll() virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP habanalabs: fix debugfs address translation habanalabs: Disable file operations after device is removed habanalabs: Call put_pid() when releasing control device drivers: habanalabs: remove unused dentry pointer for debugfs files habanalabs: mark hl_eq_inc_ptr() as static 13 March 2021, 20:38:44 UTC
be61af3 Merge tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small staging driver fixes for reported problems. They include: - wfx header file cleanup patch reverted as it could cause problems - comedi driver endian fixes - buffer overflow problems for staging wifi drivers - build dependency issue for rtl8192e driver All have been in linux-next for a while with no reported problems" * tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits) Revert "staging: wfx: remove unused included header files" staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan() staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data() staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan() staging: comedi: pcl726: Use 16-bit 0 for interrupt data staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data staging: comedi: ni_6527: Use 16-bit 0 for interrupt data staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data staging: comedi: pcl818: Fix endian problem for AI command data staging: comedi: pcl711: Fix endian problem for AI command data staging: comedi: me4000: Fix endian problem for AI command data staging: comedi: dmm32at: Fix endian problem for AI command data staging: comedi: das800: Fix endian problem for AI command data staging: comedi: das6402: Fix endian problem for AI command data staging: comedi: adv_pci1710: Fix endian problem for AI command data staging: comedi: addi_apci_1500: Fix endian problem for command sample staging: comedi: addi_apci_1032: Fix endian problem for COS sample staging: ks7010: prevent buffer overflow in ks_wlan_set_scan() staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd ... 13 March 2021, 20:36:53 UTC
cc14086 Merge tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty and serial driver fixes to resolve some reported problems: - led tty trigger fixes based on review and were acked by the led maintainer - revert a max310x serial driver patch as it was causing problems - revert a pty change as it was also causing problems All of these have been in linux-next for a while with no reported problems" * tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "drivers:tty:pty: Fix a race causing data loss on close" Revert "serial: max310x: rework RX interrupt handling" leds: trigger/tty: Use led_set_brightness_sync() from workqueue leds: trigger: Fix error path to not unlock the unlocked mutex 13 March 2021, 20:34:29 UTC
5c7bdbf Merge tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch of reported issues: - usbip fixups for issues found by syzbot - xhci driver fixes and quirk additions - gadget driver fixes - dwc3 QCOM driver fix - usb-serial new ids and fixes - usblp fix for a long-time issue - cdc-acm quirk addition - other tiny fixes for reported problems All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits) xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing xhci: Improve detection of device initiated wake signal. usb: xhci: do not perform Soft Retry for some xHCI hosts usbip: fix vudc usbip_sockfd_store races leading to gpf usbip: fix vhci_hcd attach_store() races leading to gpf usbip: fix stub_dev usbip_sockfd_store() races leading to gpf usbip: fix vudc to check for stream socket usbip: fix vhci_hcd to check for stream socket usbip: fix stub_dev to check for stream socket usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement USB: usblp: fix a hang in poll() if disconnected USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe() usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM usb: dwc3: qcom: Honor wakeup enabled/disabled state usb: gadget: f_uac1: stop playback on function disable usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot USB: gadget: u_ether: Fix a configfs return code usb: dwc3: qcom: add ACPI device id for sc8180x Goodix Fingerprint device is not a modem ... 13 March 2021, 20:32:57 UTC
4206234 Merge tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Fix an urgent regression introduced by commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated vector number"), which could cause unexpected hung since linux 5.12-rc1. Resolve it by avoiding using bio->bi_max_vecs completely" * tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix bio->bi_max_vecs behavior change 13 March 2021, 20:26:22 UTC
e83bad7 Merge tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - avoid 'make image_name' invoking syncconfig - fix a couple of bugs in scripts/dummy-tools - fix LLD_VENDOR and locale issues in scripts/ld-version.sh - rebuild GCC plugins when the compiler is upgraded - allow LTO to be enabled with KASAN_HW_TAGS - allow LTO to be enabled without LLVM=1 * tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: fix ld-version.sh to not be affected by locale kbuild: remove meaningless parameter to $(call if_changed_rule,dtc) kbuild: remove LLVM=1 test from HAS_LTO_CLANG kbuild: remove unneeded -O option to dtc kbuild: dummy-tools: adjust to scripts/cc-version.sh kbuild: Allow LTO to be selected with KASAN_HW_TAGS kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc kbuild: rebuild GCC plugins when the compiler is upgraded kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR kbuild: dummy-tools: fix inverted tests for gcc kbuild: add image_name to no-sync-config-targets 13 March 2021, 20:18:59 UTC
2766f18 zram: fix broken page writeback commit 0d8359620d9b ("zram: support page writeback") introduced two problems. It overwrites writeback_store's return value as kstrtol's return value, which makes return value zero so user could see zero as return value of write syscall even though it wrote data successfully. It also breaks index value in the loop in that it doesn't increase the index any longer. It means it can write only first starting block index so user couldn't write all idle pages in the zram so lose memory saving chance. This patch fixes those issues. Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org Fixes: 0d8359620d9b("zram: support page writeback") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Amos Bianchi <amosbianchi@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
57e0076 zram: fix return value on writeback_store writeback_store's return value is overwritten by submit_bio_wait's return value. Thus, writeback_store will return zero since there was no IO error. In the end, write syscall from userspace will see the zero as return value, which could make the process stall to keep trying the write until it will succeed. Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
e1baddf mm/memcg: set memcg when splitting page As described in the split_page() comment, for the non-compound high order page, the sub-pages must be freed individually. If the memcg of the first page is valid, the tail pages cannot be uncharged when be freed. For example, when alloc_pages_exact is used to allocate 1MB continuous physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is set). When make_alloc_exact free the unused 1MB and free_pages_exact free the applied 1MB, actually, only 4KB(one page) is uncharged. Therefore, the memcg of the tail page needs to be set when splitting a page. Michel: There are at least two explicit users of __GFP_ACCOUNT with alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713 ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not just a theoretical issue. Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Weilong Chen <chenweilong@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
be6c898 mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass in page number argument. In this way, the interface name is more common and can be used by potential users. In addition, the complete info(memcg and flag) of the memcg needs to be set to the tail pages. Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Weilong Chen <chenweilong@huawei.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
61bf318 ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign In https://bugs.gentoo.org/769614 Dmitry noticed that `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly. The bug is in mismatch between get/set errors: static inline long syscall_get_error(struct task_struct *task, struct pt_regs *regs) { return regs->r10 == -1 ? regs->r8:0; } static inline long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs) { return regs->r8; } static inline void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs, int error, long val) { if (error) { /* error < 0, but ia64 uses > 0 return value */ regs->r8 = -error; regs->r10 = -1; } else { regs->r8 = val; regs->r10 = 0; } } Tested on v5.10 on rx3600 machine (ia64 9040 CPU). Link: https://lkml.kernel.org/r/20210221002554.333076-2-slyfox@gentoo.org Link: https://bugs.gentoo.org/769614 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reported-by: Dmitry V. Levin <ldv@altlinux.org> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
0ceb1ac ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls In https://bugs.gentoo.org/769614 Dmitry noticed that `ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via glibc's syscall() wrapper. ia64 has two ways to call syscalls from userspace: via `break` and via `eps` instructions. The difference is in stack layout: 1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8} 2. `break` uses userspace stack frame: may be locals (glibc provides one), in{0..7} == out{0..8}. Both work fine in syscall handling cde itself. But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to re-extract syscall arguments but it does not account for locals. The change always skips locals registers. It should not change `eps` path as kernel's handler already enforces locals=0 and fixes `break`. Tested on v5.10 on rx3600 machine (ia64 9040 CPU). Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org Link: https://bugs.gentoo.org/769614 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reported-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
6ce6442 mm/userfaultfd: fix memory corruption due to writeprotect Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit <namit@vmware.com> Suggested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
d9b571c kasan: fix KASAN_STACK dependency for HW_TAGS There's a runtime failure when running HW_TAGS-enabled kernel built with GCC on hardware that doesn't support MTE. GCC-built kernels always have CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't supported by HW_TAGS. Having that config enabled causes KASAN to issue MTE-only instructions to unpoison kernel stacks, which causes the failure. Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used. (The commit that introduced CONFIG_KASAN_HW_TAGS specified proper dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.) Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.1615215441.git.andreyknvl@google.com Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
f9d79e8 kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called after debug_pagealloc_unmap_pages(). This causes a crash when debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an unmapped page. This patch puts kasan_free_nondeferred_pages() before debug_pagealloc_unmap_pages() and arch_free_page(), which can also make the page unavailable. Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:31 UTC
96cfe2c mm/madvise: replace ptrace attach requirement for process_madvise process_madvise currently requires ptrace attach capability. PTRACE_MODE_ATTACH gives one process complete control over another process. It effectively removes the security boundary between the two processes (in one direction). Granting ptrace attach capability even to a system process is considered dangerous since it creates an attack surface. This severely limits the usage of this API. The operations process_madvise can perform do not affect the correctness of the operation of the target process; they only affect where the data is physically located (and therefore, how fast it can be accessed). What we want is the ability for one process to influence another process in order to optimize performance across the entire system while leaving the security boundary intact. Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process performance. Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
149fc78 include/linux/sched/mm.h: use rcu_dereference in in_vfork() Fix a sparse warning by using rcu_dereference(). Technically this is a bug and a sufficiently aggressive compiler could reload the `real_parent' pointer outside the protection of the rcu lock (and access freed memory), but I think it's pretty unlikely to happen. Link: https://lkml.kernel.org/r/20210221194207.1351703-1-willy@infradead.org Fixes: b18dc5f291c0 ("mm, oom: skip vforked tasks from being selected") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
0aa41ca kfence: fix reports if constant function prefixes exist Some architectures prefix all functions with a constant string ('.' on ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in <asm/kfence.h>, so that get_stack_skipnr() can work properly. Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f058@csgroup.eu Link: https://lkml.kernel.org/r/20210304144000.1148590-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
df3ae2c kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations cache_alloc_debugcheck_after() performs checks on an object, including adjusting the returned pointer. None of this should apply to KFENCE objects. While for non-bulk allocations, the checks are skipped when we allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after() is called via cache_alloc_debugcheck_after_bulk(). Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects. Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
702b16d kfence: fix printk format for ptrdiff_t Use %td for ptrdiff_t. Link: https://lkml.kernel.org/r/3abbe4c9-16ad-c168-a90f-087978ccd8f7@csgroup.eu Link: https://lkml.kernel.org/r/20210303121157.3430807-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Jann Horn <jannh@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
97e4910 linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Separating compiler-clang.h from compiler-gcc.h inadventently dropped the definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling back to the open-coded version and hoping that the compiler detects it. Since all versions of clang support the __builtin_bswap interfaces, add back the flags and have the headers pick these up automatically. This results in a 4% improvement of compilation speed for arm defconfig. Note: it might also be worth revisiting which architectures set CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is set on six architectures (arm32, csky, mips, powerpc, s390, x86), while another ten architectures define custom helpers (alpha, arc, ia64, m68k, mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300, hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized version and rely on the compiler to detect it. A long time ago, the compiler builtins were architecture specific, but nowadays, all compilers that are able to build the kernel have correct implementations of them, though some may not be as optimized as the inline asm versions. The patch that dropped the optimization landed in v4.19, so as discussed it would be fairly safe to backport this revert to stable kernels to the 4.19/5.4/5.10 stable kernels, but there is a remaining risk for regressions, and it has no known side-effects besides compile speed. Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/ Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Marco Elver <elver@google.com> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
f0b15b6 MAINTAINERS: exclude uapi directories in API/ABI section Commit 7b4693e644cb ("MAINTAINERS: add uapi directories to API/ABI section") added include/uapi/ and arch/*/include/uapi/ so that patches modifying them CC linux-api. However that was already done in the past and resulted in too much noise and thus later removed, as explained in b14fd334ff3d ("MAINTAINERS: trim the file triggers for ABI/API") To prevent another round of addition and removal in the future, change the entries to X: (explicit exclusion) for documentation purposes, although they are not subdirectories of broader included directories, as there is apparently no defined way to add plain comments in subsystem sections. Link: https://lkml.kernel.org/r/20210301100255.25229-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
e7850f4 binfmt_misc: fix possible deadlock in bm_register_write There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: Lior Ribak <liorribak@gmail.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
184cee5 mm/highmem.c: fix zero_user_segments() with start > end zero_user_segments() is used from __block_write_begin_int(), for example like the following zero_user_segments(page, 4096, 1024, 512, 918) But new the zero_user_segments() implementation for for HIGHMEM + TRANSPARENT_HUGEPAGE doesn't handle "start > end" case correctly, and hits BUG_ON(). (we can fix __block_write_begin_int() instead though, it is the old and multiple usage) Also it calls kmap_atomic() unnecessarily while start == end == 0. Link: https://lkml.kernel.org/r/87v9ab60r4.fsf@mail.parknet.co.jp Fixes: 0060ef3b4e6d ("mm: support THPs in zero_user_segments") Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
4eae4ef hugetlb: do early cow when page pinned on src mm This is the last missing piece of the COW-during-fork effort when there're pinned pages found. One can reference 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes", 2020-09-27) for more information, since we do similar things here rather than pte this time, but just for hugetlb. Note that after Jason's recent work on 57efa1fe5957 ("mm/gup: prevent gup_fast from racing with COW during fork", 2020-12-15) which is safer and easier to understand, we're safe now within the whole copy_page_range() against gup-fast, we don't need the wr-protect trick that proposed in 70e806e4e645 anymore. Link: https://lkml.kernel.org/r/20210217233547.93892-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gal Pressman <galpress@amazon.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Roland Scheidegger <sroland@vmware.com> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Cc: Wei Zhang <wzam@amazon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
ca6eb14 mm: use is_cow_mapping() across tree where proper After is_cow_mapping() is exported in mm.h, replace some manual checks elsewhere throughout the tree but start to use the new helper. Link: https://lkml.kernel.org/r/20210217233547.93892-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Cc: Roland Scheidegger <sroland@vmware.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gal Pressman <galpress@amazon.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Wei Zhang <wzam@amazon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
97a7e47 mm: introduce page_needs_cow_for_dma() for deciding whether cow We've got quite a few places (pte, pmd, pud) that explicitly checked against whether we should break the cow right now during fork(). It's easier to provide a helper, especially before we work the same thing on hugetlbfs. Since we'll reference is_cow_mapping() in mm.h, move it there too. Actually it suites mm.h more since internal.h is mm/ only, but mm.h is exported to the whole kernel. With that we should expect another patch to use is_cow_mapping() whenever we can across the kernel since we do use it quite a lot but it's always done with raw code against VM_* flags. Link: https://lkml.kernel.org/r/20210217233547.93892-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gal Pressman <galpress@amazon.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Roland Scheidegger <sroland@vmware.com> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Cc: Wei Zhang <wzam@amazon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
ca7e045 hugetlb: break earlier in add_reservation_in_range() when we can All the regions maintained in hugetlb reserved map is inclusive on "from" but exclusive on "to". We can break earlier even if rg->from==t because it already means no possible intersection. This does not need a Fixes in all cases because when it happens (rg->from==t) we'll not break out of the loop while we should, however the next thing we'd do is still add the last file_region we'd need and quit the loop in the next round. So this change is not a bugfix (since the old code should still run okay iiuc), but we'd better still touch it up to make it logically sane. Link: https://lkml.kernel.org/r/20210217233547.93892-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gal Pressman <galpress@amazon.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Roland Scheidegger <sroland@vmware.com> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Cc: Wei Zhang <wzam@amazon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
2103cf9 hugetlb: dedup the code to add a new file_region Patch series "mm/hugetlb: Early cow on fork, and a few cleanups", v5. As reported by Gal [1], we still miss the code clip to handle early cow for hugetlb case, which is true. Again, it still feels odd to fork() after using a few huge pages, especially if they're privately mapped to me.. However I do agree with Gal and Jason in that we should still have that since that'll complete the early cow on fork effort at least, and it'll still fix issues where buffers are not well under control and not easy to apply MADV_DONTFORK. The first two patches (1-2) are some cleanups I noticed when reading into the hugetlb reserve map code. I think it's good to have but they're not necessary for fixing the fork issue. The last two patches (3-4) are the real fix. I tested this with a fork() after some vfio-pci assignment, so I'm pretty sure the page copy path could trigger well (page will be accounted right after the fork()), but I didn't do data check since the card I assigned is some random nic. https://github.com/xzpeter/linux/tree/fork-cow-pin-huge [1] https://lore.kernel.org/lkml/27564187-4a08-f187-5a84-3df50009f6ca@amazon.com/ Introduce hugetlb_resv_map_add() helper to add a new file_region rather than duplication the similar code twice in add_reservation_in_range(). Link: https://lkml.kernel.org/r/20210217233547.93892-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210217233547.93892-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Gal Pressman <galpress@amazon.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Wei Zhang <wzam@amazon.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jann Horn <jannh@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Roland Scheidegger <sroland@vmware.com> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
82e69a1 mm/fork: clear PASID for new mm When a new mm is created, its PASID should be cleared, i.e. the PASID is initialized to its init state 0 on both ARM and X86. This patch was part of the series introducing mm->pasid, but got lost along the way [1]. It still makes sense to have it, because each address space has a different PASID. And the IOMMU code in iommu_sva_alloc_pasid() expects the pasid field of a new mm struct to be cleared. [1] https://lore.kernel.org/linux-iommu/YDgh53AcQHT+T3L0@otcwcpicx3.sc.intel.com/ Link: https://lkml.kernel.org/r/20210302103837.2562625-1-jean-philippe@linaro.org Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
0740a50 mm/page_alloc.c: refactor initialization of struct page for holes in memory layout There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") the holes inside a zone were re-initialized during memmap_init() and got their zone/node links right. However, after that commit nothing updates the struct pages representing such holes. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); in set_pfnblock_flags_mask() when called with a struct page from a range other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32 but the unset zone link in struct page makes them appear as a part of ZONE_DMA. Interleave initialization of the unavailable pages with the normal initialization of memory map, so that zone and node information will be properly set on struct pages that are not backed by the actual memory. With this change the pages for holes inside a zone will get proper zone/node links and the pages that are not spanned by any node will get links to the adjacent zone/node. The holes between nodes will be prepended to the zone/node above the hole and the trailing pages in the last section that will be appended to the zone/node below. [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64] Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Qian Cai <cai@lca.pw> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Łukasz Majczak <lma@semihalf.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
ea29b20 init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM I read the commit log of the following two: - bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML") - 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390") Both are talking about HAS_IOMEM dependency missing in many drivers. So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me. This does not change the behavior of UML. UML still cannot enable COMPILE_TEST because it does not provide HAS_IOMEM. The current dependency for S390 is too strong. Under the condition of CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST. I also removed the meaningless 'default n'. Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KP Singh <kpsingh@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Perret <qperret@google.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
cbf78d8 stop_machine: mark helpers __always_inline With clang-13, some functions only get partially inlined, with a specialized version referring to a global variable. This triggers a harmless build-time check for the intel-rng driver: WARNING: modpost: drivers/char/hw_random/intel-rng.o(.text+0xe): Section mismatch in reference from the function stop_machine() to the function .init.text:intel_rng_hw_init() The function stop_machine() references the function __init intel_rng_hw_init(). This is often because stop_machine lacks a __init annotation or the annotation of intel_rng_hw_init is wrong. In this instance, an easy workaround is to force the stop_machine() function to be inline, along with related interfaces that did not show the same behavior at the moment, but theoretically could. The combination of the two patches listed below triggers the behavior in clang-13, but individually these commits are correct. Link: https://lkml.kernel.org/r/20210225130153.1956990-1-arnd@kernel.org Fixes: fe5595c07400 ("stop_machine: Provide stop_machine_cpuslocked()") Fixes: ee527cd3a20c ("Use stop_machine_run in the Intel RNG driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:30 UTC
34dc2ef memblock: fix section mismatch warning The inlining logic in clang-13 is rewritten to often not inline some functions that were inlined by all earlier compilers. In case of the memblock interfaces, this exposed a harmless bug of a missing __init annotation: WARNING: modpost: vmlinux.o(.text+0x507c0a): Section mismatch in reference from the function memblock_bottom_up() to the variable .meminit.data:memblock The function memblock_bottom_up() references the variable __meminitdata memblock. This is often because memblock_bottom_up lacks a __meminitdata annotation or the annotation of memblock is wrong. Interestingly, these annotations were present originally, but got removed with the explanation that the __init annotation prevents the function from getting inlined. I checked this again and found that while this is the case with clang, gcc (version 7 through 10, did not test others) does inline the functions regardless. As the previous change was apparently intended to help the clang builds, reverting it to help the newer clang versions seems appropriate as well. gcc builds don't seem to care either way. Link: https://lkml.kernel.org/r/20210225133808.2188581-1-arnd@kernel.org Fixes: 5bdba520c1b3 ("mm: memblock: drop __init from memblock functions to make it inline") Reference: 2cfb3665e864 ("include/linux/memblock.h: add __init to memblock_set_bottom_up()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Aslan Bakirov <aslan@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 13 March 2021, 19:27:29 UTC
bcbcf50 kbuild: fix ld-version.sh to not be affected by locale ld-version.sh checks the output from $(LD) --version, but it has a problem on some locales. For example, in Italian: $ LC_MESSAGES=it_IT.UTF-8 ld --version | head -n 1 ld di GNU (GNU Binutils for Debian) 2.35.2 This makes ld-version.sh fail because it expects "GNU ld" for the BFD linker case. Add LC_ALL=C to override the user's locale. BTW, setting LC_MESSAGES=C (or LANG=C) is not enough because it is ineffective if LC_ALL is set on the user's environment. Link: https://bugzilla.kernel.org/show_bug.cgi?id=212105 Reported-by: Marco Scardovi Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Recensito-da: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> 13 March 2021, 02:12:13 UTC
f296bfd Merge tag 'nfs-for-5.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client bugfixes from Anna Schumaker: "These are mostly fixes for issues discovered at the recent NFS bakeathon: - Fix PNFS_FLEXFILE_LAYOUT kconfig so it is possible to build into the kernel - Correct size calculationn for create reply length - Set memalloc_nofs_save() for sync tasks to prevent deadlocks - Don't revalidate directory permissions on lookup failure - Don't clear inode cache when lookup fails - Change functions to use nfs_set_cache_invalid() for proper delegation handling - Fix return value of _nfs4_get_security_label() - Return an error when attempting to remove system.nfs4_acl" * tag 'nfs-for-5.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs: we don't support removing system.nfs4_acl NFSv4.2: fix return value of _nfs4_get_security_label() NFS: Fix open coded versions of nfs_set_cache_invalid() in NFSv4 NFS: Fix open coded versions of nfs_set_cache_invalid() NFS: Clean up function nfs_mark_dir_for_revalidate() NFS: Don't gratuitously clear the inode cache when lookup failed NFS: Don't revalidate the directory permissions on a lookup failure SUNRPC: Set memalloc_nofs_save() for sync tasks NFS: Correct size calculation for create reply length nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default 12 March 2021, 22:19:35 UTC
b6b8aa2 Merge branch 'for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "Removing the ambiguity broke userspace so this reverts the change" * 'for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") 12 March 2021, 21:58:04 UTC
9afc116 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Ten updates: one non code maintainer update for vmw_pvscsi, five code updates for ibmvfc and four for UFS. All are either trivial patches or bug fixes" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: MAINTAINERS: Update maintainer scsi: ufs: Convert sysfs sprintf/snprintf family to sysfs_emit scsi: ufs: Remove redundant checks of !hba in suspend/resume callbacks scsi: ufs: ufs-qcom: Disable interrupt in reset path scsi: ufs: Minor adjustments to error handling scsi: ibmvfc: Reinitialize sub-CRQs and perform channel enquiry after LPM scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup scsi: ibmvfc: Treat H_CLOSED as success during sub-CRQ registration scsi: ibmvfc: Fix invalid sub-CRQ handles after hard reset scsi: ibmvfc: Simplify handling of sub-CRQ initialization 12 March 2021, 21:37:18 UTC
3b0c2d3 Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") It turns out that there are in fact userspace implementations that care and this recent change caused a regression. https://github.com/containers/buildah/issues/3071 As the motivation for the original change was future development, and the impact is existing real world code just revert this change and allow the ambiguity in v3 file caps. Cc: stable@vger.kernel.org Fixes: 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> 12 March 2021, 21:27:14 UTC
ce30708 Merge tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Mostly just random fixes all over the map. The only odd-one-out change is finally getting the rename of BIO_MAX_PAGES to BIO_MAX_VECS done. This should've been done with the multipage bvec change, but it's been left. Do it now to avoid hassles around changes piling up for the next merge window. Summary: - NVMe pull request: - one more quirk (Dmitry Monakhov) - fix max_zone_append_sectors initialization (Chaitanya Kulkarni) - nvme-fc reset/create race fix (James Smart) - fix status code on aborts/resets (Hannes Reinecke) - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni) - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong) - Follow-up NVMe error fix for NULL 'id' (Christoph) - Fixup for the bd_size_lock being IRQ safe, now that the offending driver has been dropped (Damien). - rsxx probe failure error return (Jia-Ju) - umem probe failure error return (Wei) - s390/dasd unbind fixes (Stefan) - blk-cgroup stats summing fix (Xunlei) - zone reset handling fix (Damien) - Rename BIO_MAX_PAGES to BIO_MAX_VECS (Christoph) - Suppress uevent trigger for hidden devices (Daniel) - Fix handling of discard on busy device (Jan) - Fix stale cache issue with zone reset (Shin'ichiro)" * tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-block: nvme: fix the nsid value to print in nvme_validate_or_alloc_ns block: Discard page cache of zone reset target range block: Suppress uevent for hidden device when removed block: rename BIO_MAX_PAGES to BIO_MAX_VECS nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done nvme-core: check ctrl css before setting up zns nvme-fc: fix racing controller reset and create association nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange() nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request() nvme: simplify error logic in nvme_validate_ns() nvme: set max_zone_append_sectors nvme_revalidate_zones block: rsxx: fix error return code of rsxx_pci_probe() block: Fix REQ_OP_ZONE_RESET_ALL handling umem: fix error return code in mm_pci_probe() blk-cgroup: Fix the recursive blkg rwstat s390/dasd: fix hanging IO request during DASD driver unbind s390/dasd: fix hanging DASD driver unbind block: Try to handle busy underlying device on discard 12 March 2021, 21:25:49 UTC
9278be9 Merge tag 'io_uring-5.12-2021-03-12' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Not quite as small this week as I had hoped, but at least this should be the end of it. All the little known issues have been ironed out - most of it little stuff, but cancelations being the bigger part. Only minor tweaks and/or regular fixes expected beyond this point. - Fix the creds tracking for async (io-wq and SQPOLL) - Various SQPOLL fixes related to parking, sharing, forking, IOPOLL, completions, and life times. Much simpler now. - Make IO threads unfreezable by default, on account of a bug report that had them spinning on resume. Honestly not quite sure why thawing leaves us with a perpetual signal pending (causing the spin), but for now make them unfreezable like there were in 5.11 and prior. - Move personality_idr to xarray, solving a use-after-free related to removing an entry from the iterator callback. Buffer idr needs the same treatment. - Re-org around and task vs context tracking, enabling the fixing of cancelations, and then cancelation fixes on top. - Various little bits of cleanups and hardening, and removal of now dead parts" * tag 'io_uring-5.12-2021-03-12' of git://git.kernel.dk/linux-block: (34 commits) io_uring: fix OP_ASYNC_CANCEL across tasks io_uring: cancel sqpoll via task_work io_uring: prevent racy sqd->thread checks io_uring: remove useless ->startup completion io_uring: cancel deferred requests in try_cancel io_uring: perform IOPOLL reaping if canceler is thread itself io_uring: force creation of separate context for ATTACH_WQ and non-threads io_uring: remove indirect ctx into sqo injection io_uring: fix invalid ctx->sq_thread_idle kernel: make IO threads unfreezable by default io_uring: always wait for sqd exited when stopping SQPOLL thread io_uring: remove unneeded variable 'ret' io_uring: move all io_kiocb init early in io_init_req() io-wq: fix ref leak for req in case of exit cancelations io_uring: fix complete_post races for linked req io_uring: add io_disarm_next() helper io_uring: fix io_sq_offload_create error handling io-wq: remove unused 'user' member of io_wq io_uring: Convert personality_idr to XArray io_uring: clean R_DISABLED startup mess ... 12 March 2021, 21:13:57 UTC
2614100 Merge tag 'devprop-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fixes from Rafael Wysocki: "Prevent software nodes from being registered before their parents and fix a recent mistake causing already registered software nodes to be registered again in some cases (Heikki Krogerus)" * tag 'devprop-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: Fix device_add_software_node() software node: Fix node registration 12 March 2021, 21:09:29 UTC
3077f02 Merge tag 'pm-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an operating performance point (OPP) reference counting issue and three issues in ARM cpufreq drivers. Specifics: - Add a flag to mark OPPs that are not referenced by he OPP core any more to prevent OPPs from being freed prematurely by mistake (Beata Michalska). - Add ARM Vexpress platforms to the cpufreq-dt-platdev blacklist since the actual scaling of them is handled elsewhere (Sudeep Holla). - Fix a function return value check and a possible use-after-free in the qcom-hw cpufreq driver (Shawn Guo, Wei Yongjun)" * tag 'pm-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: opp: Don't drop extra references to OPPs accidentally cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init() cpufreq: qcom-hw: fix dereferencing freed memory 'data' 12 March 2021, 20:28:03 UTC
f4f9fc2 nvme: fix the nsid value to print in nvme_validate_or_alloc_ns ns can be NULL at this point, and my move of the check from the original patch by Chaitanya broke this. Fixes: 0ec84df4953b ("nvme-core: check ctrl css before setting up zns") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 20:17:45 UTC
3441783 Merge tag 'sound-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "No surprise here, only a collection of device-specific fixes for USB-audio and HD-audio at this time" * tag 'sound-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/hdmi: Cancel pending works before suspend ALSA: hda: Avoid spurious unsol event handling during S3/S4 ALSA: hda: Flush pending unsolicited events before suspend ALSA: usb-audio: fix use after free in usb_audio_disconnect ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support ALSA: hda: Drop the BATCH workaround for AMD controllers ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5 ALSA: usb-audio: Apply the control quirk to Plantronics headsets ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar ALSA: hda: ignore invalid NHLT table ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend() ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk 12 March 2021, 20:01:26 UTC
568099a Merge tag 'mmc-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix partition switch time for eMMC MMC host: - mmci: Enforce R1B response to fix busy detection for the stm32 variants - cqhci: Fix crash when removing mmc module/card" * tag 'mmc-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: cqhci: Fix random crash when remove mmc module/card mmc: core: Fix partition switch time for eMMC mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants 12 March 2021, 19:58:33 UTC
270c055 Merge tag 'regulator-fix-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A small collection fo driver specific fixes that have arrived since the merge window" * tag 'regulator-fix-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: mt6315: Fix off-by-one for .n_voltages regulator: rt4831: Fix return value check in rt4831_regulator_probe() regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting regulator: qcom-rpmh: Use correct buck for S1C regulator regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck regulator: pca9450: Fix return value when failing to get sd-vsel GPIO regulator: mt6315: Return REGULATOR_MODE_INVALID for invalid mode 12 March 2021, 19:55:58 UTC
8d9d53d Merge tag 'configfs-for-5.12' of git://git.infradead.org/users/hch/configfs Pull configfs fix from Christoph Hellwig: - fix a use-after-free in __configfs_open_file (Daiyue Zhang) * tag 'configfs-for-5.12' of git://git.infradead.org/users/hch/configfs: configfs: fix a use-after-free in __configfs_open_file 12 March 2021, 19:48:14 UTC
b77b5fd Merge tag 'gfs2-v5.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes" * tag 'gfs2-v5.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: bypass log flush if the journal is not live gfs2: bypass signal_our_withdraw if no journal gfs2: fix use-after-free in trans_drain gfs2: make function gfs2_make_fs_ro() to void type 12 March 2021, 19:46:09 UTC
17f8fc1 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a smattering of changes all over the place which we've acrued since -rc1. To my knowledge, there aren't any pending issues at the moment, but there's still plenty of time for something else to crop up... Summary: - Fix booting a 52-bit-VA-aware kernel on Qualcomm Amberwing - Fix pfn_valid() not to reject all ZONE_DEVICE memory - Fix memory tagging setup for hotplugged memory regions - Fix KASAN tagging in page_alloc() when DEBUG_VIRTUAL is enabled - Fix accidental truncation of CPU PMU event counters - Fix error code initialisation when failing probe of DMC620 PMU - Fix return value initialisation for sve-ptrace selftest - Drop broken support for CMDLINE_EXTEND" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: perf/arm_dmc620_pmu: Fix error return code in dmc620_pmu_device_probe() arm64: mm: remove unused __cpu_uses_extended_idmap[_level()] arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds arm64: perf: Fix 64-bit event counter read truncation arm64/mm: Fix __enable_mmu() for new TGRAN range values kselftest: arm64: Fix exit code of sve-ptrace arm64: mte: Map hotplugged memory as Normal Tagged arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL arm64/mm: Reorganize pfn_valid() arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER arm64/mm: Drop redundant ARCH_WANT_HUGE_PMD_SHARE arm64: Drop support for CMDLINE_EXTEND arm64: cpufeatures: Fix handling of CONFIG_CMDLINE for idreg overrides 12 March 2021, 19:39:53 UTC
6bf8819 Merge tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fix series and a single cleanup: - a small cleanup patch to remove unneeded symbol exports - a series to cleanup Xen grant handling (avoiding allocations in some cases, and using common defines for "invalid" values) - a series to address a race issue in Xen event channel handling" * tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Xen/gntdev: don't needlessly use kvcalloc() Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF} Xen/gntdev: don't needlessly allocate k{,un}map_ops[] Xen: drop exports of {set,clear}_foreign_p2m_mapping() xen/events: avoid handling the same event on two cpus at the same time xen/events: don't unmask an event channel when an eoi is pending xen/events: reset affinity of 2-level event when tearing it down 12 March 2021, 19:34:36 UTC
35737d2 KVM: LAPIC: Advancing the timer expiration on guest initiated write Advancing the timer expiration should only be necessary on guest initiated writes. When we cancel the timer and clear .pending during state restore, clear expired_tscdeadline as well. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 12 March 2021, 18:18:52 UTC
8df9f1a KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to REMOVED_SPTE when recursively zapping SPTEs as part of shadow page removal. The concurrent write protections provided by REMOVED_SPTE are not needed, there are no backing page side effects to record, and MMIO SPTEs can be left as is since they are protected by the memslot generation, not by ensuring that the MMIO SPTE is unreachable (which is racy with respect to lockless walks regardless of zapping behavior). Skipping !PRESENT drastically reduces the number of updates needed to tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could be skipped. Avoiding the write itself is likely close to a wash, but avoiding __handle_changed_spte() is a clear-cut win as that involves saving and restoring all non-volatile GPRs (it's a subtly big function), as well as several conditional branches before bailing out. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210310003029.1250571-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 12 March 2021, 18:18:52 UTC
d7eb79c KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-63 Off-line CPU(s) list: 64-87 # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall # echo 1 > /sys/devices/system/cpu/cpu76/online -bash: echo: write error: Cannot allocate memory The per-cpu vsyscall pvclock data pointer assigns either an element of the static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall and sometimes you end up with vsyscall disabled if the host does something strange. This patch fixes it by allocating this dynamically memory unconditionally even if vsyscall is disabled. Fixes: 6a1cac56f4 ("x86/kvm: Use __bss_decrypted attribute in shared variables") Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org#v4.19-rc5+ Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 12 March 2021, 18:18:16 UTC
6fcd9cb kvm: x86: annotate RCU pointers This patch adds the annotation to fix the following sparse errors: arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map * arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map * arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter * arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map * arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map * arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305191123.GA497469@LEGION> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 12 March 2021, 18:17:41 UTC
7180323 Merge branch 'pm-opp' * pm-opp: opp: Don't drop extra references to OPPs accidentally 12 March 2021, 17:47:22 UTC
bee7359 Merge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull an operating performance points (OPP) framework fix for 5.12 from Viresh Kumar: "Fix OPP refcount issue noticed by Beata." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: Don't drop extra references to OPPs accidentally 12 March 2021, 17:45:44 UTC
58f9937 io_uring: fix OP_ASYNC_CANCEL across tasks IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task. If it fails go over tctx_list and try it out for every single tctx. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 16:42:56 UTC
521d6a7 io_uring: cancel sqpoll via task_work 1) The first problem is io_uring_cancel_sqpoll() -> io_uring_cancel_task_requests() basically doing park(); park(); and so hanging. 2) Another one is more subtle, when the master task is doing cancellations, but SQPOLL task submits in-between the end of the cancellation but before finish() requests taking a ref to the ctx, and so eternally locking it up. 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and same io_uring_cancel_sqpoll() from the owner task, they race for tctx->wait events. And there probably more of them. Instead do SQPOLL cancellations from within SQPOLL task context via task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal park()/unpark() during cancellation, which is ugly, subtle and anyway doesn't allow to do io_run_task_work() properly. io_uring_cancel_sqpoll() is called only from SQPOLL task context and under sqd locking, so all parking is removed from there. And so, io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by SQPOLL task, and that spare us from some headache. Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll, which is not used anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 16:42:55 UTC
26984fb io_uring: prevent racy sqd->thread checks SQPOLL thread to which we're trying to attach may be going away, it's not nice but a more serious problem is if io_sq_offload_create() sees sqd->thread==NULL, and tries to init it with a new thread. There are tons of ways it can be exploited or fail. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 16:42:53 UTC
262b003 KVM: arm64: Fix exclusive limit for IPA size When registering a memslot, we check the size and location of that memslot against the IPA size to ensure that we can provide guest access to the whole of the memory. Unfortunately, this check rejects memslot that end-up at the exact limit of the addressing capability for a given IPA size. For example, it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit IPA space. Fix it by relaxing the check to accept a memslot reaching the limit of the IPA space. Fixes: c3058d5da222 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org 12 March 2021, 15:43:22 UTC
7d71755 KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM/arm64 has forever used a 40bit default IPA space, partially due to its 32bit heritage (where the only choice is 40bit). However, there are implementations in the wild that have a *cough* much smaller *cough* IPA space, which leads to a misprogramming of VTCR_EL2, and a guest that is stuck on its first memory access if userspace dares to ask for the default IPA setting (which most VMMs do). Instead, blundly reject the creation of such VM, as we can't satisfy the requirements from userspace (with a one-off warning). Also clarify the boot warning, and document that the VM creation will fail when an unsupported IPA size is provided. Although this is an ABI change, it doesn't really change much for userspace: - the guest couldn't run before this change, but no error was returned. At least userspace knows what is happening. - a memory slot that was accepted because it did fit the default IPA space now doesn't even get a chance to be registered. The other thing that is left doing is to convince userspace to actually use the IPA space setting instead of relying on the antiquated default. Fixes: 233a7cb23531 ("kvm: arm64: Allow tuning the physical address size for VM") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org 12 March 2021, 15:42:57 UTC
0efc497 gfs2: bypass log flush if the journal is not live Patch fe3e397668775 ("gfs2: Rework the log space allocation logic") changed gfs2_log_flush to reserve a set of journal blocks in case no transaction is active. However, gfs2_log_flush also gets called in cases where we don't have an active journal, for example, for spectator mounts. In that case, trying to reserve blocks would sleep forever, but we want gfs2_log_flush to be a no-op instead. Fixes: fe3e397668775 ("gfs2: Rework the log space allocation logic") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> 12 March 2021, 14:52:48 UTC
0df8ea6 io_uring: remove useless ->startup completion We always do complete(&sqd->startup) almost right after sqd->thread creation, either in the success path or in io_sq_thread_finish(). It's specifically created not started for us to be able to set some stuff like sqd->thread and io_uring_alloc_task_context() before following right after wake_up_new_task(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 14:23:01 UTC
e1915f7 io_uring: cancel deferred requests in try_cancel As io_uring_cancel_files() and others let SQO to run between io_uring_try_cancel_requests(), SQO may generate new deferred requests, so it's safer to try to cancel them in it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> 12 March 2021, 14:23:00 UTC
d4b64fd Merge tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme into block-5.12 Pull NVMe fixes from Christoph: "nvme fixes for 5.12: - one more quirk (Dmitry Monakhov) - fix max_zone_append_sectors initialization (Chaitanya Kulkarni) - nvme-fc reset/create race fix (James Smart) - fix status code on aborts/resets (Hannes Reinecke) - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni) - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)" * tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme: nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done nvme-core: check ctrl css before setting up zns nvme-fc: fix racing controller reset and create association nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange() nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request() nvme: simplify error logic in nvme_validate_ns() nvme: set max_zone_append_sectors nvme_revalidate_zones 12 March 2021, 14:21:15 UTC
d5bf630 gfs2: bypass signal_our_withdraw if no journal Before this patch, function signal_our_withdraw referenced the journal inode immediately. But corrupt file systems may have some invalid journals, in which case our attempt to read it in will withdraw and the resulting signal_our_withdraw would dereference the NULL value. This patch adds a check to signal_our_withdraw so that if the journal has not yet been initialized, it simply returns and does the old-style withdraw. Thanks, Andy Price, for his analysis. Reported-by: syzbot+50a8a9cf8127f2c6f5df@syzkaller.appspotmail.com Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> 12 March 2021, 13:55:23 UTC
c8e3866 perf/arm_dmc620_pmu: Fix error return code in dmc620_pmu_device_probe() Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: 53c218da220c ("driver/perf: Add PMU driver for the ARM DMC-620 memory controller") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20210312080421.277562-1-weiyongjun1@huawei.com Signed-off-by: Will Deacon <will@kernel.org> 12 March 2021, 11:30:31 UTC
ba08abc objtool,x86: Fix uaccess PUSHF/POPF validation Commit ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not use popf") replaced "push %reg; popf" with something like: "test $0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry that commit ea24213d8088 ("objtool: Add UACCESS validation") relies on. The result is: drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted Meanwhile, commit c9c324dc22aa ("objtool: Support stack layout changes in alternatives") makes that we can actually use stack-ops in alternatives, which means we can revert 1ff865e343c2 ("x86,smap: Fix smap_{save,restore}() alternatives"). That in turn means we can limit the PUSHF/POPF handling of ea24213d8088 to those instructions that are in alternatives. Fixes: ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not use popf") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net 12 March 2021, 08:15:49 UTC
606a5d4 opp: Don't drop extra references to OPPs accidentally We are required to call dev_pm_opp_put() from outside of the opp_table->lock as debugfs removal needs to happen lock-less to avoid circular dependency issues. commit cf1fac943c63 ("opp: Reduce the size of critical section in _opp_kref_release()") tried to fix that introducing a new routine _opp_get_next() which keeps returning OPPs that can be freed by the callers and this routine shall be called without holding the opp_table->lock. Though the commit overlooked the fact that the OPPs can be referenced by other users as well and this routine will end up dropping references which were taken by other users and hence freeing the OPPs prematurely. In effect, other users of the OPPs will end up having invalid pointers at hand. We didn't see any crash reports earlier as the exact situation never happened, though it is certainly possible. We need a way to mark which OPPs are no longer referenced by the OPP core, so we don't drop extra references to them accidentally. This commit adds another OPP flag, "removed", which is used to track this. And now we should never end up dropping extra references to the OPPs. Cc: v5.11+ <stable@vger.kernel.org> # v5.11+ Fixes: cf1fac943c63 ("opp: Reduce the size of critical section in _opp_kref_release()") Signed-off-by: Beata Michalska <beata.michalska@arm.com> [ Viresh: Almost rewrote entire patch, added new "removed" field, rewrote commit log and added the correct Fixes tag. ] Co-developed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> 12 March 2021, 03:56:52 UTC
f78d76e Merge tag 'drm-fixes-2021-03-12-1' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Regular fixes for rc3. The i915 pull was based on the rc1 tag so I just cherry-picked the single fix from there to avoid it. The misc and amd trees seem to be on okay bases. It's a bunch of fixes across the tree, amdgpu has most of them a few ttm fixes around qxl, and nouveau. core: - Clear holes when converting compat ioctl's between 32-bits and 64-bits. docs: - Use gitlab for drm bugzilla now. ttm: - Fix ttm page pool accounting. fbdev: - Fix oops in drm_fbdev_cleanup() shmem: - Assorted fixes for shmem helpers. qxl: - unpin qxl bos created as pinned when freeing them, and make ttm only warn once on this behavior. - Zero head.surface_id correctly in qxl. atyfb: - Use LCD management for atyfb on PPC_MAC. meson: - Shutdown kms poll helper in meson correctly. nouveau: - fix regression in bo syncing i915: - Wedge the GPU if command parser setup fails amdgpu: - Fix aux backlight control - Add a backlight override parameter - Various display fixes - PCIe DPM fix for vega - Polaris watermark fixes - Additional S0ix fix radeon: - Fix GEM regression - Fix AGP dependency handling" * tag 'drm-fixes-2021-03-12-1' of git://anongit.freedesktop.org/drm/drm: (33 commits) drm/nouveau: fix dma syncing for loops (v2) drm/i915: Wedge the GPU if command parser setup fails drm/compat: Clear bounce structures drm/shmem-helpers: vunmap: Don't put pages for dma-buf drm: meson_drv add shutdown function drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff drm/shmem-helper: Check for purged buffers in fault handler qxl: Fix uninitialised struct field head.surface_id drm/ttm: Fix TTM page pool accounting drm/ttm: soften TTM warnings drm: Use USB controller's DMA mask when importing dmabufs MAINTAINERS: update drm bug reporting URL fbdev: atyfb: use LCD management functions for PPC_PMAC also fbdev: atyfb: always declare aty_{ld,st}_lcd() drm/qxl: fix lockdep issue in qxl_alloc_release_reserved drm/qxl: unpin release objects drm/fb-helper: only unmap if buffer not null drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m drm/radeon: fix AGP dependency drm/radeon: also init GEM funcs in radeon_gem_prime_import_sg_table ... 12 March 2021, 01:38:49 UTC
4042160 drm/nouveau: fix dma syncing for loops (v2) The index variable should only be increased in one place. Noticed this while trying to track down another oops. v2: use while loop. Fixes: f295c8cfec83 ("drm/nouveau: fix dma syncing warning with debugging on.") Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210311043527.5376-1-airlied@gmail.com 12 March 2021, 01:21:47 UTC
a829f03 drm/i915: Wedge the GPU if command parser setup fails Commit 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing") introduced mandatory command parsing but setup failures were not translated into wedging the GPU which was probably the intent. Possible errors come in two categories. Either the sanity check on internal tables has failed, which should be caught in CI unless an affected platform would be missed in testing; or memory allocation failure happened during driver load, which should be extremely unlikely but for correctness should still be handled. v2: * Tidy coding style. (Chris) [airlied: cherry-picked to avoid rc1 base] Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing") Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210302114213.1102223-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit 5a1a659762d35a6dc51047c9127c011303c77b7f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> 12 March 2021, 01:20:50 UTC
fb19848 Merge tag 'amd-drm-fixes-5.12-2021-03-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.12-2021-03-10: amdgpu: - Fix aux backlight control - Add a backlight override parameter - Various display fixes - PCIe DPM fix for vega - Polaris watermark fixes - Additional S0ix fix radeon: - Fix GEM regression - Fix AGP dependency handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210310221141.3974-1-alexander.deucher@amd.com 12 March 2021, 01:20:02 UTC
e0da968 Merge tag 'drm-misc-fixes-2021-03-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for rc3, rebased on rc2: - Fix oops in drm_fbdev_cleanup() - unpin qxl bos created as pinned when freeing them, and make ttm only warn once on this behavior. - Use LCD management for atyfb on PPC_MAC. - Use gitlab for drm bugzilla now. - Fix ttm page pool accounting. - Zero head.surface_id correctly in qxl. - Assorted fixes for shmem helpers. - Shutdown kms poll helper in meson correctly. - Clear holes when converting compat ioctl's between 32-bits and 64-bits. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4606f08e-d0e8-c543-5e96-cee2fd728a41@linux.intel.com 12 March 2021, 00:30:36 UTC
0b73688 powerpc/traps: unrecoverable_exception() is not an interrupt handler unrecoverable_exception() is called from interrupt handlers or after an interrupt handler has failed. Make it a standard function to avoid doubling the actions performed on interrupt entry (e.g.: user time accounting). Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ae96c59fa2cb7f24a8929c58cfa2c909cb8ff1f1.1615291471.git.christophe.leroy@csgroup.eu 12 March 2021, 00:02:12 UTC
e511350 block: Discard page cache of zone reset target range When zone reset ioctl and data read race for a same zone on zoned block devices, the data read leaves stale page cache even though the zone reset ioctl zero clears all the zone data on the device. To avoid non-zero data read from the stale page cache after zone reset, discard page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce the helper function blkdev_truncate_zone_range() to discard the page cache. Ensure the page cache discarded by calling the helper function before and after zone reset in same manner as fallocate does. This patch can be applied back to the stable kernel version v5.10.y. Rework is needed for older stable kernels. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Cc: <stable@vger.kernel.org> # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> 11 March 2021, 18:49:25 UTC
9ec4914 block: Suppress uevent for hidden device when removed register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN but enables uevents at the end again in order to announce disk after possible partitions are created. When the device is removed the uevents are still on and user land sees 'remove' messages for devices which were never 'add'ed to the system. KERNEL[95481.571887] remove /devices/virtual/nvme-fabrics/ctl/nvme5/nvme0c5n1 (block) Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the uevents at all. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20210311151917.136091-1-dwagner@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> 11 March 2021, 18:48:25 UTC
28806e4 Merge tag 'media/v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A couple of fixes: - fix a build issue with CEC - fix a deadlock at usbtv driver - fix some null pointer address issues at vsp1 driver - fix a wrong bitmap setting at rkisp1 driver" * tag 'media/v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: rkisp1: params: fix wrong bits settings media: v4l: vsp1: Fix uif null pointer access media: v4l: vsp1: Fix bru null pointer access media: usbtv: Fix deadlock on suspend media: rc: compile rc-cec.c into rc-core 11 March 2021, 18:39:18 UTC
4f8be1f nfs: we don't support removing system.nfs4_acl The NFSv4 protocol doesn't have any notion of reomoving an attribute, so removexattr(path,"system.nfs4_acl") doesn't make sense. There's no documented return value. Arguably it could be EOPNOTSUPP but I'm a little worried an application might take that to mean that we don't support ACLs or xattrs. How about EINVAL? Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> 11 March 2021, 18:17:42 UTC
d052d1d io_uring: perform IOPOLL reaping if canceler is thread itself We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread, but if it's the thread itself invoking cancelations, then we still need to perform it or no one will. Fixes: 9936c7c2bc76 ("io_uring: deduplicate core cancellations sequence") Signed-off-by: Jens Axboe <axboe@kernel.dk> 11 March 2021, 17:49:20 UTC
5c2469e io_uring: force creation of separate context for ATTACH_WQ and non-threads Earlier kernels had SQPOLL threads that could share across anything, as we grabbed the context we needed on a per-ring basis. This is no longer the case, so only allow attaching directly if we're in the same thread group. That is the common use case. For non-group tasks, just setup a new context and thread as we would've done if sharing wasn't set. This isn't 100% ideal in terms of CPU utilization for the forked and share case, but hopefully that isn't much of a concern. If it is, there are plans in motion for how to improve that. Most importantly, we want to avoid app side regressions where sharing worked before and now doesn't. With this patch, functionality is equivalent to previous kernels that supported IORING_SETUP_ATTACH_WQ with SQPOLL. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> 11 March 2021, 17:17:56 UTC
a8affc0 block: rename BIO_MAX_PAGES to BIO_MAX_VECS Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> 11 March 2021, 14:47:48 UTC
d450293 regulator: mt6315: Fix off-by-one for .n_voltages The valid selector is 0 ~ 0xbf, so the .n_voltages should be 0xc0. Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20210311020558.579597-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org> 11 March 2021, 13:23:21 UTC
30b2675 arm64: mm: remove unused __cpu_uses_extended_idmap[_level()] These routines lost all existing users during the latest merge window so we can remove them. This avoids the need to fix them in the context of fixing a regression related to the ID map on 52-bit VA kernels. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310171515.416643-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org> 11 March 2021, 13:04:28 UTC
7ba8f2b arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds 52-bit VA kernels can run on hardware that is only 48-bit capable, but configure the ID map as 52-bit by default. This was not a problem until recently, because the special T0SZ value for a 52-bit VA space was never programmed into the TCR register anwyay, and because a 52-bit ID map happens to use the same number of translation levels as a 48-bit one. This behavior was changed by commit 1401bef703a4 ("arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ value for a 52-bit VA to be programmed into TCR_EL1. While some hardware simply ignores this, Mark reports that Amberwing systems choke on this, resulting in a broken boot. But even before that commit, the unsupported idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly as well. Given that we already have to deal with address spaces being either 48-bit or 52-bit in size, the cleanest approach seems to be to simply default to a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the kernel in DRAM requires it. This is guaranteed not to happen unless the system is actually 52-bit VA capable. Fixes: 90ec95cda91a ("arm64: mm: Introduce VA_BITS_MIN") Reported-by: Mark Salter <msalter@redhat.com> Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org> 11 March 2021, 13:04:28 UTC
back to top