https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
5838d03 xfs: reset child dir '..' entry when unlinking child While running xfs/168, I noticed a second source of post-shrink corruption errors causing shutdowns. Let's say that directory B has a low inode number and is a child of directory A, which has a high number. If B is empty but open, and unlinked from A, B's dotdot link continues to point to A. If A is then unlinked and the filesystem shrunk so that A is no longer a valid inode, a subsequent AIL push of B will trip the inode verifiers because the dotdot entry points outside of the filesystem. To avoid this problem, reset B's dotdot entry to the root directory when unlinking directories, since the root directory cannot be removed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> 15 July 2021, 16:58:42 UTC
da062d1 xfs: check for sparse inode clusters that cross new EOAG when shrinking While running xfs/168, I noticed occasional write verifier shutdowns involving inodes at the very end of the filesystem. Existing inode btree validation code checks that all inode clusters are fully contained within the filesystem. However, due to inadequate checking in the fs shrink code, it's possible that there could be a sparse inode cluster at the end of the filesystem where the upper inodes of the cluster are marked as holes and the corresponding blocks are free. In this case, the last blocks in the AG are listed in the bnobt. This enables the shrink to proceed but results in a filesystem that trips the inode verifiers. Fix this by disallowing the shrink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> 15 July 2021, 16:58:41 UTC
229adf3 iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor Now that we create those objects in iomap_writepage_map when needed, there's no need to pre-create them in iomap_page_mkwrite_actor anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> 15 July 2021, 16:58:06 UTC
637d337 iomap: Don't create iomap_page objects for inline files In iomap_readpage_actor, don't create iop objects for inline inodes. Otherwise, iomap_read_inline_data will set PageUptodate without setting iop->uptodate, and iomap_page_release will eventually complain. To prevent this kind of bug from occurring in the future, make sure the page doesn't have private data attached in iomap_read_inline_data. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> 15 July 2021, 16:58:05 UTC
8e1bcef iomap: Permit pages without an iop to enter writeback Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. We'll initially do that for pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> 15 July 2021, 16:58:05 UTC
49694d1 iomap: remove the length variable in iomap_seek_hole The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> 15 July 2021, 16:58:04 UTC
3ac1d42 iomap: remove the length variable in iomap_seek_data The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> 15 July 2021, 16:58:04 UTC
e6f85cb arm64: entry: fix KCOV suppression We suppress KCOV for entry.o rather than entry-common.o. As entry.o is built from entry.S, this is pointless, and permits instrumentation of entry-common.o, which is built from entry-common.c. Fix the Makefile to suppress KCOV for entry-common.o, as we had intended to begin with. I've verified with objdump that this is working as expected. Fixes: bf6fa2c0dda7 ("arm64: entry: don't instrument entry code with KCOV") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210715123049.9990-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> 15 July 2021, 16:37:55 UTC
31a7f0f arm64: entry: add missing noinstr We intend that all the early exception handling code is marked as `noinstr`, but we forgot this for __el0_error_handler_common(), which is called before we have completed entry from user mode. If it were instrumented, we could run into problems with RCU, lockdep, etc. Mark it as `noinstr` to prevent this. The few other functions in entry-common.c which do not have `noinstr` are called once we've completed entry, and are safe to instrument. Fixes: bb8e93a287a5 ("arm64: entry: convert SError handlers to C") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210714172801.16475-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> 15 July 2021, 16:36:51 UTC
59f4406 arm64: mte: fix restoration of GCR_EL1 from suspend Since commit: bad1e1c663e0a72f ("arm64: mte: switch GCR_EL1 in kernel entry and exit") we saved/restored the user GCR_EL1 value at exception boundaries, and update_gcr_el1_excl() is no longer used for this. However it is used to restore the kernel's GCR_EL1 value when returning from a suspend state. Thus, the comment is misleading (and an ISB is necessary). When restoring the kernel's GCR value, we need an ISB to ensure this is used by subsequent instructions. We don't necessarily get an ISB by other means (e.g. if the kernel is built without support for pointer authentication). As __cpu_setup() initialised GCR_EL1.Exclude to 0xffff, until a context synchronization event, allocation tag 0 may be used rather than the desired set of tags. This patch drops the misleading comment, adds the missing ISB, and for clarity folds update_gcr_el1_excl() into its only user. Fixes: bad1e1c663e0 ("arm64: mte: switch GCR_EL1 in kernel entry and exit") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210714143843.56537-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> 15 July 2021, 16:34:46 UTC
295cf15 arm64: Avoid premature usercopy failure Al reminds us that the usercopy API must only return complete failure if absolutely nothing could be copied. Currently, if userspace does something silly like giving us an unaligned pointer to Device memory, or a size which overruns MTE tag bounds, we may fail to honour that requirement when faulting on a multi-byte access even though a smaller access could have succeeded. Add a mitigation to the fixup routines to fall back to a single-byte copy if we faulted on a larger access before anything has been written to the destination, to guarantee making *some* forward progress. We needn't be too concerned about the overall performance since this should only occur when callers are doing something a bit dodgy in the first place. Particularly broken userspace might still be able to trick generic_perform_write() into an infinite loop by targeting write() at an mmap() of some read-only device register where the fault-in load succeeds but any store synchronously aborts such that copy_to_user() is genuinely unable to make progress, but, well, don't do that... CC: stable@vger.kernel.org Reported-by: Chen Huang <chenhuang5@huawei.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> 15 July 2021, 16:29:14 UTC
05d69d9 xen-blkfront: sanitize the removal state machine xen-blkfront has a weird protocol where close message from the remote side can be delayed, and where hot removals are treated somewhat differently from regular removals, all leading to potential NULL pointer removals, and a del_gendisk from the block device release method, which will deadlock. Fix this by just performing normal hot removals even when the device is opened like all other Linux block drivers. Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> 15 July 2021, 15:32:34 UTC
a347c15 Merge tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme into block-5.14 Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14 - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)" * tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme: nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE 15 July 2021, 15:31:36 UTC
16ad3db nbd: fix order of cleaning up the queue and freeing the tagset We must release the queue before freeing the tagset. Fixes: 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-and-tested-by: syzbot+9ca43ff47167c0ee3466@syzkaller.appspotmail.com Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706040016.1360412-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> 15 July 2021, 15:30:15 UTC
58b63e0 pd: fix order of cleaning up the queue and freeing the tagset We must release the queue before freeing the tagset. Fixes: 262d431f9000 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706010734.1356066-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> 15 July 2021, 15:29:22 UTC
f88321a dt-bindings: Move fixed string 'patternProperties' to 'properties' There's no need for fixed strings to be under 'patternProperties', so move them under 'properties' instead. Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Kishon Vijay Abraham I <kishon@ti.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: Saravanan Sekar <sravanhome@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Jagan Teki <jagan@amarulasolutions.com> Cc: Troy Kisky <troy.kisky@boundarydevices.com> Cc: linux-hwmon@vger.kernel.org Cc: linux-phy@lists.infradead.org Cc: linux-spi@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210713193514.690894-1-robh@kernel.org 15 July 2021, 14:45:33 UTC
e891726 dt-bindings: More dropping redundant minItems/maxItems Another round of removing redundant minItems/maxItems from new schema in the recent merge window. If a property has an 'items' list, then a 'minItems' or 'maxItems' with the same size as the list is redundant and can be dropped. Note that is DT schema specific behavior and not standard json-schema behavior. The tooling will fixup the final schema adding any unspecified minItems/maxItems. This condition is partially checked with the meta-schema already, but only if both 'minItems' and 'maxItems' are equal to the 'items' length. An improved meta-schema is pending. Cc: Stephen Boyd <sboyd@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Kamal Dasu <kdasu.kdev@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: linux-clk@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: linux-mtd@lists.infradead.org Cc: linux-rtc@vger.kernel.org Cc: linux-usb@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20210713193453.690290-1-robh@kernel.org 15 July 2021, 14:45:27 UTC
d951b22 KVM: selftests: smm_test: Test SMM enter from L2 Two additional tests are added: - SMM triggered from L2 does not currupt L1 host state. - Save/restore during SMM triggered from L2 does not corrupt guest/host state. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-7-vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:44 UTC
bb00bd9 KVM: nSVM: Restore nested control upon leaving SMM If the VM was migrated while in SMM, no nested state was saved/restored, and therefore svm_leave_smm has to load both save and control area of the vmcb12. Save area is already loaded from HSAVE area, so now load the control area as well from the vmcb12. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:44 UTC
37be407 KVM: nSVM: Fix L1 state corruption upon return from SMM VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") broke return from SMM when we entered there from guest (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem manifests itself like this: kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000 kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000 nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000 npt: on kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002 intercepts: fd44bfeb 0000217f 00000000 kvm_entry: vcpu 0, rip 0xffffffffffbbe119 kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info 200000006 1ab000 kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1 0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000 error_code 0x00000000 kvm_page_fault: address 1ab000 error_code 6 kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000 int_info 0 int_info_err 0 kvm_entry: vcpu 0, rip 0x7ffbb280 kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_inj_exception: #GP (0x0) Note: return to L2 succeeded but upon first exit to L1 its RIP points to 'RSM' instruction but we're not in SMM. The problem appears to be that VMCB01 gets irreversibly destroyed during SMM execution. Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from VMCB02. Pre-split (working) flow looked like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() restores L1's state from 'hsave' - SMM -> RSM - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have pre-SMM (and pre L2 VMRUN) L1's state there - L2's state is restored from SMRAM - upon first exit L1's state is restored from L1. This was always broken with regards to svm_get_nested_state()/ svm_set_nested_state(): 'hsave' was never a part of what's being save and restored so migration happening during SMM triggered from L2 would never restore L1's state correctly. Post-split flow (broken) looks like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() switches to VMCB01 from VMCB02 - SMM -> RSM - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01 is already lost. - L2's state is restored from SMRAM - upon first exit L1's state is restored from VMCB01 but it is corrupted (reflects the state during 'RSM' execution). VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest and host state so when we switch back to VMCS02 L1's state is intact there. To resolve the issue we need to save L1's state somewhere. We could've created a third VMCB for SMM but that would require us to modify saved state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA) seems appropriate: L0 is free to save any (or none) of L1's state there. Currently, KVM does 'none'. Note, for nested state migration to succeed, both source and destination hypervisors must have the fix. We, however, don't need to create a new flag indicating the fact that HSAVE area is now populated as migration during SMM triggered from L2 was always broken. Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:44 UTC
0a75829 KVM: nSVM: Introduce svm_copy_vmrun_state() Separate the code setting non-VMLOAD-VMSAVE state from svm_set_nested_state() into its own function. This is going to be re-used from svm_enter_smm()/svm_leave_smm(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:43 UTC
fb79f56 KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN APM states that "The address written to the VM_HSAVE_PA MSR, which holds the address of the page used to save the host state on a VMRUN, must point to a hypervisor-owned page. If this check fails, the WRMSR will fail with a #GP(0) exception. Note that a value of 0 is not considered valid for the VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will fail with a #GP(0) exception." svm_set_msr() already checks that the supplied address is valid, so only check for '0' is missing. Add it to nested_svm_vmrun(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-3-vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:43 UTC
fce7e15 KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when the supplied address is not page-aligned or is outside of "maximum supported physical address for this implementation". page_address_valid() check seems suitable. Also, forcefully page-align the address when it's written from VMM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [Add comment about behavior for host-provided values. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:43 UTC
c7a1b2b KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities Use IS_ERR() instead of checking for a NULL pointer when querying for sev_pin_memory() failures. sev_pin_memory() always returns an error code cast to a pointer, or a valid pointer; it never returns NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210506175826.2166383-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:43 UTC
b4a6939 KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails Return -EFAULT if copy_to_user() fails; if accessing user memory faults, copy_to_user() returns the number of bytes remaining, not an error code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210506175826.2166383-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:42 UTC
4b639a9 KVM: SVM: add module param to control the #SMI interception In theory there are no side effects of not intercepting #SMI, because then #SMI becomes transparent to the OS and the KVM. Plus an observation on recent Zen2 CPUs reveals that these CPUs ignore #SMI interception and never deliver #SMI VMexits. This is also useful to test nested KVM to see that L1 handles #SMIs correctly in case when L1 doesn't intercept #SMI. Finally the default remains the same, the SMI are intercepted by default thus this patch doesn't have any effect unless non default module param value is used. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:42 UTC
896707c KVM: SVM: remove INIT intercept handler Kernel never sends real INIT even to CPUs, other than on boot. Thus INIT interception is an error which should be caught by a check for an unknown VMexit reason. On top of that, the current INIT VM exit handler skips the current instruction which is wrong. That was added in commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"). Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:42 UTC
991afbb KVM: SVM: #SMI interception must not skip the instruction Commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"), unfortunately made a mistake of treating nop_on_interception and nop_interception in the same way. Former does truly nothing while the latter skips the instruction. SMI VM exit handler should do nothing. (SMI itself is handled by the host when we do STGI) Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:42 UTC
c0e1303 KVM: VMX: Remove vmx_msr_index from vmx.h vmx_msr_index was used to record the list of MSRs which can be lazily restored when kvm returns to userspace. It is now reimplemented as kvm_uret_msrs_list, a common x86 list which is only used inside x86.c. So just remove the obsolete declaration in vmx.h. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210707235702.31595-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:41 UTC
f85d401 KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() When the host is using debug registers but the guest is not using them nor is the guest in guest-debug state, the kvm code does not reset the host debug registers before kvm_x86->run(). Rather, it relies on the hardware vmentry instruction to automatically reset the dr7 registers which ensures that the host breakpoints do not affect the guest. This however violates the non-instrumentable nature around VM entry and exit; for example, when a host breakpoint is set on vcpu->arch.cr2, Another issue is consistency. When the guest debug registers are active, the host breakpoints are reset before kvm_x86->run(). But when the guest debug registers are inactive, the host breakpoints are delayed to be disabled. The host tracing tools may see different results depending on what the guest is doing. To fix the problems, we clear %db7 unconditionally before kvm_x86->run() if the host has set any breakpoints, no matter if the guest is using them or not. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org [Only clear %db7 instead of reloading all debug registers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:41 UTC
6f2f86e KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc Commit a75a895e6457 ("KVM: selftests: Unconditionally use memslot 0 for vaddr allocations") removed the memslot parameters from vm_vaddr_alloc. It addressed all callers except one under lib/aarch64/, due to a race with commit e3db7579ef35 ("KVM: selftests: Add exception handling support for aarch64") Fix the vm_vaddr_alloc call in lib/aarch64/processor.c. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Message-Id: <20210702201042.4036162-1-ricarkol@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:41 UTC
004d62e kvm: debugfs: fix memory leak in kvm_create_vm_debugfs In commit bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") loop for filling debugfs_stat_data was copy-pasted 2 times, but in the second loop pointers are saved over pointers allocated in the first loop. All this causes is a memory leak, fix it. Fixes: bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210701195500.27097-1-paskripkin@gmail.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 15 July 2021, 14:19:40 UTC
d549c66 dt-bindings: net: dsa: sja1105: Fix indentation warnings Some of the lines aren't properly indented, causing yamllint to warn about them: .../nxp,sja1105.yaml:70:17: [warning] wrong indentation: expected 18 but found 16 (indentation) Use the proper indentation to fix those warnings. Signed-off-by: Thierry Reding <treding@nvidia.com> Fixes: 070f5b701d559ae1 ("dt-bindings: net: dsa: sja1105: add SJA1110 bindings") Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20210622113327.3613595-1-thierry.reding@gmail.com Signed-off-by: Rob Herring <robh@kernel.org> 15 July 2021, 13:35:15 UTC
530c437 docs/zh_CN: add a missing space character "LinusTorvalds" is not pretty. Replace it with "Linus Torvalds". Signed-off-by: Hu Haowen <src.res@email.cn> Link: https://lore.kernel.org/r/20210620010444.24813-1-src.res@email.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net> 15 July 2021, 12:33:44 UTC
d3fb382 Documentation/features: Add THREAD_INFO_IN_TASK feature matrix Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/YN2nhV5F0hBVNPuX@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> 15 July 2021, 12:33:44 UTC
842f697 Documentation/features: Update the ARCH_HAS_TICK_BROADCAST entry Risc-V gained support recently. Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/YN2nqOVHgGDt4Iid@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> 15 July 2021, 12:33:44 UTC
21de80b LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" A couple of exotic quote characters came in with this license text; they can confuse software that is not expecting non-ASCII text. Switch to normal quotes here, with no changes to the actual license text. Reported-by: Rahul T R <r-ravikumar@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210703012931.30604-1-nm@ti.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> 15 July 2021, 12:31:24 UTC
4a5c155 MAINTAINERS: Add Suravee Suthikulpanit as Reviewer for AMD IOMMU (AMD-Vi) To help review changes related to AMD IOMMU. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/1626296542-30454-1-git-send-email-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> 15 July 2021, 07:00:16 UTC
775da83 drm/amdgpu: add another Renoir DID Add new PCI device id. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.11.x 14 July 2021, 19:08:55 UTC
83d1fc9 perf cs-etm: Split Coresight decode by aux records Populate the auxtrace queues using AUX records rather than whole auxtrace buffers so that the decoder is reset between each aux record. This is similar to the auxtrace_queues__process_index() -> auxtrace_queues__add_indexed_event() flow where perf_session__peek_event() is used to read AUXTRACE events out of random positions in the file based on the auxtrace index. But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer using the index, and add a fragment of that buffer to the auxtrace queues. No other changes to decoding were made, apart from populating the auxtrace queues. The result of decoding is identical to before, except in cases where decoding failed completely, due to not resetting the decoder. The reason for this change is because AUX records are emitted any time tracing is disabled, for example when the process is scheduled out. Because ETM was disabled and enabled again, the decoder also needs to be reset to force the search for a sync packet. Otherwise there would be fatal decoding errors. Testing ======= Testing was done with the following script, to diff the decoding results between the patched and un-patched versions of perf: #!/bin/bash set -ex $1 script -i $3 $4 > split.script $2 script -i $3 $4 > default.script diff split.script default.script | head -n 20 And it was run like this, with various itrace options depending on the quantity of synthesised events: compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns No changes in output were observed in the following scenarios: * Simple per-cpu perf record -e cs_etm/@tmc_etr0/u top * Per-thread, single thread perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C * Per-thread multiple threads (but only one thread collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-thread multiple threads (both threads collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-cpu explicit threads: perf record -e cs_etm/@tmc_etr0/u --pid 853,854 * System-wide (per-cpu): perf record -e cs_etm/@tmc_etr0/u -a * No data collected (no aux buffers) Can happen with any command when run for a short period * Containing truncated records Can happen with any command * Containing aux records with 0 size Can happen with any command * Snapshot mode (various files with and without buffer wrap) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Some differences were observed in the following scenario: * Snapshot mode (with duplicate buffers) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Fewer samples are generated in snapshot mode if duplicate buffers were gathered because buffers with the same offset are now only added once. This gives different, but more correct results and no duplicate data is decoded any more. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Branislav Rankov <branislav.rankov@arm.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 17:42:36 UTC
fa2c02e tools headers: Remove broken definition of __LITTLE_ENDIAN The linux/kconfig.h file was copied from the kernel but the line where with the generated/autoconf.h include from where the CONFIG_ entries would come from was deleted, as tools/ build system don't create that file, so we ended up always defining just __LITTLE_ENDIAN as CONFIG_CPU_BIG_ENDIAN was nowhere to be found. This in turn ended up breaking the build in some systems where __LITTLE_ENDIAN was already defined, such as the androind NDK. So just ditch that block that depends on the CONFIG_CPU_BIG_ENDIAN define. The kconfig.h file was copied just to get IS_ENABLED() and a 'make -C tools/all' doesn't breaks with this removal. Fixes: 93281c4a96572a34 ("x86/insn: Add an insn_decode() API") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/YO8hK7lqJcIWuBzx@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 17:39:36 UTC
8096acd Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ... 14 July 2021, 16:24:32 UTC
d1d488d fs: add vfs_parse_fs_param_source() helper Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks. Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 14 July 2021, 16:19:06 UTC
3b04627 cgroup: verify that source is a string The following sequence can be used to trigger a UAF: int fscontext_fd = fsopen("cgroup"); int fd_null = open("/dev/null, O_RDONLY); int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null); close_range(3, ~0U, 0); The cgroup v1 specific fs parser expects a string for the "source" parameter. However, it is perfectly legitimate to e.g. specify a file descriptor for the "source" parameter. The fs parser doesn't know what a filesystem allows there. So it's a bug to assume that "source" is always of type fs_value_is_string when it can reasonably also be fs_value_is_file. This assumption in the cgroup code causes a UAF because struct fs_parameter uses a union for the actual value. Access to that union is guarded by the param->type member. Since the cgroup paramter parser didn't check param->type but unconditionally moved param->string into fc->source a close on the fscontext_fd would trigger a UAF during put_fs_context() which frees fc->source thereby freeing the file stashed in param->file causing a UAF during a close of the fd_null. Fix this by verifying that param->type is actually a string and report an error if not. In follow up patches I'll add a new generic helper that can be used here and by other filesystems instead of this error-prone copy-pasta fix. But fixing it in here first makes backporting a it to stable a lot easier. Fixes: 8d2451f4994f ("cgroup1: switch to option-by-option parsing") Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@kernel.org> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 14 July 2021, 16:19:06 UTC
7234c36 KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM The AMD platform does not support the functions Ah CPUID leaf. The returned results for this entry should all remain zero just like the native does: AMD host: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000 (uncanny) AMD guest: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000 Fixes: cadbaa039b99 ("perf/x86/intel: Make anythread filter support conditional") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20210628074354.33848-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:56 UTC
23fa2e4 KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183 Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269 CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 4269: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 kmem_cache_alloc_trace include/linux/slab.h:450 [inline] kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146 kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 4269: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102 kvm_iodevice_destructor include/kvm/iodev.h:61 [inline] kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374 kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor() inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list and free the dev, but kvm_iodevice_destructor() is called again, it will lead the above issue. Let's check the the return value of kvm_io_bus_unregister_dev(), only call kvm_iodevice_destructor() if the return value is 0. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com> Cc: stable@vger.kernel.org Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:56 UTC
76ff371 KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for non-SEV guests, and for SEV guests the C-bit is dropped before the GPA hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM to mishandle #NPFs with that collide with the host's C-bit. Although the APM doesn't explicitly state that the C-bit is not reserved for non-SEV, Tom Lendacky confirmed that the following snippet about the effective reduction due to the C-bit does indeed apply only to SEV guests. Note that because guest physical addresses are always translated through the nested page tables, the size of the guest physical address space is not impacted by any physical address space reduction indicated in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however, the guest physical address space is effectively reduced by 1 bit. And for SEV guests, the APM clearly states that the bit is dropped before walking the nested page tables. If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Consequently, the hypervisor does not need to be aware of which pages the guest has chosen to mark private. Note, the bogus C-bit clearing was removed from legacy #PF handler in commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception"). Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address") Cc: Peter Gonda <pgonda@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210625020354.431829-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:56 UTC
fc9bf2e KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs Ignore "dynamic" host adjustments to the physical address mask when generating the masks for guest PTEs, i.e. the guest PA masks. The host physical address space and guest physical address space are two different beasts, e.g. even though SEV's C-bit is the same bit location for both host and guest, disabling SME in the host (which clears shadow_me_mask) does not affect the guest PTE->GPA "translation". For non-SEV guests, not dropping bits is the correct behavior. Assuming KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits that are lost as collateral damage from memory encryption are treated as reserved bits, i.e. KVM will never get to the point where it attempts to generate a gfn using the affected bits. And if userspace wants to create a bogus vCPU, then userspace gets to deal with the fallout of hardware doing odd things with bad GPAs. For SEV guests, not dropping the C-bit is technically wrong, but it's a moot point because KVM can't read SEV guest's page tables in any case since they're always encrypted. Not to mention that the current KVM code is also broken since sme_me_mask does not have to be non-zero for SEV to be supported by KVM. The proper fix would be to teach all of KVM to correctly handle guest private memory, but that's a task for the future. Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-5-seanjc@google.com> [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:56 UTC
e39f00f KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest version is only relevant to NPT/TDP). When using shadow paging, any reductions to the host's MAXPHYADDR apply to KVM and its guests as well, i.e. using the raw CPUID info will cause KVM to misreport the number of PA bits available to the guest. Unconditionally zero out the "Physical Address bit reduction" entry. For !TDP, the adjustment is already done, and for TDP enumerating the host's reduction is wrong as the reduction does not apply to GPAs. Fixes: 9af9b94068fb ("x86/cpu/AMD: Handle SME reduction in physical address size") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:55 UTC
4bf48e3 KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled Ignore the guest MAXPHYADDR reported by CPUID.0x8000_0008 if TDP, i.e. NPT, is disabled, and instead use the host's MAXPHYADDR. Per AMD'S APM: Maximum guest physical address size in bits. This number applies only to guests using nested paging. When this field is zero, refer to the PhysAddrSize field for the maximum guest physical address size. Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:55 UTC
f0414b0 Revert "KVM: x86: WARN and reject loading KVM if NX is supported but not enabled" Let KVM load if EFER.NX=0 even if NX is supported, the analysis and testing (or lack thereof) for the non-PAE host case was garbage. If the kernel won't be using PAE paging, .Ldefault_entry in head_32.S skips over the entire EFER sequence. Hopefully that can be changed in the future to allow KVM to require EFER.NX, but the motivation behind KVM's requirement isn't yet merged. Reverting and revisiting the mess at a later date is by far the safest approach. This reverts commit 8bbed95d2cb6e5de8a342d761a89b0a04faed7be. Fixes: 8bbed95d2cb6 ("KVM: x86: WARN and reject loading KVM if NX is supported but not enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210625001853.318148-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:17:55 UTC
f8f0eda KVM: selftests: x86: Address missing vm_install_exception_handler conversions Commit b78f4a59669 ("KVM: selftests: Rename vm_handle_exception") raced with a couple of new x86 tests, missing two vm_handle_exception to vm_install_exception_handler conversions. Help the two broken tests to catch up with the new world. Cc: Andrew Jones <drjones@redhat.com> CC: Ricardo Koller <ricarkol@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210701071928.2971053-1-maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 July 2021, 16:15:05 UTC
f3cf800 Merge tag 'kvm-s390-master-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: selftests: Fixes - provide memory model for IBM z196 and zEC12 - do not require 64GB of memory 14 July 2021, 16:14:27 UTC
b7eb335 Makefile: Enable -Wimplicit-fallthrough for Clang With the recent fixes for fallthrough warnings, it is now possible to enable -Wimplicit-fallthrough for Clang. It's important to mention that since we have adopted the use of the pseudo-keyword macro fallthrough; we also want to avoid having more /* fall through */ comments being introduced. Notice that contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we have to use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 14 July 2021, 16:12:21 UTC
104aba8 powerpc/smp: Fix fall-through warning for Clang Fix the following fallthrough warning: arch/powerpc/platforms/powermac/smp.c:149:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60ef0750.I8J+C6KAtb0xVOAa%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 14 July 2021, 16:10:40 UTC
d08c84e perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) In fedora rawhide the PTHREAD_STACK_MIN define may end up expanded to a sysconf() call, and that will return 'long int', breaking the build: 45 fedora:rawhide : FAIL gcc version 11.1.1 20210623 (Red Hat 11.1.1-6) (GCC) builtin-sched.c: In function 'create_tasks': /git/perf-5.14.0-rc1/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror] 43 | (void) (&_max1 == &_max2); \ | ^~ builtin-sched.c:673:34: note: in expansion of macro 'max' 673 | (size_t) max(16 * 1024, PTHREAD_STACK_MIN)); | ^~~ cc1: all warnings being treated as errors $ grep __sysconf /usr/include/*/*.h /usr/include/bits/pthread_stack_min-dynamic.h:extern long int __sysconf (int __name) __THROW; /usr/include/bits/pthread_stack_min-dynamic.h:# define PTHREAD_STACK_MIN __sysconf (__SC_THREAD_STACK_MIN_VALUE) /usr/include/bits/time.h:extern long int __sysconf (int); /usr/include/bits/time.h:# define CLK_TCK ((__clock_t) __sysconf (2)) /* 2 is _SC_CLK_TCK */ $ So cast it to int to cope with that. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 16:06:38 UTC
afbd0d2 dmaengine: mpc512x: Fix fall-through warning for Clang Fix the following fallthrough warning (powerpc-randconfig): drivers/dma/mpc512x_dma.c:816:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60ef0750.I8J+C6KAtb0xVOAa%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 14 July 2021, 16:05:55 UTC
14158aa usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang Fix the following fallthrough warning (powerpc-randconfig): drivers/usb/gadget/udc/fsl_qe_udc.c:589:4: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60ef0750.I8J+C6KAtb0xVOAa%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 14 July 2021, 16:02:37 UTC
9e5c772 drm/ttm: add a check against null pointer dereference When calling ttm_range_man_fini(), 'man' may be uninitialized, which may cause a null pointer dereference bug. Fix this by checking if it is a null pointer. This log reveals it: [ 7.902580 ] BUG: kernel NULL pointer dereference, address: 0000000000000058 [ 7.905721 ] RIP: 0010:ttm_range_man_fini+0x40/0x160 [ 7.911826 ] Call Trace: [ 7.911826 ] radeon_ttm_fini+0x167/0x210 [ 7.911826 ] radeon_bo_fini+0x15/0x40 [ 7.913767 ] rs400_fini+0x55/0x80 [ 7.914358 ] radeon_device_fini+0x3c/0x140 [ 7.914358 ] radeon_driver_unload_kms+0x5c/0xe0 [ 7.914358 ] radeon_driver_load_kms+0x13a/0x200 [ 7.914358 ] ? radeon_driver_unload_kms+0xe0/0xe0 [ 7.914358 ] drm_dev_register+0x1db/0x290 [ 7.914358 ] radeon_pci_probe+0x16a/0x230 [ 7.914358 ] local_pci_probe+0x4a/0xb0 Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1626274459-8148-1-git-send-email-zheyuma97@gmail.com Signed-off-by: Christian König <christian.koenig@amd.com> 14 July 2021, 15:16:16 UTC
c9c9c68 cifs: fix the out of range assignment to bit fields in parse_server_interfaces Because the out of range assignment to bit fields are compiler-dependant, the fields could have wrong value. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> 14 July 2021, 15:06:33 UTC
50630b3 cifs: Do not use the original cruid when following DFS links for multiuser mounts Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213565 cruid should only be used for the initial mount and after this we should use the current users credentials. Ignore the original cruid mount argument when creating a new context for a multiuser mount following a DFS link. Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Cc: stable@vger.kernel.org # 5.11+ Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> 14 July 2021, 15:06:33 UTC
506c1da cifs: use the expiry output of dns_query to schedule next resolution We recently fixed DNS resolution of the server hostname during reconnect. However, server IP address may change, even when the old one continues to server (although sub-optimally). We should schedule the next DNS resolution based on the TTL of the DNS record used for the last resolution. This way, we resolve the server hostname again when a DNS record expires. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: <stable@vger.kernel.org> # v5.11+ Signed-off-by: Steve French <stfrench@microsoft.com> 14 July 2021, 15:06:03 UTC
50e9892 libperf: Fix build error with LIBPFM4=1 Fix build error with LIBPFM4=1: CC util/pfm.o util/pfm.c: In function ‘parse_libpfm_events_option’: util/pfm.c:102:30: error: ‘struct evsel’ has no member named ‘leader’ 102 | evsel->leader = grp_leader; | ^~ Committer notes: There is this entry in 'make -C tools/perf build-test' to test the build with libpfm: $ grep libpfm tools/perf/tests/make make_with_libpfm4 := LIBPFM4=1 run += make_with_libpfm4 $ But the test machine lacked libpfm-devel, now its installed and further cases like this shouldn't happen. Committer testing: Before this patch this fails, after applying it: $ make -C tools/perf build-test make: Entering directory '/var/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_static: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 -j24 DESTDIR=/tmp/tmp.KzFSfvGRQa <SNIP> make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_libpfm4_O: make LIBPFM4=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_auxtrace_O: make NO_AUXTRACE=1 <SNIP> $ rpm -q libpfm-devel libpfm-devel-4.11.0-4.fc34.x86_64 $ FIXME: This shows a need for 'build-test' to bail out when a build option is specified that has no required library devel files installed. Fixes: fba7c86601e2e42d ("libperf: Move 'leader' from tools/perf to perf_evsel::leader") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210713091907.1555560-1-hca@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
376a947 tools headers UAPI: Sync files changed by the memfd_secret new syscall To pick the changes in this cset: 7bb7f2ac24a028b2 ("arch, mm: wire up memfd_secret system call where relevant") That silences these perf build warnings and add support for those new syscalls in tools such as 'perf trace'. For instance, this is now possible: # perf trace -v -e memfd_secret event qualifier tracepoint filter: (common_pid != 13375 && common_pid != 3713) && (id == 447) ^C# That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ grep memfd_secret tools/perf/arch/x86/entry/syscalls/syscall_64.tbl 447 common memfd_secret sys_memfd_secret $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/unistd.h' differs from latest version at 'arch/arm64/include/uapi/asm/unistd.h' diff -u tools/arch/arm64/include/uapi/asm/unistd.h arch/arm64/include/uapi/asm/unistd.h Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
e0a7ef2 perf stat: Merge uncore events by default for hybrid platform On a hybrid platform, by default 'perf stat' aggregates and reports the event counts per PMU. For example, # perf stat -e cycles -a true Performance counter stats for 'system wide': 1,400,445 cpu_core/cycles/ 680,881 cpu_atom/cycles/ 0.001770773 seconds time elapsed But for uncore events that's not a suitable method. Uncore has nothing to do with hybrid. So for uncore events, we aggregate event counts from all PMUs and report the counts without PMUs. Before: # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true Performance counter stats for 'system wide': 2,058 uncore_arb_0/event=0x81,umask=0x1/ 2,028 uncore_arb_1/event=0x81,umask=0x1/ 0 uncore_arb_0/event=0x84,umask=0x1/ 0 uncore_arb_1/event=0x84,umask=0x1/ 0.000614498 seconds time elapsed After: # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true Performance counter stats for 'system wide': 3,996 arb/event=0x81,umask=0x1/ 0 arb/event=0x84,umask=0x1/ 0.000630046 seconds time elapsed Of course, we also keep the '--no-merge' working for uncore events. # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true Performance counter stats for 'system wide': 1,952 uncore_arb_0/event=0x81,umask=0x1/ 1,921 uncore_arb_1/event=0x81,umask=0x1/ 0 uncore_arb_0/event=0x84,umask=0x1/ 0 uncore_arb_1/event=0x84,umask=0x1/ 0.000575536 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
de3d5fd perf tests: Fix 'Convert perf time to TSC' on core-only system If the atom CPUs are offlined, the 'cpu_atom' is not valid. We don't need the test case for 'cpu_atom'. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210708013701.20347-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
212f3d9 perf tests: Fix 'Roundtrip evsel->name' on core-only system If the atom CPUs are offlined, the 'cpu_atom' is not valid. Perf will not create two events for one hw event, so the evsel->idx doesn't need to be divided by 2 before comparing. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210708013701.20347-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
490e9a8 perf tests: Fix 'Parse event definition strings' on core-only system If the atom CPUs are offlined, the 'cpu_atom' is not valid. We don't need the test case for 'cpu_atom'. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210708013701.20347-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
49afa7f perf pmu: Skip invalid hybrid pmu On hybrid platform, such as Alderlake, if atom CPUs are offlined, the kernel still exports the sysfs path '/sys/devices/cpu_atom/' for 'cpu_atom' pmu but the file '/sys/devices/cpu_atom/cpus' is empty, which indicates this is an invalid pmu. Need to check and skip the invalid hybrid pmu. Before: # perf list ... branch-instructions OR cpu_atom/branch-instructions/ [Kernel PMU event] branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event] branch-misses OR cpu_atom/branch-misses/ [Kernel PMU event] branch-misses OR cpu_core/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_atom/bus-cycles/ [Kernel PMU event] bus-cycles OR cpu_core/bus-cycles/ [Kernel PMU event] ... The cpu_atom events are still displayed even if atom CPUs are offlined. After: # perf list ... branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event] branch-misses OR cpu_core/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_core/bus-cycles/ [Kernel PMU event] ... Now only cpu_core events are displayed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210708013701.20347-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 14 July 2021, 13:05:35 UTC
0abb33b drm/i915/gtt: drop the page table optimisation We skip filling out the pt with scratch entries if the va range covers the entire pt, since we later have to fill it with the PTEs for the object pages anyway. However this might leave open a small window where the PTEs don't point to anything valid for the HW to consume. When for example using 2M GTT pages this fill_px() showed up as being quite significant in perf measurements, and ends up being completely wasted since we ignore the pt and just use the pde directly. Anyway, currently we have our PTE construction split between alloc and insert, which is probably slightly iffy nowadays, since the alloc doesn't actually allocate anything anymore, instead it just sets up the page directories and points the PTEs at the scratch page. Later when we do the insert step we re-program the PTEs again. Better might be to squash the alloc and insert into a single step, then bringing back this optimisation(along with some others) should be possible. Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: <stable@vger.kernel.org> # v4.15+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matthew.auld@intel.com (cherry picked from commit 8f88ca76b3942d82e2c1cea8735ec368d89ecc15) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> 14 July 2021, 12:46:18 UTC
c987b65 iommu/rockchip: Fix physical address decoding Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20210712101232.318589-1-benjamin.gaignard@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de> 14 July 2021, 10:58:07 UTC
474dd1c iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries The commit 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") fixes an issue of "sub-device is removed where the context entry is cleared for all aliases". But this commit didn't consider the PASID entry and PASID table in VT-d scalable mode. This fix increases the coverage of scalable mode. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Fixes: 8038bdb855331 ("iommu/vt-d: Only clear real DMA device's context entries") Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Cc: stable@vger.kernel.org # v5.6+ Cc: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> 14 July 2021, 10:58:07 UTC
37764b9 iommu/vt-d: Global devTLB flush when present context entry changed This fixes a bug in context cache clear operation. The code was not following the correct invalidation flow. A global device TLB invalidation should be added after the IOTLB invalidation. At the same time, it uses the domain ID from the context entry. But in scalable mode, the domain ID is in PASID table entry, not context entry. Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> 14 July 2021, 10:57:39 UTC
ce36c94 iommu/qcom: Revert "iommu/arm: Cleanup resources in case of probe error path" QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, what fails for the second and latter IOMMU devices. This is intended and must be not fatal to the driver registration process. Also the cleanup path should take care of the runtime PM state, what is missing in the current patch. Revert relevant changes to the QCOM IOMMU driver until a proper fix is prepared. This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155. Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210705065657.30356-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de> 14 July 2021, 10:56:24 UTC
479857a powerpc/powernv: Fix fall-through warning for Clang Fix the following fallthrough warnings (powernv_defconfig and powerpc64): drivers/char/powernv-op-panel.c:78:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 14 July 2021, 00:21:41 UTC
bcb9928 net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() This was not caught because there is no switch driver which implements the .port_bridge_join but not .port_bridge_leave method, but it should nonetheless be fixed, as in certain conditions (driver development) it might lead to NULL pointer dereference. Fixes: f66a6a69f97a ("net: dsa: permit cross-chip bridging between all trees in the system") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 21:47:10 UTC
cf6678a MIPS: Fix unreachable code issue Fix the following warning (mips-randconfig): arch/mips/include/asm/fpu.h:79:3: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] Originally, the /* fallthrough */ comment was introduced by commit: 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries") and it was wrongly replaced with fallthrough; by commit: c9b029903466 ("MIPS: Use fallthrough for arch/mips") As the original comment is actually useful, fix this issue by removing unreachable fallthrough; statement and place the original /* fallthrough */ comment back. Fixes: c9b029903466 ("MIPS: Use fallthrough for arch/mips") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 20:08:15 UTC
bc431d2 MIPS: Fix fall-through warnings for Clang Fix the following fallthrough warnings: arch/mips/mm/tlbex.c:1386:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] arch/mips/mm/tlbex.c:2173:3: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 20:00:23 UTC
4796372 ASoC: Mediatek: MT8183: Fix fall-through warning for Clang Fix the following fallthrough warning: sound/soc/mediatek/mt8183/mt8183-dai-adda.c:342:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 19:58:18 UTC
2feeb52 drm/i915/gt: Fix -EDEADLK handling regression The conversion to ww mutexes failed to address the fence code which already returns -EDEADLK when we run out of fences. Ww mutexes on the other hand treat -EDEADLK as an internal errno value indicating a need to restart the operation due to a deadlock. So now when the fence code returns -EDEADLK the higher level code erroneously restarts everything instead of returning the error to userspace as is expected. To remedy this let's switch the fence code to use a different errno value for this. -ENOBUFS seems like a semi-reasonable unique choice. Apart from igt the only user of this I could find is sna, and even there all we do is dump the current fence registers from debugfs into the X server log. So no user visible functionality is affected. If we really cared about preserving this we could of course convert back to -EDEADLK higher up, but doesn't seem like that's worth the hassle here. Not quite sure which commit specifically broke this, but I'll just attribute it to the general gem ww mutex work. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> Testcase: igt/gem_pread/exhaustion Testcase: igt/gem_pwrite/basic-exhaustion Testcase: igt/gem_fenced_exec_thrash/too-many-fences Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210630164413.25481-1-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit 78d2ad7eb4e1f0e9cd5d79788446b6092c21d3e0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> 13 July 2021, 19:56:21 UTC
b51883d power: supply: Fix fall-through warnings for Clang Fix the following fallthrough warnings: drivers/power/supply/ab8500_fg.c:1730:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] drivers/power/supply/abx500_chargalg.c:1155:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 19:50:47 UTC
d6a48a4 dmaengine: ti: k3-udma: Fix fall-through warning for Clang Fix the following fallthrough warning: drivers/dma/ti/k3-udma.c:4951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 19:48:28 UTC
d4e8134 s390: Fix fall-through warnings for Clang Fix the following fallthrough warnings: drivers/s390/net/ctcm_fsms.c:1457:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] drivers/s390/net/qeth_l3_main.c:437:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] drivers/s390/char/tape_char.c:374:4: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] arch/s390/kernel/uprobes.c:129:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 19:43:09 UTC
4161401 dmaengine: ipu: Fix fall-through warning for Clang Fix the following fallthrough warnings (arm64-randconfig): drivers/dma/ipu/ipu_idmac.c:621:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] rivers/dma/ipu/ipu_idmac.c:981:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 19:38:47 UTC
40226a3 Merge tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux Pull vboxsf fixes from Hans de Goede: "This adds support for the atomic_open directory-inode op to vboxsf. Note this is not just an enhancement this also fixes an actual issue which users are hitting, see the commit message of the "boxsf: Add support for the atomic_open directory-inode" patch" * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: Add support for the atomic_open directory-inode op vboxsf: Add vboxsf_[create|release]_sf_handle() helpers vboxsf: Make vboxsf_dir_create() return the handle for the created file vboxsf: Honor excl flag to the dir-inode create op 13 July 2021, 19:07:18 UTC
f02bf85 Merge tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs zoned mode fixes from David Sterba: - fix deadlock when allocating system chunk - fix wrong mutex unlock on an error path - fix extent map splitting for append operation - update and fix message reporting unusable chunk space - don't block when background zone reclaim runs with balance in parallel * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree btrfs: don't block if we can't acquire the reclaim lock btrfs: properly split extent_map for REQ_OP_ZONE_APPEND btrfs: rework chunk allocation to avoid exhaustion of the system chunk array btrfs: fix deadlock with concurrent chunk allocations involving system chunks btrfs: zoned: print unusable percentage when reclaiming block groups btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work 13 July 2021, 19:02:07 UTC
5a1ab5c iommu/arm-smmu-v3: Fix fall-through warning for Clang Fix the following fallthrough warning (arm64-randconfig with Clang): drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:382:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:59:31 UTC
f95deae mmc: jz4740: Fix fall-through warning for Clang Fix the following fallthrough warning (mips-randconfig with Clang): drivers/mmc/host/jz4740_mmc.c:792:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:59:22 UTC
54325d0 PCI: Fix fall-through warning for Clang Fix the following fallthrough warning (arm64-randconfig with Clang): drivers/pci/proc.c:234:3: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:59:12 UTC
223fa87 scsi: libsas: Fix fall-through warning for Clang Fix the following fallthrough warning (arm64-randconfig with Clang): drivers/scsi/libsas/sas_discover.c:467:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:58:56 UTC
c869834 video: fbdev: Fix fall-through warning for Clang Fix the following fallthrough warning (arm64-randconfig with Clang): drivers/video/fbdev/xilinxfb.c:244:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:58:29 UTC
f336a00 math-emu: Fix fall-through warning Fix the following fallthrough warning (nds32-randconfig with GCC): include/math-emu/op-common.h:332:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 18:57:44 UTC
420405e configfs: fix the read and write iterators Commit 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods") changed the simple_read_from_buffer() calls into copy_to_iter() calls and the simple_write_to_buffer() calls into copy_from_iter() calls. The simple*buffer() methods update the file offset (*ppos) but the read and write iterators not yet. Make the read and write iterators update the file offset (iocb->ki_pos). This patch has been tested as follows: # modprobe target_core_user # dd if=/sys/kernel/config/target/dbroot bs=1 /var/target 12+0 records in 12+0 records out 12 bytes copied, 9.5539e-05 s, 126 kB/s # cd /sys/kernel/config/acpi/table # mkdir test # cd test # dmesg -c >/dev/null; printf 'SSDT\x8\0\0\0abcdefghijklmnopqrstuvwxyz' | dd of=aml bs=1; dmesg -c 34+0 records in 34+0 records out 34 bytes copied, 0.010627 s, 3.2 kB/s [ 261.056551] ACPI configfs: invalid table length Reported-by: Yanko Kaneti <yaneti@declera.com> Cc: Yanko Kaneti <yaneti@declera.com> Fixes: 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> 13 July 2021, 18:56:24 UTC
28efd20 Merge branch 'sfc-tx-queues' Íñigo Huguet says: ==================== sfc: Fix lack of XDP TX queues A change introduced in commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT because it unintentionally reduced the number of XDP TX queues, letting not enough queues to have one per CPU, which leaded to errors if XDP TX/REDIRECT was done from a high numbered CPU. This patchs make the following changes: - Fix the bug mentioned above - Revert commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") which intended to fix a related problem, created by mentioned bug, but it's no longer necessary - Add a new error log message if there are not enough resources to make XDP_TX/REDIRECT work V1 -> V2: keep the calculation of how many tx queues can handle a single event queue, but apply the "max. tx queues per channel" upper limit. V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the expected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 17:02:41 UTC
d2a16bd sfc: add logs explaining XDP_TX/REDIRECT is not available If it's not possible to allocate enough channels for XDP, XDP_TX and XDP_REDIRECT don't work. However, only a message saying that not enough channels were available was shown, but not saying what are the consequences in that case. The user didn't know if he/she can use XDP or not, if the performance is reduced, or what. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 17:02:41 UTC
788bc00 sfc: ensure correct number of XDP queues Commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same. Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist. Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future. Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 17:02:41 UTC
f28100c sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues The buggy commit intended to allocate less channels for XDP in order to be more unlikely to reach the limit of 32 channels of the driver. The idea was to use each IRQ/eventqeue for more XDP TX queues than before, calculating which is the maximum number of TX queues that one event queue can handle. For example, in EF10 each event queue could handle up to 8 queues, better than the 4 they were handling before the change. This way, it would have to allocate half of channels than before for XDP TX. The problem is that the TX queues are also contained inside the channel structs, and there are only 4 queues per channel. Reducing the number of channels means also reducing the number of queues, resulting in not having the desired number of 1 queue per CPU. This leads to getting errors on XDP_TX and XDP_REDIRECT if they're executed from a high numbered CPU, because there only exist queues for the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low numbered CPU, the error doesn't happen. This is the error in the logs (repeated many times, even rate limited): sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) This errors happens in function efx_xdp_tx_buffers, where it expects to have a dedicated XDP TX queue per CPU. Reverting the change makes again more likely to reach the limit of 32 channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT will be possible at all, and we will have this log error messages: At interface probe: sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32) At every subsequent XDP_TX/REDIRECT failure, rate limited: sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) However, without reverting the change, it makes the user to think that everything is OK at probe time, but later it fails in an unpredictable way, depending on the CPU that handles the packet. It is better to restore the predictable behaviour. If the user sees the error message at probe time, he/she can try to configure the best way it fits his/her needs. At least, he/she will have 2 options: - Accept that XDP_TX/REDIRECT is not available (he/she may not need it) - Load sfc module with modparam 'rss_cpus' with a lower number, thus creating less normal RX queues/channels, letting more free resources for XDP, with some performance penalty. Anyway, let the calculation of maximum TX queues that can be handled by a single event queue, and use it only if it's less than the number of TX queues per channel. This doesn't happen in practice, but could happen if some constant values are tweaked in the future, such us EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE. Related mailing list thread: https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/ Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 17:02:41 UTC
2e7ea96 cpufreq: Fix fall-through warning for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a fallthrough warning by simply dropping the empty default case at the bottom. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> 13 July 2021, 16:53:07 UTC
deb7178 net: fddi: fix UAF in fza_probe fp is netdev private data and it cannot be used after free_netdev() call. Using fp after free_netdev() can cause UAF bug. Fix it by moving free_netdev() after error message. Fixes: 61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 16:43:50 UTC
b0b33b0 net: dsa: sja1105: fix address learning getting disabled on the CPU port In May 2019 when commit 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") was introduced, the comment that "STP does not get called for the CPU port" was true. This changed after commit 0394a63acfe2 ("net: dsa: enable and disable all ports") in August 2019 and went largely unnoticed, because the sja1105_bridge_stp_state_set() method did nothing different compared to the static setup done by sja1105_init_mac_settings(). With the ability to turn address learning off introduced by the blamed commit, there is a new priv->learn_ena port mask in the driver. When sja1105_bridge_stp_state_set() gets called and we are in BR_STATE_LEARNING or later, address learning is enabled or not depending on priv->learn_ena & BIT(port). So what happens is that priv->learn_ena is not being set from anywhere for the CPU port, and the static configuration done by sja1105_init_mac_settings() is being overwritten. To solve this, acknowledge that the static configuration of STP state is no longer necessary because the STP state is being set by the DSA core now, but what is necessary is to set priv->learn_ena for the CPU port. Fixes: 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 July 2021, 16:32:41 UTC
back to top